Parallel Computing Notes


Flynn s Classification (1966)

Broad classification of parallel computing systems based on number of instruction and data streams

  • SISD: Single Instruction, Single Data conventional uniprocessor

  • SIMD: Single Instruction, Multiple Data distributed memory SIMD (MPP, DAP, CM-1&2, Maspar) shared memory SIMD (STARAN, vector computers)

  • MIMD: Multiple Instruction, Multiple Data message passing machines (Transputers, nCube, CM-5) non-cache-coherent SMP s (BBN Butterfly, T3D) cache-coherent (Sequent, Sun Starfire, SGI Origin)

  • MISD: Multiple Instruction, Single Data - no commercial examples.

    Source MIT CSAIL.


    Cache Consistency

  • Bus Based Snooping

  • Makes use of broadcast nature of a bus to maintain consistency of views

  • MSI/MESI/MOESI Protocols

  • Caches are associative on FSB as well as on CPU side (snooping)

  • Bus has an INVALIDATE signal that any node can drive to abort a cycle, allowing dirty cache line to get written out before restart.

  • On MOESI style systems, bus cycle can be serviced from another cache. Protocols

  • Can have multiple busses for more bandwidth, but associative snooping overhead prevents scaling.

  • Example Sun Starfire (UE10000) has four busses, connecting 16 blades each with 4 CPUs and local memory.

    CC Network

  • A solution that "pipelines the bus".

  • Example: Scalable Coherent Interface

  • Slotted ring instead of a bus

  • Still a broadcast media and so snooping possible

  • Extensions to Torus structures - scaling again causes a push for directory protocol.

  • Today, being re-invented as networks on a chip.

    Directory Protocols

  • Basis of many ccNUMA systems.

  • Keep sharing status with the memory instead of in just in cache.

  • Each block/line of memory has a directory entry storing which nodes it currently resides on.


    Traditional Switch Structures

  • Flits sent between nodes

  • Clos, Benes, Delta, Crossbar Structures from telehpone systems?

  • Blocking/nonblocking

  • Hypercubes mostly used

    A Tesseract - 4D cube.

  • Wormhole and Manhatten routing to avoid blocking

    Supercomputer Interconnect

    SGI Numalink

    Example Supercomputer

  • Connection Machine(s)

  • CM-1 was SIMD research project.

  • CM-5 MIMD, using up to 65000 SPARC microprocessors.

  • NCSA's CM-5 has: 512 nodes, gigabytes of memory, 140 gigabytes parallel disk storage system called as Scalable Disk Array (SDA)

  • Both Cray and Thinking Machines saw software costs and revenues soon exceed those for hardware.

  • Scalable disk array SDA

    SoC Bus Protocols

  • ARM AHB high performance bus - fixed handshake timing

  • Open Cores Protocol (OCP) - pipelineable bus protocol

  • ARM AXI - switchable bus protocol with asynchronous send/receive using transaction id.

    SAN. Storage Area Network

  • FDDI (old), Hyperchannel, Fibre Channel

  • Data migration and backup does not need to pass through a host.

  • Recently Firewire on PCs

  • Commercial offering: Connectrix.

  • New technologies such as WDM fibre may be used.

    Cluster Computing

  • Collection of identical computers on a fast LAN

  • Batch or interactive use/mix.

  • Control station maintains Job Queue and collects results

  • Load balancing is provided

  • Dynamic migration is possible - requires 'mobile-ready' app code

  • Parallel Virtual Memory (PVM) Libraries allow application-level pseudo supercomputing.

  • Synchronisation Primitives - next lecture.

  • Examples: Beowulf, Condor, Xenoservers

    www.answers.com/topic/parallel-computing