Lecture 1 - Introduction to Concurrency and Parallelism
Tuesday, September 14, 2021 9:56 AM
Understand the different approaches for different problems
What approach to use when
Develop algorithms
Image Viewer:
• Can view in a number of directories and see a number of files, what
we want to avoid is loading the entire file.
○ Can have a folder with a 1000 files and we don't want to
preload all of these 1000 files.
▪ We can have a thread loading things in the background
while we show something in the application on the
foreground
• Task Parallelism
Second Coursework - Data Parallelism
1. Module AIM
a. Understand what we mean by concurrent and parallel systems
b. Build concurrent and parallel applications to solve particular problems
c. Analyze and measure concurrent and parallel applications to determine their performance
i. Run many times and put an average one
2. Coursework
a. First one - task parallelism
b. Second one - try to improve performance , measure and report performance gains
3. Work Plan
a. This is algorithm and technique based, and you need to understand concepts and use them in particular ways
b. The directed reading and exercises are far more important to aid your understanding
c. This is also an opportunity to study GPU programming, which is ubiquitous in high -performance computing
4. Software
a. C++ Version 11
i. Why C++
1) Low level language - more control and understand of what is going on
2) Used in High performance computing
3) Interface with GPU's
ii. Use cmake (use what is there)
5. Concurrency and parallelism
1)
a. Concurrency
i.
1) Two or more tasks that give the illusion of executing the same time (achieved by multiple threads, provided by operating system)
a) Different threads that do different things but they cannot overlap as they share processor's time
a)
i) The thing with concurrency is that we can have multiple threads executed on a single core and you can see on the schematic that we can have
non parallel execution where one thread does a bit of work and then at some point it stops and waits for something - not using CPU time - (this
can be for example where you are trying to load some file from disk, because you are getting data from your hard drive that does not occupy
CPU time so we are just waiting on data to travel from hard drive now)
While we are waiting for that data to come from the hard drive (no CPU time) we can do some other work - why waste time. - this is the
use case for more threads because that thread says: ok I am going to stop here, shout out to the operating system that I'm waiting and I
do not have anything to do (waiting on data from hard drive) so please somebody else do some work
Another thread does a little bit of work while the first one is waiting for data from Hard Drive and similarly might stop for whatever
reason and passes execution to a different thread.
ii) The impression of doing a lot of things as once is given by allocating a little bit of time by splitting the functons into execution at different times
while the other stop so another starts
You get the impression because cycles are done so couple that you think things happen in parallel where they are not really
2) The trick here is that we have different threads that are doing different work but they are on the same processor (single processor) - threads share time on
the processor
a) There is no overlap between execution
b. Parallelism
i. Two or more task that actually execute simultaneously
1)
a) We can see different things done on different threads at the same time - there is an overlap
2) This can be achieved by having different tasks executed on different processor cores
6. Flynn's Taxonomy - a way to distinguish computer architecture in the context of parallel execution
i. Is the function operating on a single piece of data or multiple piece of data at the same time or at different times
Instruction - can be a computer function
SET10108 - Concurrent and Parrallel Systems Page 1
, i. Is the function operating on a single piece of data or multiple piece of data at the same time or at different times
Instruction - can be a computer function
a.
i. Single Instruction, Single Data (SISD)
1) Traditional model
ii. Single Instruction, Multiple Data (SIMD) -
1) We have a single instruction, for example calculate the square root of a number
2) Can execute this instruction on multiple data at once
3) We have a vector storing the numbers [9,25,36] and we have a special instruction to calculate the square root of the numberswithin the vectors at the
same time, the output for this function will be another vector representing the output of these number [3, 5, 6]
iii. Multiple instructions, Single data (MISD)
1) More uncommon
2) Used for fault tolerance
a) Multiple instruction on the same piece of data - working on the same piece of data will generate different results
i) The output provided must be identical otherwise there is some sort of error and we will need to execute again to find out what it is
iv. Multiple Instruction, Multiple Data (MIMD)
1) General model
2)
a) SPMD
i) In multi node clusters
b) MPMP
i) Implements worker farmer model
b.
7. Task vs Data Parallelism
a. Task parallelism
i.
1) Different tasks / functions executed concurrently on different threads
2) Tasks can / have to share access to common data (data has to be shared across threads)
3) Example: UI thread, physics thread, resource loading thread, etc
a) The threads run independently
b. Data Parallelism
i.
1) Same task executed on different parts of the data (chunks of data)
a) Run exact same function but on different parts of the data
2) Each processor operates on a small part of the data
3) GPUs are designed for data parallelism
a) GPU not good for Task Parallelism
4) Example:
a) Pixel drawing
8. Multi-Threading
a.
i. The first part of the module will cover traditional multi-threading approaches using C++11 or later versions
1) Thread - streams of instructions
2) Mutexes - objects that allow threads to use/ share same resource (exclusive actions) - while a thread uses a resource no other thread has access to it
3) Condition - allow threads to wait until a particular condition occurs
9. Instruction level Parallelism
a. We will also look at how modern CPUs can perform parallel operations
i. Do it with :
1) Single Instruction Multiple Data (SIMD) -
a) operates on fixed sized data
i) Very good for particular tasks, but very limited in what is possible
ii) Requires low level manipulation of data and instructions - is therefore difficult to operate across different operating systems and processor
architectures
2) Vector processing
a) more flexible - execute a single instruction on multiple data which is not on fixed sized data
10. Distributed Parallelism
a.
i. Use parallelism across different machines
1) We will also look at how we can exploit multi-machine parallelism across a network
2) We will look at the problems of I/O (Input-Output) and CPU bound problems here as well
11. GPU Programming
SET10108 - Concurrent and Parrallel Systems Page 2
, a.
b. The latter part of the module will focus on GPU programming via OpenCL and CUDA
c. Shaders are little programs that execute physically on the graphics card, typically used in computer graphics
i. How parallelism came live
12. Why work in concurrency and parallelism?
a.
b. Makes software run faster
i. Software developers relied on CPU performance increases to overcome limitations caused by larger computational problems
1) From 1986 till 2002 CPU speeds increased by an average of 50% per year
2) Since 2002 the average increase is 20% per year - and this has been due to multicore
a) We can no longer rely on faster single core performance to overcome our performance limitations
13. Moore's Law
a.
i. Moore’s Law states that transistors per square inch doubles every 18 months
1) More transistors = more performance
ii. The observation was made in 1965, and proved accurate for several decades
iii. Observation, not a natural law:
1) Physical limits have caused the rate to reduce over the decades
2) The rate will soon reach saturation
14. CPU Clock Speeds
a.
i. GHz does not mean better performance - more cores is the answer
15. CPU Core Count
a.
i. GPU cores are weak
ii. Having more cores does not mean everything is better due to memory issues
1) Communication with data between the cores
16. CPU Architecture
SET10108 - Concurrent and Parrallel Systems Page 3
, a.
i. Multiple levels of Cache memory - L1, L2, L3
1)
ii. Every Memory that has bigger memory is more slower
iii. L3 is shared across all cores
17. Memory Bandwidth - biggest problem (performance issues)
a. Our biggest problem today comes from memory bandwidth
i. “I/O bound” - with more cores that means we have more memory accesses (they might be all over the place)
1) This means that each core spends more time waiting for memory read/write
b. More cores = more memory reads (usually discontinuous) = more time core spent waiting for memory read and write
c. GPU read/write is even worse
i. Discrete GPU memory is physically away from the CPU
ii. We will come to this later in the module
18. Distributed and GPU computing have limitations too
a. The general rule of thumb you might believe is that GPUs and cluster computing will overcome our local performance limitation s.
i. Memory transfers between CPU and GPU are very, very slow
ii. GPUs only solve well particular types of problems - data parallel
iii. Distributed parallelism has to deal with latency and bandwidth issues
1) Even 1000 Mbit/s is very slow -have to make your problems CPU bound not I/O bound
19. Performance
a. IDEA Scenario
i.
1) P - number of processors
2) In an ideal scenario we can achieve parallel computation time of the T serial divided by the number of processes
a) If an execution takes 10 seconds and we have 5 processors - in an ideal worlds the time to execute the program with 5 cores would be 2 seconds
b. More Realistic formulation
i.
1) What if we throw more and more threads in our application. We do not typically get the perfect speed up, even in some cases we might end up with worse
performance
2) Let's define the speed up of the application as the ratio of the serial time to the parallel time
a) Serial time 10 seconds, the parallel time - 2 seconds and the speed up would be 5 in an ideal world
b) If S = P (exceptional case - linear speed up)
3) The efficiency E is equal to the ratio (S) divided by p (speed up divided by the number of processors)
a) The speed up is the ratio of the serial to parallel - this is all divided by the number of processors
i)
4) If we move things around we have these serial divided by the number of processors times (x) parallel
SET10108 - Concurrent and Parrallel Systems Page 4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper venetsiakrasteva. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €25,10. Je zit daarna nergens aan vast.