Saturday, January 24, 2009

What does Multi-core HW mean for SW engineers ?

Much of the worlds SW is written to be executed on a single processor.  The writers validly assumed a single processor machine. That assumption is breaking down now with most new machines having multiple processors.
The individual processors or cores are just as powerful as the traditional single core machines, but to get best value from the HW, we wish as much of the resources available to be working for us when we're waiting for results as possible. Then they should idle when we've no jobs for them.
Traditional single thread code can only use one core, so if the user is only running one program - and it's a lot of work to do, then the other available cores are being wasted.

Hmmm ... but isn't there a huge amount of code written using multiple processes, which could be altered for execution on multiple processors ?
Indeed, isn't there much code that deals with various HW IO interfaces that are essentially designed for multiple processing HW ?
Could the "OS" not be modified to select which processor to run the various processes on, perhaps with a "deployment" file prepared by the SW designers to assist in the spread of processes/threads between the various processors availablility, taking relative processor loading into consideration ?

If you're programming systems with multiple processes or applications, employing IPC through files, sockets, named pipes etc, could the OS provide all the required mappings ?  YES !

Doesn't the OS itself have many programs working - which can be spread over the cores? And further, the modern PC user employs MANY programs all at the same time.  This is also becoming common in embedded systems too, where Linux is widely available, with other mature embedded OSs.  In such an evnironment, can't the load be pread over the cores reasonable well by the OS itself, and permit a great improvement to the end users experience?
Let's assume that we're talking about single thread apps for now, perhaps these are where the majority of SW currently is.  As the number of cores increase from 2, 4, 8, ... these applications will use a smaller and smaller fraction of the available processor capacity, providing their users with a fraction of the performance that may be possible.  This assumes that processor power is the limiting factor, while communications bandwidth to distant servers, or HW, or other bottlenecks, might actually be the limiting factor.

So, the ability to write stable, multi-threaded SW is cited as the scarce resource in optimising the employment of the new multi-core chips. Ok, so at least I'm a member of that gang! :-)

Also, it would seem that anyone who has written device-drivers, and worked with ISRs,  will have dealth with the issues that need to be addressed in optimising code for the new HW.
The SW should be capable of employing N processors, not just the 2 or 4 cores that todays platforms offer.

How will we tackle this ? Should, or can, the OS be left to decide which core to run a new process on ?

Would the OS be capable of moving processes between cores to load-balance ?

Should threads all reside on the same core, or is there patterns where threads can be distributed throughout the cores ?
Debugging muilti-core applications is more difficult than single-thread applications as the scheduling is not deterministic. The timing between the processes/threads are not easily repeatable.

 But the existing multi-threading programming patterns already have solutions for these problems? We're just talking about getting more programmers to understand and use them ?

Amdahl's law : if work is split into work that can be executed in parallel (WP)[0-1] must be computed serially (1-WP) work and work that , then the possible speed-up on many processors is at best
acceleration  = 1/(1-WP).

Work Law :  P.Tp  >=  T1

... where P is the number of processors, Tp  fastest possible execution time on P processors and T1 time
 taken to do all the work on 1 processor.

Ehh ... essentially there are overheads ... and we recognise that some work can not be run in parallel, right ?

In problems where the work permits speed-up proportional to the number of processors:

T1/Tp = k.P  (k is less than, or equal to, 1)

we say that the possible speedup is linear. (!) Where k=1, we have perfect linear speedup (!!)
Super - linear can exist ... (k>1)  ? [really ? even with caching, .. ?]

Critical Path - can't beat this, regardless of the number of processors.
....
Anyway - Amdalh killed multiprocessing back-then, when problems were relatively simple.  Now, with huge programs being written, the fraction of code that can be run in parallel is growing.
> Is this true ?? Why ?
People have a greater desire for prompt responses than greater throughput.
>> True for individuals, not for companies running servers. There are more individuals than servers.
>> What about embedded applications ?
.....
Don't use lobal variables - they cause race conditions in multi-threaded/processor apps !
>> and hey're hard to remember ! (encapsulation rules these out anyway)
..

No comments: