Wednesday, January 28, 2009

Digital Heritage

/.   http://tech.slashdot.org/article.pl?sid=09/01/27/0128207

[0]Hugh Pickens writes "The chief executive of the British Library, Lynne Brindley, says that our cultural heritage is at risk as the Internet evolves and technologies become obsolete, and that historians and citizens face [1]a 'black hole' in the knowledge base of the 21st century unless urgent action is taken to preserve websites and other digital records. For example, when Barack Obama was inaugurated as US president last week, [2]all traces of George W. Bush disappeared from the White House website. There were more than 150 websites relating to the 2000 Olympics in Sydney that vanished instantly at the end of the games and are now stored only by the National Library of Australia. 'If websites continue to disappear in the same way as those on President Bush and the Sydney Olympics... the memory of the nation disappears too,' says Brindley. ... 

  1. http://www.guardian.co.uk/technology/2009/jan/25/internet-heritage


CFM >> This is appropriate? The world is growing it's 'document' production systems, some now automated! The exponential growth in reading materials, audio and video records is undeniable.

BUT,  The librarian is not tasked with capturing ALL of it ! 

The librarian is tasked with keeping a representative overview of things, within the budgets available.  This forces SELECTION !

We need to select from the vast chatter on the internet and in more traditional media, what we wish to be available to future generations and researchers.  This selection process needs also to be matched with a system to facilitate a researcher "finding" and perhaps "translating" the material efficiently, as the quantity of material grows.  However, some system of 'valuing' documents is needed, perhaps Oxford and Cambridge university presses did this in times past,along with newspaper editors.  Should there be a more democratic valuing system today ?

I think not.  I think that various organisations will retain what they feel is important, and perhaps make that available to the internet users, or not. I think this is appropriate.  Having everything ever recorded in the internet search engines will dilute the value of the search engine !

Also considering that with vast amounts of material, future researchers will only be able to use automated searches, and pulling the really valuable (and value is a property of the valuer, not the item) documents,  which are on target with respect to the query, will become increasingly difficult.

Perhaps a new "Decimal Dewey Classification(*)" system, or PageRank(+) algorithm is needed, designed to facilitate searches of documents based on tags, time-of-writing, perspective-of-author and perspective-0f-reader. The last two properties here are obviously multi-dimensional, and may benefit from an agreed clasification system themselves.

* (from Wikipedia) The Dewey Decimal Classification (DDC, also called the Dewey Decimal System) is a proprietary system of library classification developed by Melvil Dewey in 1876, and has since then been greatly modified and expanded through twenty-two major revisions, the most recent in 2004. The system is a method for placing books on library shelves in a specific and repeatable order that makes it easier to find any specific book or to return it to its proper place.

+ PageRank is a link analysis algorithm used by the Google Internet search engine that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references.

Saturday, January 24, 2009

What does Multi-core HW mean for SW engineers ?

Much of the worlds SW is written to be executed on a single processor.  The writers validly assumed a single processor machine. That assumption is breaking down now with most new machines having multiple processors.
The individual processors or cores are just as powerful as the traditional single core machines, but to get best value from the HW, we wish as much of the resources available to be working for us when we're waiting for results as possible. Then they should idle when we've no jobs for them.
Traditional single thread code can only use one core, so if the user is only running one program - and it's a lot of work to do, then the other available cores are being wasted.

Hmmm ... but isn't there a huge amount of code written using multiple processes, which could be altered for execution on multiple processors ?
Indeed, isn't there much code that deals with various HW IO interfaces that are essentially designed for multiple processing HW ?
Could the "OS" not be modified to select which processor to run the various processes on, perhaps with a "deployment" file prepared by the SW designers to assist in the spread of processes/threads between the various processors availablility, taking relative processor loading into consideration ?

If you're programming systems with multiple processes or applications, employing IPC through files, sockets, named pipes etc, could the OS provide all the required mappings ?  YES !

Doesn't the OS itself have many programs working - which can be spread over the cores? And further, the modern PC user employs MANY programs all at the same time.  This is also becoming common in embedded systems too, where Linux is widely available, with other mature embedded OSs.  In such an evnironment, can't the load be pread over the cores reasonable well by the OS itself, and permit a great improvement to the end users experience?
Let's assume that we're talking about single thread apps for now, perhaps these are where the majority of SW currently is.  As the number of cores increase from 2, 4, 8, ... these applications will use a smaller and smaller fraction of the available processor capacity, providing their users with a fraction of the performance that may be possible.  This assumes that processor power is the limiting factor, while communications bandwidth to distant servers, or HW, or other bottlenecks, might actually be the limiting factor.

So, the ability to write stable, multi-threaded SW is cited as the scarce resource in optimising the employment of the new multi-core chips. Ok, so at least I'm a member of that gang! :-)

Also, it would seem that anyone who has written device-drivers, and worked with ISRs,  will have dealth with the issues that need to be addressed in optimising code for the new HW.
The SW should be capable of employing N processors, not just the 2 or 4 cores that todays platforms offer.

How will we tackle this ? Should, or can, the OS be left to decide which core to run a new process on ?

Would the OS be capable of moving processes between cores to load-balance ?

Should threads all reside on the same core, or is there patterns where threads can be distributed throughout the cores ?
Debugging muilti-core applications is more difficult than single-thread applications as the scheduling is not deterministic. The timing between the processes/threads are not easily repeatable.

 But the existing multi-threading programming patterns already have solutions for these problems? We're just talking about getting more programmers to understand and use them ?

Amdahl's law : if work is split into work that can be executed in parallel (WP)[0-1] must be computed serially (1-WP) work and work that , then the possible speed-up on many processors is at best
acceleration  = 1/(1-WP).

Work Law :  P.Tp  >=  T1

... where P is the number of processors, Tp  fastest possible execution time on P processors and T1 time
 taken to do all the work on 1 processor.

Ehh ... essentially there are overheads ... and we recognise that some work can not be run in parallel, right ?

In problems where the work permits speed-up proportional to the number of processors:

T1/Tp = k.P  (k is less than, or equal to, 1)

we say that the possible speedup is linear. (!) Where k=1, we have perfect linear speedup (!!)
Super - linear can exist ... (k>1)  ? [really ? even with caching, .. ?]

Critical Path - can't beat this, regardless of the number of processors.
....
Anyway - Amdalh killed multiprocessing back-then, when problems were relatively simple.  Now, with huge programs being written, the fraction of code that can be run in parallel is growing.
> Is this true ?? Why ?
People have a greater desire for prompt responses than greater throughput.
>> True for individuals, not for companies running servers. There are more individuals than servers.
>> What about embedded applications ?
.....
Don't use lobal variables - they cause race conditions in multi-threaded/processor apps !
>> and hey're hard to remember ! (encapsulation rules these out anyway)
..

Monday, January 5, 2009

Iterative v Incremental Design

Alistair Cockburn: Agile Software Development

Iterative refers to a scheduling and staging strategy that allows rework of pieces of the system.

Iterative development lets the team learn about the requirements and design of the system. Grady Booch calls this sort of learning “gestalt, round-trip design”, a term that emphasizes the human characteristic of learning by completing.

Iterative schedules are difficult to plan, because it is hard to guess in advance how many major learnings will take place. To get past this difficulty, some planners simply fix the schedule to contain three iterations: draft design, major design, and tested design.

Incremental refers to a scheduling and staging strategy in which pieces of the system are developed at different rates or times and integrated as they are developed.

Incremental development lets the team learn about its own development process as well as about the system being designed. After a section of the system is built, the team members examine their working conventions to find out what should be improved. They might change the team structure, the techniques or the deliverables.

Incremental is the simpler of the two methods to learn, because cutting the project into subprojects is not as tricky as deciding when to stop improving the product. Incremental development is a critical success factor for modern projects.

Sunday, January 4, 2009

Be Upbeat in a down turn

techdirt: Yes, we're in the midst of a brutal financial mess -- but that won't stop innovation. Yes, incumbent forces, with short-sighted plans and a desire to hold back the tides are annoying and disruptive (not in a good way) in the short run. But even they are finding they can't hold back progress. Robert Friedel has a wonderful book called A Culture of Improvement that details how we, as a society, are constantly looking to improve on what we already have. We add ideas and ingenuity to old concepts and build something better -- not because of the desire to grab some "intellectual property," but because of the desire to improve our own lot, to build a better tool that we want to use. Incumbent short-sighted players have been able to hinder and harm progress, but they can't keep it down completely. That culture of improvement can't be stopped entirely.