Computers are great tools for storing ridiculous amount of data. Even for home users and desktop computers, it is not uncommon to have more than 100GB of mass storage. So nobody bothers nowadays with deleting any old stuff. Everything gets stored somewhere and after a while one forgets about it. But what if you want to find that data again? When you do not no the original context that data was in? How can you leverage all the knowledge that is somewhere on your hard disk or even on your collection of dusted backup disks?
It seems to me that the current major challenge in software design isin managing the vast amounts of data we all produce daily. We basicallyhave sophisticated tools for pretty much everything in terms of dataproduction or data consumption. But there are no good tools for datafor data management. There are, of course, basic solutions for parts ofthe problem. Mac OS X is said to have good indexing and searchfunctionality (unfortunately I don’t have a Mac), Microsoft isdeveloping WinFS, storage is promising (but it seems to be stalled),Semantic Web technology is hyped (but still in its infancy), there arenumerous music and photo management tools, etc. But there is noomnipresent data management tool (that I know of), that combines all ofthis. No tool, no operating system, that really cares about all datafiling needs of regular computer users. Which for example you can tellto, find all info about say Turkey on your disks and it presents youwith a list of all documents in order of importance in connection withTurkey (e.g. the essay you wrote for arts class about the Hagia Spohia,or the entry in CIA’s world factbook, of which you have stored a copyon your hard disk), all music files with Turkish music, all pictures inconnection Turkey (e.g. the ones you took at your last vacation), andso on (Internet bookmarks, e-mails, IM discussions, etc.). There is noneed for the whole Semantic Web as long as we can’t even manage thedata on our local computers.
Search tools rely on two things: their understanding of the data itselfand their access to metadata. For text files there are (emerging) waysto deal with them and put them into context. With hypertext it is eveneasier, because links are basically a type of metadata. For everythingelse search tools can solely do their work based on metadata (at leastfor now, and probably for some time on). And metadata is scarce –simply because people are lazy.
So what we currently really need in order to make the above dreampossible are tools to produce as much (meaningful) metadata aspossible. This means (1) making the generation of metadata as simpleand natural as possible and (2) to generate as much metadata aspossible automatically. The former is mostly a question of interfacedesign and standards. The second part is much harder to do. Wired hasan interesting article about metadata collection for photographs.
There are already terabytes of metadata stored somewhere around theworld, much of which is (freely) accessible via the Internet. Thiscould be a good source for automatic metadata generation for your onefiles (I’m for instance thinking of music files) and to but files intocontext, but unfortunately very little of these vast amounts of data isusable by computers. So now I’m contradicting myself as for this youneed the semantic web.