Knowledge Hunter: March 2006

Friday, March 31, 2006

Browsing through multimedia

Johan Oomen and I published our learnings in a magazine for Information Professionals. Johan being the projectleader at The Netherlands Institute for Sound and Vision and I being the information strategy consultant have been the tandem carrying out the project for the last 2 years. The article is in Dutch, so is the magazine, but for the english readers we are aiming at presenting the story at the European Streaming Media conference in London on Octobre 12th-13nd.

Sunday, March 26, 2006

Gobsmacking RFID...

Here comes miniature robosquad

Monday, March 20, 2006

History of internet

Just because I stumbled over it while gathering multimedia retrieval cases I couldn't help myself sharing it with you. An interesting (and kind of funny) video showing the team that developed ARPANet, the predecessor of Internet.Widely circulated photo of the IMP Team (left to right): Truett Thatch, Bill Bartell (Honeywell), Dave Walden, Jim Geisman, Robert Kahn, Frank Heart, Ben Barker, Marty Thrope, Will Crowther, Severo Ornstein. Not pictured: Bernie Cosell.

In late 1968, the Advanced Research Projects Agency (ARPA) put out a Request for Quotation (RFQ) to build a network of four Interface Message Processors (IMPs). At BBN, Frank Heart assembled a proposal team that included Dave Walden, a young programmer with expertise in real-time systems, Bernie Cosell, an ace de-bugger, Severo Ornstein, a hardware ace, Will Crowther, an programmer who specialized in producing complex, tight code, and Bob Kahn, the consummate theoretician who understood error-control and the problems associated with sending data over telephone lines. Source: www.bbn.com

In the video provided by GoogleVideo we can see interviews with: J.C.R. Licklider, Fernando Corbanto, Leonard Kleinrock, Bob Kahn, ... Len Kleinrock's story about the first message over ARPANet:

"A month later the second node was added (at Stanford Research Institute(SRI)) and the first Host-to-Host message ever to be sent on the Internet was launched from UCLA. This occurred in October when Kleinrock and one of his programmers proceeded to "logon" to the SRI Host from the UCLA Host. The procedure was to type in "log" and the system at SRI was set up to be clever enough to fill out the rest of the command, namely to add "in" thus creating the word "login". A telephone headset was mounted on the programmers at both ends so they could communicate by voice as the message was transmitted. At the UCLA end, they typed in the "L" and asked SRI if they received it; "got the L" came the voice reply. UCLA typed in the "O", asked if they got it, and received "got the O". UCLA then typed in the "G" and the darned system crashed! Quite a beginning. On the second attempt, it worked fine!"

European Archive

The European Archive is a non-profit foundation working towards universal access to all knowledge. The archive will achieve this through partnerships with libraries, museums, other collection bodies, and through building its own collections. The primary goal of collecting this knowledge is to make it as publicly accessible as possible, via the Internet and other means.

Rationale With the Internet, tremendously rich parts of our cultural heritage could be freely accessible online. But most of them are still dormant (non-digitized collections) or disappearing (Web history). Massive digital collecting, digitizing, and storage techniques make it possible to preserve and give public access to this rich material. Mastering these techniques will be key in the coming years for the future of cultural heritage, both traditional and materials produced in digital form.

Europe, cradle of a unique cultural heritage has a special role to play to this regard. But even in a connected world, propinquity, specific legal and technical environment should not be underestimated and having a European-based institution in this domain makes a difference. By developing a large-scale archiving architecture in Europe and competences that come with it, the European Archive intends to be a catalyser in the development of skills and know-how in the domain of preservation and access to digital collections. It also intend to bring in Europe a new type of Cultural Institution that focus to free public access to large rich digital collections. Web archiving As the web has grown in importance as a publishing medium, we are behind in bringing into operation the archiving and library services that will provide enduring access to many important resources. Where some assumed web site owners would archive their own materials, this has not generally been the case. If properly archived, the Web history can provide a tremendous base for time-based analysis of the content, the topology including emerging communities and topics, trends analysis etc. as well as an invaluable source of information for the future.

The foremost effort to archive the Web has been carried on in the US by the Internet Archive, a non-profit foundation based in San Francisco. Every two months, large snapshots of the surface of the web are archived by the Internet Archive since 1996. This entire collection offers 500 terabytes of data of major significance in all domain that have been impacted by the development of the Internet, that is, almost all. This represent large amount of data (petabytes in the coming years) to crawl, organize and give access to.

By partnering with the Internet Archive, the European Archive is laying down the foundation of a global Web archive based in Europe. Digitization We have entered an era where digitization of all significant cultural artefacts will be completed. Within the next decade, most of the published cultural content (books, music, images and moving images) will have been digitized. Recent commercial announcements have fostered awareness and started this movement, but limited to a few major libraries which leaves an opportunity for an open system to be pursued. This entails digitizing, preserving and providing access to the rich public domain of books, music, images and moving images on a the large scale. By fostering the development of a large scale, archiving platform, the European Archive intend to facilitate the mastering of processes and tools needed for digital public content archiving and distribution in Europe. Infrastructure With the technical support of the Internet Archive and XS4ALL, the European Archive has installed a repository with 250 terabytes (250 000 giga-bytes) capacity in Amsterdam via which a large collection digital material (text, music, moving images, software) can be accessed. On average, the download rate has been over 350 Megabits per second in December 2005, already making EA a significant content provider in Europe. The data organization is highly distributed (200 nodes on a cluster) to enable distributed processing. This achievement represents already a significant step in establishing in Europe an archiving infrastructure to collect and archive digital material at large scale. We plan to extent it to 1 petabyte (1000 terabytes) within the coming years. The European Archive should be accessible in Europe’s language. A multilingual web management system for large digital collections has been implemented. This system is based on a flat structure permanently indexed and updated. It allows light and flexible management of the web interface to collections, and can scale up unlimitedly in the future. An opportunity for Europe We expect the European Archive to become an essential piece in the European cultural heritage landscape. In order to meet the goals of Lisbon 2010 for Europe to become

“the most dynamic knowledge economy in the world”

large-scale public archive is a key component. It will enable public and free access to large portion of European cultural heritage and bring in the broadband network infrastructure tremendous quantity of rich and legal content. It will bring to traditional heritage institutions a technology partner enabling them to make significant steps towards digitization and public accessibility of their collections, making Europe more visible and attractive in a globally networked world. By developing cutting hedge technology application in the domain of massive digital collection acquisition, management and storage it will develop a centre of excellence in a key domain of tomorrow’s Internet.

Munch: Video Retrieval system for Dutch national video archive

Multimedia aNalysis for Cultural Heritage (MuNCH). MuNCH focuses on knowledge enrichment by means of automated analyses of digital images and video. With the advent of digital communication, we live in the exciting times of broad and narrow casting through the Internet, of passive and active viewers, of direct or delayed broadcast, and of digital pictures being delivered in the museum or at home. At the same time, the picture and television archives turn digital. In these demanding times, the archives are likely to be swamped with information requests unless they swiftly adapt to partially automatic annotation and digital retrieval. The aim of this project is to provide faster and more complete access to pictures in cultural archives through digital analysis. MuNCH is a project of the University of Amsterdam, the Vrije Universiteit Amsterdam, the Netherlands Institute for Sound and Vision and the association Digitaal Erfgoed Nederland. Projectleader is Arnold Smeulders. Intelligent Systems Lab Amsterdam on the project:

In the MuNCH project - part of the CATCH projects by NWO - we aim to deliver a working video retrieval system to the Beeld en Geluid national (video)archive. A first version is planned for at the end of 2006 making use of the engines for video analysis and interaction. In the MuNCH, we cooperate with Maarten de Rijke (UvA) on natural language processing, with Guus Schreiber (VU) on ontologies of video material, and of course with The Netherlands Institute for Sound and Vision, Annemieke de Jong providing the content (essence + metadata).

Participants of Intelligent Systems Lab Amsterdam are:

Jan van Gemert
Jan-Mark Geusebroek
Arnold Smeulders (contact MuNCH)
Cees Snoek (contact TRECVID)
Marcel Worring (contact MediaMill)

MultiMatch: multilingual/multimedia access to cultural heritage

Our shared cultural heritage (CH) is an essential part of our European identity, transcending cultural and language barriers. The MultiMATCH project will enable users to explore and interact with online CH content, across media types and language boundaries, in ways that do justice to the multitude of existing perspectives.

The MultiMATCH search engine will:

crawl the Internet to identify websites with CH information, locating relevant texts, images and videos, regardless of the source and target languages used to write queries and describe the results;
automatically classify the results in a semantic-web compliant fashion, based on document content, its metadata, its context, and on the occurrence of relevant CH concepts in the document;
automatically extract relevant information which will then be used to create cross-links between related material, such as the biography of an artist, exhibitions of his/her work, critical analyses, etc.;
organize and further analyse the material crawled to serve focused queries generated from user-formulated information needs;
interact with the user to obtain a more specific definition of initial information requirements; and finally organize and display search results in an integrated, user-friendly manner, allowing users to access and exploit the information retrieved regardless of language barriers.

MultiMATCH will combine the advantages of both generic and specialised search facilities: it will be automatically updated via regular focusedcrawling of relevant web sites, and does not involve costly maintenance for manual indexing, classification or annotation of data. The system will be designed to support diverse user classes but also to assist CH institutions to disseminate their content widely and raise their visibility.

MultiMatch is an IST-2005-2.5.10 project (Access to and preservation of cultural and scientific resources) and involves 11 teams from around Europe. Within the Amsterdam team, the people involved are:

Jaap Kamps
Maarten de Rijke
PhD student
PhD student
Postdoc

Content will be provided by Johan Oomen, The Netherlands Institute for Sound and Vision.

Culture around the Corner

Information about the nearest cultural event, museum or monument on your mobile phone, Personal Digital Assistant (PDA) or laptop is no longer something of the far future. ZaPPWeRK is initiator of Culture around the Corner and selected parties to participate in this innovative project. The service works via text messaging or, technically more advanced, the mobile portal Vodafone life! where "Steetguide" indicates the exact location of the nearest building of interest. The website www.cultuurindebuurt.nl can be consulted via PDA and laptop. Whatever instrument you use the system retrieves your position automatically and matches your location with the location of the required place of interest.

Sunday, March 19, 2006

National Archives and Google Launch Pilot Project to Digitize and Offer Historic Films Online

National Archives and Google Launch Pilot Project to Digitize and Offer Historic Films Online Washington, D.C. and Mountain View, Calif. – Feb. 24, 2006 – Archivist of the United States Allen Weinstein and Google (NASDAQ:GOOG) Co-Founder and President of Technology Sergey Brin today announced the launch of a pilot program to make holdings of the National Archives available for free online. The eagle has landed, 1969. This non-exclusive agreement will enable researchers and the general public to access a diverse collection of historic movies, documentaries and other films from the National Archives via Google Video as well as the National Archives website." source: NARA Press/www.archives.gov

Wednesday, March 15, 2006

Quaero's 'stunningly ambitious' technological goals

That’s what The Economist wrote on March 9, 2006 in an article heading:” Search technology: Can an ambitious new European search engine, backed by the governments of France and Germany, challenge Google?” On various blogs people discuss Quaero, so do I. A great many of them state that the way Quaero will compete (as if) with Google and Yahoo! is unfair, since the French and German governments are pumping enormous funds into the collaborative economic effort in the image of the success of Airbus. Well, so what! Isn’t the American software industry built upon the previous investments of its governmental projects like DARPA and NASA? Google being a private company today, doesn’t mean it did ever profit from governmental funds, didn’t Page and Brin study at Stanford, and doesn’t Stanford still hold the patents on the ranking mechanism… I rest my case. Far more interesting than French/German – American competition, is the scientific side of Quaero, that is: investing in building a new body of knowledge about multimedia retrieval. Really pushing the envelope of information retrieval by developing new technology to be able to incorporate other than text-based search mechanisms. Searching by example, with an image as example! One of the french project partners called LTU Technology is already working on it quite some time, now they have the opportunity to get to the next level. And so does the German research team at the university of Karlsruhe, working on speech-based algorithms. The latter are focussed on speech-based retrieval. Furthermore, think of the massive impact on the multiliguality of the web by propagating keywords. Say two images match based on image properties, the one has english metadata the other french metadata, through propagation of these metadata onto eachother (in the index) both pictures will become retrievable in both languages. That’s kick-ass innovation. Let’s give Quaero a chance. Luckily Mrs. Merkell backs the initiative of her predecessor Schroeder, so politics didn’t get in the way during the german government’s transition. Let the researchers do their work, don’t let Quaero get in harms way before something is actually built. It’s not about competing, it’s about a next leap in retrieval, multi media retrieval! PS Quaero has one advantage on Google and alike, it doesn’t have difficult business models underneath, just like Google when it was still a project at Stanford some ten years ago being nurtured by ambitious students called Larry Page and Sergey Brin.

Tuesday, March 14, 2006

Google and its Page Ranking... how objective is it really

Today I was wondering, don't worry happens quite often, I was wondering what the trade-off is between Page-Ranking and AdSense. Page-Ranking is the algorithm used by Google to make a bit of sense of the vast amount of information on the web and add some sort of useful ranking to it when searched. AdSense is selling advertising space related to search terms. So, here I come: what if a big spender pays a lot for advertising related to a set of search terms... what happens to the integrity of page-ranking (as if) of the pages relating to those terms?

Furthermore, an increasing amount of e-businesses' revenue is directly related to the ranking of their site in Google, hence they try to influence (also known as SEO) the ranking mechanism with all sorts of methods. Think of the following things they probably do:

Pay attention to keyword inclusion and placement. Research most relevant AND most searched for keywords/phrases. Balance these with niche keywords/phrases to hit nr 1 spot for these niches.
Make sure especially your homepage in content rich!
Create content-rich information pages to direct traffic to your site
Submit your site to online directories
Make sure you site is linked to from Blogger.com blogs, wiki, et cetera.
Multiply and conquer. Create a community of related sites that link to each other.
Create a structured sitemap page and submit it to Google Sitemaps
Consider using Google’s advertising programmes (AdWords and AdSense) to attract visitors instead of purely optimizing site ranking on Google index criteria (can result in poor ranking on other seach engines).

Sending a site map to Google is relatively new, but consider the last bullet. SEO by means of AdWords and AdSense. There could be a trade-off regarding the objectivity of ranking, don't you think? Remember EPIC, will newspapers become the news source for the elite and will the rest eat Google's dogfood? Don't get me wrong here, I google as much as any other web user, only I have some second thoughts and once in a while roll my own search engine.

Edward Tufte

Edward Tufte doesn't need an elaborate introduction, he's simply one of the greater authors on information visualisation. Well known for his books:

The Visual Display of Quantitative Information.
Visual Explanations: Images and Quantities, Evidence and Narrative
Envisioning Information

The last one being my favourite. Today I tripped on his website, while following a link on the blogroll of another aficionado, and saw Tufte's essay on powerpoint for the first time. The picture of Stalin inspecting his army with added text baloons made me laugh out loud. Have a look yourselves.

Wednesday, March 08, 2006

Information Architecture according to Dynamic Diagrams

Monday, March 06, 2006

Amsterdam?

Can someone tell me why this icon set is called "Amsterdam"? Where are the corny cloggs, canals, tulips, et cetera that are normally associated with Amsterdam... To the joy of the Dutch the designer freed his mind of all those obvious things and came up with something completely new. Did he get detached from reality or did he just pick a name starting with "A" in the atlas? I just don't get it, sorry icon-buffet-dot-com, you're are just to far-out for me.

No more bookmarks, here comes tagging

Deb Richards dropped an interesting idea:

The idea is that we replace bookmarks entirely with the tagging concept. Instead of bookmarking a page, subscribing to an RSS feed, you just tag it. Tagging an item automatically stashes that URL into your profile's tags file/database. If you're tagging a web feed, it automatically turns it into a Live bookmark (although we need to get rid of the "bookmark" term entirely...it's not a book).
As it stands, browsers are adding "there's a feed here" indicators to the address bar. Our browser treats that as a "Store this as a Live Bookmark" button (which is sort of unuseful, really). The initial idea is to add to or replace the Feed button (the orange thing) with a Tag button.

Wednesday, March 01, 2006

Mobtagging?

Mobtagging, or 'folksonomies' is what happens when users freely apply and exchange metadata to online information. Have a look at this promo for a mobtagging project for urban art. It didn't take me a long time to start tagging altruisticaly, honestly and elaborately. Praise for the developers! The site was the result of a project done by Maarten Janssen and 4 fellow students at the Utrecht School of Arts (HKU) in The Netherlands. He studies Digital Media Design. I'm looking forward to his next project result...

Wayfinding and ambient findability

I just found out about the new book by Peter Morville - one of the Peters in information architecture, thank you for not naming me Peter mum and dad - it's called Ambient Findability.

How do you find your way in an age of information overload? How can you filter streams of complex information to pull out only what you want? Why does it matter how information is structured when Google seems to magically bring up the right answer to your questions? What does it mean to be "findable" in this day and age? This eye-opening new book examines the convergence of information and connectivity. Written by Peter Morville, author of the groundbreaking Information Architecture for the World Wide Web, the book defines our current age as a state of unlimited findability. In other words, anyone can find anything at any time. Complete navigability.
Morville discusses the Internet, GIS, and other network technologies that are coming together to make unlimited findability possible. He explores how the melding of these innovations impacts society, since Web access is now a standard requirement for successful people and businesses. But before he does that, Morville looks back at the history of wayfinding and human evolution, suggesting that our fear of being lost has driven us to create maps, charts, and now, the mobile Internet.

The book's central thesis is that information literacy, information architecture, and usability are all critical components of this new world order. Hand in hand with that is the contention that only by planning and designing the best possible software, devices, and Internet, will we be able to maintain this connectivity in the future. Morville's book is highlighted with full color illustrations and rich examples that bring his prose to life.

Check it out!

Information Architecture or Information Anxiety?

After the great book by Richard Saul Wurman I mentioned before, here's a new one by the man himself! Buy it, read it, inculcate it all and stop making confusagrams and start building usable sites, with useful stuff on it. This blog is not an example!

So what is the web 2.0 ?

Tim O'Reilly's article called: "What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software" these are the changes at hand:

Web 1.0 --> Web 2.0
DoubleClick --> Google AdSense
Ofoto --> Flickr
Akamai --> BitTorrent
mp3.com --> Napster
Britannica Online --> Wikipedia
personal websites --> blogging
evite --> upcoming.org and EVDB
domain name speculation --> search engine optimization
page views --> cost per click
screen scraping --> web services
publishing --> participation
content management systems --> wikis
directories (taxonomy) --> tagging ("folksonomy")
stickiness --> syndication

Knowledge Hunter