The European Archive is a non-profit foundation working towards universal access to all knowledge. The archive will achieve this through partnerships with libraries, museums, other collection bodies, and through building its own collections. The primary goal of collecting this knowledge is to make it as publicly accessible as possible, via the Internet and other means.
The foremost effort to archive the Web has been carried on in the US by the Internet Archive, a non-profit foundation based in San Francisco. Every two months, large snapshots of the surface of the web are archived by the Internet Archive since 1996. This entire collection offers 500 terabytes of data of major significance in all domain that have been impacted by the development of the Internet, that is, almost all. This represent large amount of data (petabytes in the coming years) to crawl, organize and give access to.By partnering with the Internet Archive, the European Archive is laying down the foundation of a global Web archive based in Europe. Digitization We have entered an era where digitization of all significant cultural artefacts will be completed. Within the next decade, most of the published cultural content (books, music, images and moving images) will have been digitized. Recent commercial announcements have fostered awareness and started this movement, but limited to a few major libraries which leaves an opportunity for an open system to be pursued. This entails digitizing, preserving and providing access to the rich public domain of books, music, images and moving images on a the large scale. By fostering the development of a large scale, archiving platform, the European Archive intend to facilitate the mastering of processes and tools needed for digital public content archiving and distribution in Europe. Infrastructure With the technical support of the Internet Archive and XS4ALL, the European Archive has installed a repository with 250 terabytes (250 000 giga-bytes) capacity in Amsterdam via which a large collection digital material (text, music, moving images, software) can be accessed. On average, the download rate has been over 350 Megabits per second in December 2005, already making EA a significant content provider in Europe. The data organization is highly distributed (200 nodes on a cluster) to enable distributed processing. This achievement represents already a significant step in establishing in Europe an archiving infrastructure to collect and archive digital material at large scale. We plan to extent it to 1 petabyte (1000 terabytes) within the coming years. The European Archive should be accessible in Europe’s language. A multilingual web management system for large digital collections has been implemented. This system is based on a flat structure permanently indexed and updated. It allows light and flexible management of the web interface to collections, and can scale up unlimitedly in the future. An opportunity for Europe We expect the European Archive to become an essential piece in the European cultural heritage landscape. In order to meet the goals of Lisbon 2010 for Europe to become
“the most dynamic knowledge economy in the world”large-scale public archive is a key component. It will enable public and free access to large portion of European cultural heritage and bring in the broadband network infrastructure tremendous quantity of rich and legal content. It will bring to traditional heritage institutions a technology partner enabling them to make significant steps towards digitization and public accessibility of their collections, making Europe more visible and attractive in a globally networked world. By developing cutting hedge technology application in the domain of massive digital collection acquisition, management and storage it will develop a centre of excellence in a key domain of tomorrow’s Internet.