ADOCO: Facilitating Quality Control in Mass Digitisation

Max Kaiser, Jeanna Nikolov-Ramirez, Georg Petz, Christa Müller, Martin Reisacher

Abstract


In a Public Private Partnership with Google the Austrian National Library (project “Austrian Books Online, ABO”, operational start was December 2010) is systematically digitizing 600.000 public-domain books/~200 Mio pages and gradually making them available via its Digital Library, Google Books and Europeana. When dealing with such large amounts of data quality assurance is a challenge. The quality assurance work in ABO was started in June 2011. The presentation will explain the approach chosen by the Austrian National Library and describe the evolvement of the quality chain over the past year. It will trace the processes established in the library for downloading the digitized files and checking their quality, both with semi-automated methods (such as combining text field-size and words per page to detect suspicious cropping) and in manual audits. Special software (ADOCO – ABO Download and Control) was implemented and is continuously developed to meet the needs of the quality auditing process. ADOCO enables simultaneous, multithreaded downloads. It is based on Primefaces and Spring Webflow, using Linux command line tools wrapped in JAVA (wget, tar, gpg, exiftool for image metadata, md5sum,..) and uses a MySQL-Database for technical and bibliographic metadata. It allows for various searches and views on the relevant volumes. The talk will cover considerations for the establishing of quality metrics and quality indicators, possible error phenomena, error severity scales, clustering of errors, tools that were evaluated and used in the project so far and the strategic approach to quality assurance in mass digitization. Additionally we will outline future quality perspectives in mass digitization, such as collaborations of ABO with the IMPACT Centre of Competence for Digitisation and the FP7 project SCAPE and experiments and tools developed there. Requirements,challenges and opportunities from the first year of quality assurance will be shared as Lessons Learned.

Full Text: Paper Presentation

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.