BiblioCerberus is a network firewall for the storage of digitised books and documents. The system, implemented as a web application, is designed to keep usage statistics and control access to the digitised books and documents of a large library by volume. BiblioCerberus is compatible with the document display element of the Vivaldi electronic library network.
BiblioCerberus structure
- Program interface subsystem. Customers' access to the documents is organised in the form of a web service working via HTTP and interaction with the service is executed via the HTTPS secure protocol.
- A subsystem of bibliographic entries. For every document identifier the subsystem receives a bibliographic entry from the source adjusted for the system. The subsystem is able to cache bibliographic entries locally to provide high processing speeds and reduce load on adjacent systems.
- A subsystem providing access to full texts for working with electronic PDF documents. The following functions for working with documents have been implemented.
- Receiving information about the document (number of pages)
- Receiving information about the page sizes of the document
- Receiving the content of the text matrix of the document
- Receiving a list of words on a page with the coordinates of the words
- Rasterization of a page with a set dpi. There is a possibility of watermark application during page rasterization
- Extracting a page from a document as a separate PDF file
- A statistical subsystem. Every query with all of its parameters is entered into a database of document access statistics. The identification of a customer who sends a query is recorded. Aggregate statistics are calculated daily in order for the statistical display interface to work with a high degree of efficiency. Using the statistical display interface, a period for which the main statistical indicators are shown can be selected: how many documents were opened, how many pages were viewed, and how many have been printed. The interface has a function of filtering by customer or document identification. A diagram can be displayed according to a chosen indicator for every day of a chosen period and all access indicators are shown filtered by customers and documents.
BiblioCerberus operational scheme
The software interface subsystem enables customers to connect and send requests to library documents. To execute these requests the system addresses the document repository and bibliographic entry repository subsystems. The system is developed with consideration to code separation procedure requirements and the low connectivity of components. All customer requests are passed to the statistical subsystem so that information can be entered into the database.
Four subsystems interact through four interfaces (contracts). There are no direct dependencies between the functional blocks – all dependencies are set through interfaces and dependency injections.
Administration interface and authentication
The BiblioCerberus administration interface is web-orientated and contains the following sections.
- The customer management section is a list of customers. Each customer is identified by the following data:
- Name
- Identification
- API key
- Activity indicator
- Statistical section.
Access to the administration interface is granted with federal authentication on the basis of the WS-Federation Passive Requestor Profile protocol. In WS-Federation protocol terminology the document lending system is referred to as the Relying Party.
Main specifications
BiblioCerberus works on the following equipment:
- 4 x 2.5 GHz 64-bit processor
- 8 GB RAM
- 1 TB disk storage space
- 1 Gbps network resource access speed
BiblioCerberus works in the following software environments:
- Windows Server 2008 R2
- Microsoft SQL Server 2008 R2
- Microsoft Internet Information Services 7.5
Releases
BiblioCerberus was created in January 2013. Since its development three releases of the program have been launched. BiblioCerberus has been implemented into the National Library.
Release 1.0
In the release which came out in December 2014 a basic API for working with documents was created.
- The system had the following functions.
- Receiving information about the number of pages in a document
- Receiving bibliographic document entries in MODS format
- Receiving information about the page sizes of a document
- Receiving images of document pages
- Full-text search inside documents
- Authentication of the calling party by the service is fulfilled with the help of special API keys. The keys are given by the service administrator.
- All document page access events are saved in the database. They are later used to view document access statistics. Statistics are available for any time period filtered by document, collection or reading room.
Release 2.0
In January 2015 a new download method was added to the API, giving away the source PDF file but only for open documents. A watermark with the name of the library is added. The possibility of turning off the watermark was also added.
Release 3.0
In release 3.0 (June 2015) the method of receiving PDF files was updated – customers could now access not only the whole file, but also separate sections, beginning and ending with a certain byte. Parameters were changed to a standard range heading. Example of heading: range: bytes = 0–10.
- A method for receiving a document safety indicator was added to the API. When methods for working with documents are requested, information about the request is stored in the DocumentEvent database table.
- All 32-bit dependencies were deleted from the server so that it could take all available server memory and not just 4 GB.