Martin Czygan , Bootcamp Leader (Software Developer, Leipzig University Library,

David Aumüller (Software Developer, Leipzig University Library,

Leander Seige (Head of IT, Leipzig University Library,

Expected time slot : 4h

Audience : People interested in complex batch data processing, aggregated indices, library

data, Python, open source.

Expertise : To follow along with code: a Python and Shell scripting in a Unix environment. To

attend and follow: Basic understanding of programming and data in libraries.

Required : Laptop with VirtualBox preinstalled.

Programming experience : Python basics, Shell, standard set of Unix tools.

Short description :

Providing discovery systems for eresourcesis essential for library services today.

Commercial search engine indices have been a widely used solution in recent years. In

contrast, running an own discovery service is undoubtedly a challenging task but promises

full control over data processing, enrichment, performance and quality. Building an own

aggregated index of eresourcesincludes gathering the right mix of data sources, clearing

licensing issues, and negotiating data availability. Technically, these threads are resumed by

data harvesters, filters and workflow orchestration tools.

In this bootcamp, you will build your own aggregated index from scratch. We will introduce

tools and technologies we use ourselves at Leipzig University Library. Most software is

written by the community, some is written by us, but it’s all available as open source.

We will use mostly Python and shell scripting, so if you want to follow along with code, you

should have basic familiarity with such an environment. If you have a less technical

background, we invite you to take a look behind the curtain of a complex data processing


We will bring sample data and Linux virtual machines, which can be used with the VirtualBox

hypervisor. If you are not a regular Linux user, you will probably need time to adapt. Pair (or

group) programming is encouraged.