Thu 1b: TEI annotation and publication I
Thursday, 19/Sep/2019:
9:00am - 10:30am

Session Chair: Susanne Haaf, Berlin-Brandenburg Academy of Sciences and Humanities
Location: Lecture Hall HS 15.12
RESOWI building, section C, first floor

Analyzing and Visualizing Uncertain Knowledge: Introducing the PROVIDEDH Open Science Platform

A. Benito1, M. Doran3, J. Edmond3, M. Kozak2, C. Mazurek2, A. Rodríguez1, R. Therón1, E. Wandl-Vogt4

1University of Salamanca, Spain; 2Poznań Supercomputing and Networking Center, Poland; 3Trinity College Dublin, Ireland; 4Austrian Centre for Digital Humanities, Austria

Underlying uncertainty in DH research data affects decision-making and persists during the project's lifecycle. This uncertainty will always be present. Thus, efforts in providing technical support for humanistic research should focus on managing and making it more transparent, rather than removing it.

Locating and tracing (certain types of) uncertainty through the evolution of a textual corpus can be done with the use of TEI tags. However, the use of these methods is not a common practice. The motivation of this paper is to address one possible barrier to wider use of these tags by providing a user-friendly interface to collaboratively annotating texts with uncertainty. We propose some minor extensions of the TEI specification that follow from our metrics of uncertainty. The first extension is adding the new “category” attribute to the “certainty” element, required to indicate the source of uncertainty. The second extension is to change the closed list of values of the “locus” attribute to an open list, in order to be able to explicitly indicate the attribute to which uncertainty refers.

Additionally, the authors detected a need to describe the nature and type of uncertainties as well as evaluating the degree of uncertainty the piece of data introduces.

Our tools on the platform were developed against the background of human-centered design with the focus onto easing uncertainty annotation and visualization, promoting the use of TEI standards and making uncertainty play a more active role in the research process.

The platform fulfils the common needs of a complete research lifecycle by providing well-known technologies for basic tasks, such as versioning, file management, text editing, and reference resource management.

The authors aim to get feedback from the TEI community to improve their tools in a generic way and fulfill further needs of the audience.

The Prefabricated Website: Who Needs a Server Anyway?

M. D. Holmes1, J. Takeda2

1University of Victoria, Canada; 2University of British Columbia, Canada

Project Endings, a collaboration between digital humanists and librarians, is devising principles ( for building DH projects in ways that ensure that they remain viable, functional, and archivable into the distant future. Endings principles cover five components of project design:





Release Management

Previous Endings work has focused on Data and Products (Holmes 2017; Arneil & Holmes 2017) and diagnostic tools for monitoring project progress (Holmes & Takeda 2017 and 2018). This presentation will deal with the mechanics of Processing, focusing in particular on building large static sites which are resilient because they have no requirement for server-side technology at all. We will use the Map of Early Modern London project as a case study.

Comprised of 2,000 TEI source files and 15,000 distinct entities, MoEML is a densely interlinked project that requires a sophisticated build process to create its website structure, the historical Agas Map interface, editions of primary source documents, various indexes and gazetteers, and encyclopedia entries. As a flagship Endings project, MoEML has been a testbed for the scalability of the Endings principles. The MoEML site has 9,000 HTML files, 26,000 XML files, and over 5,000 images, and is around 2GB in size. Our presentation will cover a number of key techniques in the build process, including:

- Validation, validation, validation: XML, HTML, CSS, and TEI egXML example code is validated at every stage of the build process.

- Diagnostics to check all links and targets.

- Unique query-free URLs for all entities

- Generating the gazetteer, which includes every variant spelling of every placename.

- Pre-generating HTML fragments for AJAX retrieval for every entity.

- Processing and rationalizing <rendition> elements and @style attributes.

- Using document type taxonomies to build sitemaps and breadcrumb trails.

- Filtering of images to include only those actually used.

correspSearch v2 – New ways of exploring correspondence

S. Dumont, S. Grabsch, J. Müller-Laackman

Berlin-Brandenburg Academy of Sciences and Humanities, Germany

The webservice correspSearch has been developed since 2014 to aggregate correspondence metadata and offers it to the scientific community for research and retrieval. The data is obtained in the TEI-XML-based "Correspondence Metadata Interchange Format" (CMIF) - developed by the TEI Correspondence SIG. A prototype was presented at the TEI Conference in Lyon in 2015.

Since 2017, the web service has been further developed in a project funded by the German Research Foundation (DFG). At the same time, the data quantity increased from around 25,000 to over 52,000 letters - many editions offer letter metadata in CMIF by now. In order to enable even small edition projects to deliver data in CMIF and to simplify the capture of letter metadata from printed editions, the CMIF Creator was developed in 2018 to allow a convenient browser-based input and processing of metadata into CMIF.

Over time, the development of the web service focused on both, the system architecture and the improvement of the search, which now - in accordance with the ongoing development of CMIF - for the first time does capture letter content as well. In addition, several different editions of one letter can be linked to each other or connected to associated archives. To add on that, correspSearch does offer a map-based, geographical search for writing and receiving locations by now. The API interfaces and networking possibilities of correspSearch have also been extended. With csLink, a JavaScript-based widget to present correspondence networks is now available on GitHub as open source software that can be integrated into any digital edition.

The article will present and discuss the further development of the web service, as well as the community’s experiences with the aggregation of metadata. During the presentation, the second version of correspSearch will be released as public beta.

