Wed 2b: TEI, formal ontologies, controlled vocabularies and Linked Open Data II
Wednesday, 18/Sep/2019:
11:00am - 12:30pm

Session Chair: Frank Fischer, Higher School of Economics, Moscow
Location: Lecture Hall HS 15.12
RESOWI building, section C, first floor

Modeling FRBR Entities and their Relationships with TEI: A Look at HallerNet Bibliographic Descriptions

A. Rojas Castro

Universität zu Köln-Cologne Center for eHumanities

The aim of this paper is to discuss the mapping between FRBR (Functional Requirements for Bibliographic Records) and TEI carried out to describe bibliographic objects and their relationships. Although some work has been done in this area (Hawkins, 2008), on the differences about database modeling and markup modeling (Eide, 2015) or XML and linked data (Ciotti and Tomasi, 2016), this paper will argue that the TEI Guidelines are suitable for describing highly structured descriptions of bibliographic objects and their relationships.

For illustration purposes, the paper will use HallerNet website portal devoted to Albrecht von Haller (Bern, 1708-1777), a key figure of the Swiss Enlightenment, and his circle of friends and collaborators. The website comprises both digital editions and a large collection of 35.600 bibliographic records amongst other types of objects. Hallernet has adapted the FRBR abstract model (IFLA, 2008) to define two bibliographic entities and their relationships: a work (a distinct intellectual or artistic creation) and its manifestations (the physical embodiment) using the TEI elements *bibl* and *biblStruct* respectively and four potential relationships (embodimentOf, isPartOf, isAReviewOf and isASuccessorOf) encoded with *relatedItem*.

Since the TEI Guidelines cover both metadata and data, its vocabulary and syntax can go beyond the representation of texts and facilitate the creation of bibliographic catalogues that group records into “families” based upon some shared characteristics – e.g. same content in different languages or different editions of the same work – and, thus, enable further browsing and discovery of records.

Karl Kraus contra …, or: text contra action

K. Prager1, V. Hannesschlaeger2, I. Boerner2

1Ludwig Boltzmann Institute for Digital History, Austria; 2Austrian Academy of Sciences, Austria

In the project to be presented, the legal papers of the Austrian satirist Karl Kraus (1874-1936) are being edited according to the TEI Guidelines and will be provided digitally and contextualized with Kraus’ oeuvre as a whole. Kraus welcomed the reform of the Austrian Press Law of 1922, which marked the beginning of the writer’s growing fondness for litigation. In the same year, Oskar Samek became his lawyer. In the course of the following 15 years, they were involved in over 200 court actions together.

The material documenting these actions is the focus of our project. Even though the material’s volume (approx. 8000 pages) is a challenge in itself, the most demanding aspect of these documents is their heterogeneity: typescripts, manuscripts, pre-printed forms, carbon copies, and receipts are only some examples of material types we are working with. In addition to the diverse materialities, the heterogeneous functions of the materials (statements, summons, verdicts, correspondences, etc.) pose a challenge as the exact functions of document types have to be understood before the document’s qualities can be encoded.

In this paper, we will focus on the document characteristics that are not per se inherent in the text these documents carry, i.e. the documents’ functions in relation to real-world processes such as court actions and daily procedures in a lawyer’s office. As suggested by Hannesschläger and Andorfer (2019: 8), “the Text Encoding Initiative’s guidelines, while the unquestionably best approach for encoding text inherent phenomena, reach their limits when used for encoding ‘real world phenomena’ related to text genesis”, which is why we are developing a taxonomy in SKOS format to model these processes. In this paper, we will introduce the project, explain our approach and describe the integration of our SKOS taxonomy into the TEI documents containing the texts of our edition.

Referencing an editorial ontology from the TEI: An attempt to overcome informal typologies

J. Šimek

Universität Heidelberg, Germany

The introduction of TEI P5 in 2007 was accompanied by efforts of mapping contents of TEI documents to high level conceptual models like CIDOC CRM. They focused on prosopographical information connecting the textual content with index metadata. Moreover, a flexible use of the <taxonomy> element was implemented, allowing for ontology-like thesauri which can be referred to by pointers from an edition.

While these mechanisms for named entities and terms enable powerful indexing, little attention so far has been given to formalizing the ways of dealing with editorial and documentary typologies which are used in attributes like e.g. @type, @function and @reason. These typologies refer to document types, textual and editorial phenomena, the processes of text production and text redaction and similar categories of concepts which characterize the text itself rather than external entities referred to by the textual content.

Attributes like @type do not even permit the use of pointers to formal conceptual definitions as their expected data type is free text, not URI pointers (although the <equiv> element in ODD specifications could map different XML components to formal URIs externally).

This paper presents the attempt made at the Heidelberg University Library to enable in TEI documents pointers to definitions of editorial phenomena administrated in a OWL ontology (“heiEDITIONS Concepts”) in order to replace free text attributes with URI pointer mechanisms. This strategy makes use of a few TEI attributes like @ana whose data type is “teidata.pointer” and some additional pointer attributes provided by a schema extension. A “private URI scheme” stated in the TEI Header allows the use of abbreviated URI forms.

The goal of this institutional strategy is not only a standardization of the TEI encoding adopted by in-house edition projects and cooperative endeavours but also a transparency in documentary and editorial terminology used in TEI code.

Text Graph Ontology. A Semantic Web approach to represent genetic scholarly editions

P. Hinkelmanns

Universität Salzburg, Austria

Text genetic editions are enjoying sustained popularity in the fields of scholarly editions and literary studies. Representatives of recent research history are the Faust Edition (Bohnenkamp et al. 2016) or the Edition of the works of Arthur Schnitzler (Burch et al. 2016), both of which aim at a complete reproduction of the text genesis.

Genetic editions require the reconstruction of complex text genetic processes. Which sequence of tokens forms a specific text state? The extension of the model of the Text Encoding Initiative [TEI] by elements required for genetic editions was the subject of a working group that presented its results in a draft (Burnard et al. 2010). Parts of this draft have been incorporated into the TEI guidelines. With the TEI model complex genetic editions can be realized. However, the underlying structure of the hierarchical graph makes it difficult to reconstruct and compare text gradients, i.e. the evolutionary stages of a text.

The reconstruction of a particular text state can be described as a path through the text. A flexible model of text as graph can support working with genetic text editions beyond strictly hierarchical graphs. The model makes it possible to describe the relations between tokens and their relative de-pendencies in text genesis. In addition the Text-Graph-Ontology enables the indexing of genetic text editions via the Semantic Web. In addition to this ontology, a converter from and to TEI-XML and a web-based viewer and editor will be presented. Semantic Web technologies enable easy annotating and linking the scholarly genetic edition other resources.

A Tool for adding word Annotations into TEI Files.

B. Gaiffe


We face the problem of adding annotations into TEI files on a regular basis. For instance, we had to annotate into part of speech and lemma a big TEI Corpus of 5300 file ; we also had to annotate words cut by line endings in a xml file that had to keep its shape (lines of the xml transcription have to mirror the lines of the manuscript).

All these use cases may be reduced to inserting annotations taken from a tabular file (one column of which is close to the textual contents of the xml file) into the original XML file so that :

- the text contents of the XML file is garanteed unmodified

- the new annotations are inserted as smoothly as possible.

Regarding the second point, inserting

"2nd NUM"

in the following frament : <name>Elisabeth the 2<hi rend="sup">nd</hi></name> leads to :

<name>Elisabeth the <num>2<hi rend="sup">nd</hi></num></name>

We will present a tool dedicated to preciselly do that. This relies on:

- aligning two texts with some tolerance to characters approximations (a lot of automatic tools slightly modify the text. For instance, "½" may become "1/2", "œ" may become "oe","ü" may become "ue", "ß" becomes "ss", etc.

- parsing an acyclic graph by the grammar of the XML language in order to choose a proper way to embed the annotations.

- automatically make syntactic corrections when the annotation cannot embed (think of <a> <b> </a> </b>).

The tool is still more a proof of concept (very few intelligent error messages for instance). However, it as allready been used:

- on a big annotation project (5300 texts annotated in part of speech)

- with a range of automatic annotation tools (Melt, Talismane, Tree tagger, flair (for named entities recognition)).

