Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview
Wed 3b: TEI across corpora, languages, and cultures I
Wednesday, 18/Sep/2019:
1:30pm - 3:00pm

Session Chair: Georg Vogeler, University of Graz
Location: Lecture Hall HS 15.12
RESOWI building, section C, first floor

Show help for 'Increase or decrease the abstract text size'

Growing collections of TEI texts: Some lessons from SARIT

P. McAllister

Institute for the Cultural and Intellectual History of Asia, Austrian Academy of Sciences, Austria

There is no certainty as to the size that a corpus of Indic texts would

have, but it is certain that this is due, mainly, to its sheer extent,

rather than to other factors like the frequent reappearance of lost

works or of important new witnesses to existing ones. Any fancy about the

constitution of a corpus of Indic texts, however narrowly defined, is

quickly sobered by the recognition that such a project lies far beyond

the capacity of a single generation of scholars.

Nevertheless, some attempts towards its realization do exist, as do

very different ideas of how such a growing collection should be

designed and how it could be maintained. The problems that all such

attempts must overcome are not only technical, but often practical.

For example, the common expectation that texts in a collection should

be consistent at least in their formal characteristics can easily

conflict with the highly specialized and sometimes changing interests

of the scholars producing an edition of a work, or also with the fact

that individual sections of a single work are often edited by

different scholars working with varying methods.

This talk will reflect on the choices made in SARIT

(, which attempts to provide an environment

for an expanding collection of Indic texts. The proposed solution

was, and with some modifications still is, to design a repository

that, over the very long term, can increase and deepen in such a way

that new contributions either improve the existing material, or expand

the collection with new material without disrupting the general

integrity and basic standards of such a collection.

Towards larger corpora of Indic texts: For now, minimize metatext

H. Trikha

Institute for the Cultural and Intellectual History of Asia, Austrian Academy of Sciences, Austria

The Digital Corpus of Vidyānandin’s works (DCVW) is an ongoing

collection of digital text resources for the works of a 10th century

Sanskrit author. The resources are assembled and maintained in the

context of my Indological research specialization, i.e., the history

of an Indian philosophical tradition. A web interface

( allows to access the resources and, to

some extent, modify them.

The digital resources are XML-files that are processed by a bundle of

technologies in order to pursue specific research interests: search

for text strings, identification of dialogic or intertextual elements,

differences between attestations etc. In this context, the quality of

the results depends on the quality of the resource files, which are

assessed by three basic criteria: (a) status of the separation of text

and metatext, (b) quality of the captured text and (c) compliance of

the metatext to an established terminology. For the latter I use TEI

markup on basically two levels: (1) markup for a precise

identification of the attestation of the text and its specific

editorial features and (2) markup to enrich the text from the

perspective of my own research interests.

The presentation will provide examples for the applied markup. I will

argue that the use of tag sets within the first category is certainly

an indispensable prerequisite for long term efforts to build larger

and larger corpora of Indic texts. The tag sets within the secondary

category, on the other hand, seem to be of no relevance for this

goal. The energy invested in the refinement of technically demanding

tag sets is an asset for scholars who are so inclined. In the current

state of Digital Indology, however, it is still necessary to develop

standards for the discipline as a whole before we can start to agree

on the basic ones.

Encoding history in TEI: A corpus-oriented approach for investigating Tibetan historiography

M. Fermer

Institute for the Cultural and Intellectual History of Asia, Austrian Academy of Sciences, Austria

My paper addresses a system for deriving historical data from Tibetan primary sources by applying semantic markup to the texts' key entities (i.e. persons, places, literary works and artefacts). This markup system follows the TEI-P5 guidelines and has been developed in the framework of the Sakya Research web-application ( which holds a large corpus of machine-readable sources in Classical Tibetan, ranging from medieval annals, chronologies, genealogies and histories to illustrious life stories of single Buddhist masters.

The markup applied to the digital collection has been designed in line with the historiographical nature of the texts: It captures information about historic agents, the places they visited, as well as artefacts and literary works mentioned in varying contexts along the chronological sequence of the individual texts.

Using TEI-markup in this way has proven particularly useful in my own research for depicting the social, geographical, artistic and doctrinal contexts of the texts' narrative subjects (and their authors). It allowed for tracking complex teacher-student relationships and exploring the geographic expanse of those masters' regional networks, to give two examples for how the empirical evidence from encoded texts can be assessed.

I will address the concept behind this markup and its potential for a quantitative, intertextual analysis that goes beyond single texts. What can Tibetan historians gain from markup-technology, if systematically applied to a wider corpus of literature?

I will argue that the data deriving from a consistent encoding of primary literature on a large scale will gradually change our understanding of Tibetan history and historiography as a whole. At the same time, such a corpus-oriented approach to history raises a number of conceptual and practical questions about how, and in which form encoded information can best be stored, analysed, displayed and reused.

Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: TEI 2019
Conference Software - ConfTool Pro 2.6.129+CC
© 2001 - 2019 by Dr. H. Weinreich, Hamburg, Germany