Download
1. Source files
The edition is encoded in XML and follows the TEI P5 Guidelines:
- TEI text: edition.xml
- TEI header, converted to HTML with this style sheet
Lemma descriptions and linguistic annotations are currently stored in a set of ad hoc XML files, formally described and documented in corresponding RELAX NG schema files:
- grammar.xml [schema]: POS tags, inflection classes, morphological tags
- lemmas.xml [schema]: lemma data, linked to Streitberg's dictionary
- tokens.xml [schema]: token data, linked to the TEI text, lemmas and items in the grammar file
- dictionary.xml [schema]: partial XML transcription of Streitberg's dictionary
Data files and schemas are also available in one ZIP archive (version 2026-06-25).
Copyright status for data on the website is described here.
⁂
Dynamic pages on the website are generated from the data files with BaseX, an open-source database platform, and Saxon-HE, an open-source XSLT processor. (Both are also available commercially or with paid support; for this digital humanities project we stick to open-source tools.)
The data files were created in June 2026 and are subject to change, if only because the automatically generated annotations are still being disambiguated. A copy of the full database will be available later this year, once data and code are stable.
From 2004 to 2025, data on the website was served from a server-side MS Access database. The database was last edited on 2026-05-19 and is no longer updated, but it is still available in archived form:
- gotica.mdb.zip (*.mdb file)
- gotica.mdb.export.zip (XML export with schema, generated by Access)
- documentation: relational structure and some older notes on the dictionary
2. Derived formats
- Plain text (generated from the TEI document with this style sheet):
- JSON data for the client-side search engine:
- HTML 4.01†:
† This file is not derived from the master TEI document, but a (very) old HTML file that predates the TEI edition.