Download

1. Source files

The edition is encoded in XML and follows the TEI P5 Guidelines:

Lemma descriptions and linguistic annotations are currently stored in a set of ad hoc XML files, formally described and documented in corresponding RELAX NG schema files:

Data files and schemas are also available in one ZIP archive (version 2026-06-25).

Copyright status for data on the website is described here.

Dynamic pages on the website are generated from the data files with BaseX, an open-source database platform, and Saxon-HE, an open-source XSLT processor. (Both are also available commercially or with paid support; for this digital humanities project we stick to open-source tools.)

The data files were created in June 2026 and are subject to change, if only because the automatically generated annotations are still being disambiguated. A copy of the full database will be available later this year, once data and code are stable.

From 2004 to 2025, data on the website was served from a server-side MS Access database. The database was last edited on 2026-05-19 and is no longer updated, but it is still available in archived form:

2. Derived formats

This file is not derived from the master TEI document, but a (very) old HTML file that predates the TEI edition.