Introduction

Overview

This section of the website presents a digital edition of the Gothic Bible.

Read the tagged text with selected interlinear translations.
Click on words for lexical and morphosyntactic information.
Search the text using regular expressions.
Browse the dictionary.

Other resources:

Facsimile of Streitberg's Gotisches Elementarbuch (1920).
Facsimile of Streitberg's Gotisch-Griechisch-Deutsches Wörterbuch (1910) with preliminary text transcription.
TEI P5 edition and derived files.
Formal model of Gothic inflectional morphology (described below).
Information on the manuscripts, and a few pointers to other sites.

Source

The digital text is based on the edition of Wilhelm Streitberg:

Die gotische Bibel: Herausgegeben von Wilhelm Streitberg. (Germanische Bibliothek, 2. Abteilung, 3. Band)

1. Teil: Der gotische Text und seine griechische Vorlage. Mit Einleitung, Lesarten und Quellennachweisen sowie den kleineren Denkmälern als Anhang. Heidelberg: Carl Winter, 1919.

2. Teil: Gotisch-griechisch-deutsches Wörterbuch. Heidelberg: Carl Winter, 1910.

To avoid possible copyright issues, we used the 1919 edition. The Speyer fragment, discovered in 1970, is cited from Piergiuseppe Scardigli's Nachtrag zum ersten Band in the latest edition (2000), which is essentially a reprint. Differences between the readings in the 1919 and 2000 editions are listed below. The latest edition (ISBN: 3-8253-0745-X and 3-8253-0746-8) can be ordered from Universitätsverlag WINTER. Copies of earlier editions can generally be found at Abebooks or Zentrales Verzeichnis Antiquarischer Bücher.

Although it is not error-free, Streitberg's work is generally considered the standard edition; cf. James Marchand, WEMSK: This is the standard edition and has superseded all others. Its ‘reconstruction’ of the Greek Vorlage is seriously flawed by his dependence on von Soden, and he is given to conjectures, but this is the text you must use.

Magnús Snædal's indispensable Concordance to Biblical Gothic (Reykjavík 1998, 2005, 2013) includes the texts with numerous emendations and corrections to Streitberg's readings; unfortunately, it is currently out of print. Some of the minor fragments at Christian Petersen's website are based on his readings. Sadly, M. Snædal passed away in 2017.

The text was digitized in 1997 by Robert Tannert, David Landau and Tom De Herdt and has been thoroughly checked and proofread. In 2002, the reliability of the transcription was assessed by an automated collation with the electronic edition provided by the TITUS project. A simple script aligned the texts and compared them byte for byte, reporting about 800 differences. These were individually compared with printed copies of Streitberg's 1919 and 1965 editions, and corrected where necessary. Since both texts were independently digitized using different methods (scanning vs. typing), the chances that the same error would occur at the same location seem quite small, and the comparison should reveal most errors on both sides.

The current edition is an accurate transcription of Streitberg's 1919 text. Only a very small number of obvious printing errors have been corrected; most of them are mentioned in De Tollenaere 1976, others in Streitberg's Berichtigungen. (Some of the typos still appear in the 2000 edition.) Corrections are not marked in the text but listed in the front matter of the TEI edition. A copy of the list is provided below.

TEI encoding

The text is encoded in XML and follows the guidelines of the Text Encoding Initiative (TEI P5). We refer to the TEI header for detailed information. The file contains a complete transcription of the Gothic text in Streitberg's edition: it does not include the preface, introduction, Greek text, commentary, or information on the minor fragments, nor is it intended to reproduce the appearance of the original work. The project's primary goal is a linguistically annotated text based on Streitberg's readings, not a reproduction of his book. There are, however, plans to digitize Streitberg's critical apparatus, with references to more recent readings.

Conversion to TEI turned out to be surprisingly complicated, mainly because there are at least three overlapping layers of annotation: the logical structure of the canonical reference system (books, chapters, verses), the linguistic structure (sentences, clauses, words, morphemes; linked to part-of-speech tags and lemmas), and textual criticism (unclear or missing text, conjectures by Streitberg and his predecessors, variations between different witnesses). Ideally, one would also like to encode manuscript details, adding yet another level of markup: leaf, side, line, hand, marginal glosses, corrections made by the scribe, etc.

It is non-trivial to combine these levels into a single manageable and reasonably elegant data structure. Biblical verses do not always correspond to sentences, unclear readings often cross word boundaries (strictly speaking, Gothic does not even have word boundaries, since the manuscripts are written in scriptio continua), page and line breaks can occur virtually anywhere, etc.

The different layers form intersecting hierarchies that cannot easily be represented in a single XML document tree. This common problem—along with some standard solutions that inevitably boil down to trade-offs among various sets of advantages and disadvantages—is described in chapter 21 of the current TEI P5 Guidelines: Non-hierarchical Structures; see also Renear/Mylonas/Durand 1996, Durusau/O'Donnell 2001, among others.

We tried to pragmatically avoid the problem by:

Focusing on Streitberg's edition rather than a transcription of the manuscripts. A lemmatized and POS-tagged edition of a 6th-century copy of a 4th-century text will inevitably require a certain amount of abstraction and reconstruction. In that case, it probably makes sense to start from Streitberg's semi-critical edition rather than a strictly diplomatic edition like Uppström's work or the text files of the Codex Argenteus prepared by David Landau. The ideal, of course, would be to present them side by side, providing both philological accuracy and linguistic abstraction without forcing both levels into a single encoding.
Using the parallel segmentation method to record all readings in extenso, even when they are identical (cf. Birnbaum 1999). This mirrors Streitberg's decision to transcribe the full text of all witnesses. Variants have been marked up with a generic segmentation element.
Making sure that Streitberg's additions and deletions do not cross word boundaries, i.e. either contain one or more complete tokens (e.g. John 7:12 CA: “jah birodeins mikila <bi ina> was”) or are entirely contained within a single token (e.g. Timothy I 3:4 A: “ufhausjan[jan]dona”). As mentioned above, unclear text generally does cross word boundaries. This is circumvented by applying the so-called fragmentation technique, breaking what might be considered a single logical (but non-nesting) element into multiple smaller structural elements that fit within the dominant hierarchy but can be reconstituted virtually [TEI P5, 21.3], in other words converting overlapping sequences like “i{n him}inam” to “i{n} {him}inam”. As Streitberg uses italics to mark unclear spans of text, fragmentation is invisible.

Linguistic annotations

At present, the TEI document does not yet contain POS tags or lexical information. The linguistic annotations are stored in a relational database, implicitly linked to the TEI text by means of corresponding token identifiers. The database contains a digital dictionary based on Streitberg 1910, a table of tokens, and a set of morphosyntactic tags. Software tried to associate each token with one or more lemma/POS combinations, representing its possible lexical identity and grammatical category.

Since Gothic is a non-productive language with few extant texts, the corpus has been tagged by generating paradigms for every entry in the dictionary, in other words by building a lookup table of possible forms (±3600 lemmas yield ±250,000 inflected forms). This ‘naïve’ method worked reasonably well, mainly due to the relatively low degree of syncretism in Gothic inflectional morphology.

About 58% of the tokens in the Gothic Bible could be unambiguously linked to one lemma and one POS tag. Most of the remaining tokens could be lemmatized, but were morphologically ambiguous (e.g. nominative and accusative forms of neuter nouns). A small number of forms turned out to be lexically ambiguous as well: the word ita, for instance, can be a neuter pronoun (‘it’) or a verb (‘I eat’); the same applies to the very frequent form im: ‘(to) them’ or ‘I am’. Correct analysis can only be determined by looking at the context. Given the small size of the corpus (±67,400 tokens), we decided to disambiguate the data manually rather than writing a full-fledged statistical or rule-based parser.

In the online editions, ambiguous forms are marked in orange. As soon as the automatically assigned annotations have all been verified, they will be added to the TEI file. The dictionary entries are based on Streitberg's Gotisch-griechisch-deutsches Wörterbuch (1910).

In order to generate the Gothic lexicon, we developed an application that reads a formal description of inflectional morphology (Gomorphv2, named after a C++ prototype that hard-coded Gothic morphology). The syntax is conceptually similar to MathML, a standard mathematical markup language defined by the W3C, and can in principle be used for any inflected language.

The model is based on inheritance: morphological classes can be derived from other classes, adding new rules or overriding rules defined in the parent class, e.g. ‘noun’ > ‘a-stems’ > ‘ja-stems’ > ‘Mja’ for the class of short masculine ja-stems in Gothic. (See Daelemans, Gazdar & De Smedt 1992 for an overview of inheritance in Natural Language Processing.) Morphological classes are defined by expressions involving parameters (e.g. Lemma), variables (e.g. Root, Suffix), functions (e.g. Umlaut), and two operators, ‘concatenation’ and ‘union’ (basically, each expression defines a regular language without using Kleene star). There is only one data type: a set of strings, which makes it easier to handle spelling variations or alternative forms. Functions are regular-expression substitutions that operate on each element of a string set. Finally, each expression has a specified range, i.e. it applies to a given subset of the entire paradigm (allowing us, for instance, to apply a function ‘Ablaut()’ to a variable ‘Root’ in the preterite only).

Here are a few examples taken from the definition of masculine u-stems in Gothic, written in pseudo-code:

	parameter Lemma = "sunus" [i.e. the default value]
	function GetRoot(): replace /us$/ with "" [i.e. strip final -us]
	function Phonology(): ... [normally inherited]
	variable Root(*) = GetRoot(Lemma)
	variable Form(*) = Phonology(Root • Suffix)
	variable Suffix(NS) = {"us"}
	variable Suffix(AS) = {"u"}
	...
	variable Suffix(VS) = {"au", "u"}
	...

... and in the Gomorphv2 format (somewhat simplified):

	<class name="Mu" description="Masculine u-stems" inherits="_uStems">
	  <parameters>
	    <parameter name="Lemma" default="sunus"/>
	  </parameters>
	  <functions>
	    <function name="DeriveRoot">
	      <rgx pattern="us$" replace=""/>
	    </function>
	    <!-- function Phonology inherited from parent class -->
	  </functions>
	  <paradigm>
	  <!-- variables Form and Root would normally be inherited from the parent class
	          but are included here for illustration: -->
	    <variable name="Form">
	      <assign range="*">
	        <apply-function name="Phonology">
	          <concatenation>
	            <var name="Root"/>
	            <var name="Suffix"/>
	          </concatenation>
	        </apply-function>
	      </assign>
	    </variable>
	    <variable name="Root">
	      <assign range="*">
	        <apply-function name="DeriveRoot">
	          <param name="Lemma"/>
	        </apply-function>
	      </assign>
	    </variable>
	    <variable name="Suffix">
	      <assign>
	        <list>
	          <literal value="us"/>
	          <literal value="u"/>
	          <literal value="au"/>
	          <literal value="aus"/>
	          <literal value="au|u" type="expression"/>
	          <literal value="jus"/>
	          <literal value="uns"/>
	          <literal value="um"/>
	          <literal value="iwe"/>
	          <null/>
	        </list>
	      </assign>
	    </variable>
	  </paradigm>
	</class>

The XML notation is admittedly verbose, but offers readily available parsers, validation, auto-completing editors with syntax highlighting, and conversion to other formats using XSLT. In the current implementation (a prototype written in Visual Basic), the XML specification is directly interpreted by a program that generates paradigms based on parameters supplied by the user or stored in a database. A more interesting approach would be to compile the specification, for instance by translating the morphological classes into Java classes or similar code.

The specification of Gothic inflectional morphology can be downloaded (Gothic.xml) or browsed online in HTML, generated from the XML source using this stylesheet.

Digital transcriptions

As far as we know, the Gothic Bible (or more precisely, Streitberg's edition) has been digitized independently at least six times:

The introduction to De Tollenaere & Jones 1976 states: The original computer corpus for this work was punched in 1962 at the IBM Research Center in Yorktown Heights, New York, under the direction of Philip H. Smith, Jr. [...] The text was later updated according to the fifth edition of Streitberg (1965) and expanded at the Leiden Institute for Netherlandic Lexicology to include a new version of the Skeireins together with all available biblical and non-biblical texts in early Gothic.
In January 2003, Dr. F. De Tollenaere (1912–2009) wrote to me that the resulting tape might still be available at the Institute for Dutch Lexicology (INL) in Leiden.
A footnote in the same introduction mentions: Computer texts of Gothic have also been prepared in the past few years by William Estabrook and James W. Marchand. Estabrook has produced a word index and a reverse word list, Marchand a grammatical concordance. None of these, however, has yet been published.
See Marchand 1987: I have on my “litte” 1984 issue AT the entire Greek New Testament, the King James Bible, and the Gothic Bible, with plenty of room left over for the software to interrogate them and I entered this text on punched cards in 1960, with a grammatical analysis of each word (my emphasis; cited from Christian Petersen: Gotica Minora, SYLLABUS-Verlag, Hanau 2002).
Wolfgang Griepentrog for the TITUS project in 1986–1988.
Magnús Snædal's Concordance to Biblical Gothic (1998) is obviously based on a digital transcription. The preface to the first edition mentions: work on the concordance began in 1991. According to the preface to the third edition (2013), only a very limited number of copies of that edition were printed; otherwise, distribution is electronic.
Ljuba Veselinova (Stockholm University, Department of Linguistics) scanned the Gospels. In an e-mail from 2003-11-09: I worked on Gothic many years ago and scanned the New Testament texts, intending to design a tagged corpus.
The text available here, digitized by R. Tannert, D. Landau and myself in 1997. I was not aware of other versions at the time.

Deviations from Streitberg's 1919 edition

A small number of obvious printing errors in the source text have been corrected. Most of them are mentioned in De Tollenaere, Felicien: Word-Indices and Word-Lists to the Gothic Bible and Minor Fragments, Leiden: E. J. Brill, 1976, others in Streitberg's Berichtigungen.

[1] Matthew 8:14 (CA): [jah] gasaƕ → jah gasaƕ, <jah> in heitom → in heitom: Streitberg 1919, Berichtigungen: S. 13 M 8,14 ist die Überlieferung beizubehalten: jah qimands Iesus in garda Paitraus jah gasaƕ swaihron is ligandein in heitom, vgl. E. A. Kock Kontinentalgermanische Streifzüge (Lund und Leipzig 1919) S. 1. Die Intonation bestätigt die Ursprünglichkeit der überlieferten Fassung.
[2] Matthew 10:36 (CA): innakundai is → innakundai is.: Missing period in the 1919 edition (corrected in the 2000 edition).
[3] John 6:22 (CA): þatai → þatei: De Tollenaere 1976: manuscript and Streitberg's first [1908] edition.
[4] John 6:22 (CA): sainaim → seinaim: Streitberg 1919, Berichtigungen: S. 33 J 6,22 lies siponjam seinaim statt sainaim.
[5] Luke 3:31 (CA): sanaus → sunaus: De Tollenaere 1976: manuscript.
[6] Luke 9:32 (CA): geseƕun → gaseƕun: De Tollenaere 1976: manuscript.
[7] Mark 4:39 (CA): da → du: De Tollenaere 1976: manuscript.
[8] Mark 12:36 (CA): fotaubaurd → fotubaurd: De Tollenaere 1976: manuscript. Streitberg's glossary has the correct form fotubaurd (Streitberg 1910, p. 36).
[9] Romans 9:27 (A): Iaraelis → Israelis: De Tollenaere 1976: manuscript.
[10] Corinthians I 7:5 (A): þaþroh~þan → þaþroþ~þan: Streitberg 1919, Anhang: K 7,5. þaþroþ|þan, nicht þaþroh|þan.; Streitberg 1919, Berichtigungen: S. 255 K 7,5 lies þaþroþ-þan statt þaþroh-þan.
[11] Colossians 1:21 (A): waurstwam ubilaim, 22 iþ nu gafriþodai → waurstwam ubilaim, iþ nu gafriþodai: Incorrect verse separation. De Tollenaere 1976: cf. B and verse separation in the Greek text.
[12] Timothy I 5:25 (A): þo[ei] → þoei: Streitberg 1919, Berichtigungen: S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.
[13] Timothy I 5:25 (B): þo[ei] → þoei: Streitberg 1919, Berichtigungen: S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.
[14] Titus 1:5 (B): [in þize] → in þize: Streitberg 1919, Berichtigungen: Ebenso ist S. 445 Tit 1,5 die eckige Klammer bei in þize zu tilgen: der Wortlaut von B wird durch die Intonation als ursprünglich erwiesen.
[15] Philemon 1:14 (A): sawswe → swaswe: De Tollenaere 1976: manuscript.
[16] Skeireins 5:2 (E): anþaranuhþan → anþaranuh þan, (a)nþaranuhþan → (a)nþaranuh þan: De Tollenaere 1976.
[17] Skeireins 6:6 (E): sumanuhþan → sumanuh þan, sumanuhþan → sumanuh þan: De Tollenaere 1976.

Differences between Streitberg's 1919 and 2000 edition

Listed below are differences between Streitberg's 1919 and 2000 edition. The list was compiled by collating the electronic edition with the text provided by the Titus project, which is based on Streitberg's 1965 edition; it is probably not complete.

[1] Matthew 7:16 (CA): different interpretation

1919: lisand[a]
2000: lisanda

[2] Matthew 9:32 (CA): different segmentation

1919: utusiddjedun
2000: ut usiddjedun

Misprint or deliberate correction? If it was a deliberate correction, it seems inconsistent: cf. the verb innatgaggan (e.g. Luke 7:45 [CA]: innatiddja).

[3] Matthew 10:24 (CA): different punctuation

1919: laisarja nih
2000: laisarja, nih

[4] Matthew 10:36 (CA): different punctuation

1919: innakundai is
2000: innakundai is.

Period is missing in the 1919 edition. Corrected in this edition.

[5] John 6:22 (CA): correction in 2000

1919: sainaim
2000: seinaim

Streitberg 1919, Berichtigungen: S. 33 J 6,22 lies siponjam seinaim statt sainaim.

[6] John 11:18 (CA): different interpretation

1919: Iairusaulwmi[a]m
2000: Iairusaulwmiam

[7] John 12:14 (CA): different interpretation

1919: <jah> gasat ana ina
2000: gasat ana ina

[8] John 15:13 (CA): different interpretation

1919: friaþwa[i]
2000: friaþwai

[9] John 18:28 (CA): different interpretation

1919: praitoria<un>
2000: praitoria

[10] Luke 2:37 (CA): different interpretation

1919: widuwo <swe> jere
2000: widuwo jere

[11] Luke 3:19 (CA): different punctuation

1919: Herodes.
2000: Herodes,

[12] Luke 8:14 (CA): different interpretation

1919: þai[ei]
2000: þaiei

[13] Luke 9:37 (CA): different interpretation

1919: <afar>daga
2000: daga

[14] Luke 14:28 (CA): different interpretation

1919: habaiu <þo> du ustiuhan
2000: habaiu du ustiuhan

[15] Luke 18:38 (CA): misprint in 2000

1919: ubuhwopida
2000: ubuƕopida

De Tollenaere 1976: manuscript and Streitberg's second edition.

[16] Luke 18:39 (CA): different interpretation

1919: faur<a>gaggandans
2000: faurgaggandans

Streitberg 1919, Anhang: L 18,39. faurgaggandans CA, fauragaggandansGLGabelentz-Löbe. Dieses ist intonationsgemäß und entspricht der Lesart προάγοντες; faurgaggandans könnte durch παράγοντες beeinflußt oder durch faurgaggandein· διαπορευομένου (V. 36) hervorgerufen sein.

[17] Luke 20:42 (CA): different interpretation

1919: psalmo<no>
2000: psalmo

Streitberg 1919, Anhang: L 20,42. psalmono für psalmo CA wird durch die Intonation gefordert. Die got. Flexion des Fremdworts ist wie so häufig vom Dativ Sg. ausgegangen, vgl. Akk. Sg. psalmon K 14,26.

[18] Mark 2:4 (CA): different interpretation

1919: [jah fralailotun]
2000: jah fralailotun

[19] Mark 10:46 (CA): different interpretation(s)

1919: Barteimai[a]us <sa> blinda
2000: Barteimaiaus blinda

[20] Mark 15:38 (CA): different interpretation

1919: faur[a]hah
2000: faurahah

[21] Mark 16:11 (CA): different punctuation

1919: ni galaubidedun.
2000: ni galaubidedun

[22] Romans 9:20 (A): Inconsistent correction in 2000

1919: gadikis
2000: gadigis

Streitberg 1919 & 2000, apparatus: gadikis] A deutlich Br.Wilhelm Braun, für gadigis.

[23] Romans 9:33 (A): different interpretation

1919: jah <sa> galaubjands
2000: jah sa galaubjands

[24] Romans 11:11 (A): misprint in 2000

1919: briggan
2000: briggau

De Tollenaere 1976: Streitberg's first and second edition.

[25] Corinthians I 7:5 (A): (Incomplete) correction

1919: þaþroh-þan
2000: þaþroþ þan

Streitberg 1919, Anhang: K 7,5. þaþroþ|þan, nicht þaþroh|þan. 2000 corrects the error, but hyphen is missing.

[26] Corinthians I 11:23 (A): different punctuation

1919: galewiþs was, nam hlaif
2000: galewiþs was. nam hlaif

[27] Corinthians I 13:12 (A): different reading

1919: iþ þan ufkunna
2000: <iþ> þan ufkunna

[28] Corinthians I 15:21 (A): different punctuation

1919: dauþaize;
2000: dauþaize:

Probably due to poor facsimile reproduction in the 2000 edition.

[29] Corinthians II 6:18 (B): misprint in 2000

1919: dauhtrum
2000: dauhtram

De Tollenaere 1976: Streitberg's second edition.

[30] Ephesians 3:21 (A): different interpretation

1919: aikkles<jon>
2000: aikklesjon

[31] Galatians 2:6 (A): different segmentation

1919: anainsokun
2000: ana insokun

De Tollenaere 1976: Streitberg's second edition.

[32] Philippians 3:3 (A): misprint in 2000

1919: jan-ni
2000: jan ni

De Tollenaere 1976: Streitberg's first and second edition.

[33] Colossians 3:5 (B): different reading

1919: ubila(na)
2000: ubila

[34] Timothy I 4:14 (B): different interpretation

1919: praizbwtairei<n>s
2000: praizbwtaireis

[35] Timothy I 5:22 (B): misprint in 2000

1919: ni man<n>hun lagjais
2000: niman<n>hun nlagjais

De Tollenaere 1976: Streitberg's first and second edition.

[36] Timothy I 5:25 (A): correction in 2000

1919: þo[ei]
2000: þoei

Streitberg 1919, Berichtigungen: S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.

[37] Timothy I 5:25 (B): correction in 2000

1919: þo[(ei)]
2000: þo(ei)

Streitberg 1919, Berichtigungen: S. 427 T 5,25 tilge in AB die eckige Klammer und lies þoei: die Intonation verlangt die überlieferte Form.

[38] Timothy I 6:4 (B): Inconsistent correction in 2000

1919: witands
2000: witāds

De Tollenaere 1976: expanded in Streitberg's first and second edition.

[39] Timothy II 4:14 (A): misprint in 2000

1919: usgildiþ
2000: us gildiþ

Due to missing soft hyphen at end of line.

[40] Titus 1:5 (B): correction in 2000

1919: [in þize]
2000: in þize

Streitberg 1919, Berichtigungen: Ebenso ist S. 445 Tit 1,5 die eckige Klammer bei in þize zu tilgen: der Wortlaut von B wird durch die Intonation als ursprünglich erwiesen.

[41] Nehemiah 6:15 (D): misprint in 2000

1919: ·n· dage
2000: ·n dage

[42] Nehemiah 7:3 (D): different punctuation

1919: und þatei urrinnai sunno
2000: und þatei urrinnai, sunno

[43] Nehemiah 7:17 (D): misprint in 2000

1919: <Az>gadis
2000: <Az->gadis

Error due to hyphenation in 1919.