Evaluating Cuneiform in the World Wide Web: fantastic use cases and where to find them
Adam Anderson
This research addresses the digital representation of cuneiform as the earliest multilingual orthography and demonstrates how its historical polyvalence and format heterogeneity continue to complicate reproducible research to this day. Building on a UC Berkeley Discovery Project (since 2021), a two-stage approach is presented: (1) a document-oriented pipeline for structure-preserving, auditable conversion from transliteration to Unicode, and (2) a character-based pipeline for large-scale NLP scenarios. Based on systematic comparisons of central digital character lists (including OSL, Nuolenna, CuneiML, Text-Fabric, Akkademia), a harmonized ontology with transparent provenance and publication of character values as linked data in Wikidata is created. Dictionary-based matching and error analysis generate prioritized correction lists and quality metrics; identified inconsistencies are evaluated in terms of their impact on downstream tasks (e.g., machine translation) and are incorporated into rule refinements and Unicode proposals. The results include a quality assurance harness for cuneiform rendering, the integration of script variants, and an open, testable framework for character normalization that combines philological accuracy with machine processability. Overall, the project demonstrates how born-digital workflows can create a robust, reusable infrastructure for computer-assisted Assyriology.
Time & Location
Oct 24, 2025 | 04:00 PM c.t.
Berlin-Brandenburg Academy of Sciences and Humanities / Berlin-Brandenburgische Akademie der Wissenschaften
Unter den Linden 8, Lise-Meitner-Saal (Gebäude der Staatsbibliothek Berlin, entrance on Unter den Linden) (07W04, opposite the Academy Library lending desk)
10117 Berlin
The seminar will generally be broadcast digitally. Some speakers will be present on site. Zoom link: https://hu-berlin.zoom-x.de/j/62272165290?pwd=DmvBO97b3JAJIutndWU2bBILGaJ3AX.1