Marcus Pöckelmann, Mahmoud Kozae
The production of the digital synopsis of a chapter starts with transcribing instances of it from multiple manuscripts. In the transcription process, the text of each manuscript is transformed into a normalized orthography to ensure that the digital corpus follows a uniform writing standard. Phonetic deviations are adjusted as well in accordance with Classical Arabic, except for such features of Middle Arabic, that are significant for diagnosing manuscripts.
Each transcribed chapter then undergoes a literary analysis, whose end result is an edition, which is segmented into “literary units” with each unit marked by a unique title. This segmentation is a way to identify and describe the structure of a chapter as the content unfolds as elements of wisdom sayings, dialogue, and narration among other types. An identical procedure is followed with any further manuscript containing the same chapter. The unique titles are then used by LERA to detect and align similar passages next to each other. Moreover, LERA provides further assistance for the analysis of the variants, e.g. CATview, a condensed visualization of the alignment that facilitates overview and navigating within the synoptic edition.
LERA offers further features for automatic detection of text variations, for example, passages unique to a single manuscript or passages that are exclusively shared between two manuscripts. Currently, language specific information is being updated to optimize LERA’s filters to function better with Arabic texts.