Yocum, Christopher Guy, “Text clustering and methods in the Book of Leinster”, Lash, Elliott, Fangzhe Qiu, and David Stifter (eds), Morphosyntactic variation in medieval Celtic languages: corpus-based approaches, Trends in Linguistics. Studies and Monographs 346, Berlin, Online: De Gruyter Mouton, 2020. 85–112. doi:10.1515/9783110680744-005.

“Text clustering and methods in the Book of Leinster
Elliott Lash (ed.) • Fangzhe Qiu (ed.) • David Stifter (ed.), Morphosyntactic variation in medieval Celtic languages: corpus-based approaches (2020)
For a list of texts, see https://github.com/cyocum/bol_project.
Abstract (cited)

Most investigations of the Book of Leinster (hereafter LL) have used close reading, historical, and philological techniques to identify authors within LL (for instance, see Mac Gearailt 1993; Bhreathnach 2002; Mac Gearailt 1997–1998; Ó Lochlainn 1941–1942; Ó Lochlainn 1943–1944; Mac Eoin 1982: 113–114). While this has met with some success, the methods used are by their nature idiosyncratic and prone to individual scholarly opinion. One notable exception is Derick Thomson’s paper The Poetry of Niall MacMhuirich which attempts to use statistical methods to attribute authorship of poems to Niall MacMhuirich (Thomson 1970). This paper will use methods of anonymous authorship attribution, which has been developed within the discipline of machine learning and statistical analysis to accomplish two goals: first, to demonstrate the means and methods of unsupervised machine learning techniques in early Irish literature and second, to discuss the implications of the application of this methodology to LL with a view towards a larger research project.

The paper will proceed in four stages. First, some scholarly literature concerning LL is reviewed. Second, the methods of data gathering, along with certain related problems, as well as the algorithms used in the analysis are commented upon. Third, the outcome of the analysis is summarised. Fourth, the paper concludes with an examination of the contribution the analysis makes to the debate surrounding the authorship of LL

Subjects and topics
early Irish literature
machine learning authorship attributions authority XML vectors k-medoids
