News:Text itemiser

From CODECS: Online Database and e-Resources for Celtic Studies

Introducing the Text Focaliser

17 Apr 2016

Did you ever had to work your way through a sizeable corpus of texts and wished you had a digital toolset to hand that allowed you not only to store annotations (such as tags and references to scholarly discussions), but to go about it in a more systematic, semantically structured and data-driven fashion? Better still, one which allowed you to perform such actions within a team/community whose members may be physically far removed from one another? You or your group may have been focussing on particular figures in history or literary characters, places real and imagined, literary and learned motifs or themes, legal topics, lexical items of interest, linguistic features, narrative devices, metrical forms, evidence of intertextuality, or dates and recurring events, etc. Having assembled the data, you want automated queries to present your data in conveniently arranged lists, tables, filtered views, maps and what not.

It may seem a bit much to ask for, at least if one were aiming for the ultimate ‘chef’s knife’ that can do anything with equal success, but it is not an idle hope. In fact, such a multi-purpose tool is precisely what has been the focus of preparations here at CODECS. Why? Behind the scenes, I had already experimented with tailor-made solutions to specific projects, but in the end, a more centralised effort seemed both more productive and more feasible than building/maintaining separate tools for many different specialised needs.

Enter the Text Focaliser (previously Text Itemiser), for lack of a more adequate, catchier or established term. The challenge has been to develop a set of tools and methods that is at once simple and accessible enough to be easily taught and used, and can be used to assist a wide agenda of research questions with which one may approach a text or corpus of texts. Broadly speaking, the process involves three main areas of activity.

1. Editing items

A special editing form allows editors to do two things: (1) single out an item of source information, or multiple items under a shared heading, and (2) add semantic annotations about each item. Typically, such selections focus on manageably small sections of text that are relatively coherent in themselves, but there are other possibilities. In lieu of a full presentation, here is a screenshot to give you a rough impression:

Screenshot (2 December 2015)

Screenshot of an earlier draft of Commentary on Félire Óengusso - 14 September (page accessible to editors)

2. Controlled vocabularies

The use of controlled vocabularies is not new to this site, but work in this area has certainly improved and accelerated with the coming of a new data tool that takes full advantage of their possibilities. It involves setting up a thesaurus of predefined terms that provide reliable anchor points for semantic tagging (no. 1 above), but which are also structured and organised themselves in order to assist data queries (no 3. below).

3. Queries

Semantic queries can act upon the available data and produce convenient overviews, for instance by rendering lists and tables, maps, filtered search and custom views. Some of these facilities are already available from this site, although they come with a soft warning.

A simple example:

1. An editor opens the form to edit a page about the introductory part of Aislinge Meic Con Glinne. At the point where this text makes a parody of the ‘four conditions of every composition’, the editor can add an appropriate tag to the ‘subject’ field. But which one?
2. The thesaurus includes a relevant term called four elements of composition (time, place, person and cause) (a variant of the six elementa narrationis in rhetorics), which is occasionally attested as a learned motif in Hiberno-Latin and Middle Irish writing. This term is indexed as a subclass of ‘numerical motifs’.
1. If the term is present (which it is), the editor can select the value from the autocomplete box. If it isn't, the term can still be defined later on, or a value can be added to the ‘keywords’ field: this is a field for non-standardised tags so that decisions can be deferred to a later, more convenient time, or referred to for consultation by other editors.
3. If we were to look for ‘numerical motifs’ in vernacular Irish writing and built a query to that effect, this example would turn up in our query results.

Of course, I have greatly simplified the process, for the sake of clarity if at the expense of ignoring some intricacies, but the general idea stands. For now, these outlines are all I have time for, but hopefully, I can dedicate the next series of posts to elucidating each of these areas in some detail and to offering some concrete examples.

Site news Project:Texts