This week’s lab explores some of the markup and semantic design that makes resource like the Old Bailey Online so useful for text mining by contrasting its technical approach with that of Artist’s Books Online.
Artist’s Books Online
Artist’s Books Online is a project to create digital repository of artist notebooks and other material directed by Johanna Drucker. The site is structured around a hierarchical conceptual framework of work, edition, object and image. Work is understood as an intellectual project: an idea in its most abstract form. This is then materialised in versions and instantiated in specific objects.
For the digital repository the images are the scans or electronic representation of an object. In addition to this hierarchy there are some container terms like exhibits, that allow parts of the repository to be curated into different collections, and essays, which use items from the repository. A similar hierarchy could be developed for Old Bailey online consisting of proceedings, sessions, trials and offences, for example, but I’m not sure this has been as explicitly modelled as a hierarchy.
As the items in the repository are encoded using the Text Encoding Initiative (TEI) scheme the markup of each item is governed by a specification know as a Document Type Definition (DTD). This encapsulates and validates the project’s classification scheme so that the markup conforms to it and will be suitable for the research approaches the resource wants to serve.
There is a lot of information given about the DTD for Artist’s Books Online and technical information on the tags that can be used to markup a document and their descriptions. The technical methods for the Old Bailey Online provide much less detail. Perhaps because the markup was done by the project team as part of the original project whereas Artist’s Books Online is open to new deposits and more of the onus is on the contributors to help with marking up material appropriately.
Marking up is an interpretative intervention in the text but even more so with Artist’s Books Online where the metadata includes not just descriptive metadata but annotations that allow critical commentary to be included. These contributions include author attributions making the interpretative act more visible than it normally is with objective, nameless cataloguing.
Despite this detail that has gone into crafting markup I cannot see anyway in the user interface two switch between an html rendered view of the data for an item and the raw xml view as is possible with the Old Bailey Online. None of the metadata is linked either meaning that you cannot click through to another part of the site from an item’s metadata. This allows you to navigate ‘down’ into the hierarchy, but not as easily across it. Nor is their an API that I’ve been able to discover. The XML a remains wrapped within.
For all the well crafted XML that is used to create each entry, it is not clear to me what this approach offers beyond other static or dynamic web sites. The advantages of this architecture are described as being non-proprietary, platform independent, and providing better options for searching, processing and reuse than a standard website. You could think about how this site could be implemented using WordPress for example, or repository software and the experience of the site not being that different. You wouldn’t have the same semantics underneath but it is not clear how these make much difference in the final presentation? They do, however provide a higher level of control of input, albeit by making input more technical and the files may be utilised by the project in ways that aren’t yet visible on the web site.
I was reminded of Designing Shakespeare a project undertaken by an academic at Royal Holloway. The original repository is now archived as a functional collection of web pages and text database but the academic wanted to actively develop the collection again and open it up to researchers along with annotated texts and a mobile application for the general public.
I wasn’t sure how to help at the time but exploring these resources it is more obvious how a conceptual framework, semantic model, structured metadata, TEI encoded texts, and an API could really open up this as a resource for researchers and the general public. Increasingly LIS services may be expected to contribute their expertise to the conceptual and classification schemes for such projects from the bid phases through to production in future as part of multi-functional teams.
For Designing Shakespeare, a repository could be based on Luke Blaney’s Theatre Ontology and borrow concepts from Artist Books Online for the images and critical commentary. It is also suitable for the type of visualisation and network analysis seen in the CKCC project from Utrecht to map who performed with whom and played which characters in which productions as a form of co-citation.
With well structured data as the foundation there are so many possibilities for digital humanities resources that are both semantically and functionally rich. The separation of data from presentation and the provision of APIs means the same data can be used in interfaces for different audiences. The digitisation, coding and platform development costs are not insignificant.
Again I am struck by how much effort is needed to turn unstructured content into well structured collections so that others can interrogate or reuse them in their research. It is work done either by enthusiasts or as part of funded projects. Many services have uncertain futures once the initial project is over as it is unclear how ongoing hosting and maintenance will be funded except on a best efforts basis by host institutions.
“Who Pays?” remains one of the hardest research questions to answer.
Featured image: partial screenshot of some Old Bailey Online XML.