At the University of Oxford, humanities benefit from an intense and constant interaction between cultural tradition and technological innovation. It was therefore with great expectations that the author of this post, profiting from the support of the Erasmus + programme for staff mobility, took part in the Digital Humanities at Oxford Summer School from 2 to 6 July at Keble college.
The Digital Humanities at Oxford Summer School (DHOxSS), which offers training to anyone with an interest in the Digital Humanities, offers eight workshops which are completed by some plenary lectures:
- An Introduction to Digital Humanities
- An Introduction to the Text Encoding Initiative
- Quantitative Humanities
- Digital Musicology
- From Text to Tech
- Hands-On Humanities Data Curation
- Linked Data for Digital Humanities
- Crowdsourced Research in the Humanities
Considering that Wikidata is essential for the entity-fishing service used by the HIRMEOS project to enrich the texts of the open access monographs published on its digital platforms, the author decided to attend the workshop introducing concepts and technologies behind Linked Data and the Semantic Web and their meaning for DH.
The workshop, organized and conducted by Dr. Terhi Nurmikko-Fuller, lecturer in Digital Humanities at the Centre for Digital Humanities Research at the Australian National University, allowed also those participants who, like the author, had no or little knowledge in the field of computer science to become familiar with the main concepts underlying the transformation of a simple dataset into a structured data system. During the workshops all participants were encouraged to put into practice the notions acquired, mainly by sketching ontologies, structuring data in the turtle format and using the SPARQL query language.
Terhi and the two co-trainers John Pybus and Graham Klyne first introduced the notion of the Semantic Web. This is mainly an overall view of the web – I would call it a kind of ‘regulatory ideal’ of the ‘computation reason’ – which manifests concretely itself in the effort to create a Web of Data, i.e. an architecture of (possibly open) linked data. These data present should ideally follow these standards:
★ They Are available on the Web (whatever format) but with an open licence, to be Open Data
★★ They Are available as machine-readable structured data (e.g. excel, not an image scan of a table)
★★★ They Present a non-proprietary format (e.g. CSV instead of excel)
★★★★ They Use open standards from W3C (RDF and SPARQL) to identify things
★★★★★ They Are linked to other people’s data to provide context
According to this paradigm, the web should become a system of data entities which are recognizable by unique identifiers (http URIs), related to each other and created in such a way as to be readable by machines. The ultimate goal of the Semantic Web is therefore to have a hierarchical data architecture rather than a simple collection of documents. However, a total, omni-comprehensive architecture of the world (of data) remains an ideal far away from actual practice. More relevant for practice is the general aim of this vision, namely a reduction of complexity of the considered dataset which it is possible when this set is structured according to specific knowledge needs. In their rawness, data are not really useful for the digital humanist. Only if structured by an ontology, the dataset become fully suitable for scholarly investigations. An ontology is an “explicit specification of a conceptualization” (Gruber, T. R: A translation approach to portable ontology specifications. Knowledge acquisition, 5(2),1993, p. 199-220.)
Ontologies consist of categories and relationships which are applied to specific datasets in order to confer a semantic structure. This means that an ontology is the result of a conscious selection according to specific research needs. By abstracting a little bit, we can so think an ontology as a structured complex of decisions allowing the interpretation of a dataset – where interpretation means reading and querying data in order to obtain, in response, a subset of data linked to each other in an interesting way. Therefore ontologies are not something absolute, i.e. not independent of the activity of the subject who needs them to expand its knowledge. Ontologies cannot be considered a true or false reflection of an external reality (of data), but just a pragmatic constructions. More concretely, ontologies are semantic models articulated in a specific syntax and their materiality is that of a piece of software.
In the course of second day we learned that the development of an ontology usually involve the following phases:
Specification, in which reasons and aims of the ontology are assessed and determined.
Conceptualisation, dedicated to planning out structure, classes and properties of the onotology
Formalisation, in which the ideas are realised in a model, and the hierarchy of concepts is defined
Implementation, in which the language the editor software and, if necessary, the reasoner are selected.
Evaluation, in which the ontology is tested against SPARQL queries or through an online validator
Documentation, in which information regarding the design decisions and the rationale are outlined for the benefit of other users.
To understand how linked data architectures can be generated we need to become familiar with some basic concepts and definitions. First of all, we spoken about RDF (Resource Description Framework), the data model used to formulate the links between the different URI entities in order to make their relationships readable by machines. Afterwards we learnt how these relationships formulated through RDF can be expressed through different formats and that one of the most practical and functional to do this is turtle. In fact, this is not only readable by machines, but with a little exercise, even by our (in my case, totally inexperienced) human eyes. Through turtle we can represent the triples. i.e. the connections between entities according to a model subject-predicate-object and then implement them in software. It was shown, that an important reason for using turtle is its similarity with the query language SPARQL and we get to know a few of its syntactic elements.
In several hands-on sessions we then attempted to sketch a simple ontology to be applied to a dataset provided by the instructors and afterwards to query them using SPARQL. The ontologies we had prepared were integrated into a dataset using some different programs: We exported our ontology from Protege as .owl or .ttl and we uploaded it in turtle (.ttl) format into Web Karma together with the data in a .csv format. Combining dataset and ontology we could create a knowledge graph and export it from web karma in RDF. It was suggested to use Blazegraph to generate the graph database.
To complete these exercises we were also introduced to various examples of ontologies used in the construction of linked data in a variety of disciplines like numismatic and musicology, and used for organizing metadata, exploring our cultural heritage, and visualizing data with innovative tools:
Sharing the wealth, Linking Discipline: Linked Open Data for numismaticsby Professor Andrew Meadows
Linked Data for Digital Musicology by Dr. Kevin Page
Defining the Cultural Heritage Knowledge Graph This session was run by Dominic Oldman and Dr Diana Tanase
Linked Open Geodata with Recogito This session was run by Valeria Vitale
OxLOD The final session of the workshop was a talk by Dr Athanasios Velios (University of Oxford) on OxLOD (Oxford Linked Open Data)
Linked Data and Digital Libraries In this session, Professor Stephen Downie provided an insight into projects that combine Linked Data methodologies and technologies with data from Digital Libraries.
In general, a particularly positive aspect of the workshop was the combination of three different didactic moments: theoretical explanations, practical exercises, and presentations of external projects. Perhaps, for the next edition, the time dedicated to the latter could be a little reduced to advantage of the first two session types. However such a workshop remain an unique opportunity to have, in just five days, a general understanding of the workflow related to the creation of linked open data and knowledge graphs and, in addition, ‘to learn how to learn more’, i.e. to know what people who are not IT specialists can do in order to progress autonomously in the usage of these tools.
The author was particularly interested in some aspects of the workshops related to Open Access and Open Science. It was very important to see more concretely why also linked data should be open, why only if open the linked data can deploy their full potential for the digital humanities. Indeed, the vision of the Semantic Web confirms that open science and digital humanities are not parallel paths, but two interconnected processes reinforcing each other.
Furthermore, considering the tasks of the HIRMEOS project, it was important to better understand how linked open data and the Semantic Web are going to play an important role in the enhancement of Open Access monographs. In fact, linked data could have an enormous utility for improving the discoverability of monographs: Converting library metadata into a system of linked data could be the way forward. What concrete practices such a transition will require it is an open and fascinating question!