Seminars in Scholarly Text Encoding with TEI

Julia Flanders



I recently had the opportunity to define digital scholarship and as part of that definition I suggested that digital scholarship (like other forms of scholarship) is characterized by self-awareness and self-criticism on the one hand, and a willingness to expose one’s processes and results to critical scrutiny from others, on the other hand. It is the willingness to explain, to engage, and to be critiqued—the possibility of being interestingly wrong—in short, exercising something akin to what David Pye calls the workmanship of risk in the intellectual field. [slide 02]

These characteristics may seem to go against what we often look for in digitally encoded data: we more often seek consistency and reproducibility than we do originality and experimentation. However, there is an important strand of the TEI that needs to work in precisely this way, and the question is how to reach the parts of the scholarly community who are the real audience for and potential practitioners of this kind of encoding. The WWP has been calling this, informally, craft encoding to distinguish it from what we might call production encoding, and we recently had an opportunity to explore methods of teaching craft encoding to humanities scholars. This paper is an account of that process and of what we learned.

Between January 2007 and June 2009, the Brown University Women Writers Project offered eleven introductory seminars in text encoding, focusing on the use of the TEI Guidelines, and funded by the NEH. There were two central goals envisioned for this initiative:

The starting point for each event was a discussion of research methods and the kinds of digital representations that support them, as a way of situating text encoding within this landscape as a specific representational strategy. The core of each seminar was organized around participants’ own projects and sample texts, using these materials and participants’ interests to drive further discussion of how to create effective digital representations in TEI. The final segments of each seminar were typically aimed at giving participants an understanding of how text encoding projects are planned and conducted, including topics like digital publication tools, TEI customization, and project management. The seminars were aimed at faculty, students, and practitioners in the humanities with little or no technical experience but a strong interest in digital textuality.

In all, the seminars reached 220 participants from 71 different institutions, ranging from large research institutions (UCLA, University of Illinois) to small liberal-arts colleges (Wheaton College, Augustana College). The shortest event was a one-day seminar at Stanford, the first in the series; the longest were a full three days. [slide 03]The participants were drawn from across the academic spectrum, with a roughly even distribution between faculty, students, and staff (though it’s not obvious that that’s the most relevant division). [slide 04] The evaluation feedback that we received was extremely positive. Most participants found the seminars to be the right pace and length and considered the seminars to be highly relevant to their work. The comments included praise for both the design and conduct of the seminars, as some highlight quotes will illustrate. [slide 05]

In designing the curriculum for these seminars, we drew on the experience the WWP has gained from teaching TEI workshops in a variety of venues over the past five years. Although early on our audience was typically library staff and digital humanities practitioners, over time an increasing proportion have been humanities faculty and graduate students, and from their participation we observed that the intellectual hooks most helpful for understanding text encoding were drawn from textual editing, anthropology, and information modeling. What these connections bring to the fore is the role that the editor (observer, modeler) plays in shaping the representation that is created, and the disciplinary, strategic, and interpretive nature of the resulting digital object. The seminar curriculum emphasized these conceptual models and encourages participants to think about the models their research demands. It also raised the question of how these models can be harmonized (through a standard encoding language like TEI) so as to permit communication between researchers.

Our goal was thus for participants in the seminars to come away, at a minimum, with an understanding of how text markup works: what kind of digital representation it can create, and how these can function in a digital scholarly context. We also wanted them to understand the role the TEI currently plays in supporting these activities, and to have some sense of how TEI data works in a digital project context: how such projects are developed and managed, and how they work as publications. To get a grounded understanding of these topics, participants would need some introduction to technical matters—for instance, the basic concepts of XML—but we wanted to be sure that the emphasis was never purely technical, and that even a discussion of XML well-formedness or data standards was contextualized so that their significance for the humanities could be easily grasped.

All of the seminars were structured similarly around a core set of topics, with several optional topics included based on local interests. [slide 06] The core topics provided an essential grounding in the TEI and text markup, and the optional topics focused on further dimensions of use and implementation. We also included two hands-on practice sessions of two or three hours each in almost every case, with the exception of Stanford and University of Nebraska. In addition, at some of the seminars (Wheaton, University of Washington, Miami University of Ohio, University of Nebraska) we included presentations on local projects.

The role of hands-on practice proved crucial in these seminars in ways that we had not predicted. It was clear to us from the start that we would need to ground our discussion of TEI markup with some opportunity for participants to practice what they had learned, and we designed the seminars around an alternation of concepts and practice: first, an opening discussion of background concepts (markup, XML, TEI), followed by hands-on practice; next, an exploration of more advanced encoding concepts (annotation, editorial markup, overlap) followed by hands-on practice; and finally, a discussion of how markup is used (project management, publication tools, schema customization). However, we discovered that the real role of the hands-on practice turned out not to be to anchor specific TEI concepts in people’s minds, or even really to teach them TEI at all, but rather to have participants use the markup process as a way of thinking about the text—thus helping to illustrate how much the process has to do with expressing the scholar’s own interests. This point was reinforced by our use of participants’ own sample texts for the hands-on practice, which ensured that participants were using texts in which they had some personal investment. In our use of the hands-on practice time we did not emphasize the production of correct TEI encoding beyond the basic requirements of validity: instead, we tried to reinforce the idea that the TEI can be used to express a very wide variety of research interests, and that in important ways the TEI markup reflects the information modeling that takes place within the researcher’s mind, rather than simply shaping the data to fit a standard external model. (In the seminars where TEI schema customization was covered, we were able to make this point more explicitly, since participants could see how they might modify the TEI to match their own needs more precisely.) The hands-on practice tended to generate questions that would not have arisen otherwise: for instance, concerning the semantics of markup, how to ensure consistency across projects, what level of encoding granularity is useful in different contexts, and similar questions of encoding strategy and its impact on the digital representation being created.

In a similar vein, and even more unexpectedly, we found that offering participants the ability to see their texts displayed had a remarkable effect on their engagement with the markup issues. In the early seminars we did not provide any mechanism for displaying the encoded texts (e.g. in a web browser) and although participants always asked what will it look like? we took this question as an opportunity to discuss the separation of content from formatting, and the use of descriptive markup to produce multi-purpose source data that could be displayed in a variety of ways. However, part way through the series we developed a simple CSS stylesheet to accompany our TEI encoding template, and included a brief session in which we showed participants how to modify the style information so that they could choose (for instance) to display all personal names highlighted in red. This experiment was a success not only in giving participants a sense of completion (of being able to see their texts), but also more importantly in giving them a very concrete sense of how display (and other functions) can only work with what the data provides: if one wants to see rhyme words highlighted, they first must be represented explicitly in the markup. Enabling participants to design a very simple publication in this way gave them a motivation to work more intensively on the markup driving it.

To support the seminar series we developed a set of materials to use in teaching the seminars themselves, and also for participants (and others) to use outside the seminar to refresh their memories or teach themselves further. All of these materials are permanently available from the WWP seminars site ( and are published under a Creative Commons Attribution-Noncommercial-ShareAlike license which permits free reuse.[slide 07]

The slides and lecture notes for all presentations given in these seminars (and in the other workshops we teach) are written in TEI, using a customized schema developed for this purpose. The source XML is converted to HTML slides and lecture note output using XSLT; the source and all derived files are published on the WWP site. We maintain a master set of materials which are reviewed and improved at intervals, and these are available at the WWP’s encoding resources page ( In addition, the specific versions used in each seminar are archived with the materials for that seminar, so that participants can go back and review the materials as they were presented at the event. This system permits us to make and preserve event-specific changes and references. The slide schema itself and the stylesheets used to transform and display the HTML are also available from the WWP site.

Since hands-on practice is so central to these seminars, the schema participants use for this work plays a very important role in shaping what they can do in their encoding and how they experience the TEI language in practice.We wanted to provide a customization for use in the seminars that would be particularly suited to the kinds of encoding we would be teaching: focused on the representation of historical documents and other research materials. The schema we used for the seminars evolved somewhat during the course of the series as we shifted our approach. In the earlier seminars, we provided two schemas: a very simple one for the introductory exercises containing only the most basic elements, and a more complex schema for the more advanced practice later in the seminar. However, we found that we had better results in most cases if instead of starting with a set of simple exercises, we asked participants to begin encoding their own sample documents from the start: it meant that the materials were more engaging and participants were more strongly motivated to figure out how to represent them. We still offered participants the option of starting with our simpler examples (see below) but few tended to choose this option. As a result, participants tended to “outgrow” the simple schema fairly quickly, and this occasionally introduced a slight logistical hiccup at the point where we needed to show participants how to change their schema. It proved more straightforward to provide a single schema that was complex enough to accommodate all of the features of interest, and have participants use that from the start. In practice the disadvantage we had anticipated (that participants would find the larger schema confusing) did not tend to materialize.

To accompany the schema, we also provided participants with a document template: a short, valid TEI file using the seminar schema. This enabled them to begin encoding without having to worry at the outset about which elements from the TEI header were required for validity; they could simply open the file and begin transcription. In our earlier versions, this document template included some sample encoding, demonstrating the elements for basic prose, verse, drama, and letters. However, we found towards the end of the series that participants almost never used these examples but either deleted them instantly, or left them as a kind of appendage in the document. In some cases, the deletion caused difficulties when participants deleted too much or too little, leaving partial elements behind and causing errors. For this reason we finally modified the template to contain no sample material at all. For future events we will provide instead an encoded sample text for reference purposes.

As mentioned above, we also found that participants benefited from being able to see their encoding displayed in a browser. We developed a skeletal CSS file for this purpose, and showed participants how to add style information to it to take advantage of their specific encoding.

We also developed a number of handouts on various topics to assist participants in the hands-on practice [slide 08] All of the handouts are available for download from the WWP seminars site.

Finally—although our focus in planning and teaching these seminars has been chiefly on the text encoding content, we’ve also made some observations concerning the peculiar ergonomics and logistics of teaching this kind of seminar—a hybrid of presentation, discussion, and hands-on practice which requires a variety of different kinds of interaction. The technical logistics are of course important but also self-explanatory for this audience and we won’t belabor them here. But in fact our experience suggests that other kinds of logistical factors played a greater role in the success of these events. Because of the diversity of the audiences (including librarians, junior and senior faculty, students, technical staff, and others) it was important at each event to create an atmosphere where everyone could feel comfortable asking questions, expressing ignorance, and contributing a variety of expertise. The ergonomics of space turned out to have a significant impact on this level of comfort: the most successful seminars were those where the discussion took place around a central table, rather than with chairs or computers placed in rows. Similarly, very large rooms where participants were seated at a distance from one another (or with a great deal of space overhead) tended to diminish the vibrancy of discussion. These effects are not unexpected but their significance in this case was perhaps magnified, since a seminar on text encoding is not a genre of pedagogical encounter that participants would find familiar; we could not count on normal habits and expectations to provide a basis for interaction. When participants felt (because of proximity and sitting face to face) that they were all in this together, they seemed more inclined to take conversational risks, respond to each other’s observations, and generally work harder at the social encounter. When they were more isolated from one another, they seemed more inclined to leave the work to the instructors, and to treat the event as one of passive delivery rather than participation.

To evaluate the seminars, we created an online survey which we asked all participants to complete following each seminar. The total response rate was 34% (74 responses from a total of 220 participants) which was lower than we had hoped. Responses were drawn from across the participant population, however, with faculty, students, and staff responding in roughly the same proportion as their attendance overall. It’s probably fair to assume that responses came disproportionately from those who felt most positively about the seminar and were thus motivated to respond, so the results must be read with an admixture of caution. Nonetheless we felt the feedback was encouraging (and it was confirmed by the informal comments we received during the seminars). Each faculty member or graduate student who took away some greater interest in concepts of scholarly text representation and the use of digital markup is in a position to communicate that interest to many others through teaching and other interactions; library and IT staff are similarly able to point colleagues and collaborators in the direction of TEI if appropriate. The most important effects of the series may thus operate over the long term.

Quick overview of the survey feedback:

Overall, the feedback on the design and value of the seminars was positive; respondents indicated on the whole that the pace, length, and level of technical detail were just about right [slide 09] and also that the relevance and likely impact on their work were high [slide 10].

Addressing the question what was the most useful concept or information you took away from the seminar?, respondents tended to answer in general rather than specific terms: the following responses were typical [slide 11].

These kinds of answers suggest that the seminar did succeed at its most basic and important goal of giving participants an overall appreciation of the importance of markup of this kind for humanities scholarship, and a sense of the informational complexity that underlies high-quality digital resources. Some particularly telling responses also suggested that some participants, at least, understood the critical role humanities scholars need to play in developing these resources: [slide 12].