Quotations [053]


Encoding of quotations, distinction between use of <q> and <quote>, treatment of quotation marks


The WWP uses <q> to encode direct speech and reported thought. We use <quote> to encode material which a passage of text identifies as originating outside of itself (regardless of where the material actually originates), or which a speaker within the text identifies as originating outside of his/her current utterance, including proverbs, mottoes, sayings, etc.

The WWP does not use the direct=, who=, or type= attribute on <q>.

When it is convenient to break a single quotation into multiple XML elements (to avoid overlap with other XML elements, e.g. verse lines), we use the part= attribute of <q> or <quote>, or if that is undesirable (because of nested <q> or <quote> elements, for instance, which might confuse the part= sequence), we identify each segment with an id= and use next= and prev=. We only use part= or next/prev= in cases where the <quote> or <q> elements are artificially broken to avoid overlap, not in cases where a quotation is interrupted by the text itself (for instance, with “she said” or other interventions).

The WWP does not provide any attribution of quotes, aside from encoding whatever information is provided in the text itself. For the present the WWP also makes no claims about the accuracy of quoted material or its authenticity (whether there actually is a source for the quote). Future research in this area would need to take into account issues of which edition/copy of the source is being quoted, and what degrees of accuracy are of scholarly interest.

If a text quotes itself, or if a character within a text quotes material from elsewhere within the text (for instance, a poem by another character), the same rules apply: the quoted material comes from outside of the current utterance, or outside of the current passage, and hence is encoded with <quote>. If it is also direct speech, it should be encoded with <q> as well.

Quoted speech in our corpus may be marked in a number of ways, or may even be left unmarked. In some cases this makes it difficult to be certain where a given quotation begins and ends. In addition, the conventions for signalling direct and indirect speech have changed over the centuries and our corpus contains transitional forms which may be hard to assign to one category or the other. Our strategy for deciding what instances of quoted speech to encode can be summed up as follows:

1. Encode all quoted speech which is renditionally distinct, regardless of whether it is direct or indirect speech. Rendition in this case includes the use of quotation marks, as well as the use of distinctive fonts (all caps, small caps, italics, black letter).

2. Also encode all instances of direct speech, whether renditionally distinct or not. Direct speech here means any speech which occurs in the first person singular or plural.

These two conditions can be expressed in a little table:

                      INDIRECT DIRECT

REND encode encode

NO REND don’t encode encode

3. Thus, examples in which the only indication of a quote is a phrase such as “she said”, without any renditional mark or any other clue in the text (such as a shift to the first person) should not be encoded using <q>. For example, in the sentence “She said that she would never taunt the chicken again” no <q> would be necessary.

4. In examples in which we are not sure exactly where the quoted material begins or ends, we encode the minimum text about whose quotedness we are certain. The rationale for this approach is that people who wish to find all instances of quoted speech so that they can compare them to verbal patterns present in non-quoted speech will want to know for sure that the material tagged with <q> is all quoted material. Also, people who are trying to locate all instances of quoted speech so that they can look at them with their eyes can still find these minimum-extent quotes, and then decide whether there is additional material to be considered. Since most quoted material is actually not uncertain, searching functions and so on will on the whole be supported.

The WWP’s nesting of <q> and <quote> with <l> and <lg> is for practical purposes only and does not express any analytical perspective on the relationship of these things (for instance, on whether a speaker is quoting a poem or being quoted in a poem).


Example 1. Direct speech encoded with <q>:
<q rend="pre(&ldquo;)post(&rdquo;)">Bless me!</q> he said, looking about him, <q rend="pre(&ldquo;)post(&rdquo;)">I never did.</q>

Example 2. Quoted material encoded with <quote>:
If we reflect whether <quote rend="slant(italic)">to be, or not to be</quote>, we are surely lost.

Example 3. Quotations whose status is uncertain should still be encoded with <quote>:
<p>I then spoke to him plainly, saying <q rend="pre(&ldquo;) post(&rdquo;)">If I were in your shoes, I would not <quote rend="pre(&lsquo;) post(&rsquo;)">taunt the chicken</quote> with such vainglory.</q></p>
even though we have no idea where the phrase “taunt the chicken” comes from. (Note that this usage would need to be carefully distinguished from <gloss> or <term>, which would be appropriate if it seemed that the phrase in question was a technical term rather than a quotation.)

list all entries