The WWP uses the elements specified in Chapter 18 of TEI to encode damaged, unclear, or illegible text. Most of these elements can be nested inside others, in cases where text moves through varying degrees of illegibility, or where deliberate deletions and other forms of obscurity overlap. The possibilities are too many to enumerate here, but when you come upon cases that seem to require them, you can use your own judgement about what is appropriate, based on the role of each of the following individual elements.
Used to indicate printing problems or errors in the original text. This includes uninked letters and omitted letters. To use <sic>, you need to be fairly certain that the problem lies in the original. This is quite likely if you see a missing letter and the letters around it are fairly dark (so you know it’s not just the xerox quality); if you are working with a very faint xerox and it’s not certain whether it’s the original or the reproduction that’s at fault, use one of the other elements below.
Example: He wh m you seek is not here.
He wh<sic corr="o">␣</sic>m you seek is not here.
Note the use of the ␣ to explicitly mark the space left between letters.
Used to indicate deliberately deleted text which is still legible. Since this element implies an assumption of human involvement, it should only be used when you are fairly certain that the deletion is deliberate and committed by human agency. If there’s just a big blot on the page it would be better to use one of the elements below. If text has also been added to replace the deletion, use <add> for the added text. If parts of the deletion are unreadable, these can be additionally tagged with <gap>, with reason="deleted" (see below).
Example: She was such a nasty ungovernable prophet [“ungovernable” crossed out but legible]
She was such a nasty <del>ungovernable</del> prophet.
Used to indicate that a passage of text is partially illegible, and that the tagged text is conjectural; signals the reader to regard the information provided as somewhat uncertain. You should use this where you can be reasonably sure your conjecture is accurate. If you really cannot be sure, use one of the elements below. For this element, we will use the reason= attribute but not the cert= attribute.
Example (bracketed letter is unclear; not sure whether it’s “these” or “those”):
When she spoke to th[ ]se prophets...
When she spoke to th<unclear reason="flawed reproduction">o</unclear>se prophets...
(The choice of reading would be a best guess based on context.)
Values for reason= on <unclear>:
"damaged": for cases where the original page has been damaged in some way (torn, folded, creased)
"obscured": for cases where the page is intact, but the text is obscured or unclear for some reason having to do with the original text (partial deletion, stain on the original page, poorly inked type)
"flawed-reproduction": for cases where the reproduction causes unclarity but we have reason to believe that the original is still fully legible (unclear gutters, edge cut off by xeroxing or filming, darkening which results in a black fog on the page [microfilm or xerox underexposure], an object superimposed on the original when filmed or xeroxed. We should assume that problems lie with the reproduction unless we are fairly sure they are problems with the original.
Used to indicate that a passage of text is completely illegible, and that the tagged text is supplied by the editor or transcriber based either on pure supposition or from some other source. If from another source (e.g. another copy of the text), this can be indicated using the source= attribute. In our case, we would only indicate a separate source if we used evidence from a different copy of the source (i.e. a copy from a different library, not a different reproduction of the same copy). If we check a transcription against the source text, any readings from the source text can be added silently without using <supplied>.
The values for reason= on <supplied> are more limited than those for <gap>, because we are supplying text whose accuracy we’re reasonably sure of, so the reader has less of a need for information about the problem. In the case of <gap>, our explanation is taking the place of what the reader really wants, so it needs to be more informative. The values for <supplied> are thus designed to let the reader know whether he/she can expect to be able to check the reading by consulting the original or another reproduction of it (in the case of “illegible” and “flawed-reproduction”) or whether the original itself is compromised.
The value “illegible” is used instead of “obscured” (which is used for <unclear>) because in the case of <unclear> the text is not fully illegible; “obscured” is intended to indicate that there is a diminished degree of visibility and confidence in the reading, while “illegible” indicates that the text cannot be read at all (hence text is being supplied from elsewhere or upon supposition).
"damaged": use this value where the physical text is damaged (torn, folded, burned)
"illegible": use this value where the page is intact but the original text is illegible for some reason other than damage (e.g. page is illegibly stained, letter is uninked, a bug was squashed on the page)
"flawed-reproduction": use this value where the text is illegible because of some problem with the reproduction technology (blurring, gutter problems, edge of page not copied)
Used to indicate that a passage of text is completely illegible or omitted, and that no conjecture is being made about the omitted material. This is an empty element. It could be used where text is illegibly deleted or obscured, where pages are torn out or cut off by the xerox, or even where they are folded under or creased. In cases where our reproduction is at fault, we would use <gap> to indicate places which need to be checked, and would try to get a better reproduction which would eliminate the illegibility. See the entry on <gap> for the appropriate attribute values.