Punctuation: general
In general, punctuation may be transcribed as part of the text content, using standard keyboard characters. There are a few exceptions:
- special marks of punctuation, such as curly quotes or em-dashes, for which you need to use either an entity reference or a Unicode character reference. See example 1.
- punctuation which is used as a delimiter within the text, and which might be altered with changes in display (for instance, to set off a stage direction or a turned-down line of poetry); in these cases, it makes more sense to encode the punctuation using the rend attribute, to give more control over its representation. See example 2.
- punctuation which by its nature needs to be treated variably: for instance, end-of-line hyphens, which become irrelevant if the text is relineated. For these, we recommend using the standard ISO entity reference ­ for soft hyphens. See for details.
- punctuation which serves a function as part of the markup system, and must be represented with an entity reference to avoid confusion; see Ordinary characters requiring special treatment for details.
Some projects in which punctuation is very important (for instance, manuscript transcription projects working on texts in which punctuation plays a very significant role) find it useful to make fine-grained distinctions between different functions for individual punctuation marks: for instance, a period used to end a sentence, to mark an abbreviation, to delimit a reference, etc. If such discriminations are necessary, they can be accomplished by establishing a set of entity references that map to the different functions being represented. These entity references can then be resolved as appropriate in a given circumstances—either as an ordinary mark of punctuation, or as a special character that triggers a special behavior.
For ordinary transcription purposes, we do not recommend this level of detail.