Tables of contents [042]


Encoding of tables of contents with <list> inside <div type="contents">, with internal encoding to capture the functional parts of the table of contents information, such as page numbers and titles.


The WWP uses <div type="contents"> to encode tables of contents, with <list> inside to mark the table’s structure. Each entry is encoded with <item>. We record the original pagination while also setting up a system for generating new pagination upon printout, so as to make the table of contents useful under all formatting conditions.

There are several basic components to the table of contents, not all of which are always present in every case:

The label for the contents item (e.g. a chapter number or other number): encoded with <label>, as the first child of <item>, if it appears. Often there is no label at all and this element will be unnecessary.

The name of the text chunk (e.g. a chapter name or poem title): encoded with <rs>. No internal encoding except for renditional markup is included; since the text chunk is simply a duplication for reference purposes of the words that “really” appear later in the actual text, encoding things like <persName> is redundant and misleading.

The page number on which this text chunk is to be found: encoded with <ref type="pageNum">, and a target= attribute pointing to the id= of the text chunk in question (not to the id= of the page onwhich it falls). If there is no page number given, then the pointer to the location in the main text is encoded with <ptr> and a target= attribute just as with <ref>. No type= attribute is given for <ptr>. Note that the function of the “pagenum” attribute value in <ref> indicates the type of reference, not the type of information to which the reference points.

The dots, dashes, spaces, or other leader that lie between the text chunk and the page number: we ignore this entirely. The relative alignment of the elements (<rs>, <ref>, etc.) is indicated using the rend= attribute.

The headings for the columns (usually something like “Chapter” and “Page”): The first instance is encoded with <head>; any subsequent repetitions of the column headings caused by page breaks are encoded with <mw type="listhead">.

If there are internal subgroupings within the table of contents (for example, if the volume is a series of novels each of which has chapters), we encode these as nested lists (the outermost <item>s would be the novels, and within each novel-level <item> would be contained a list of chapters.

The WWP distinguishes between tables of contents and indexes as follows: a table of contents is ordered by page number, while an index is ordered according to topic (usually alphabetically). Either may appear at the beginning or end of the book or section. For more information on indexes and a comparison with tables of contents, see 077.

list all entries