Centrum voor Teksteditie en Bronnenstudie
Centre for Scholarly Editing and Document Studies
a research centre of the Royal Academy of Dutch Language and Literature
5. Letter-specific textual features
Up: Contents Previous: 4. The DALF header Next: 6. Correlations of logical and physical structures
A searchable electronic textbase of correspondence material not only requires specific provisions for the encoding of letter-specific metadata. Also the transcription proper of the letters calls for means to encode letter-specific features. Although many of the features encountered in letters can be covered with standard TEI elements (in particular, those described for the transcription of primary sources in chapter 18. Transcription of Primary Sources of the TEI P4 Guidelines), there are some very specific ones for which the semantics of TEI elements would have to be stretched to an intolerable degree. Obviously, there are some structural elements that are unique to letters, like the envelope and postscripts. Others are more generally bound to primary manuscript materials, and thus occur very frequently in letters, such as calculations, pre- and post-printed materials, and decorative elements. One element which is not specific for manuscript material, but a refinement of the TEI provisions to mark dimensions, is taken over from the Master DTD.
The typical letter is delivered within an envelope. Often, when letters are stored, their accompanying envelopes are stored with them. For an encoder, there are good reasons to provide transcriptions of letters with a transcription of their envelopes. One is that envelopes may contain valuable information for the identification of letters. When the letter is lacking some of the indicators of the communicative context that are important for an unambiguous identification of a letter (communicative participants, time and place of writing; see 4.2. The letter heading: <letHeading>), there is good chance they still can be deduced from the postal information on the envelope. Furthermore, the envelope may contain significant information, apart from the postal data. Some authors create on their envelopes pieces of art in their own right, that may closely relate and contribute to the letter content. Some receivers may use the envelope to write quick notes about the letter content or other contextual circumstances. Envelopes may even contain drafts of successive letters.
From a technical point of view, envelopes have certain typical formal and semantic features that justify the adoption of an own tag, e.g. the occurrence of a postmark, addresses on front and/or back, possibly additional plain text and/or graphics, or no text at all, and even the containment of another envelope. Therefore, the <envelope> tag is introduced as a direct child of the TEI <text> element, at the same structural level as the TEI elements <front>, <body> and <back>. It may occur anywhere before, in between or after those.
However desirable the adoption of a special <envelope> element may be, care has to be taken with regards to its semantics. After all, what holds for a letter, also holds for the envelope of a letter: they come in many forms and flavours, pushing the encoder to the question of what is an envelope. Is it the separate piece of paper that holds the postal information of a typical letter? Should the box containing a parcel then be regarded as an envelope, too? What to do with postcards, that most often are not packaged in a separate envelope, but still manage to reach the right destination? Perhaps the best option is not to take a strictly physical criterium, but instead abstract this into a functional view, thus defining an envelope as ‘the part/room/space reserved for postal information’. This definition includes the postal information, as well as all other possible contents of that functional part of the letter. In order to specify the kind or format of envelope concerned, the <envelope> element has one special attribute apart from the global ones (see 3.3. Global attributes):
Note that the functional approach to the encoding of an envelope can be illustrated with the ‘envelope part’ of a postcard, which is less physically distinct from that of a normal letter:
The envelope can contain following elements:
These elements may occur in any order and frequency, provided that at least one of them is present. The recursive <envelope> element enables the encoding of envelopes that are enclosed within another envelope. Note that the <note> element may well be the only element of an envelope (in contrast to the additional <note> elements in the header elements described thus far): this allows to explicate the encoding of an envelope that has no informational contents.
The <envPart> element contains all data that appear on one side of the envelope. To indicate the part concerned, one special attribute is declared, apart from the global ones (see 3.3. Global attributes):
An envelope part can hold postal information like addresses of sender and receiver and a postmark that may be important for the identification of the letter. Also, additional textual information can be encoded within an envelope part. The following elements are used:
These elements may occur in any order and frequency, provided that at least one is present. .
The <postmark> element contains a transcription of the postmark on the letter. This may be useful for the identification of the date and place of writing, when they are not stated anywhere else. Also, graphical aspects of the postmark can be described. This can be done with the following elements:
These elements may occur in any order and frequency, provided that at least one is present. The <note> element may only be used when at least one of the other elements is present.
The following example illustrates how an envelope can be encoded that contains an address of the receiver, a postmark and the sentence "Dank u, postbode!" on the front side, and the address of the sender on the back side. Furthermore, it holds a postcard that has no postal or other information on its address part.
<envelope type="220x110"> <envPart side="front"> <address type="receiver"> <addrLine>Stijn Streuvels</addrLine> <addrLine>Lijsternest</addrLine> <addrLine>Ingooigem</addrLine> </address> <postmark> <placeName>Tielt</placeName> <date value="1924-10-24">24.10.'24</date> </postmark> <div> <p>Dank u, postbode!</p> </div> </envPart> <envPart side="back"> <address type="sender"> <addrLine>Lannoo uitgeverij</addrLine> <addrLine>Meulebeekschesteenweg 641</addrLine> <addrLine>Tielt</addrLine> </address> </envPart> <envelope type="postcard"> <note>this envelope contains no data</note> </envelope> </envelope>
<!ELEMENT text %om.RR; ((envelope | %m.Incl;)*, (front, (envelope | %m.Incl;)*)?, (body | group), (envelope | %m.Incl;)*, (back, (envelope | %m.Incl;)*)?)> <!ATTLIST text %a.global; %a.declaring;> <!ELEMENT envelope %om.RR; (envPart | envelope | note)+> <!ATTLIST envelope %a.global; type CDATA #IMPLIED> <!ELEMENT envPart %om.RR; ((address | postmark | div | %m.Incl;)+)> <!ATTLIST envPart %a.global; side (front | back | postcard) #IMPLIED> <!ELEMENT address %om.RR; ((%m.Incl;)*, ((addrLine, (%m.Incl;)*)+ | ((%m.addrPart;), (%m.Incl;)*)* ))> <!ATTLIST address %a.global; type CDATA #IMPLIED> <!ELEMENT postmark %om.RR; ((figure | placeName |date)+, note*)> <!ATTLIST postmark %a.global;>
Postscripts are a typical phenomenon for letters. Occurring after the closing formulae and salutation, they form a last addition to the contents of the letter. Moreover, the author often explicitly signals this additional status with the abbreviation ‘P.S.’.
Their formulaic use and meaning justify an own tag. Therefore, the <ps> element is adopted in the DALF DTD. Because it can appear only at the end of letters, or letter parts written by separate authors, it is specified as a member of the TEI element class divbot (see http://www.tei-c.org/P4X/ref-DIVBOT.html).
The <ps> element has only the global attributes. It can contain all elements from the TEI specialPara element class (see http://www.tei-c.org/P4X/ref-PESPECP.html). This is the same content model as for example the <note> and <add> elements, which should be broad enough to cater for all elements that can occur within postscripts.
<closer> <salute>Met vriendelijken groet</salute> <signed>(Styn Streuvels)</signed> </closer> <ps> <p id="xr2"><add id="add1"><abbr expan="postscriptum">P.S.</abbr> Ze jubileeren bij de firma Veen (60 jaar bestaan)<ref target="n8">8</ref> en er wordt me daarom gevraagd, door het comité: hoeveel geld ik daarvoor als feestgave wensch te geven! Zonderlinge zeden? Als ik nu eens vroeg: hoeveel ze voor mij beschikken als 75-jarige jubilaris!</add></p> </ps>
<!ELEMENT ps %om.RR; %specialPara;> <!ATTLIST ps %a.global;>
Modern correspondence, especially businesslike letters, may contain a lot of numerical data. The contents of such calculations may be very valuable for research. Formally, calculations are often set apart from running text, and it may be desirable to mark them with explicit encoding features. This provides researchers with greater control over the textual features they want to study.
Calculations have an internal structure the semantics of which cannot be captured sufficiently with the standard TEI <num> element (see http://www.tei-c.org/P4X/ref-NUM.html). We considered the option to incorporate MathMl into the DALF DTD, which is an existing W3C standard providing a specialised tagset for mathematical formulae (see http://www.w3.org/TR/MathML2/). In chapter 22.2 of the TEI P4 guidelines directions are given for specifying external tagsets as XML notations that can be used in the <formula> tag. However, testing that mechanism with the literal examples given in that chapter turned out unsuccessful. Further investigation of postings on the TEI public mailing list (see http://listserv.brown.edu) showed that other TEI users encountered the same problem, and learned that the incorporation mechanism itself does not provide the inclusion functionality we had in mind. These troubles and the unwieldy suggestions to get around the incorporation of external tagsets like MathMl [note3], and the complexity of the MathMl standard itself made this option less favourable than devising a specialised element that can encode at least some of the semantic structure of calculations in a rudimentary way.
This specialised element for calculations adopted in the DALF DTD is <calc>. Because calculations may occur on virtually any physical place on a letter or envelope, the <calc> element is declared as member of the common, inter and tpParts element classes [note4], defined respectively at http://www.tei-c.org/P4X/ref-COMMON.html, http://www.tei-c.org/P4X/ref-INTER.html and http://www.tei-c.org/P4X/ref-TPPARTS.html.
The <calc> element has only the global attributes (see 3.3. Global attributes). Calculations may partly or entirely consist of plain prose (thus possibly needing some phrase-level TEI elements), and can contain embedded calculations. The basic structure, however, is made up of one or more arguments, an operator, and a result. This structure is captured in following elements:
These elements may occur in any order and frequency. Arguments and results have a similar inner structure: they can contain plain prose text and possibly phrase-level TEI elements, or another calculation, and possibly some more or less free-standing arguments. Operators are predominantly in PCDATA form, but allow phrase-level TEI elements to appear within.
The following example shows how a calculation in the original can be encoded with the <calc> element. Note the occurrence of a mixture of PCDATA and phrase-level elements within the <calc> (sub)structure(s), and the encoding of the embedded <calc>, as a child of the second argument:
<calc> <arg>969 <abbr expan="exemplaren">ex.</abbr> (zie afrekening van 30.8.41)</arg> <oper>-</oper> <arg>138<abbr expan="exemplaren">ex</abbr> ( <calc> <arg>133 <abbr expan="exemplaren">ex.</abbr> verkocht</arg> <oper>+</oper> <arg>5<abbr expan="persexemplaren">persex.</abbr></arg> </calc>) </arg> <result><hi rend="double_underlined">831</hi><abbr expan="exemplaren">ex.</abbr></result> </calc>
<!ELEMENT calc %om.RR; (%phrase; | calc | arg | oper | result)*> <!ATTLIST calc %a.global;> <!ELEMENT arg %om.RR; (%phrase; | calc | arg)*> <!ATTLIST arg %a.global;> <!ELEMENT oper %om.RR; (%phrase;)*> <!ATTLIST oper %a.global;> <!ELEMENT result %om.RR; (%phrase; | calc | arg)*> <!ATTLIST result %a.global;>
Letters may be written (or printed) on paper (or other support material) containing pre-printed text like letterheads, form data, newspaper articles, ads and so on. There are also similar formal text elements, like stamps, that may be added after the composition of the letter. Such text fragments can be seen as part of the letter, but may need to be distinguished from more "authorial" parts of the letter, as they mostly have an impersonal character. It is imaginable that such material would be excluded from e.g. a linguistic study on the language of a writer, or selected in a study on stamps in letters.
The TEI tagset does not contain any element that can accurately indicate pre-printed text material. Post-printed material, like stamps, could possibly be tagged with the TEI <add> element. However, as that element is reserved for ‘letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector’, it is questionable whether mostly impersonal stamps can be regarded as genuine additions in that sense. Therefore, in order to provide consistent treatment for all pre- and post-printed material, a special element is presented in the DALF DTD: <print>. Since pre- and post-printed material may occur on virtually any physical place in a letter or on an envelope, the <print> element is declared as member of the Incl element class, as described at 7. Modifications to TEI element classes.
The <print> element has a number of attributes, apart from the global ones (see 3.3. Global attributes), that are borrowed from some TEI elements. It has the same attributes style, character and ink as the TEI <handShift /> element, to indicate particularities regarding the writing style, the font and ink used. Further, the hand attribute is copied from the TEI element <add>, to enable reference to a hand the encoder has defined earlier in the header under <hand>. The last attribute, type, enables the characterisation of the type of printed material, like ‘letterhead’, ‘form’, ‘newspaper’, ‘ad’, ‘stamp’ and so on:
The <print> element has no specific inner structure: it is used to encode plain PCDATA text. In order to allow normal textual subelements, the same content model as for the TEI elements <note> and <add> is used, viz. specialPara (see http://www.tei-c.org/P4X/ref-PESPECP.html).
<!ELEMENT print %om.RR; %specialPara;> <!ATTLIST print %a.global; style CDATA #IMPLIED ink CDATA #IMPLIED character CDATA #IMPLIED hand IDREF #IMPLIED type CDATA #IMPLIED>
Letters can contain all sorts of material which is strictly speaking non-textual. Authors can include illustrative or plainly decorative sketches in their letters, or even include real objects like dried flowers, hair curls etc.
The TEI scheme provides the possibility to mark decorative elements with the <figure> element (see http://www.tei-c.org/P4X/ref-FIGURE.html). However, in view of the archival requirements for DALF letters, we opted for a more comprehensive approach to mark up these text elements. As was explained in a previous section (4.3.6. Decorative elements: <decoration> and <paraphernalia>), the DALF header contains a <decoration> and <paraphernalia> element where the different decorative elements in the letter can be described and identified. Where those elements are encountered in the text, the encoder can indicate this and refer back to those definitions by means of the special DALF elements <deco /> and <paraph />. These differentiate between decorations, i.e. less-textual materials that can be regarded to have originated within the writing act; and paraphernalia, materials that did not originate within the writing act. Decorations can include drawings, schemes etc., whereas paraphernalia can include hair curls, dried flowers etc. Since these phenomena can occur very freely within their textual context, they have been defined as members of the TEI Incl class (see 7. Modifications to TEI element classes).
They are empty elements that merely indicate the occurrence of decorative elements in the text. They must refer to the definition of the element concerned, and may also include a reference to a digitally scanned facsimile of the decorative item. Therefore, the following attributes are specified, apart from the global ones (see 3.3. Global attributes):
Note that it is up to the encoder to decide whether to encode decorative material that was already present on the paper before the letter was written or added afterwards, as a decorative element (<paraph />) or as pre/post-printed material (<print>). An important factor for this choice, however, is the function of the decorative material within the text. If the author of the letter clearly intended some relation between his writing and such a decorative element, it could be considered a real decoration (possibly within a <print> element). If on the other hand there does not seem to be a real relation, it can just be tagged as printed material without further specifications.
The following example illustrates how the occurrence of two decorations and one paraphernalia can be indicated and linked both to their definition in the header (that is assumed to have been provided in the manner specified in 4.3.6. Decorative elements: <decoration> and <paraphernalia> ) as well as to their visual representations:
<?xml version="1.0"?> <!DOCTYPE TEI.2 PUBLIC "//CTB//DTD Dalf 1.0 (based on TEI)//NL" "DALF.dtd" [ <!NOTATION jpeg PUBLIC 'ISO DIS 10918//NOTATION JPEG Graphics Format//EN' > <!NOTATION png PUBLIC '-//TEI//NOTATION IETF RFC2083 Portable Network Graphics//EN'> <!ENTITY fig1 SYSTEM "fig1.jpg" NDATA jpeg> <!ENTITY fig2 SYSTEM "fig2.jpg" NDATA jpeg> <!ENTITY obj1 SYSTEM "obj1.png" NDATA png> ]> <TEI.2> <text> ... <p>Een grote boom stond daar <deco decoRef="fig1" entity="fig1" />, met aan zijn voet een mooie papaver <paraph paraphRef="obj1" entity="obj1" />.</p> ... <p>Ik stel mij de pagina zo voor: <deco decoRef="fig2" entity="fig2" /></p> ... </text> </TEI.2>
<!ELEMENT deco %om.RR; EMPTY> <!ATTLIST deco %a.global; decoRef IDREF #REQUIRED entity ENTITY #IMPLIED> <!ELEMENT paraph %om.RR; EMPTY> <!ATTLIST paraph %a.global; paraphRef IDREF #REQUIRED entity ENTITY #IMPLIED>
The <dimensions> element is a specialised means to express the size of 3-dimensional objects taken over from the Master DTD (with slight adaptations). Since descriptions of dimensions can occur on various places inside normal running text, it is declared as member of the TEI class phrase (see http://www.tei-c.org/P4X/ref-PHRASE.html).
The dimensions expressed in the <dimensions> element can be specified with some special attributes, apart from the global ones (see 3.3. Global attributes):
The <dimensions> element can contain PCDATA and necessary phrase-level elements, or can contain specific elements that identify the length, width or depth:
These elements may occur freely in any combination and frequency. They all have a special attribute (apart from the global ones defined at 3.3. Global attributes) to express the measurement unit used:
They can contain PCDATA and necessary phrase-level elements as their contents:
... het kaft is <dimensions> <height units="cm">40 cm</height> hoog en <width units="cm">25 cm</width> breed </dimensions>
<!ELEMENT dimensions %om.RR; (%phrase;|height|width|depth)*> <!ATTLIST dimensions %a.global; %a.measured; type CDATA #IMPLIED> <!ELEMENT height %om.RR; (%phrase;)*> <!ATTLIST height %a.global; %a.measured;> <!ELEMENT depth %om.RR; (%phrase;)*> <!ATTLIST depth %a.global; %a.measured;> <!ELEMENT width %om.RR; (%phrase;)*> <!ATTLIST width %a.global; %a.measured;>
Up: Contents Previous: 4. The DALF header Next: 6. Correlations of logical and physical structures