Next: The table Element, Previous: The text Element (Inside container), Up: Structure Member Format   [Contents]
html Elementhtml :lang=(en) => TEXT
The element contains an HTML document as text (or, in practice, as
CDATA).  In some cases, the document starts with <html> and
ends with </html>; in others the html element is
implied.  Generally the HTML includes a head element with a CSS
stylesheet.  The HTML body often begins with <BR>.
The HTML document uses only the following elements:
htmlSometimes, the document is enclosed with
<html>…</html>.
brThe HTML body often begins with <BR> and may contain it as well.
biuStyling.
fontThe attributes face, color, and size are
observed.  The value of color takes one of the forms
#rrggbb or rgb (r, g,
b).  The value of size is a number between 1 and 7,
inclusive.
The CSS in the corpus is simple.  To understand it, a parser only
needs to be able to skip white space, <!--, and -->, and
parse style only for p elements.  Only the following properties
matter:
colorIn the form rrggbb, e.g. 000000, with
no leading ‘#’.
font-weightEither bold or normal.
font-styleEither italic or normal.
text-decorationEither underline or normal.
font-familyA font name, commonly Monospaced or SansSerif.
font-sizeValues claim to be in points, e.g. 14pt, but the values are
actually in “device-independent pixels” (px), at 96/inch.
This element has the following attributes.
lang ¶This always contains en in the corpus.
Next: The table Element, Previous: The text Element (Inside container), Up: Structure Member Format   [Contents]