Next: The table Element, Previous: The text Element (Inside container), Up: Structure Member Format [Contents]
html Elementhtml :lang=(en) => TEXT
The element contains an HTML document as text (or, in practice, as
CDATA). In some cases, the document starts with <html> and
ends with </html>; in others the html element is
implied. Generally the HTML includes a head element with a CSS
stylesheet. The HTML body often begins with <BR>.
The HTML document uses only the following elements:
htmlSometimes, the document is enclosed with
<html>…</html>.
brThe HTML body often begins with <BR> and may contain it as well.
biuStyling.
fontThe attributes face, color, and size are
observed. The value of color takes one of the forms
#rrggbb or rgb (r, g,
b). The value of size is a number between 1 and 7,
inclusive.
The CSS in the corpus is simple. To understand it, a parser only
needs to be able to skip white space, <!--, and -->, and
parse style only for p elements. Only the following properties
matter:
colorIn the form rrggbb, e.g. 000000, with
no leading ‘#’.
font-weightEither bold or normal.
font-styleEither italic or normal.
text-decorationEither underline or normal.
font-familyA font name, commonly Monospaced or SansSerif.
font-sizeValues claim to be in points, e.g. 14pt, but the values are
actually in “device-independent pixels” (px), at 96/inch.
This element has the following attributes.
lang ¶This always contains en in the corpus.
Next: The table Element, Previous: The text Element (Inside container), Up: Structure Member Format [Contents]