SPV Structure Member Format (PSPP)

D.1 Structure Member Format

A structure member lays out the high-level structure for a group of output items such as heading, tables, and charts. Structure members do not include the details of tables and charts but instead refer to them by their member names.

Structure members’ XML files claim conformance with a collection of XML Schemas. These schemas are distributed, under a nonfree license, with SPSS binaries. Fortunately, the schemas are not necessary to understand the structure members. The schemas can even be deceptive because they document elements and attributes that are not in the corpus and do not document elements and attributes that are commonly found in the corpus.

Structure members use a different XML namespace for each schema, but these namespaces are not entirely consistent. In some SPV files, for example, the viewer-tree schema is associated with namespace ‘http://xml.spss.com/spss/viewer-tree’ and in others with ‘http://xml.spss.com/spss/viewer/viewer-tree’ (note the additional viewer/). Under either name, the schema URIs are not resolvable to obtain the schemas themselves.

One may ignore all of the above in interpreting a structure member. The actual XML has a simple and straightforward form that does not require a reader to take schemas or namespaces into account. A structure member’s root is heading element, which contains heading or container elements (or a mix), forming a tree. In turn, container holds a label and one more child, usually text or table.

The following sections document the elements found in structure members in a context-free grammar-like fashion. Consider the following example, which specifies the attributes and content for the container element:

container
   :visibility=(visible | hidden)
   :page-break-before=(always)?
   :text-align=(left | center)?
   :width=dimension
=> label (table | container_text | graph | model | object | image | tree)

Each attribute specification begins with ‘:’ followed by the attribute’s name. If the attribute’s value has an easily specified form, then ‘=’ and its description follows the name. Finally, if the attribute is optional, the specification ends with ‘?’. The following value specifications are defined:

(a | b | …)

One of the listed literal strings. If only one string is listed, it is the only acceptable value. If OTHER is listed, then any string not explicitly listed is also accepted.

bool

Either true or false.

dimension

A floating-point number followed by a unit, e.g. 10pt. Units in the corpus include in (inch), pt (points, 72/inch), px (“device-independent pixels”, 96/inch), and cm. If the unit is omitted then points should be assumed. The number and unit may be separated by white space.

The corpus also includes localized names for units. A reader must understand these to properly interpret the dimension:

inch: 인치, pol., cala, cali
point: пт
centimeter: см

real

A floating-point number.

int

An integer.

color

A color in one of the forms #rrggbb or rrggbb, or the string transparent, or one of the standard Web color names.

ref

ref element

ref(elem1 | elem2 | …)

The name from the id attribute in some element. If one or more elements are named, the name must refer to one of those elements, otherwise any element is acceptable.

All elements have an optional id attribute. If present, its value must be unique. In practice many elements are assigned id attributes that are never referenced.

The content specification for an element supports the following syntax:

element: An element.
a b: a followed by b.
a | b | c: One of a or b or c.
a?: Zero or one instances of a.
a*: Zero or more instances of a.
b+: One or more instances of a.
(subexpression): Grouping for a subexpression.
EMPTY: No content.
TEXT: Text and CDATA.

Element and attribute names are sometimes suffixed by another name in square brackets to distinguish different uses of the same name. For example, structure XML has two text elements, one inside container, the other inside pageParagraph. The former is defined as text[container_text] and referenced as container_text, the latter defined as text[pageParagraph_text] and referenced as pageParagraph_text.

This language is used in the PSPP source code for parsing structure and detail XML members. Refer to src/output/spv/structure-xml.grammar and src/output/spv/detail-xml.grammar for the full grammars.

The following example shows the contents of a typical structure member for a DESCRIPTIVES procedure. A real structure member is not indented. This example also omits most attributes, all XML namespace information, and the CSS from the embedded HTML:

<?xml version="1.0" encoding="utf-8"?>
<heading>
  <label>Output</label>
  <heading commandName="Descriptives">
    <label>Descriptives</label>
    <container>
      <label>Title</label>
      <text commandName="Descriptives" type="title">
        <html lang="en">
<![CDATA[<head><style type="text/css">...</style></head><BR>Descriptives]]>
        </html>
      </text>
    </container>
    <container visibility="hidden">
      <label>Notes</label>
      <table commandName="Descriptives" subType="Notes" type="note">
        <tableStructure>
          <dataPath>00000000001_lightNotesData.bin</dataPath>
        </tableStructure>
      </table>
    </container>
    <container>
      <label>Descriptive Statistics</label>
      <table commandName="Descriptives" subType="Descriptive Statistics"
             type="table">
        <tableStructure>
          <dataPath>00000000002_lightTableData.bin</dataPath>
        </tableStructure>
      </table>
    </container>
  </heading>
</heading>