Next: Dimensions, Previous: Table Settings, Up: Light Detail Member Format [Contents][Index]
Formats => int32[n-widths] int32*[n-widths] string[locale] int32[current-layer] bool[x7] bool[x8] bool[x9] Y0 CustomCurrency count( v1(X0?) v3(count(X1 count(X2)) count(X3))) Y0 => int32[epoch] byte[decimal] byte[grouping] CustomCurrency => int32[n-ccs] string*[n-ccs]
If n-widths
is nonzero, then the accompanying integers are
column widths as manually adjusted by the user.
locale
is a locale including an encoding, such as
en_US.windows-1252
or it_IT.windows-1252
.
(locale
is often duplicated in Y1, described below).
epoch
is the year that starts the epoch. A 2-digit year is
interpreted as belonging to the 100 years beginning at the epoch. The
default epoch year is 69 years prior to the current year; thus, in
2017 this field by default contains 1948. In the corpus, epoch
ranges from 1943 to 1948, plus some contain -1.
decimal
is the decimal point character. The observed values
are ‘.’ and ‘,’.
grouping
is the grouping character. Usually, it is ‘,’ if
decimal
is ‘.’, and vice versa. Other observed values are
‘'’ (apostrophe), ‘ ’ (space), and zero (presumably
indicating that digits should not be grouped).
n-ccs
is observed as either 0 or 5. When it is 5, the
following strings are CCA through CCE format strings. See Custom
Currency Formats in PSPP. Most commonly these are all
-,,,
but other strings occur.
A writer may safely use false for x7
, x8
, and x9
.
X0 only appears, optionally, in version 1 members.
X0 => byte*14 Y1 Y2 Y1 => string[command] string[command-local] string[language] string[charset] string[locale] bool bool bool bool Y0 Y2 => CustomCurrency byte[missing] bool[x17]
command
describes the statistical procedure that generated the
output, in English. It is not necessarily the literal syntax name of
the procedure: for example, NPAR TESTS becomes “Nonparametric
Tests.” command-local
is the procedure’s name, translated
into the output language; it is often empty and, when it is not,
sometimes the same as command
.
missing
is the character used to indicate that a cell contains
a missing value. It is always observed as ‘.’.
A writer may safely use false for x17
.
X1 only appears in version 3 members.
X1 => bool[x14] byte[show-title] bool[x16] byte[lang] byte[show-variables] byte[show-values] int32[x18] int32[x19] 00*17 bool[x20] bool[show-caption]
lang
may indicate the language in use. Some values seem to be
0: en, 1: de, 2: es, 3: it, 5: ko, 6: pl, 8:
zh-tw, 10: pt_BR, 11: fr.
show-variables
determines how variables are displayed by
default. A value of 1 means to display variable names, 2 to display
variable labels when available, 3 to display both (name followed by
label, separated by a space). The most common value is 0, which
probably means to use a global default.
show-values
is a similar setting for values. A value of 1
means to display the value, 2 to display the value label when
available, 3 to display both. Again, the most common value is 0,
which probably means to use a global default.
show-title
is 1 to show the caption, 10 to hide it.
show-caption
is true to show the caption, false to hide it.
A writer may safely use false for x14
, false for x16
, 0
for lang
, -1 for x18
and x19
, and false for
x20
.
X2 only appears in version 3 members.
X2 => int32[n-row-heights] int32*[n-row-heights] int32[n-style-map] StyleMap*[n-style-map] int32[n-styles] StylePair*[n-styles] count((i0 i0)?) StyleMap => int64[cell-index] int16[style-index]
If present, n-row-heights
and the accompanying integers are row
heights as manually adjusted by the user.
The rest of X2 specifies styles for data cells. At first glance this is odd, because each data cell can have its own style embedded as part of the data, but in practice X2 specifies a style for a cell only if that cell is empty (and thus does not appear in the data at all). Each StyleMap specifies the index of a blank cell, calculated the same was as in the Cells (see Cells), along with a 0-based index into the accompanying StylePair array.
A writer may safely omit the optional i0 i0
inside the
count(…)
.
X3 only appears in version 3 members.
X3 => 01 00 byte[x21] 00 00 00 Y1 double[small] 01 (string[dataset] string[datafile] i0 int32[date] i0)? Y2 (int32[x22] i0)?
small
is a small real number. In the corpus, it overwhelmingly
takes the value 0.0001, with zero occasionally seen. Nonzero numbers
with format 40 (see Value) whose magnitudes are
smaller than displayed in scientific notation. (Thus, a small
of zero prevents scientific notation from being chosen.)
dataset
is the name of the dataset analyzed to produce the
output, e.g. DataSet1
, and datafile
the name of the
file it was read from, e.g. C:\Users\foo\bar.sav. The latter
is sometimes the empty string.
date
is a date, as seconds since the epoch, i.e. since
January 1, 1970. Pivot tables within an SPV file often have dates a
few minutes apart, so this is probably a creation date for the table
rather than for the file.
Sometimes dataset
, datafile
, and date
are present
and other times they are absent. The reader can distinguish by
assuming that they are present and then checking whether the
presumptive dataset
contains a null byte (a valid string never
will).
x22
is usually 0 or 2000000.
A writer may safely use 4 for x21
and omit x22
and the
other optional bytes at the end.
Formats contains several indications of character encoding:
locale
in Formats itself.
locale
in Y1 (in version 1, Y1 is optionally nested inside X0;
in version 3, Y1 is nested inside X3).
charset
in version 3, in Y1.
lang
in X1, in version 3.
charset
, if present, is a good indication of character
encoding, and in its absence the encoding suffix on locale
in
Formats will work.
locale
in Y1 can be disregarded: it is normally the same as
locale
in Formats, and it is only present if charset
is
also.
lang
is not helpful and should be ignored for character
encoding purposes.
However, the corpus contains many examples of light members whose strings are encoded in UTF-8 despite declaring some other character set. Furthermore, the corpus contains several examples of light members in which some strings are encoded in UTF-8 (and contain multibyte characters) and other strings are encoded in another character set (and contain non-ASCII characters). PSPP treats any valid UTF-8 string as UTF-8 and only falls back to the declared encoding for strings that are not valid UTF-8.
The pspp-output
program’s strings
command can help
analyze the encoding in an SPV light member. Use pspp-output
--help-dev
to see its usage.
Next: Dimensions, Previous: Table Settings, Up: Light Detail Member Format [Contents][Index]