Next: Long String Value Labels Record, Previous: Very Long String Record, Up: System File Format [Contents][Index]
This record, if present, indicates the character encoding for string data, long variable names, variable labels, value labels and other strings in the file.
/* Header. */
int32 rec_type;
int32 subtype;
int32 size;
int32 count;
/* Exactly count
bytes of data. */
char encoding[];
int32 rec_type;
Record type. Always set to 7.
int32 subtype;
Record subtype. Always set to 20.
int32 size;
The size of each element in the encoding
member. Always set to 1.
int32 count;
The total number of bytes in encoding
.
char encoding[];
The name of the character encoding. Normally this will be an official IANA character set name or alias. See http://www.iana.org/assignments/character-sets. Character set names are not case-sensitive, but SPSS appears to write them in all-uppercase.
This record is not present in files generated by older software. See
also the character_code
field in the machine integer info
record (see character-code).
When the character encoding record and the machine integer info record
are both present, all system files observed in practice indicate the
same character encoding, e.g. 1252 as character_code
and
windows-1252
as encoding
, 65001 and UTF-8
, etc.
If, for testing purposes, a file is crafted with different
character_code
and encoding
, it seems that
character_code
controls the encoding for all strings in the
system file before the dictionary termination record, including
strings in data (e.g. string missing values), and encoding
controls the encoding for strings following the dictionary termination
record.
Next: Long String Value Labels Record, Previous: Very Long String Record, Up: System File Format [Contents][Index]