Metadata is the term used for data that describes other data, information essential for gaining a full understanding of the data's
meaning and use. This typically includes a description of specific types of data that a field is designed to hold, such as numbers,
text or dates, and the binary format used to store the data.
In NDAD, extensive metadata is collected about data tables and data fields. Data type
descriptions are based on a common range of possible data types. Field attributes describe other
characteristics of the data, including lists of valid values or ranges of values.
Data type descriptions
The following terms are used to describe the data types of fields in the piece-level descriptions in the finding aids.
Missing value
In many datasets, a special value is needed to represent data which is, for one reason or another, not known or not
recorded. This is usually referred to as a 'missing value'. The missing value is sometimes a particular number (such as
-99), a special character (such as a question mark or asterisk) or simple a blank space. If the same missing value is used
for every field in a table, this is indicated at the head of the field descriptions.
Integer (Byte)
A field described as Integer (Byte) is a whole number which can be stored in a single 8-bit byte. This means that it can
take values from -128 to +127.
Positive Integer (Byte)
A field described as Positive Integer (Byte) is a whole, non-negative number which can be stored in a single 8-bit byte.
This means that it can take values from 0 to 255.
Integer (Short)
A field described as Integer (Short) is a whole number which can be stored in two 8-bit bytes. This means that it can take
values from -32,768 to +32,767
Positive Integer (Short)
A field described as Positive Integer (Short) is a whole, non-negative number which can be stored in two 8-bit bytes. This
means that it can take values from 0 to 65,535.
Integer
A field described as Integer is a whole number which can be stored in four 8-bit bytes. This is the most common format in
which we store integers with present-day computers. Such a field can take values from -2,147,483,648 to +2,147,483,647.
Positive Integer
A field described as Positive Integer is a whole, non-negative number which can be stored in four 8-bit bytes. This is the
most common format in which we store non-negative integers with present-day computers. Such a field can take values from 0
to 4,294,967,295.
Single-Precision Floating Point
A field described as single-precision floating point is a floating-point or real number - that is, one which consists of a
whole number and a possible fractional quantity, such as 1.283 or -1000.95. 'Single-precision' means that the number is
stored in four bytes. This means that the number can only have 5 digits of precision, or significant places. In other
words, the numbers 1.28356 and 1.28357 will probably have the same representation within the computer. It is not possible
to specify the precision exactly in decimal terms, as the binary format which is used to represent the numbers internally
to the computer does not translate to and from decimal notation exactly.
Double-Precision Floating Point
A field described as double-precision floating point is a floating-point or real number - that is, one which consists of a
whole number and a possible fractional quantity, such as 1.283 or -1000.95. 'Double-precision' means that the number is
stored in eight bytes. This means that the number can have approximately 11 digits of precision, or significant places.
Contrast this with single-precision floating point, where only five digits can be stored accurately. Floating-point
numbers are used most often in scientific applications, and double-precision formats are by far the most widely used.
Variable and Fixed Length strings
Variable and fixed length strings are both types of field whose value is a simple string of characters. We use this to
describe all field formats except numeric fields (described above) and logical fields (described below.) Fixed-length
strings will always have the same number of characters in them. Variable length strings may be empty (i.e. contain no
characters, not even a blank) or contain any number of characters.
Logical
Logical fields represent simple truth values - Yes or No, True or False. Logical fields can only ever take one of two
values. This may be 1 or 0 (where 1 is typically used to represent Yes or True, and 0 represents No or False), it may be Y
and N, T and F, or even Purple and Orange. If values other than 0 and 1 are used in the field, the attributes column
describes the values used to represent true and false for the field.
length
If a field is known to be a fixed number of characters or bytes, the 'length' describes the number of characters or bytes
used to store the field.
Field attributes
The piece-level descriptions of fields contain a column marked 'attributes' which describes certain constraints which may apply to
a field. The possible terms and constraints which may appear are described here.
Repeat counts
Sometimes a set of related fields which occur more than once in a record are not given individual names. Rather, the field
is named, and described as occurring more than one time. This form of description is more common in older databases (those
from the 1960s and 1970s) than in more modern, relational systems. If a field is repeated in this way, the repeat count
will be shown in the attributes section. It may either be a constant value ("Repeated 5 times per record") or a variable
value ("Repeated NUMDAYS times per record"). In the latter case, the number of times the field is repeated is itself
dependent upon the value of some other field - in this case, the NUMDAYS field.
Ranges
If the data model provided to us puts constraints on the range of values a numeric field may take, we describe them here.
As an example, if one field records the month in which an event took place, its range will typically be 1 to 12. As part
of our work in preparing the data for access and preservation, we will have checked that field values fall within the
range specified for them. Any exceptions we discovered will be noted in the table descriptions (at what ISAD(G) calls
'item level') in the dataset catalogue. Note that if a field also has a special value for missing observations, the
missing value will usually not be part of the valid range.
Truth values
For logical fields, if the values used for true and false are anything other than 0 for false and 1 for true, the specific
values used will be described.
Choice sets
Some variables are constrained so that they may only take one of a fixed set of values. If this is the case, the values
will be listed here (if there are not too many of them) or the fact that the value is constrained to be one of a list will
be noted.
Missing values
If a field uses a special value to represent a missing or unknown observation, the special value will be described here.
If a single special value applies to the whole table, this is not noted as part of each field description, but as a
prelude to the description of the fields.
Compulsory values
If a field is not allowed to take a missing value (but other fields in the table are), then the words "Must have a value"
will appear in the attributes column.
Anonymisation
If a field is anonymised when the table is viewed, then the words "Anonymised for access" appear in the attributes column
of the field descriptions. Such fields will not appear when the data is being viewed, and will not be present on copies of
the data that we supply.