The National Archives NDAD
Print page Close window
 

Help

Metadata

 
Help Glossary Frequently asked questions Contact us Site map  

Metadata is the term used for data that describes other data, information essential for gaining a full understanding of the data's meaning and use. This typically includes a description of specific types of data that a field is designed to hold, such as numbers, text or dates, and the binary format used to store the data.

In NDAD, extensive metadata is collected about data tables and data fields. Data type descriptions are based on a common range of possible data types. Field attributes describe other characteristics of the data, including lists of valid values or ranges of values.

Data type descriptions

The following terms are used to describe the data types of fields in the piece-level descriptions in the finding aids.

Missing value

In many datasets, a special value is needed to represent data which is, for one reason or another, not known or not recorded. This is usually referred to as a 'missing value'. The missing value is sometimes a particular number (such as -99), a special character (such as a question mark or asterisk) or simple a blank space. If the same missing value is used for every field in a table, this is indicated at the head of the field descriptions.

Integer (Byte)

A field described as Integer (Byte) is a whole number which can be stored in a single 8-bit byte. This means that it can take values from -128 to +127.

Positive Integer (Byte)

A field described as Positive Integer (Byte) is a whole, non-negative number which can be stored in a single 8-bit byte. This means that it can take values from 0 to 255.

Integer (Short)

A field described as Integer (Short) is a whole number which can be stored in two 8-bit bytes. This means that it can take values from -32,768 to +32,767

Positive Integer (Short)

A field described as Positive Integer (Short) is a whole, non-negative number which can be stored in two 8-bit bytes. This means that it can take values from 0 to 65,535.

Integer

A field described as Integer is a whole number which can be stored in four 8-bit bytes. This is the most common format in which we store integers with present-day computers. Such a field can take values from -2,147,483,648 to +2,147,483,647.

Positive Integer

A field described as Positive Integer is a whole, non-negative number which can be stored in four 8-bit bytes. This is the most common format in which we store non-negative integers with present-day computers. Such a field can take values from 0 to 4,294,967,295.

Single-Precision Floating Point

A field described as single-precision floating point is a floating-point or real number - that is, one which consists of a whole number and a possible fractional quantity, such as 1.283 or -1000.95. 'Single-precision' means that the number is stored in four bytes. This means that the number can only have 5 digits of precision, or significant places. In other words, the numbers 1.28356 and 1.28357 will probably have the same representation within the computer. It is not possible to specify the precision exactly in decimal terms, as the binary format which is used to represent the numbers internally to the computer does not translate to and from decimal notation exactly.

Double-Precision Floating Point

A field described as double-precision floating point is a floating-point or real number - that is, one which consists of a whole number and a possible fractional quantity, such as 1.283 or -1000.95. 'Double-precision' means that the number is stored in eight bytes. This means that the number can have approximately 11 digits of precision, or significant places. Contrast this with single-precision floating point, where only five digits can be stored accurately. Floating-point numbers are used most often in scientific applications, and double-precision formats are by far the most widely used.

Variable and Fixed Length strings

Variable and fixed length strings are both types of field whose value is a simple string of characters. We use this to describe all field formats except numeric fields (described above) and logical fields (described below.) Fixed-length strings will always have the same number of characters in them. Variable length strings may be empty (i.e. contain no characters, not even a blank) or contain any number of characters.

Logical

Logical fields represent simple truth values - Yes or No, True or False. Logical fields can only ever take one of two values. This may be 1 or 0 (where 1 is typically used to represent Yes or True, and 0 represents No or False), it may be Y and N, T and F, or even Purple and Orange. If values other than 0 and 1 are used in the field, the attributes column describes the values used to represent true and false for the field.

length

If a field is known to be a fixed number of characters or bytes, the 'length' describes the number of characters or bytes used to store the field.

Field attributes

The piece-level descriptions of fields contain a column marked 'attributes' which describes certain constraints which may apply to a field. The possible terms and constraints which may appear are described here.

Repeat counts

Sometimes a set of related fields which occur more than once in a record are not given individual names. Rather, the field is named, and described as occurring more than one time. This form of description is more common in older databases (those from the 1960s and 1970s) than in more modern, relational systems. If a field is repeated in this way, the repeat count will be shown in the attributes section. It may either be a constant value ("Repeated 5 times per record") or a variable value ("Repeated NUMDAYS times per record"). In the latter case, the number of times the field is repeated is itself dependent upon the value of some other field - in this case, the NUMDAYS field.

Ranges

If the data model provided to us puts constraints on the range of values a numeric field may take, we describe them here. As an example, if one field records the month in which an event took place, its range will typically be 1 to 12. As part of our work in preparing the data for access and preservation, we will have checked that field values fall within the range specified for them. Any exceptions we discovered will be noted in the table descriptions (at what ISAD(G) calls 'item level') in the dataset catalogue. Note that if a field also has a special value for missing observations, the missing value will usually not be part of the valid range.

Truth values

For logical fields, if the values used for true and false are anything other than 0 for false and 1 for true, the specific values used will be described.

Choice sets

Some variables are constrained so that they may only take one of a fixed set of values. If this is the case, the values will be listed here (if there are not too many of them) or the fact that the value is constrained to be one of a list will be noted.

Missing values

If a field uses a special value to represent a missing or unknown observation, the special value will be described here. If a single special value applies to the whole table, this is not noted as part of each field description, but as a prelude to the description of the fields.

Compulsory values

If a field is not allowed to take a missing value (but other fields in the table are), then the words "Must have a value" will appear in the attributes column.

Anonymisation

If a field is anonymised when the table is viewed, then the words "Anonymised for access" appear in the attributes column of the field descriptions. Such fields will not appear when the data is being viewed, and will not be present on copies of the data that we supply.

 
 

NDAD v3.0

 
 
Go to top of page Print page Close window