The NDAD glossary offers succinct explanations of the many technical terms used on the web site.It is intended to help users understand
(1) the terminology of computing and (2) the terminology of archival practices used in The National Archives and central government
record-keeping.
30-year rule
This was the rule relating to the standard period for the closure of public records under the Public Records Acts 1958 and 1967.
Its effect was that public records were open to public access after 30 years unless steps were taken to open them earlier, or to
close them for longer periods. Since the Freedom of Information Act came into force on 01 January 2005, the 30-year rule is now
redundant.
Accelerated opening
This was the process whereby public records were made available for public access in advance of the usual 30-year closure period
via means of a Lord Chancellor s Instrument. Since the Freedom of Information Act came into force on 01 January 2005, accelerated
opening is now redundant.
Accession
A collection of records constituting a whole dataset or part of a dataset transferred to NDAD at any one time.
Accompanying documentation
Documentation supplied by the transferring government department with the dataset to assist NDAD specialists in understanding and
documenting the system/dataset; or Documentation of secondary importance to the dataset. Accompanying documentation is non-archive
material and therefore is not catalogued, nor made available to the public. It may not be preserved permanently within the NDAD.
See also Dataset Documentation, finding aids.
Administrative history
An account of the origin, progress, development and work of an organisation, very often a Government Department or agency. See also
finding aid.
Aggregation
One of the processes which may be carried out to prevent viewing of data which the transferring government department and/or TNA
has designated as in any way sensitive and/or confidential. Summary forms of the data are produced which contain fewer records
and/or less detail than the raw data. The data may, for instance, be averaged over a geographical area (such as a parish or census
district) or over a time period. From 01 January 2005, this is done by invoking relevant FOI Exemptions.
American Standard Code for Information Interchange (ASCII)
An internationally-agreed character set; widely used in the computer industry. ASCII is a 7-bit code; the Standard ASCII Character
Set consists of 128 decimal numbers ranging from zero through 127 assigned to letters, numbers, punctuation marks, and the most
common special characters. A computer stores each character in a single byte, using the 7-bit code assigned to that character by
the ASCII standard.See also binary encoded data, extended ASCII, EBCDIC
Anonymisation
Process carried out to prevent viewing of any data which the transferring government department and/or TNA has designated as in any
way sensitive and/or confidential. Anonymisation is carried out either by blocking display of certain fields (such as names,
addresses or telephone numbers), or by producing summary forms of data. A summary form will contain fewer records and less detail
than the raw data. The data may have been averaged over a small geographical area (such as a parish or census district) or over a
small time period. From 01 January 2005, this is done by invoking relevant FOI Exemptions. See also redaction.
Application software
Software to perform functions for the users (such as Word Processing or a Payroll System) as distinct from systems software (such
as the Operating System).
Binary encoded data
Numeric data held in binary format (as opposed to in eg ASCII format) See also ASCII, extended ASCII, EBCDIC.
Binary Large Object (BLOB)
Binary large object. A term used in more modern databases which are able to store information such as images, sound or video as
well as simple values which consist of numbers or short character strings. The data in the image, sound or video clip is referred
to as a BLOB.
Bit
Binary digit - BIT - the smallest unit of data recognisable by a computer, it is either a 0 or a 1. Eight bits equals one byte (or
1 character)
Boolean searching
A method of searching text based on boolean operators: an "and" operator between two words or other values (for example, "pear AND
apple") means one is searching for documents containing both of the words or values. An "or" operator between two words or other
values (eg "pear OR apple") means one is searching for documents containing either of the words.
Byte
A computing storage unit. It consists of eight bits; a byte is the amount of storage space required to hold one alphanumeric
character such as the letter A, the numerical digit 9, or a single standard punctuation mark eg a comma.
Catalogue descriptions
The description of record series, datasets and related documentation in Finding Aids prepared by NDAD.
CD-ROM
Compact Disk, Read-Only Memory. A form of data storage that uses laser optics for reading data. Write-once CD-ROM is one of the
formats on which data can be supplied to NDAD users.
Coded data
Data which are held in code form, where short sequences of letters or numbers are used to represent information in a database. An
example is the use of single-letter codes to represent different types or classes of vehicle. NDAD needs explanations of the codes
either in a separate computer file or on paper.
COM/Fiche Output to fiche (COM)
COM is the direct recording of computer output on to microfilm or microfiche.
Checksum
A computed value (of a file). If re-calculated after the data have, for instance, been transferred from one computer to another,
and the two values are the same, there is a high degree of confidence that the data were transferred correctly.
Client Manager
A Client Manager is employed by The National Archives to liaise with central government departments concerning all matters to do
with record-keeping and archives. A Client Manager supervises and guides the records work done by several government departments,
and ensures that they keep documents worthy of permanent preservation and throw away any documents that aren't once they are no
longer of current use. The Client Manager role is outward-looking and involves engaging with government departments to advise them
on their current electronic and paper records management.
Command Line Interface (CLI)
A form of user interface where the user types in commands, usually one line at a time. In many modern computing applications,
command line interfaces have generally been superseded by graphical user interfaces (GUIs).
Comma Separated Variables (CSV) file
A format of file used to facilitate transfer of data between applications. The data are in the form of a table, with each field
separated by a comma; text may be enclosed in double quotes.
Computer Readable Data Archive (CRDA)
A working title for the project which became the UK National Digital Archive (Datasets). Much of the early publicity material
produced both by the Public Record Office (now The National Archives) and the University of London refers to the project by this
name. See also NDAD.
Database
A collection of information, usually covering subject areas which are related in some way, structured to enable effective retrieval
of the information. Databases are organised into a hierarchy of files, records, and fields. A file is a group of related
information, such as names and addresses of members of a sports club. All the information about a particular member (name,
address, etc.) is stored in a record. A record is a collection of related data items called fields (an example of a field would be
a member's name).
Data dictionary
Details (often held on-line as part of, for instance, a database management system) of components of a system, including file and
field names, characteristics, relationships and structure.
Data model/structure
A data model is a graphical representation of the structure and relationships of the items of data (files, fields etc) within a
system.
Dataset
A computer file or related set of computer files, and where applicable associated metadata (e.g data dictionary) in digital form,
which are organised under a single descriptive title and are capable of being described as a coherent unit in the Archive s
finding aids. A dataset may comprise one or more accessions and may form part of a series of related datasets transferred to NDAD
over time. As an example, in the case of an annual survey which is transferred to NDAD annually, a dataset would comprise all the
data for one year s survey. For the purposes of transfer to NDAD, datasets are distinguished from digital documents which are
provided by Departments to assist in interpreting, or providing context to, a dataset.
Dataset Catalogue
A finding aid listing the contents of an assembly of documents or electronically held information usually including a brief list of
the organisation and functions of the organising body or individual. See also finding aid.
Dataset Documentation
That part of the Archive which consists of documents supplied with datasets (e.g. printouts, survey forms, user manuals, reports
produced using the data), which are deemed to be worthy of permanent preservation as archives. These documents could originate on
paper and/or in electronic form. See also Accompanying Documentation, finding aid.
Departmental Record Officer (DRO)
The officer in a government department charged with responsibility for the care of all public records of that department while they
are in its custody, and for the transfer to The National Archives of those selected for permanent retention. See also Client Manager, the TNA representative who assists them.
Digital documents
Digital Documents are documents that are stored on a computer. The documents may have been created on a computer, as with
word-processing files and spreadsheets, or they may have been converted into digital documents by means of document imaging.
Digital documents are also referred to (somewhat inaccurately) as electronic documents. The term 'Electronic documents' however is
widely used by TNA and others, and is now the preferred term for describing them in Dataset Documentation Catalogues. Within the
NDAD Transfer process, references to digital documents cover material (which could just as easily be in paper form) sent with the
dataset to assist in interpreting, or providing context to, the dataset; i.e. the term excludes the dataset itself and any
accompanying metadata in digital form which was an integral part of the original system (such as a data dictionary). Digital
documents would not have been part of (i.e. held as data within) the original computer system although they could include the
system documentation (e.g. system specification, user manual etc). [In the case of a document management system, digital documents
do constitute the data within the system but such systems are not normally to be transferred to NDAD; they fall within the remit
of the The National Archives electronic records management programme].
Disk Operating System (DOS)
The part of the Operating System which deals with access to and management of files and programs stored on disk. Also the name of
the operating system used on IBM-compatible PCs; DOS translates the user's commands and allows application programs to interact
with the computer's hardware and supplies the file management system for disk input and output. In the past (and before the
invention of the PC), almost every computer supplier produced an operating system called DOS, or some derivation of that name.
Examples include IBM's DOS (which ran on 1960's IBM 360 computers), Data General's RDOS (which ran on their Nova minicomputers)
and Digital's DOS-11 and DOS-8, designed for their PDP-11 and PDP-8 computers. The different DOS systems bore no relation to each
other.
Encryption is the manipulation of data in order to prevent any but the intended recipient from reading that data. The inverse of
encryption is decryption.
Extended ASCII
A somewhat imprecise term, also referred to as 8-bit ASCII. The Extended ASCII Character Set consists of 128 decimal numbers and
ranges from 128 through 255 representing additional (ie over standard ASCII) special, mathematical, graphic, and foreign
characters. Different computer suppliers have at different times used the phrase 'extended ASCII' to denote different, and
incompatible, extended character sets. In the NDAD, the only 8-bit character set used is the International Standard ISO Latin-1
character set. This is supported by all web browsers. See also binary encoded data, ASCII, EBCDIC.
A character-to-number encoding invented by IBM and used primarily by their large computer systems, eg IBM mainframes. It was also
adopted by some other manufacturers, such as ICL, at various times in their history. EBCDIC never became a formal international or
national standard, and suffers from the problem that many variants of the character set were in use at different times and in
different countries. See also binary encoded data, extended ASCII, ASCII.
Extended closure
The extension of the closure period of a public record beyond 30 years, in accordance with a Lord Chancellor's Instrument. Since
the Freedom of Information Act came into force on 01 January 2005, the extended closure process no longer exists.
Field
Holds a single data item of a specified type. It is part of a record. A field has a field name which identifies the field and
should give some idea of the data it will hold eg a field containing the name of a member of a sports club may be called
Member_Name. See also database.
File
In computer terms, a file is a collection of information treated as a unit by the computer. A file will usually contain a related
collection of records (eg customer file would contain information on all your customers. Each record, which would hold data about
a particular customer, would consist of fields for individual data items, such as customer name, customer number, customer
address). See also database.
In archival terms, a file is a level of description used by NDAD in cataloguing a dataset, or an item of dataset documentation. See
also ISAD(G).
File Transfer Protocol (FTP)
A protocol which allows a user on one computer to access and transfer files to and from another computer over a network. FTP is the
specific standard for file transmission between computers using a TCP connection. Programs which carry out the transfer are called
FTP programs.
Finding aid
Information (including but not limited to guides, catalogues and indexes) about the contents, context and structure of archives in
conjunction with the means of retrieving this information. The elements of description in many of NDAD's finding aids conform to
the International Standard for Archival Description (General) or ISAD(G), published by the International Council on Archives. See
also Administrative History, Dataset Documentation, Dataset
Catalogue, ISAD(G).
Floating-point
A form of notation and data storage in which numbers are expressed as a fractional value together with an integer exponent eg 123.45
would be expressed as 1.2345 x 102 Floating-point numbers are used to store numeric values which cannot be represented as Integers
(whole numbers).
Fonds
An archival term referring to the whole of the documents, regardless of form or medium, organically created and/or accumulated and
used by a particular person, family, or corporate body in the course of that creator's activities and functions.
Freedom of Information Act
After 01 January 2005, all datasets transferred to NDAD are assumed to be open to the public when transferred. Closure or redaction
of a dataset will only take place if relevant FOI Exemptions are invoked. This replaces the 30 year closure rule, and the system
of accelerated opening and/or extended closure by means of Lord Chancellor's Instruments.
Giga signifies one thousand million. A gigabyte is properly 2 to the 30th power = 1,073,741,824 bytes . However when used by most
computer disk and tape suppliers it denotes the slightly smaller value of 109 bytes - 1,000,000,000 bytes.
Graphical User Interface (GUI)
A GUI allows the user to point at a list of command options or click on an icon instead of typing a character-based command. An
example of a GUI is Microsoft Windows.
Graphics Interchange Format (GIF)
One of a number of standard formats for display of images on the World Wide Web. This protocol is used as a standard for exchanging
graphical raster-based images between computers. GIF can handle up to 256 simultaneous colors, and uses a data compression
mechanism to reduce the file size, thus saving download time. GIF employs a compression mechanism (Lempel-Ziv) which is protected
by a patent held by Unisys. For this reason, use of GIF is often deprecated in open systems in favour of other image encoding
schemes such as PNG which are not subject to patent protection or proprietary licences.
Hierarchical Storage Management (HSM)
A means of storing data in a computer system in which frequently used data is stored on more expensive, faster disks and less
frequently used data migrates to slower but cheaper forms of storage such as tape.
Hypertext
Text that contains links to other documents. HTML documents are examples of hypertext.
HyperText Markup Language (HTML)
The set of markup symbols or codes inserted in a file intended for display on a World Wide Web browser. The markup tells the Web
browser how to display a Web page's words and images for the user. HTML is the usual language for documents that are 'published'
on the Web. HTML is an application of SGML.
HyperText Transfer Protocol (HTTP)
The protocol describing how a web browser requests documents from a web server and how documents (in any format) are to be
transmitted from a Web server to a web browser.
Image scan
Scanned image produced by NDAD, usually derived from a paper document. A scanned (or digitised) image is only a picture, and
although it contains characters, they cannot be recognised by a computer. Conversely, OCR documents can be recognised by a
computer, and can read into a Word processing package. See also OCR.
International Standard of Archival Description (General) (ISAD(G))
An agreed set of general rules for archival description. These rules ensure the creation of consistent, appropriate, and self
explanatory descriptions; and facilitate the retrieval and exchange of information about archival material. ISAD(G) comprises 26
descriptive elements, arranged in a hierarchical structure. It is now common practice among archivists to use subsets of these
elements (rather than all 26 of them) in preparing catalogues.
Internet Protocol (IP) address
The standard way of identifying a computer that is connected to the Internet, similar to the way a telephone number determines a
telephone on a telephone network. It may be expressed either as a 4-part number (e.g., 123.124.12.13) or in words: ndad.ulcc.ac.uk
JPEG
A graphic image format (for still picture compression) defined by the Joint Photographic Expert Group, it is commonly supported by
Web browsers. JPEG is designed to provide a compact means of storing photographic images. It is not as well suited to representing
graphical images (i.e. those drawn by hand) or images with a very small number of colours.
Key field
The field within a record which uniquely identifies a record eg a national insurance number could be the unique key for a file of
social security claimants. Also referred to as primary key.
Kilobyte (KB)
A unit of measure for computer memory or storage equivalent to approximately one thousand (1,024) bytes.
Local Area Network (LAN)
A data network (ie a network connecting a number of computers together allowing them to share information and/or peripheral
devices) covering a restricted area (usually a few square miles or less).
Logical field
A field that can have only two values - true or false (although these may be held as 1,0 and represented as Yes, No).
Logical operator
An operator, such as AND, that combines logical values (true, false) to produce a logical result
Lord Chancellor's Instrument
A legal instrument whereby the Lord Chancellor (exercising powers under the Public Records Acts 1958 and 1967) reduces or extends
the statutory 30-year closure period of a public record. Since the Freedom of Information Act came into force on 01 January 2005,
this is now redundant.
Mark sense forms/reader
Optical mark recognition: method of data capture used where a user can choose from a finite and predictable set of responses, for
instance multiple choice examination questions, by making a mark in a particular position on the form. A current application is
the National Lottery tickets. An OM (Optical Mark) reader reads and interprets the marks on the page.
Megabyte (MB)
A unit of measure for computer memory or storage equivalent to approximately one million (1,048,576) bytes .
Multipurpose Internet Mail Extensions (MIME)
An extension to Internet email which provides the ability to transfer non-textual data, such as graphics, audio, video and fax ie
an encoding scheme for allowing non-ASCII data to be included in an e-mail message. It is also used by web browsers and web
servers to describe data being transferred between them - this is how a browser knows to display one file as an image, and another
as text, sound or video.
Multimedia
The presentation of information by a computer system using a combination of still graphics, animation, sound and text.
National Archives, The (TNA)
The National Archives, which covers England, Wales and the United Kingdom, was formed in April 2003 by bringing together the Public
Record Office and the Historical Manuscripts Commission. It is responsible for looking after the records of central government and
the courts of law, and making sure everyone can look at them.
National Digital Archive of Datasets (NDAD)
The National Digital Archive of Datasets - a TNA-sponsored initiative to conserve and where possible provide access to many
computer datasets from central government departments and agencies. The data will remain in the legal custody of The National
Archives, but will be managed by ULCC and the University of London Library (ULL). See also CRDA.
NDAD reference
As part of the finding aid, NDAD allocate a unique alphanumeric code to every single dataset, and to every single dataset document.
Series Catalogues and Administrative Histories are also identified by unique NDAD references. Do not confuse this element with the
TNA Series Number.
Operating system
The set of programs which tell the machine how to perform actions, enabling it to run applications and to interface with
peripherals and users. Examples of operating systems are DOS, Windows, UNIX, VMS, and VME.
Optical Character Recognition (OCR)
A process which takes an image and turns it into editable text. OCR scanning differs from image scanning in that although both
accept a printed document as input, OCR identifies each character and creates an output file which can be used by, for instance, a
word processing package. Image scanning results only in a 'picture' of the document.
Pen plots [or] Plotters
Graphical output of data items using computer-controlled pens. Output of this form is typically limited to line drawings, graphs,
maps etc.
Piece number
The reference assigned by The National Archives to a document within a TNA Series.
Portable Network Graphics (PNG)
A file format for compressed graphic images. It provides a number of improvements over the GIF format and, unlike GIF, is
patent-free.
PostScript
A page description language used to pass instructions to printers for setting up the page to be printed ie describing to the
printer the appearance of the whole page, including graphics.
Public Record Office (PRO)
The former name of The National Archives.
Protocol
Agreed-upon standard. A communications protocol is a set of rules describing the transfer of data between devices or programs.
Punched card
A cardboard rectangle used for entering data into a computer or other machine and for storing data. A standard Punched Card held
rows of 80 characters of data coded as a series of holes punched in columns on the card. These holes were read by a card reader
which sensed which holes had been punched out in each column and translated the column dots into machine-readable character codes.
Random Access Memory (RAM)
Refers to types of memory devices whereby any location in memory can be found, on average, as quickly as any other location.
Computer internal memories and disk memories are random access memories.
Record
(1) In archive terms, a record is a document, regardless of form or medium, organically created and/or accumulated and used by a
particular person, family, or corporate body in the course of that creator's activities and functions.
(2) In computer terminology, a record is a collection of data items (fields), for example the various items of information about a
customer. Multiple computer records can be contained in a computer file.
Redaction
Process which may be carried out to prevent viewing of data/parts of documentation which the transferring Government Department
and/or TNA has designated as in any way sensitive and/or confidential. It includes anonymisation, i.e. blocking the display of
certain fields such as names, addresses or telephone numbers. From 01 January 2005, this is done by invoking relevant FOI
Exemptions.
Registered file
A collection of documents, relating to a particular subject or having some other common characteristic, which are created or stored
by a government department in the course of its business and are therefore public records. The file is controlled by the Registry
responsible for the papers produced by the department; the DRO has overall responsibility for registered files from the time they
are created through review until their destruction or transfer to TNA.
Relational database
A database where the data are structured as a number of tables and the database management system allows the tables to be linked
together for data to be searched, displayed etc. A relational database management system allows there to be a number of views of
the data.
Rich Text Format (RTF)
Format of a file used to store data from a word processor, including information on fonts, styles etc. It is most often used as a
platform-independent format for sharing documents among different word processing packages, though word processors differ in their
levels of support for the RTF standard.
Scanned images
The output from a scanner which is a device for capturing graphic images from a page and converting the data into a binary code.
The image can then be displayed, edited with a painting program, or pasted into another document. Unlike an OCR'd document it
cannot be read as text into a word processor.
Server
A provider of resources. The term Server is used to refer both to a computer program that provides services to other computer
programs in the same or other computers and to the computer that a server program runs on (although it may contain a number of
server and client programs). Specific to the Web, a Web server is the computer program (housed in a computer) that serves
requested HTML pages or files.
Software
A general term for all types of programs which can be run on a computer system.
Source code
The code written in a high-level computer language by programmers or code generators, to be subsequently translated by the
computer.
Standard Generalized Markup Language (SGML)
An international standard for describing the markup of structured documents. The basic idea behind SGML is that information can be
made independent of particular hardware and software, but more particularly that markup allows one to describe the structure of a
document (such as where chapter headings, footnotes, etc occur) without saying exactly how that structure should be represented on
the printed page or screen.
Table
A file in a relational database is often referred to as a table (because the data is held in the form of a table - records being
the rows and fields the columns).
Tagged Image File Format (TIFF)
An image file. TIFF provides a way of storing and exchanging digital image data.
Terminal emulator
A program that allows one computer workstation to act as a terminal for accessing a remote computer over a network.
Thesaurus
A dictionary arranged by meaning rather than spelling and including pointers to wider, narrower and related terms.
A reference assigned by The National Archives to a series of records. Series numbers are assigned by The Tna to datasets and
related documentation which are transferred to NDAD. TNA Series numbers are included in NDAD catalogues but are not used by NDAD
for reference purposes; NDAD has a distinct system of references for datasets and related documentation in its holdings (see NDAD reference).
Transfer
This term is used to cover the various steps involved in transferring a dataset to NDAD.
Transfer form
A brief inventory and description of materials (data, documents, etc.) being transferred in a batch to NDAD. This would usually
include the title of the material and inclusive dates of materials held therein.
Twos-complement
A method of storing integers (whole numbers) in a computer system. Every computer in common use today uses twos-complement to store
whole numbers. The name refers to the means by which negative numbers (such as -3) are distinguished from positive numbers.
Uniform Resource Locator (URL)
The unique address of a single HTML page or file on the Web. The address includes a unique Internet server address and a
hierarchical description of a file location on the server. The address of the file is in a format that can be interpreted by a Web
server, which then retrieves the file.
University of London Computer Centre (ULCC)
The University of London Computer Centre has been providing IT services for nearly 30 years. ULCC was established as a computing
centre for the University of London and is now provides national networking and data archiving services, including the National
Digital Archive of Datasets (NDAD).
UNIX
An operating system typically used on workstations and computers. Some Internet servers run on UNIX systems.
User interface
Term used to describe the part of a computer system that enables humans to issue commands to the computer, and see the results. The
two most common types of user interface are the command line interface (CLI) and the graphical user interface (GUI), which use a
screen and a keyboard and/or mouse, however other user interfaces exist for specialised tasks, or for people with disabilities
Web Server
A networked program that responds to requests from web browsers for documents available via the World Wide Web.
Wide Area Network (WAN)
A network, usually constructed with serial lines, which covers a large geographic area.
Write Once, Read Many (WORM)
This term refers to a type of computer data storage which can be written to only once, but which can then be read many times.
Optical disks (also known at some times as laser discs) are an example of WORM storage.
eXtensible Markup Language (XML)
XML is the eXtensible Markup Language, endorsed and developed by the World-Wide Web Consortium (W3C). Like SGML, XML is a
meta-markup language, defining rules and structures for use in creating new markup applications.