The National Archives

Saturday 7 November

   
 
 NDAD: The National Digital Archive of Datasets
Welcome (home page) About NDAD Users Contributors  
Search Browse News Help (new window)  
 
 

Dataset details: CRDA/20/DS/2

1901-1995 dataset

 
 
Quick reference Full details
 
  View in hierarchy
 

Jump to :

  Context   |   Identity statement   |   Administrative context   |   Source of acquisition   |   Nature and content   |   Conditions of access and use   |   Allied materials   |   Structure   |   Validation   |   Related datasets   |  Notes

Context

Historic Mortality Data Files
Top of pagetop of page

Identity statement

Title 1901-1995 dataset
NDAD reference CRDA/20/DS/2
Dates of creation of datasets 1997
Dates of contents of datasets 1901-1995
Date of last input to datasets 1997?
Date of last access to datasets
Extent of datasets 1 dataset: 28.2 MB after processing by NDAD; 19 tables comprising 1,044,630 records
ISAD(G) level of description File
Top of pagetop of page

Administrative context

Aim and purpose
Statement of responsibility
Top of pagetop of page

Source of acquisition

Source of acquisition

This dataset was transferred from the Office for National Statistics on a CD-ROM which was received by NDAD on 16 January 2004.

Top of pagetop of page

Nature and content

Scope and content

This dataset represents the version of the Historic Mortality Data Files database which was published by the Office for National Statistics in 1997 under the title "Twentieth Century Mortality Files". It includes population and mortality data for England and Wales covering the years 1901-1995. It replaced an earlier version of the database which is held by NDAD as CRDA/20/DS/1 (see Links to related datasets). For further information about this dataset and the other Historic Mortality Data Files dataset, see the Series Catalogue.

Digital processing and conversion

The data was transferred to NDAD in four different formats: Microsoft Access v2.0 files, Microsoft Access 95 files, dBase III files and comma separated text files. All the files contained exactly the same data. NDAD have used the csv files for preservation, since these required less processing.

All files had the end-of-line characters translated from the DOS to the UNIX standards. In the csv files, the date fields and the integer count fields (YR and NDTHS in each of the Historic Deaths tables, YR and POP in the Population table) were all stored as real numbers with two unused decimal places (eg 1977.00 or 3.00). NDAD removed these unused decimal places using perl.

One of the files, ICD5DTHS.CSV, although meant to be a comma separated file was actually a tab delimited file. A program, tab2csv, written by NDAD, was used to convert the file to csv format. It was then necessary to remove extra quotes inserted around some fields in the data, using the unix editor vi, since tab2csv assumed that the tab-delimited data did not contain quotes, which was not the case for this file. ICD5DTHS.CSV also contained the field names as the first line of data: this line was removed.

Top of pagetop of page

Conditions of access and use

Access conditions
Top of pagetop of page

Allied materials

Related units of description

A number of items of documentation relating specifically to this dataset are available via the Dataset Documentation Catalogue. These include the explanatory notes which were issued to purchasers of the dataset; definitions of the Access 2.0 and Access 95 versions of the database, output by NDAD using Microsoft Access's Documenter utility; and an image of the Relationships window in the Access 95 version of the database, showing the relationships which had been defined between the tables.

Associated material
Publications produced by the originating department
Publications produced by researchers working on the datasets
Top of pagetop of page

Structure

Logical structure and schema

This dataset consists of three types of tables: a Population table (POPLNS) covering the period 1901-1995; nine Historic Deaths tables (ICD1DTHS-1CD9DTHS) which cover the period 1901-1910, and the periods corresponding to the different revisions of the ICD which were implemented in England and Wales in 1911-1995; and nine ICD dictionary tables (ICD1DESC-1CD9DESC), which explain the codes used for causes of death in the Historic Deaths tables.

A Relationships diagram, extracted from the Access 95 version of the database and showing links between key fields, can be found in the Dataset Documentation Catalogue. These relationships have been maintained by NDAD.

The dataset comprises the following table(s):

Table number NDAD reference Name Title
1 CRDA/20/DS/2/1 ICD1DTHS Historic Deaths, 1901-1910
2 CRDA/20/DS/2/2 ICD2DTHS Historic Deaths, 1911-1920
3 CRDA/20/DS/2/3 ICD3DTHS Historic Deaths, 1921-1930
4 CRDA/20/DS/2/4 ICD4DTHS Historic Deaths, 1931-1939
5 CRDA/20/DS/2/5 ICD5DTHS Historic Deaths, 1940-1949
6 CRDA/20/DS/2/6 ICD6DTHS Historic Deaths, 1950-1957
7 CRDA/20/DS/2/7 ICD7DTHS Historic Deaths, 1958-1967
8 CRDA/20/DS/2/8 ICD8DTHS Historic Deaths, 1968-1978
9 CRDA/20/DS/2/9 ICD9DTHS Historic Deaths, 1979-1995
10 CRDA/20/DS/2/10 POPLNS Population, 1901-1995
11 CRDA/20/DS/2/11 ICD1DESC ICD dictionary, 1901-1910
12 CRDA/20/DS/2/12 ICD2DESC ICD dictionary, 1911-1920
13 CRDA/20/DS/2/13 ICD3DESC ICD dictionary, 1921-1930
14 CRDA/20/DS/2/14 ICD4DESC ICD dictionary, 1931-1939
15 CRDA/20/DS/2/15 ICD5DESC ICD dictionary, 1940-1949
16 CRDA/20/DS/2/16 ICD6DESC ICD dictionary, 1950-1957
17 CRDA/20/DS/2/17 ICD7DESC ICD dictionary, 1958-1967
18 CRDA/20/DS/2/18 ICD8DESC ICD dictionary, 1968-1978
19 CRDA/20/DS/2/19 ICD9DESC ICD dictionary, 1979-1995
How data was originally captured and validated

Details of the sources which were used to produce the dataset and how the data was checked are given in the Series Catalogue. This section outlines certain considerations relating to the codes for causes of death which are particular to this dataset. These codes are used in the Historic Deaths tables and are explained in the ICD Dictionary tables. In most cases, the codes and explanations should match the contemporary version of the International Classification of Diseases. The issue of translating between ICD codes and "computer codes", which affects the first Historic Mortality Data Files dataset, no longer applies to the same extent. However, the following discrepancies need to be noted:

(1) In the period 1901-1910 an unnumbered list of causes of death was used in England and Wales. In the dataset codes ranging from 0010 to 1910 have been assigned to causes in this list, with 1741 being reserved for the category of "other specified diseases". This matches the coding of 1901-1910 causes in the first dataset.

(2) The sixth through to the ninth revisions of the ICD employed four digit numeric codes, with the first three digits representing the major cause grouping and the fourth digit being used for any subdivisions. Where there were no subdivisions, the convention in the published sources was to replace the final digit with a hyphen. These have been replaced by zeros in the codes in the dataset. Codes in the range 8000-9999 refer to causes in the ICD corresponding to "external causes of injury", where the death was not due to natural causes. The codes which the ICD used for the nature of the injury (in the case of unnatural deaths) are not included in the Historic Deaths tables for these revisions, to avoid the possible double counting of deaths.1

(3) It is clear from the ICD Dictionary tables that in some cases, two or more ICD codes have been conflated to produce a single description of cause of death: e.g. the ICD_2 field (recording ICD codes) in table ICD2DESC contains entries for '89&90A' and '89&90B'; the same field in ICD3DESC contains entries for '79,80*', '99c,99d', '113,114(3)', '113,114(2)', '113,114(1)' and '165, 166*'. This is not described in the notes accompanying the 1901-1995 dataset, but it may be related to the issue of "cause code discrepancies" which is described in the documentation accompanying the 1901-1992 dataset. See the Dataset Catalogue for that dataset (Links to dataset catalogues) for further details.

Constraints on the reliability of the data
Top of pagetop of page

Validation

Content validation

No discrepancies were found in the data.

Transformation validation

Checks were carried out on the converted data file to ensure that the transformation process did not introduce any errors. The number of records and fields were counted in the converted files, and were found to be the same as in the original data files. Checks were made between each of the Historic Deaths tables and their corresponding dictionary files to ensure that all the ICD codes have been explained. Checks were also made on each of the Historic Deaths tables to ensure they only contained data for their particular period. No discrepancies were detected.

Top of pagetop of page

Links to related datasets

Related datasets
NDAD reference Title (link leads to Dataset Catalogue)
CRDA/20/DS/1 1901-1992 dataset

Top of pagetop of page

Notes

 

1. Dataset Documentation Catalogue, reference CRDA/20/DD/1/2, p. 3.

Top of pagetop of page

Last updated 2004-04-02 11:20:27

 
 

NDAD v3.0