| |
|
|
|
| |
|
|
|
|
Top of page
|
Identity statement
|
| Title |
1901-1995 dataset
|
| NDAD reference |
CRDA/20/DS/2 |
| Dates of creation of datasets |
1997 |
| Dates of contents of datasets |
1901-1995 |
| Date of last input to datasets |
1997? |
| Date of last access to datasets |
|
| Extent of datasets |
1 dataset: 28.2 MB after processing by NDAD; 19 tables comprising 1,044,630 records |
| ISAD(G) level of description |
File |
|
Top of page
|
Administrative context
|
| Aim and purpose |
|
| Statement of responsibility |
|
|
Top of page
|
Source of acquisition
|
| Source of acquisition |
This dataset was transferred from the Office for National Statistics on a CD-ROM which was received by NDAD on 16 January 2004.
|
|
Top of page
|
Nature and content
|
| Scope and content |
This dataset represents the version of the Historic Mortality Data Files database which was published by the Office for National Statistics in 1997 under the title "Twentieth Century Mortality Files". It includes population and mortality data for England and Wales covering the years 1901-1995. It replaced an earlier version of the database which is held by NDAD as CRDA/20/DS/1 (see Links to related datasets). For further information about this dataset and the other Historic Mortality Data Files dataset, see the Series Catalogue.
|
| Digital processing and conversion |
The data was transferred to NDAD in four different formats: Microsoft Access v2.0 files, Microsoft Access 95 files, dBase III files and comma separated text files. All the files contained exactly the same data. NDAD have used the csv files for preservation, since these required less processing.
All files had the end-of-line characters translated from the DOS to the UNIX standards. In the csv files, the date fields and the integer count fields (YR and NDTHS in each of the Historic Deaths tables, YR and POP in the Population table) were all stored as real numbers with two unused decimal places (eg 1977.00 or 3.00). NDAD removed these unused decimal places using perl.
One of the files, ICD5DTHS.CSV, although meant to be a comma separated file was actually a tab delimited file. A program, tab2csv, written by NDAD, was used to convert the file to csv format. It was then necessary to remove extra quotes inserted around some fields in the data, using the unix editor vi, since tab2csv assumed that the tab-delimited data did not contain quotes, which was not the case for this file. ICD5DTHS.CSV also contained the field names as the first line of data: this line was removed.
|
|
Top of page
|
Conditions of access and use
|
| Access conditions |
|
|
Top of page
|
Allied materials
|
| Related units of description |
A number of items of documentation relating specifically to this dataset are available via the Dataset Documentation Catalogue. These include the explanatory notes which were issued to purchasers of the dataset; definitions of the Access 2.0 and Access 95 versions of the database, output by NDAD using Microsoft Access's Documenter utility; and an image of the Relationships window in the Access 95 version of the database, showing the relationships which had been defined between the tables.
|
| Associated material |
|
| Publications produced by the
originating department |
|
| Publications produced by
researchers working on the datasets |
|
|
Top of page
|
Structure
|
| Logical structure and schema |
This dataset consists of three types of tables: a Population table (POPLNS) covering the period 1901-1995; nine Historic Deaths tables (ICD1DTHS-1CD9DTHS) which cover the period 1901-1910, and the periods corresponding to the different revisions of the ICD which were implemented in England and Wales in 1911-1995; and nine ICD dictionary tables (ICD1DESC-1CD9DESC), which explain the codes used for causes of death in the Historic Deaths tables.
A Relationships diagram, extracted from the Access 95 version of the database and showing links between key fields, can be found in the Dataset Documentation Catalogue. These relationships have been maintained by NDAD.
The dataset comprises the following table(s):
| Table number |
NDAD reference |
Name |
Title |
| 1 |
CRDA/20/DS/2/1 |
ICD1DTHS
|
Historic Deaths, 1901-1910 |
| 2 |
CRDA/20/DS/2/2 |
ICD2DTHS
|
Historic Deaths, 1911-1920 |
| 3 |
CRDA/20/DS/2/3 |
ICD3DTHS
|
Historic Deaths, 1921-1930 |
| 4 |
CRDA/20/DS/2/4 |
ICD4DTHS
|
Historic Deaths, 1931-1939 |
| 5 |
CRDA/20/DS/2/5 |
ICD5DTHS
|
Historic Deaths, 1940-1949 |
| 6 |
CRDA/20/DS/2/6 |
ICD6DTHS
|
Historic Deaths, 1950-1957 |
| 7 |
CRDA/20/DS/2/7 |
ICD7DTHS
|
Historic Deaths, 1958-1967 |
| 8 |
CRDA/20/DS/2/8 |
ICD8DTHS
|
Historic Deaths, 1968-1978 |
| 9 |
CRDA/20/DS/2/9 |
ICD9DTHS
|
Historic Deaths, 1979-1995 |
| 10 |
CRDA/20/DS/2/10 |
POPLNS
|
Population, 1901-1995 |
| 11 |
CRDA/20/DS/2/11 |
ICD1DESC
|
ICD dictionary, 1901-1910 |
| 12 |
CRDA/20/DS/2/12 |
ICD2DESC
|
ICD dictionary, 1911-1920 |
| 13 |
CRDA/20/DS/2/13 |
ICD3DESC
|
ICD dictionary, 1921-1930 |
| 14 |
CRDA/20/DS/2/14 |
ICD4DESC
|
ICD dictionary, 1931-1939 |
| 15 |
CRDA/20/DS/2/15 |
ICD5DESC
|
ICD dictionary, 1940-1949 |
| 16 |
CRDA/20/DS/2/16 |
ICD6DESC
|
ICD dictionary, 1950-1957 |
| 17 |
CRDA/20/DS/2/17 |
ICD7DESC
|
ICD dictionary, 1958-1967 |
| 18 |
CRDA/20/DS/2/18 |
ICD8DESC
|
ICD dictionary, 1968-1978 |
| 19 |
CRDA/20/DS/2/19 |
ICD9DESC
|
ICD dictionary, 1979-1995 |
|
| How data was originally captured and validated |
Details of the sources which were used to produce the dataset and how the data was checked are given
in the Series Catalogue. This
section outlines certain considerations relating to the codes for causes of
death which are particular to this dataset. These codes are used in the Historic Deaths tables and are explained in the ICD Dictionary tables. In most cases, the codes and explanations should match the contemporary version of the International Classification of Diseases. The issue of translating between ICD codes and "computer codes", which affects the first Historic Mortality Data Files dataset, no longer applies to the same extent. However, the following discrepancies need to be noted:
(1) In the period 1901-1910 an unnumbered list of causes of death
was used in England and Wales. In the dataset codes ranging from
0010 to 1910 have been assigned to causes in this list, with 1741 being reserved for the category of "other
specified diseases". This matches the coding of 1901-1910 causes in the first dataset.
(2) The sixth through to the ninth revisions of the ICD employed four
digit numeric codes, with the first three digits representing the major cause grouping and the fourth digit being used for any
subdivisions. Where there were no subdivisions, the convention in the published sources was to replace the final digit with a hyphen. These have been replaced by zeros in the codes in the dataset. Codes in the range 8000-9999 refer to causes
in the ICD corresponding to "external causes of injury", where the death was not due to natural causes. The codes which the ICD used
for the nature of the injury (in the case of unnatural deaths) are not included in the Historic Deaths
tables for these revisions, to avoid the possible double counting
of deaths.1
(3) It is clear from the ICD Dictionary tables that in some cases, two or more ICD codes have been conflated to produce a single description of cause of death: e.g. the ICD_2 field (recording ICD codes) in table ICD2DESC contains entries for '89&90A' and '89&90B'; the same field in ICD3DESC contains entries for '79,80*', '99c,99d', '113,114(3)', '113,114(2)', '113,114(1)' and '165, 166*'. This is not described in the notes accompanying the 1901-1995 dataset, but it may be related to the issue of "cause code discrepancies" which is described in the documentation accompanying the 1901-1992 dataset. See the Dataset Catalogue for that dataset (Links to dataset catalogues) for further details. |
| Constraints on the reliability of
the data |
|
|
Top of page
|
Validation
|
| Content validation |
No discrepancies were found in the data.
|
| Transformation validation |
Checks were carried out on the converted data file to ensure that the transformation process did not introduce any errors. The number of records and fields were counted in the converted files, and were found to be the same as in the original data files. Checks were made between each of the Historic Deaths tables and their corresponding dictionary files to ensure that all the ICD codes have been explained. Checks were also made on each of the Historic Deaths tables to ensure they only contained data for their particular period. No discrepancies were detected.
|
|
Top of page
|
Links to related datasets
|
| Related datasets |
| NDAD reference |
Title (link leads to Dataset Catalogue) |
| CRDA/20/DS/1 |
1901-1992 dataset
|
|
|
Top of page
|
|
|
Top of page
|
Last updated 2004-04-02 11:20:27
|
|
|