The National Archives - link to home page    

Saturday 22 November

 

Main website navigation:

   
 
 NDAD: The National Digital Archive of Datasets
Welcome (home page) About NDAD Users Contributors  
Search Browse News Help (new window)  
 
 

Dataset details: CRDA/1/DS/1/2

Calendar year dataset for 1991

 
 
Quick reference Full details
 
  View in hierarchy
 

Jump to :

  Context   |   Identity statement   |   Administrative context   |   Source of acquisition   |   Nature and content   |   Conditions of access and use   |   Allied materials   |   Structure   |   Validation   |   Related datasets   |  Notes

Context

Crime Statistics System (ME)
Top of pagetop of page

Identity statement

Title Calendar year dataset for 1991
NDAD referenceCRDA/1/DS/1/2
Dates of creation of datasets1991
Dates of contents of datasets1980-1991
Date of last input to datasets Dec 1991
Date of last access to datasets
Extent of datasets1 dataset: 199 Mb after conversion by NDAD; 2 tables comprising YTDEXTRACT91 (1344784 records) and YTDNOCREXT91 (142815 records)
ISAD(G) level of description File
Top of pagetop of page

Administrative context

Aim and purpose
Statement of responsibility
Top of pagetop of page

Source of acquisition

Source of acquisition

This dataset (CRDA/1/DS/1/2) was transferred to NDAD from the Metropolitan Police on 19 9-track 6250 bpi open reel magnetic tapes which were received on 23 February 1998. These tapes also contained:

  • Calendar year datasets for 1990 and 1992 (CRDA/1/DS/1/1, CRDA/1/DS/1/3).
  • Financial year datasets for 1992-1993, 1993-1994 and 1994 (CRDA/1/DS/1/4-CRDA/1/DS/1/6). Although based on a financial year rather than a calendar year, these datasets have essentially the same structure as CRDA/1/DS/1/1-3, the only exception being that CRDA/1/DS/1/6 lacks a "No Crimes" file.
  • A financial year dataset covering 1994-1995 (CRDA/1/DS/2/1) and another covering two years, 1995-1997 (CRDA/1/DS/2/2). These datasets reflect changes to the ME System which were made to facilitate interoperability with the CRIS System.
  • A dataset which is believed to correspond to the data in the Crime Statistics database when it was taken out of service (CRDA/1/DS/3/1).
  • Over 200 files containing Cobol, SCL and Data Dictionary source code.

For links to the catalogues of the other ME System datasets, see Links to related datasets. Details of the source code transferred with the datasets and paper documentation relating to the ME System can be found in the Dataset Documentation Catalogue.

Top of pagetop of page

Nature and content

Scope and content

This dataset comprises data which was input to the Metropolitan Police's Crime Statistics System between January and December 1991. It includes details of offences, arrests, victims of crime, property stolen and reports classified as "No Crime" (defined as "an allegation where the evidence is insufficient to establish that a crime has been committed").1 As in the previous dataset for 1990 (CRDA/1/DS/1/1) and the following datasets for 1992, 1992-1993 and 1993-1994 (CRDA/1/DS/1/3-CRDA/1/DS/1/5), the data is divided into a Year to Date Extract file (YTDEXTRACT91) containing data relating to offences, arrests, victims and stolen property, and a "No Crime" file (YTDNOCREXT91) containing more limited information about "No Crime" reports. The programs that produced these files were originally written in ICL 2900 COBOL using extensions to the language designed for handling an IDMSX database.

It should be noted that in some cases the entry in the Offence-Year-Input field in the YTDEXTRACT91 file predates 1991 if the record relates to an arrest which was input between January and December 1991 (i.e. the entry in the Arrest-Year-Input field should be '91'). Equally, many records in YTDEXTRACT91 and YTDNOCREXT91 relate to offences, arrests or "no crimes" which were originally reported at a much earlier period than the date when the record was input: e.g. the earliest offences recorded in the Offence-Date-YY field in YTDEXTRACT91 (excluding what are thought to be missing and invalid values) appear to date from 1980.

Further details on the administrative background and contents of this dataset and the other Crime Statistics System datasets are given in the Series Catalogue.

Digital processing and conversion

The data and associated source files were supplied to ULCC in plain text format by the Metropolitan Police Service's Department of Technology, using magnetic tapes created on an ICL Series 39 mainframe running VME.

At ULCC the tapes were copied to VME filestore using an ICL 3960 mainframe. From there they were transferred using VMX (Unix emulator for VME) and FTP to the Unix host machine for the archive control system.

The data files had been split into tape sections. Some files were spread over multiple tape sections, some tapes contained multiple files; images of these sections were preserved in their original order. A printed schedule of the jobs that were run by the Metropolitan Police (or their IT service provider) in order to produce the tapes sent to NDAD was supplied with the dataset documentation, and provided the key to recreating the original files from the tape section images. Master copies of these files were preserved, and working copies taken. Steps were also taken to validate the transformed files (see Transformation validation).

The data is in fixed length character-based records whose format varies slightly between different datasets, but is essentially the same for the first six ME System datasets (CRDA/1/DS/1/1-CRDA/1/DS/1/6). In 1991 the size of the Offence-Report-Serial-No and the Arrest-Report-Serial-No fields was increased from 5 digits to 7 digits, and the fields Offence-Occupied, Victim-Relationship and Victim-PC were introduced.

In addition to the data files, the tapes also contained a library archive of over 200 files containing COBOL, SCL source code for programs associated with the system, and a large file (DBLOADLIB) containing metadata associated with the system, extracted from the Metropolitan Police's corporate Data Dictionary. This file is in ICL's proprietary DDCL (Data Dictionary Control Language). These files had been stored on tape using a VME archive function, Copy_Library_To_Tape (analogous to Unix tar); the reverse process had to be executed to extract them from tape LA0528. To facilitate the FTP file transfer from VME to Unix, the files extracted from the archive file were concatenated into a single file. On the NDAD host machine, a small Perl script was written to extract the constituent files. The program files can be consulted via the Dataset Documentation Catalogue.

Basic sanity checks (file size, record count) were performed at all stages of the transfer process. For further information, see Transformation validation.

Top of pagetop of page

Conditions of access and use

Access conditions
Top of pagetop of page

Allied materials

Related units of description
Associated material
Publications produced by the originating department
Publications produced by researchers working on the datasets
Top of pagetop of page

Structure

Logical structure and schema

This dataset consists of two flat files: YTDNOCREXT91 (containing data relating to allegations classed as "No Crime"), and YTDEXTRACT91 (containing all other data input to the database, i.e. details of offences, arrests, victims and stolen property). These files are entirely separate from each other and are not related by any key fields.

The dataset comprises the following table(s):

Table numberNDAD referenceNameTitle
1CRDA/1/DS/1/2/1ytdextract91Year End Extract File, 1991
2CRDA/1/DS/1/2/2ytdnocrext91No Crime file, 1991
How data was originally captured and validated
Constraints on the reliability of the data
Top of pagetop of page

Validation

Content validation

No explicit record description was supplied for the data files, but it was established that the YTDEXTRACT files in CRDA/1/DS/1/1-CRDA/1/DS/1/6 were produced by the program COBEXT and the YTDNOCREXT files in CRDA/1/DS/1/1-CRDA/1/DS/1/5 were produced by the program COBNOCRIMEXT (these programs and other programs transferred with the ME System datasets can be consulted via the Dataset Documentation Catalogue). COBEXT and COBNOCRIMEXT contained file descriptions which appeared to match the data, and after some trial and error, a description was produced which fitted the data. This was confirmed by comparing the fields identified in the record with likely patterns of values, and with lists of valid field values supplied by the Metropolitan Police. The source code supplied for COBEXT matches the record format for the YTDEXTRACT files in CRDA/1/DS/1/2-CRDA/1/DS/1/6.

Although confident that the record format has been accurately described, a number of validation matters remain unresolved. Values in the Division-Code and Subdivision-Code fields were compared with lists of valid division and sub division codes provided by the Metropolitan Police (see the Dataset Documentation Catalogue, references CRDA/1/DD/3/3-5), and many records apparently had invalid codes in these fields. We conjectured, however, that these were codes that had been valid at the time the data was collected, though not still current in the Metropolitan Police: the 1991 Police and Constabulary Almanac confirmed that a number of the apparently invalid values represented police stations extant at that time, but which may since have been closed or changed their two-letter code.2 Not all of the missing values have been resolved in this way.

Validation of a number of other code and indicator fields has presented comparable difficulties, and these are flagged in the field descriptions. As all fields which could be thorougly validated against external criteria (e.g. date fields) have tested successfully, it is clear that NDAD has not been provided with complete descriptions of valid code values for each year's dataset, and values valid in one year may, for a variety of reasons, not have been valid in preceding or succeeding years.

Transformation validation

Standard checks (file size, record count) were made on the data on transfer from the VME to Unix file systems, and when the data files were reconstituted from the tape sections (see Digital processing and conversion). When copies were taken, comparisons were made between records picked at random from points at the top, middle and bottom of the file. In addition, to further validate the data transferred using VMX and FTP, images of the MP tape sections were made by mounting them onto a tape drive connected directly to the archive control machine (Unix). These images have been preserved, and translation utilities (EBCDIC to ASCII) were used to make readable copies on the Unix system. The process of reconstructing the original files from the sections was again followed, and it was thus possible to validate files produced by the other method against these. The only caveat worth noting about this method is that it is necessary to remove a number of 'duplicate' records at the end of some tape sections. This is because the ICL hardware, when writing in blocks to tape, does not flush the blocks before writing new data, and therefore where there are not sufficient records in the final block to exactly fill that block, records from the end of the penultimate block will be written to tape again after the last record proper. These are easily identified and removed from the final file.

Field and character-level validation was undertaken in tandem with the content validation (see Content validation). We are confident that the description of the record format is accurate, and that the values in the fields are their original values, unaffected by any aspect of the transfer process, even though values in some fields cannot be matched to any documentation provided by the Metropolitan Police.

Top of pagetop of page

Links to related datasets

Related datasets
NDAD referenceTitle (link leads to Dataset Catalogue)
CRDA/1/DS/1/1Calendar year dataset for 1990
CRDA/1/DS/1/3Calendar year dataset for 1992
CRDA/1/DS/1/4Financial year dataset for 1992-1993
CRDA/1/DS/1/5Financial year dataset for 1993-1994
CRDA/1/DS/1/6Financial year dataset for 1994
CRDA/1/DS/2/1Financial year dataset for 1994-1995
CRDA/1/DS/2/2Financial year dataset for 1995-1997
CRDA/1/DS/3/1Final database (1992-1997)

Top of pagetop of page

Notes

 

1. Service-Level agreement between the Department of Computing Services and G10 Branch, 5 August 1991: Dataset Documentation Catalogue, reference CRDA/1/DD/1/5/2.

2. R. Hazell and Company, Police and Constabulary Almanac 1991: Official Register (Henley-on- Thames, Oxon: R. Hazell and Company, 1991).

Top of pagetop of page

Last updated 2003-04-07 12:12:51

 
 

NDAD v3.0