| |
|
|
|
| | | | Top of page | Identity statement |
|---|
| Title | Calendar year dataset for 1990 |
|---|
| NDAD reference | CRDA/1/DS/1/1 |
|---|
| Dates of creation of datasets | 1990 |
|---|
| Dates of contents of datasets | 1979-1990 |
|---|
| Date of last input to datasets | Dec 1990 |
|---|
| Date of last access to datasets | |
|---|
| Extent of datasets | 1 dataset: 158.8Mb after conversion by NDAD; 2 tables comprising YTDEXTRACT90 (1183194 records) and YTDNOCREXT90 (121173 records) |
|---|
| ISAD(G) level of description | File |
|---|
| Top of page | Administrative context |
|---|
| Aim and purpose | |
|---|
| Statement of responsibility | |
|---|
| Top of page | Source of acquisition |
|---|
| Source of acquisition | This dataset (CRDA/1/DS/1/1) was transferred to NDAD from the
Metropolitan Police on 19 9-track 6250 bpi open reel magnetic tapes
which were received on 23 February 1998. These tapes also
contained:
- Calendar year datasets for 1991-1992
(CRDA/1/DS/1/2-CRDA/1/DS/1/3).
- Financial year datasets for 1992-1993, 1993-1994 and 1994
(CRDA/1/DS/1/4-CRDA/1/DS/1/6). Although based on a financial year
rather than a calendar year, these datasets have essentially the
same structure as CRDA/1/DS/1/1-3, the only exception being that
CRDA/1/DS/1/6 lacks a "No Crimes" file.
- A financial year dataset covering 1994-1995 (CRDA/1/DS/2/1) and
another covering two years, 1995-1997 (CRDA/1/DS/2/2). These
datasets reflect changes to the ME System which were made to
facilitate interoperability with the CRIS System.
- A dataset which is believed to correspond to the data in the
Crime Statistics database when it was taken out of service
(CRDA/1/DS/3/1).
- Over 200 files containing Cobol, SCL and Data Dictionary source
code.
For links to the catalogues of the other ME System datasets, see Links to related datasets.
Details
of the source code transferred with the datasets and paper
documentation relating to the ME System can be found in the
Dataset Documentation Catalogue. |
|---|
| Top of page | Nature and content |
|---|
| Scope and content | This dataset comprises data which was input to the Metropolitan
Police's Crime Statistics System between January and December 1990.
It includes details of offences, arrests, victims of crime,
property stolen and reports classified as "No Crime" (defined as
"an allegation where the evidence is insufficient to establish that
a crime has been committed").1 As in the following datasets for 1991,
1992, 1992-1993 and 1993-1994 (CRDA/1/DS/1/2-CRDA/1/DS/1/5), the
data is divided into a Year to Date Extract file (YTDEXTRACT90)
containing data relating to offences, arrests, victims and stolen
property, and a "No Crime" file (YTDNOCREXT90) containing more
limited information about "No Crime" reports. The programs that
produced these files were originally written in ICL 2900 COBOL
using extensions to the language designed for handling an IDMSX
database.
It should be noted that in some cases the entry in the
Offence-Year-Input field in the YTDEXTRACT90 file predates 1990 if
the record relates to an arrest which was input between January and
December 1990 (i.e. the value in the Arrest-Year-Input field should
be '90'). Equally, many records in YTDEXTRACT90 and YTDNOCREXT90
relate to offences, arrests or "no crimes" which were originally
reported at a much earlier period than the date when the record was
input: e.g. the earliest offences recorded in the Offence-Date-YY
field in YTDEXTRACT90 (excluding what are thought to be missing and
invalid values) appear to date from 1979.
Further details on the administrative background and contents of
this dataset and the other Crime Statistics System datasets are
given in the Series
Catalogue. |
|---|
| Digital processing and conversion | The data and associated source files were supplied to ULCC in plain
text format by the Metropolitan Police Service's Department of
Technology, using magnetic tapes created on an ICL Series 39
mainframe running VME.
At ULCC the tapes were copied to VME filestore using an ICL 3960
mainframe. From there they were transferred using VMX (Unix
emulator for VME) and FTP to the Unix host machine for the archive
control system.
The data files had been split into tape sections. Some files
were spread over multiple tape sections, some tapes contained
multiple files; images of these sections were preserved in their
original order. A printed schedule of the jobs that were run by the
Metropolitan Police (or their IT service provider) in order to
produce the tapes sent to NDAD was supplied with the dataset
documentation, and provided the key to recreating the original
files from the tape section images. Master copies of these files
were preserved, and working copies taken. Steps were also taken to
validate the transformed files (see Transformation validation).
The data is in fixed length character-based records whose format
varies slightly between different datasets, but is essentially the
same for the first six ME System datasets
(CRDA/1/DS/1/1-CRDA/1/DS/1/6). The record format for this dataset
differs slightly from that in CRDA/1/DS/1/2-CRDA/1/DS/1/6: from the
1991 dataset onwards, the size of the Offence-Report-Serial-No and
the Arrest-Report-Serial-No fields increased from 5 digits to 7
digits, and the fields Offence-Occupied, Victim-Relationship and
Victim-PC were introduced.
In addition to the data files, the tapes also contained a
library archive of over 200 files containing COBOL, SCL source code
for programs associated with the system, and a large file
(DBLOADLIB) containing metadata associated with the system,
extracted from the Metropolitan Police's corporate Data Dictionary.
This file is in ICL's proprietary DDCL (Data Dictionary Control
Language). These files had been stored on tape using a VME archive
function, Copy_Library_To_Tape (analogous to Unix tar);
the reverse process had to be executed to extract them from tape
LA0528. To facilitate the FTP file transfer from VME to Unix, the
files extracted from the archive file were concatenated into a
single file. On the NDAD host machine, a small Perl script was
written to extract the constituent files. The program files can be
consulted via the Dataset
Documentation Catalogue.
Basic sanity checks (file size, record count) were performed at
all stages of the transfer process. For further information, see Transformation validation. |
|---|
| Top of page | Conditions of access and use |
|---|
| Access conditions | |
|---|
| Top of page | Allied materials |
|---|
| Related units of description | |
|---|
| Associated material | |
|---|
| Publications produced by the
originating department | |
|---|
| Publications produced by
researchers working on the datasets | |
|---|
| Top of page | Structure |
|---|
| Logical structure and schema | This dataset consists of two flat files: YTDNOCREXT90 (containing
data relating to allegations classed as "No Crime"), and
YTDEXTRACT90 (containing all other data input to the database, i.e.
details of offences, arrests, victims and stolen property). These
files are entirely separate from each other and are not related by
any key fields. The dataset comprises the following table(s): | Table number | NDAD reference | Name | Title |
|---|
| 1 | CRDA/1/DS/1/1/1 | ytdextract90 | Year End Extract File, 1990 | | 2 | CRDA/1/DS/1/1/2 | ytdnocrext90 | No Crime file, 1990 |
|
|---|
| How data was originally captured and validated | |
|---|
| Constraints on the reliability of
the data | |
|---|
| Top of page | Validation |
|---|
| Content validation | No explicit record description was supplied for the data files, but
it was established that the YTDEXTRACT files in
CRDA/1/DS/1/1-CRDA/1/DS/1/6 were produced by the program COBEXT and
the YTDNOCREXT files in CRDA/1/DS/1/1-CRDA/1/DS/1/5 were produced
by the program COBNOCRIMEXT (these programs and other programs
transferred with the ME System datasets can be consulted via the
Dataset Documentation Catalogue).
COBEXT and COBNOCRIMEXT contained file descriptions which appeared
to match the data, and after some trial and error, a description
was produced which fitted the data. This was confirmed by comparing
the fields identified in the record with likely patterns of values,
and with lists of valid field values supplied by the Metropolitan
Police.
The source code supplied for COBEXT matches the record format
for the YTDEXTRACT files in CRDA/1/DS/1/2-CRDA/1/DS/1/6. Some
differences were found in the format of the YTDEXTRACT file in this
dataset, but these have been accounted for by assuming that they
were produced by an earlier version of COBEXT which we do not have.
By comparing data patterns in the record it was possible to infer
what the format of the 1990 record should be. The most significant
difference was that the fields for Report-Serial-No had been
lengthened from 5 to 7 digits.
Although confident that the record format has been accurately
described, a number of validation matters remain unresolved. Values
in the Division-Code and Subdivision-Code fields were compared with
lists of valid division and sub division codes provided by the
Metropolitan Police (see the Dataset
Documentation Catalogue, references CRDA/1/DD/3/3-5), and many
records apparently had invalid codes in these fields. We
conjectured, however, that these were codes that had been valid at
the time the data was collected, though not still current in the
Metropolitan Police: the 1990 Police and Constabulary Almanac
confirmed that a number of the apparently invalid values
represented police stations extant at that time, but which may
since have been closed or changed their two-letter code.2 Not all of the
missing values have been resolved in this way.
Validation of a number of other code and indicator fields has
presented comparable difficulties, and these are flagged in the
field descriptions. As all fields which could be thorougly
validated against external criteria (e.g. date fields) have tested
successfully, it is clear that NDAD has not been provided with
complete descriptions of valid code values for each year's dataset,
and values valid in one year may, for a variety of reasons, not
have been valid in preceding or succeeding years. |
|---|
| Transformation validation | Standard checks (file size, record count) were made on the data on
transfer from the VME to Unix file systems, and when the data files
were reconstituted from the tape sections (see Digital processing and conversion).
When copies
were taken, comparisons were made between records picked at random
from points at the top, middle and bottom of the file. In addition,
to further validate the data transferred using VMX and FTP, images
of the MP tape sections were made by mounting them onto a tape
drive connected directly to the archive control machine (Unix).
These images have been preserved, and translation utilities
(EBCDIC to ASCII) were used to make readable copies on the Unix
system. The process of reconstructing the original files from the
sections was again followed, and it was thus possible to validate
files produced by the other method against these. The only caveat
worth noting about this method is that it is necessary to remove a
number of 'duplicate' records at the end of some tape sections.
This is because the ICL hardware, when writing in blocks to tape,
does not flush the blocks before writing new data, and therefore
where there are not sufficient records in the final block to
exactly fill that block, records from the end of the penultimate
block will be written to tape again after the last record proper.
These are easily identified and removed from the final file.
Field and character-level validation was undertaken in tandem
with the content validation (see Content validation). We are confident that the
description of the record format is accurate, and that the values
in the fields are their original values, unaffected by any aspect
of the transfer process, even though values in some fields cannot
be matched to any documentation provided by the Metropolitan
Police. |
|---|
| Top of page | Links to related datasets |
|---|
| Related datasets |
|
|---|
| Top of page | Notes |
|---|
| | 1. Service-Level agreement between the Department of Computing Services and G10 Branch, 5 August 1991:
Dataset Documentation Catalogue
reference CRDA/1/DD/1/5/2. 2. R. Hazell and Company, Police and
Constabulary Almanac 1990: Official Register (Henley-on-Thames,
Oxon: R. Hazell and Company, 1990) |
|---|
| Top of page |
Last updated 2003-04-07 16:18:32
|
|
|