| |
|
|
|
| | | | Top of page | Identity statement |
|---|
| Title | Financial year dataset for 1994 |
|---|
| NDAD reference | CRDA/1/DS/1/6 |
|---|
| Dates of creation of datasets | 1994 |
|---|
| Dates of contents of datasets | 1988-1994 |
|---|
| Date of last input to datasets | Sep 1994 |
|---|
| Date of last access to datasets | |
|---|
| Extent of datasets | 1 dataset: 78.5 Mb after conversion by NDAD; 1 table, YTDEXTRACT95 (571276 records) |
|---|
| ISAD(G) level of description | File |
|---|
| Top of page | Administrative context |
|---|
| Aim and purpose | |
|---|
| Statement of responsibility | |
|---|
| Top of page | Source of acquisition |
|---|
| Source of acquisition | This dataset (CRDA/1/DS/1/6) was transferred to NDAD from the
Metropolitan Police on 19 9-track 6250 bpi open reel magnetic tapes
which were received on 23 February 1998. These tapes also
contained:
- Calendar year datasets for 1990-1992
(CRDA/1/DS/1/1-CRDA/1/DS/1/3).
- Financial year datasets for 1992-1993 and 1993-1994
(CRDA/1/DS/1/4-CRDA/1/DS/1/5).
- A financial year dataset covering 1994-1995 (CRDA/1/DS/2/1) and
another covering two years, 1995-1997 (CRDA/1/DS/2/2). These
datasets reflect changes to the ME System which were made to
facilitate interoperability with the CRIS System.
- A dataset which is believed to correspond to the data in the
Crime Statistics database when it was taken out of service
(CRDA/1/DS/3/1).
- Over 200 files containing Cobol, SCL and Data Dictionary source
code.
For links to the catalogues of the other ME System datasets, see
Links to related datasets. Details
of the source code transferred with the datasets and paper
documentation relating to the ME System can be found in the Dataset Documentation Catalogue. |
|---|
| Top of page | Nature and content |
|---|
| Scope and content | This dataset appears to contain records which were added to the
Metropolitan Police's Crime Statistics System between April and
September 1994. It is the last of the ME System datasets whose file
format predates the changes to the System which were made to
facilitate the conversion to CRIS. It includes details of offences,
arrests, victims of crime and property stolen. Unlike the preceding
five datasets (CRDA/1/DS/1/1-CRDA/1/DS/1/5), "No Crimes"
("allegations where the evidence is insufficient to establish that
a crime has been committed") are not
included in this dataset.1 There is thus only a single Extract file,
YTDEXTRACT95. The program that produced this file was originally
written in ICL 2900 COBOL using extensions to the language designed
for handling an IDMSX database.
It is thought that like the two preceding datasets for 1992-1993
and 1993-1994 (CRDA/1/DS/1/4-CRDA/1/DS/1/5), this dataset was
originally intended to cover a financial year: i.e. it would have
included records input to the ME System between April 1994 and
March 1995. For reasons which are unknown, the dataset does not
contain any records which were added to the system after September
1994. In some cases the entry in the Offence-Year-Input field
predates April 1994 if the record relates to an arrest which was
input after April 1994. Equally, many records relate to offences
and arrests which were originally reported at a much earlier period
than the date when the record was input: the earliest entries in
the Offence-Date-YY field relate to offences which were reported in
1988 (these relate to arrests which were added after April
1994).
The time period covered by this dataset suggests that the data
in it should be duplicated in the financial year dataset for
1994-1995 (CRDA/1/DS/2/1): see Links to
related datasets. Further details on the administrative background and contents of this dataset and the other Crime Statistics System datasets are given in the Series Catalogue. |
|---|
| Digital processing and conversion | The data and associated source files were supplied to ULCC in plain text
format by the Metropolitan Police Service Department of Technology, using
magnetic tapes created on an ICL Series 39 mainframe running VME.
At ULCC the tapes were copied to VME filestore using an ICL 3960 mainframe.
From there they were transferred using VMX (Unix emulator for VME) and
FTP to the Unix host machine for the archive control system.
The data files had been split into tape sections. Some files were spread
over multiple tape sections, some tapes contained multiple files; images
of these sections were preserved in their original order. A printed schedule
of the jobs that were run by the Metropolitan Police (or their IT service
provider) in order to produce the tapes sent to NDAD was supplied with
the dataset, and provided the key to recreating the original files from
the tape section images. Master copies of these files were preserved, and
working copies taken. Steps were also taken to validate the transformed files (see Transformation validation).
The data is in fixed length character-based records whose format varies
slightly between different years' datasets, but is essentially the same.
In addition to the data files, the tapes also contained a library archive
of over 200 files containing COBOL, SCL source code for programs associated
with the system, and a large file (DBLOADLIB) containing metadata associated
with the system, extracted from the Metropolitan Police's corporate Data
Dictionary. This file is in ICL's proprietary DDCL (Data Dictionary Control
Language). These files had been stored on tape using a VME archive function,
Copy_Library_To_Tape (analogous to Unix tar); the reverse process
had to be executed to extract them from tape LA0528. To facilitate the
FTP file transfer from VME to Unix, the files extracted from the archive
file were concatenated into a single file. On the NDAD host machine, a
small Perl script was written to extract the constituent files. The program
files can be consulted via the Dataset
Documentation Catalogue.
Basic sanity checks (file size, record count) were performed at all
stages of the transfer process. For further validation, see Transformation validation. |
|---|
| Top of page | Conditions of access and use |
|---|
| Access conditions | |
|---|
| Top of page | Allied materials |
|---|
| Related units of description | |
|---|
| Associated material | |
|---|
| Publications produced by the
originating department | |
|---|
| Publications produced by
researchers working on the datasets | |
|---|
| Top of page | Structure |
|---|
| Logical structure and schema | The dataset consists of one flat file, YTDEXTRACT95, containing details of offences, arrests, victims and stolen property. The dataset comprises the following table(s): | Table number | NDAD reference | Name | Title |
|---|
| 1 | CRDA/1/DS/1/6/1 | ytdextract95 | Year End Extract File, 1995 |
|
|---|
| How data was originally captured and validated | |
|---|
| Constraints on the reliability of
the data | |
|---|
| Top of page | Validation |
|---|
| Content validation | No explicit record description was supplied for the data files, but it
was established that the YTDEXTRACT files in the datasets CRDA/1/DS/1/1-CRDA/1/DS/1/6 were produced by
the program COBEXT. This program and other programs transferred with the
ME System datasets can be consulted via the Dataset
Documentation Catalogue. COBEXT contained file descriptions which
appeared to match the data, and after some trial and error, a description
was produced which fitted the data. This was confirmed by comparing the
fields identified in the record with likely patterns of values, and with
lists of valid field values supplied by the Metropolitan Police. The source
code supplied for COBEXT matches the record format in CRDA/1/DS/1/2-CRDA/1/DS/1/6.
Although confident that the record format has been accurately described,
a number of validation matters remain unresolved. Values in the Division-Code
and Subdivision-Code fields were compared with lists of valid division
and subdivision codes provided by the Metropolitan Police (see the
Dataset Documentation Catalogue, references CRDA/1/DD/3/3-5). Many records
apparently had invalid codes in these fields. We conjectured, however,
that these were codes that had been valid at the time the data was collected,
though not still current in the Metropolitan Police: the 1995 Police and
Constabulary Almanacconfirmed that a number
of the apparently invalid values represented police stations extant at
that time, but which may since have been closed or changed their two-letter
code.2 Not all of the
missing values have been resolved in this way.
Validation of a number of other code and indicator fields has presented
comparable difficulties, and these are flagged in the field descriptions.
As all fields which could be thorougly validated against external criteria
(e.g. date fields) have tested successfully, it is clear that NDAD has
not been provided with complete descriptions of valid code values for each
year's dataset, and values valid in one year may, for a variety of reasons,
not have been valid in preceding or succeeding years.
The following records were found to contain data that could not be reconciled
with any published description of the data, but this is unlikely to mean
more than that the program which produced the Extract files did not zeroise
fields in redundant parts of the record.
| Record | Field | Error | | 191043 | Offence-Weapon-Used-Ind | Field value "." is not a valid choice | | 424289 | Offence-Weapon-Used-Ind | Field value "." is not a valid choice | | 424290 | Offence-Weapon-Used-Ind | Field value "." is not a valid choice | | 561959 | Offence-Weapon-Used-Ind | Field value "." is not a valid choice | | 561960 | Offence-Weapon-Used-Ind | Field value "." is not a valid choice | |
|---|
| Transformation validation | Standard checks (file size, record count) were made on the data on transfer
from the VME to Unix file systems, and when the data files were reconstituted
from the tape sections (see Digital processing and conversion). When copies were taken, comparisons were made between
records picked at random from points at the top, middle and bottom of the
file. In addition, to further validate the data transferred using VMX and FTP,
images of the MP tape sections were made by mounting them onto a tape
drive connected directly to the archive control machine (Unix). These
images have been preserved, and translation utilities (EBCDIC to ASCII)
were used to make readable copies on the Unix system. The process of
reconstructing the original files from the sections was again followed,
and it was thus possible to validate files produced by the other method
against these. The only caveat worth noting about this method is that it
is necessary to remove a number of 'duplicate' records at the end of some
tape sections. This is because the ICL hardware, when writing in blocks to
tape, does not flush the blocks before writing new data, and therefore
where there are not sufficient records in the final block to exactly fill
that block, records from the end of the penultimate block will be written
to tape again after the last record proper. These are easily identified
and removed from the final file.
Field and character-level validation was undertaken in tandem with the
content validation (see Content validation).
We are confident that the description of the record format is accurate,
and that the values in the fields are their original values, unaffected
by any aspect of the transfer process, even though values in some fields
cannot be matched to any documentation provided by the Metropolitan Police. |
|---|
| Top of page | Links to related datasets |
|---|
| Related datasets |
|
|---|
| Top of page | Notes |
|---|
| | 1. Service-level agreement between the Department of Computing Services and G10 Branch, 5 August 1991: Dataset Documentation Catalogue, reference CRDA/1/DD/1/5/2. 2. R. Hazell and Company, Police and Constabulary Almanac 1995: Official Register (Henley-on-Thames, Oxon: R. Hazell and Company, 1995). |
|---|
| Top of page |
Last updated 2003-04-09 16:30:02
|
|
|