| |
|
|
|
| | | | Top of page | Identity statement |
|---|
| Title | Final database (1992-1997) |
|---|
| NDAD reference | CRDA/1/DS/3/1 |
|---|
| Dates of creation of datasets | 1997 |
|---|
| Dates of contents of datasets | 1950-1997 |
|---|
| Date of last input to datasets | Mar 1997 |
|---|
| Date of last access to datasets | |
|---|
| Extent of datasets | 1 dataset: 1.04 GB after conversion by NDAD; 1 table, DBEXTRACT (7748948 records) |
|---|
| ISAD(G) level of description | File |
|---|
| Top of page | Administrative context |
|---|
| Aim and purpose | |
|---|
| Statement of responsibility | |
|---|
| Top of page | Source of acquisition |
|---|
| Source of acquisition | This dataset - thought to represent the data in the Crime Statistics database when it was taken out of service - was transferred to NDAD from the Metropolitan Police on 19 9-track 6250 bpi open reel magnetic tapes which were received on 23 February
1998. These tapes also contained:
- Calendar year datasets for 1990-1992
(CRDA/1/DS/1/1-CRDA/1/DS/1/3).
- Financial year datasets for 1992-1993, 1993-1994 and 1994
(CRDA/1/DS/1/4-CRDA/1/DS/1/6). Although based on a financial year
rather than a calendar year, these datasets have essentially the
same structure as CRDA/1/DS/1/1-3, the only exception being that
CRDA/1/DS/1/6 lacks a "No Crimes" file.
- A financial year dataset covering 1994-1995 (CRDA/1/DS/2/1) and another covering two years, 1995-1997 (CRDA/1/DS/2/2). These datasets reflect changes to the ME System which were made to facilitate interoperability with the CRIS System.
- Over 200 files containing Cobol, SCL and Data Dictionary source
code.
For links to the catalogues of the other ME System datasets, see
Links to related datasets. Details
of the source code transferred with the datasets and paper
documentation relating to the ME System can be found in the Dataset Documentation Catalogue. |
|---|
| Top of page | Nature and content |
|---|
| Scope and content | This dataset comprises an extract file, DBEXTRACT, which is thought to represent all of the data in the Metropolitan Police's
Crime Statistics System at the time when the database was taken out of service, apparently in 1997. It consists of data which was input to the system between 1992 and 1997. It has the same file format as its two immediate predecessors, the dataset covering the 1994-1995 financial year (CRDA/1/DS/2/1), and that covering the 1995-1996 and 1996-1997 financial year (CRDA/1/DS/2/2). In other words, it
reflects the changes to the ME System which are believed to have been
made to facilitate the phasing-in of CRIS at a time when the ME and
CRIS systems were running simultaneously (see the Series Catalogue for further details). The program
which produced DBEXTRACT was originally written in ICL 2900
COBOL using extensions to the language designed for handling an
IDMSX database.
Although this dataset contains data which was added to the
Crime Statistics System between February 1992 and March 1997 (based on the values
recorded in the INPUT-YEAR and INPUT-MONTH fields), many records
relate to offences which were originally reported at a much earlier
date. The earliest entries in the REPORTED-YEAR field date from
1950. The fact
that the dataset runs into 1997, like CRDA/1/DS/2/1/2, indicates that the ME System
continued to be used for a brief period after the CRIS System
became fully operational in October 1996. It can be assumed that most of the data will be duplicated in the other Crime Statistics System datasets, which represent exports made from the system at periodic intervals (usually annually) between 1990 and 1997 (see Links to related datasets).
Further details on the administrative background and contents of
this dataset and the other Crime Statistics System datasets are
given in the Series
Catalogue. |
|---|
| Digital processing and conversion | The data and associated source files were supplied to ULCC in plain text format by the Metropolitan Police Service Department of Technology, using magnetic tapes created on an ICL Series 39 mainframe running VME.
At ULCC the tapes were copied to VME filestore using an ICL 3960 mainframe. From there they were transferred using VMX (Unix emulator for VME) and FTP to the Unix host machine for the archive control system.
The data files had been split into tape sections. Some files were spread over multiple tape sections, some tapes contained multiple files; images of these sections were preserved in their original order. A printed schedule of the jobs that were run by the Metropolitan Police (or their IT service provider) in order to produce the tapes sent to NDAD was supplied with the dataset, and provided the key to recreating the original files from the tape section images. Master copies of these files were preserved, and working copies taken. Steps were also taken to validate the transformed files (see Transformation validation).
The data is in fixed length character-based records whose format varies slightly between different years' datasets, but is essentially the same for this dataset and the financial year datasets for 1994-1995 and 1995-1997 (CRDA/1/DS/2/1-2).
In addition to the data files, the tapes also contained a library archive of over 200 files containing COBOL, SCL source code for programs associated with the system, and a large file (DBLOADLIB
) containing metadata associated with the system, extracted from the Metropolitan Police's corporate Data Dictionary. This file is in ICL's proprietary DDCL (Data Dictionary Control Language). These files had been stored on tape using a VME archive function, Copy_Library_To_Tape (analogous to Unix tar); the reverse process had to be executed to extract them from tape LA0528. To facilitate the FTP file transfer from VME to Unix, the files extracted from the archive file were concatenated into a single file. On the NDAD host machine, a small Perl script was written to extract the constituent files. The program files can be consulted via the Dataset Documentation Catalogue.
Basic sanity checks (file size, record count) were performed at all stages of the transfer process. For further information, see Transformation validation. |
|---|
| Top of page | Conditions of access and use |
|---|
| Access conditions | |
|---|
| Top of page | Allied materials |
|---|
| Related units of description | |
|---|
| Associated material | |
|---|
| Publications produced by the
originating department | |
|---|
| Publications produced by
researchers working on the datasets | |
|---|
| Top of page | Structure |
|---|
| Logical structure and schema | The dataset consists of one table, DBEXTRACT, containing
details of offences, arrests, clear-ups, victims of crime and "no
crimes". Because of the exceptionally large size of this table (1.04 GB), users are advised that searches and downloads may take substantially longer than usual. The dataset comprises the following table(s): | Table number | NDAD reference | Name | Title |
|---|
| 1 | CRDA/1/DS/3/1/1 | dbextract | Database Extract File 1992-1997 |
|
|---|
| How data was originally captured and validated | |
|---|
| Constraints on the reliability of
the data | |
|---|
| Top of page | Validation |
|---|
| Content validation | No explicit record description was supplied for the data files, but it was established that the files in the datasets CRDA/1/DS/2/1-CRDA/1/DS/2/2 and CRDA/1/DS/3/1 were produced by the program COBEXTNEW (this program and other programs transferred with the ME System datasets can be consulted via the
Dataset Documentation Catalogue). COBEXTNEW contained file descriptions which appeared to match the data, and after some trial and error, a description was produced which fitted the data. This was confirmed by comparing the fields identified in the record with likely patterns of values, and with lists of valid field values supplied by the Metropolitan Police. The field names in DBEXTRACT are taken from COBEXTNEW, except that:
- The HO-CLASS-CODE and HO-SUBCLASS-CODE fields are represented as a single field, HO-CODE.
- PROPERTY-1, PROPERTY-2, PROPERTY-3 and PROPERTY-4 are each represented as two separate fields, PROPERTY-1-MAJOR, PROPERTY-1-MINOR, etc.
- VENUE-CODE is also represented as two fields, VENUE-CODE-MAJOR and VENUE-CODE-MINOR.
- The three fields called VERSION have been renamed VERSION-OFFENCE, VERSION-ARREST and VERSION-VICTIM to avoid confusion.
Although confident that the record format has been accurately described, a number of validation matters remain unresolved. Values in the DIVN-CODE, OFFENCE-SUBDIVISION-CODE and ARREST-SUBDIVISION-CODE fields were compared with lists of valid division and subdivision codes provided by the Metropolitan Police (see the
Dataset Documentation Catalogue, references CRDA/1/DD/3/3-5). Many records apparently had invalid codes in these fields. We conjectured, however, that these were codes that had been valid at the time the data was collected, though not still current in the Metropolitan Police, and this was confirmed by referring to the 1994
Police and Constabulary Almanac which listed several codes for police stations extant at that time, but which may since have been closed or changed their two-letter code.
1
However not all of the missing values could be resolved in this way.
The following discrepancies between the data descriptions and the data were noted:
- 263965 records contain invalid entries for DATE-OF-OFFENCE. (In many cases the month or day part of the field is set to zero.)
- 1269 records contain invalid entries for ARREST-DATE-ARRESTED (values start with "A" or a space, or have the month part set to "00")
- 91 records have "A" or "S" in the ARREST-AGE field.
- 8 records have non-integer values in the ARREST-REPORT-SERIAL-NO field
- 3 records have non-integer values in the ARREST-ETHNIC-CODE field
Whether these values are intended to be valid, or merely result from the extract program failing to reinitialise redundant parts of the record, is difficult to ascertain in a file of this size. (Users should note that this is a very large data file, containing 7,748,948 records, and queries on the data may take several minutes to return results.)
Validation of a number of other code and indicator fields has presented comparable difficulties, and these are flagged in the field descriptions. As all fields which could be thoroughly validated against external criteria have tested successfully, it is clear that NDAD has not been provided with complete descriptions of valid code values for each year's dataset, and values valid in one year may, for a variety of reasons, not have been valid in preceding or succeeding years. |
|---|
| Transformation validation | Standard checks (file size, record count) were made on the data on transfer from the VME to Unix file systems, and when the data files were reconstituted from the tape sections (see
Digital processing and conversion). When copies were taken, comparisons were made between records picked at random from points at the top, middle and bottom of the file. In addition, to further validate the data transferred using VMX and FTP, images of the MP tape sections were made by mounting them onto a tape drive connected directly to the archive control machine (Unix). These images have been preserved, and translation utilities (EBCDIC to ASCII) were used to make readable copies on the Unix system. The process of reconstructing the original files from the sections was again followed, and it was thus possible to validate files produced by the other method against these. The only caveat worth noting about this method is that it is necessary to remove a number of 'duplicate' records at the end of some tape sections. This is because the ICL hardware, when writing in blocks to tape, does not flush the blocks before writing new data, and therefore where there are not sufficient records in the final block to exactly fill that block, records from the end of the penultimate block will be written to tape again after the last record proper. These are easily identified and removed from the final file.
Field and character-level validation was undertaken in tandem with the content validation (see
Content validation). We are confident that the description of the record format is as accurate a reflection of the data as possible, given the limited original metadata available; the values in the fields are their original values, unaffected by any aspect of the transfer process, even though some values cannot be matched to any documentation provided by the Metropolitan Police. |
|---|
| Top of page | Links to related datasets |
|---|
| Related datasets |
|
|---|
| Top of page | Notes |
|---|
| | 1.
R. Hazell and Company, Police and Constabulary Almanac 1994: Official Register (Henley-on-Thames, Oxon: R. Hazell and Company, 1994). |
|---|
| Top of page |
Last updated 2004-06-24 14:45:09
|
|
|