The National Archives

Friday 9 January

   
 
 NDAD: The National Digital Archive of Datasets
Welcome (home page) About NDAD Users Contributors  
Search Browse News Help (new window)  
 
 

Dataset details: CRDA/37/DS/1

2000 snapshot

 
 
Quick reference Full details
 
  View in hierarchy
 

Jump to :

  Context   |   Identity statement   |   Administrative context   |   Source of acquisition   |   Nature and content   |   Conditions of access and use   |   Allied materials   |   Structure   |   Validation   |   Related datasets  

Context

British Rail Electronically Archived Documents
Top of pagetop of page

Identity statement

Title 2000 snapshot
NDAD referenceCRDA/37/DS/1
Dates of creation of datasets1997-1999
Dates of contents of datasets1990-1999
Date of last input to datasets 1999
Date of last access to datasets
Extent of datasets1 dataset: 17.28MB after conversion by NDAD; 8 tables comprising 147195 records
ISAD(G) level of description File
Top of pagetop of page

Administrative context

Aim and purpose
Statement of responsibility
Top of pagetop of page

Source of acquisition

Source of acquisition

67 CD-ROMs were transferred to NDAD by the British Railways Board between 23 October 2000 and 23 November 2000. A single CD-ROM containing the Microsoft Access database was received on 23 October 2000 and a further 66 CD-ROMs containing TIFF image files of documents from the BRB privatisation archive were transferred on 16 and 23 November 2000. It should be noted that the TIFF files were produced by Rank Xerox as part of their contract with the BRB to establish the imaging archive. The purpose of the files was to assist the integration of the images into the Optika FilePower system. They do not represent data that has been exported from the BRB's databases.

Top of pagetop of page

Nature and content

Scope and content

The Access database preserved here is derived from five databases created to simplify management of the British Rail Privatisation Archive. Each Document record should contain a brief description of the material to which the record relates, the department or subsidiary company that generated the material, correspondence reference, originator and time frame.

The database was designed to facilitate searches for documents based on several key attributes, most notably by originating British Railways Board department; by subsidiary company; and by document category.

For background information on the British Railways Board privatisation archive and the BREAD system see the Series Catalogue.

Digital processing and conversion

The Access database converted and preserved here is 'breadtab.mdb', one of three Access database files supplied on CD by BRB. 'Privarc.mdb', on the same CD, contained the same tables, structure, objects and data as 'breadtab.mdb', and is likely to reflect a backup or personal copy of the main database file. The other Access file on the CD was 'bread.mdb' a database with the same table structure, void of data (the Documents table was empty; the remaining tables were links to their namesakes in 'breadtab.mdb'), but with a number of forms, queries, macros and modules defined. This reflects the common practice of storing data separately from other database objects such as queries and forms. In addition, 'bread.mdb' includes a link to table Import1 in external database file 'Import1.mdb'. This database file was not provided, but examination of queries and macros in 'bread.mdb' suggests that this was a database file containing data ported between 'breadtab.mdb' and the Oracle/Optika system (see the Series Catalogue).

The object definitions of 'breadtab.mdb' and 'bread.mdb' have been exported using the Database Documenter tool supplied by Microsoft with Access 97, and are included among the Dataset Documentation (see the Dataset Documentation Catalogue, references CRDA/37/DD/4/2/1 and CRDA/37/DD/4/2/2).

The tables in 'bread.mdb' and 'breadtab.mdb' were exported from the database file in comma-separated format,using the standard conversion tools supplied by Microsoft with Access 97.

The three remaining tables have been created from data on the 64 CDs containing the scanned document images from the Oracle/Optika system. 'Images' effects the link between a document record in the Documents table and the file(s) containing its scanned image. In addition to the document images and related files, each CD also contained one file called MULTSECT.CSV and one called SEPSECT.CSV: these have been concatenated to create the derived tables 'MultipleSection' and 'SeparateSection'. These appear to implement relationships between document images comparable with the use of the ParentID field in the Documents table.

Top of pagetop of page

Conditions of access and use

Access conditions
Top of pagetop of page

Allied materials

Related units of description
Associated material
Publications produced by the originating department
Publications produced by researchers working on the datasets
Top of pagetop of page

Structure

Logical structure and schema

The data from the Microsoft Access database is contained in the first five tables listed below. Tables 6, 7 and 8 are derived tables. For a more detailed explanation of the three derived tables see Digital Processing and Conversion. The Documents table contains the bulk of the data; the remaining tables from the Access database are lookup tables with expanded details of some of the acronyms and codes in the Documents records. The Documents table also has a logical relationship with itself, implemented in the field ParentID: a document record may be deemed to be one of many parts of a 'parent' document, whose UniqueID is entered in the child record's ParentID field.

The dataset comprises the following table(s):

Table numberNDAD referenceNameTitle
1CRDA/37/DS/1/1DocumentsDocuments
2CRDA/37/DS/1/2BoxesBoxes
3CRDA/37/DS/1/3CompanyIDCompany ID
4CRDA/37/DS/1/4DeptCodesDepartmental Codes
5CRDA/37/DS/1/5DocGroupIDDocument Group ID
6CRDA/37/DS/1/6ImagesImages
7CRDA/37/DS/1/7MultiSectionMultiple Sections
8CRDA/37/DS/1/8SeparateSectionSeparate Sections
How data was originally captured and validated
Constraints on the reliability of the data
Top of pagetop of page

Validation

Content validation

Although the Access database was clearly of great value to its users as a way of locating documents, the data suffers from a lack of on-entry validation of field values, and enforcement of referential integrity among its logical relationships. The following points were noted:

Unmatched codes

Many records in the Documents table contain either null values or unmatched codes in fields which should join to a corresponding lookup table: the table below summarises this. Some of the missing codes have been established and are recorded in the Field Descriptions.

Field Lookup table Unmatched values Total unmatched rows
DocGroupID DocGroupID MBlink, KOLD, MBlin 304 (143 null)
CompanyIDCompanyIDAP, BL, BREL, BRI, BRM, BRPB, BRT, BYCN, DA, DC, Dee, Doc, FA, FE, GM, GU, HW, JA, JP, LA, LCC, LG, LI, LR, NCC, NU, OS, RCL, RDD, RES, RZ, SCC, SD, SM, TES, TF3, TRI, TT, WO27461 (26302 null)
BoxNumberBoxes169 (169 null)
BROriginatorDepartmentDeptCodesBRB, BRML, BT, FP, OF, PBA, VEN.58503 (57381 null)

Row numbers and Record IDs

The UniqueID field in the Documents table is a row number which was automatically generated by Access as each new record was created. However, the highest value in UniqueID (91270) does not match the number of records in the Documents table (91259), because there are no records with the UniqueID 90331 through 90341 (inclusive). This is the only break in the UniqueID sequence.

The last record in the Documents table (UniqueID = 91270) should be considered blank: all fields contain only the default values specified in the Access table definition. This is common in Access tables where the only field specified as mandatory is an automatically-generated primary key.

In three records in the Documents table where ParentID has been specified, miskeying by the person entering the data has resulted in clearly erroneous references to non-existent Document records. For these records, NDAD has identified with reasonable certainty what the real ParentID should be:

UniqueID ParentID ParentID should be
76334 94239 74239
78969 744451 74445
83605 741484 74148

Missing Values

The following is a summary for each field in the Documents table of the number of rows with blank, null or spaces entered in that field (zeros are not, for the purpose of this analysis, interpreted as missing values):

Field Total missing
Barcode 85955
OCR 86
Image 86
PartNo 76273
ParentID 924
Text1 22887
Text2 81003
Text3 87085
Text4 2487
CompanyID26302
BROriginatorDept 57381
BROriginatorFunction 89262
BROriginatorPerson 83689
BROriginatorUserReference 49980
StartDate 89118
EndDate 57396
DocumentGroupID 143
BoxNumber 169
LocationImage 1952
LocationText 1952
LinkedDocument 14361

Derived tables

Of the records in the Documents table, 20075 have related entries in the Images table. The Image records with UniqueID 90331 through 90342 inclusive (11 in all) have no corresponding record in the Documents table. These records also occur in the MultiSection table.

Transformation validation

The checks described in Content Validation above were run against the original Access database file, using Access's built-in data-query tools, and against the converted data using NDAD's online data-query tools: consistent results were obtained in both methods.

The derived tables have been checked against the files from which they were derived to confirm that they have the same number of rows, and a random sample of rows has been compared to confirm the transformation process.

Top of pagetop of page

Links to related datasets

Related datasets
NDAD referenceTitle (link leads to Dataset Catalogue)
CRDA/37/DS/22003 snapshot

Top of pagetop of page

Last updated 2007-07-05 17:16:01

 
 

NDAD v3.0