The National Archives

Friday 9 January

   
 
 NDAD: The National Digital Archive of Datasets
Welcome (home page) About NDAD Users Contributors  
Search Browse News Help (new window)  
 
 

Dataset details: CRDA/37/DS/2

2003 snapshot

 
 
Quick reference Full details
 
  View in hierarchy
 

Jump to :

  Context   |   Identity statement   |   Administrative context   |   Source of acquisition   |   Nature and content   |   Conditions of access and use   |   Allied materials   |   Structure   |   Validation   |   Related datasets  

Context

British Rail Electronically Archived Documents
Top of pagetop of page

Identity statement

Title 2003 snapshot
NDAD referenceCRDA/37/DS/2
Dates of creation of datasets1997-2001
Dates of contents of datasets1903-2001
Date of last input to datasets 2001?
Date of last access to datasets
Extent of datasets1 dataset: 14.71MB after conversion by NDAD; 5 tables comprising 104355 records
ISAD(G) level of description File
Top of pagetop of page

Administrative context

Aim and purpose
Statement of responsibility
Top of pagetop of page

Source of acquisition

Source of acquisition The dataset was transferred to NDAD in the form of a Microsoft Access database (breadtab.mdb) on a single CD-ROM on 20 August 2003.
Top of pagetop of page

Nature and content

Scope and content

The Access database preserved here is derived from five databases created to simplify management of the British Rail Privatisation Archive. Each Document record should contain a brief description of the material to which the record relates, the department or subsidiary company that generated the material, correspondence reference, originator and time frame.

The dataset derived from the Access database is an updated version of that transferred to NDAD in 2000 (see Links to related datasets). It contains additional records added to the Documents table, though the other content remains the same. NDAD did not receive additional copies of the documents transferred in 2000; but this updated version of the database acts as a finding aid for the documents in the privatisation archive. No further transfers of the database are expected, and this dataset reflects the final version of the database.

The database was designed to facilitate searches for documents based on several key attributes, most notably by originating British Railways Board department; by subsidiary company; and by document category. For background information on the British Railways Board privatisation archive and the BREAD system see the Series Catalogue.

Digital processing and conversion

The Access database converted and preserved here is 'breadtab.mdb'. This is an updated version of the main database file supplied for the preceding dataset in this Series (CRDA/37/DS/1).

The object definitions of 'breadtab.mdb' have been exported using the Database Documenter tool supplied by Microsoft with Access 97, and are included among the Dataset Documentation (see the Dataset Documentation Catalogue, reference CRDA/37/DD/4/2/3).

The tables in 'breadtab.mdb' were exported from the database file in comma-separated format (CSV), using the standard conversion tools supplied by Microsoft with Access 97. A standard procedure was followed for removing DOS line-endings and for converting to the ISO 8859(1) character set.

The three additional tables in dataset CRDA/37/DS/1 (Images, MultiSection, SeparateSection) created from data on the 64 CDs containing the scanned document images from the Oracle/Optika system, have been preserved with that dataset, but the Documents table in this dataset can be joined with those tables using the same keys as for the Documents table in CRDA/37/DS/1.

The forms and reports associated with this data are those in the file "bread.mdb" accessioned as part of CRDA/37/DS/1. Example screenshots and MS Access documenter output can be found in the Dataset Documentation Catalogue.

Top of pagetop of page

Conditions of access and use

Access conditions
Top of pagetop of page

Allied materials

Related units of description
Associated material
Publications produced by the originating department
Publications produced by researchers working on the datasets
Top of pagetop of page

Structure

Logical structure and schema

The tables in this dataset should be used in conjunction with the tables Images, MultiSection and SeparateSection that are included in the earlier dataset (see also Digital processing and conversion ).

The dataset comprises the following table(s):

Table numberNDAD referenceNameTitle
1CRDA/37/DS/2/1DocumentsDocuments
2CRDA/37/DS/2/2BoxesBoxes
3CRDA/37/DS/2/3CompanyIDCompany ID
4CRDA/37/DS/2/4DeptCodesDepartmental Codes
5CRDA/37/DS/2/5DocGroupIDDocument Group ID
How data was originally captured and validated
Constraints on the reliability of the data
Top of pagetop of page

Validation

Content validation

Although the Access database was clearly of great value to its users as a way of locating documents, the data suffers from a lack of on-entry validation of field values, and enforcement of referential integrity among its logical relationships. Inconsistencies in case ("Blink", "BLINK"), punctuation ("VEN", "VEN.") and trailing spaces ("Mins", "Mins ") vitiate many links between tables. The following points were noted in analysing this dataset, for comparison with the previous dataset in this series):

Unmatched codes

Many records in the Documents table contain either null values or unmatched codes in fields which should join to a corresponding lookup table. Some of the missing codes have been established and are recorded in the Field Descriptions.

Field Lookup table Unmatched values Total unmatched rows
DocGroupID DocGroupID "BLINK", "FIN", "KOLD", "MBlin", "MBlink", "Sbib", "Scorr", "TSCH", "Wpap'", "Mins " 533 (210 null)
CompanyID CompanyID "AP", "BL", "BRB", "BREL", "BRI", "BRM", "BRPB", "BRT", "BYCN", "DA", "DC", "Dee", "Doc", "DS", "FA", "FE", "GM", "GU", "HW", "JA", "JP", "LA", "LCC", "LG", "LI", "LR", "NCC", "NU", "NTES", "OS", "RCL", "RDD", "RFD", "RES", "RZ", "SCC", "SF", "SM", "TES", "TF3", "TRI", "TT", "WO", "WS" 27186 (26360 null)
BoxNumber Boxes 171 (171 null)
BROriginatorDepartment DeptCodes "BRB", "BRML", "BT", "Fin", "FP", "OF", "PBA", "Priv", "Sec", "Sol", "VEN." 59084 (57391 null)

Row numbers and Record IDs

The UniqueID field in the Documents table is a row number which was automatically generated by Access as each new record was created. However, the highest value in UniqueID (93375) does not match the number of records in the Documents table (93134), because of breaks in the sequence (including the gap between UniqueID 90331 through 90341 (inclusive) identified in the earlier dataset).

As identified in the earlier dataset, for three records in the Documents table where ParentID has been specified, miskeying by the person entering the data has resulted in clearly erroneous references to non-existent Document records. For these records, NDAD has identified with reasonable certainty what the real ParentID should be:

UniqueID ParentID ParentID should be
76334 94239 74239
78969 744451 74445
83605 741484 74148

Missing Values

The following is a summary for each field in the Documents table of the number of rows with blank, null or spaces entered in that field (zeros are not, for the purpose of this analysis, interpreted as missing values):

Field Total missing
Barcode 87830
OCR 86
Image 86
PartNo 78148
ParentID 923
Text1 20712
Text2 82786
Text3 88958
Text4 1556
CompanyID 26360
BROriginatorDept 57391
BROriginatorFunction 91136
BROriginatorPerson 85294
BROriginatorUserReference 50278
StartDate 89484
EndDate 59162
DocumentGroupID 210
BoxNumber 171
LocationImage 1953
LocationText 1951
LinkedDocument1 14360
LinkedDocument2 14360
WithdrawnFromArchiveBy 93134
WithdrawnFromArchiveDate 93134

Additionally the record with UniqueID 92580 has a date in the DocumentGroupID field.

Derived tables

Of the records in the Documents table, 20076 have related entries in the Images table; 14693 have related records in the MultiSection table. The Image records with UniqueID 90331 through 90341 inclusive (11 in all) have no corresponding record in the Documents table. These records also occur in the MultiSection table.

Transformation validation

Post-processing checks were carried out on the converted data file to ensure the number of records and fields was identical with the original database application, and anomalous field contents were verified against the original database.

Top of pagetop of page

Links to related datasets

Related datasets
NDAD referenceTitle (link leads to Dataset Catalogue)
CRDA/37/DS/12000 snapshot

Top of pagetop of page

Last updated 2007-07-05 17:19:48

 
 

NDAD v3.0