| Content validation | Although the Access database was clearly of great value to its users as a way of locating
documents, the data suffers from a lack of on-entry validation of field values, and enforcement
of referential integrity among its logical relationships. Inconsistencies in case ("Blink",
"BLINK"), punctuation ("VEN", "VEN.") and trailing spaces ("Mins", "Mins ") vitiate many links
between tables. The following points were noted in analysing this dataset, for comparison with
the previous dataset in this series):
Unmatched codes
Many records in the Documents table contain either null values or unmatched codes in fields
which should join to a corresponding lookup table. Some of the missing codes have been
established and are recorded in the Field Descriptions.
|
Field
|
Lookup table
|
Unmatched values
|
Total unmatched rows
|
| DocGroupID |
DocGroupID |
"BLINK", "FIN", "KOLD", "MBlin", "MBlink", "Sbib", "Scorr", "TSCH", "Wpap'",
"Mins " |
533 (210 null) |
| CompanyID |
CompanyID |
"AP", "BL", "BRB", "BREL", "BRI", "BRM", "BRPB", "BRT", "BYCN", "DA", "DC",
"Dee", "Doc", "DS", "FA", "FE", "GM", "GU", "HW", "JA", "JP", "LA", "LCC", "LG", "LI",
"LR", "NCC", "NU", "NTES", "OS", "RCL", "RDD", "RFD", "RES", "RZ", "SCC", "SF", "SM",
"TES", "TF3", "TRI", "TT", "WO", "WS" |
27186 (26360 null) |
| BoxNumber |
Boxes |
|
171 (171 null) |
| BROriginatorDepartment |
DeptCodes |
"BRB", "BRML", "BT", "Fin", "FP", "OF", "PBA", "Priv", "Sec", "Sol", "VEN." |
59084 (57391 null) |
Row numbers and Record IDs
The UniqueID field in the Documents table is a row number which was automatically generated by
Access as each new record was created. However, the highest value in UniqueID (93375) does not
match the number of records in the Documents table (93134), because of breaks in the sequence
(including the gap between UniqueID 90331 through 90341 (inclusive) identified in the earlier dataset).
As identified in the earlier dataset, for three records in the Documents table where ParentID
has been specified, miskeying by the person entering the data has resulted in clearly erroneous
references to non-existent Document records. For these records, NDAD has identified with
reasonable certainty what the real ParentID should be:
|
UniqueID
|
ParentID
|
ParentID should be
|
| 76334 |
94239 |
74239 |
| 78969 |
744451 |
74445 |
| 83605 |
741484 |
74148 |
Missing Values
The following is a summary for each field in the Documents table of the number of rows with
blank, null or spaces entered in that field (zeros are not, for the purpose of this analysis,
interpreted as missing values):
|
Field
|
Total missing
|
| Barcode |
87830 |
| OCR |
86 |
| Image |
86 |
| PartNo |
78148 |
| ParentID |
923 |
| Text1 |
20712 |
| Text2 |
82786 |
| Text3 |
88958 |
| Text4 |
1556 |
| CompanyID |
26360 |
| BROriginatorDept |
57391 |
| BROriginatorFunction |
91136 |
| BROriginatorPerson |
85294 |
| BROriginatorUserReference |
50278 |
| StartDate |
89484 |
| EndDate |
59162 |
| DocumentGroupID |
210 |
| BoxNumber |
171 |
| LocationImage |
1953 |
| LocationText |
1951 |
| LinkedDocument1 |
14360 |
| LinkedDocument2 |
14360 |
| WithdrawnFromArchiveBy |
93134 |
| WithdrawnFromArchiveDate |
93134 |
Additionally the record with UniqueID 92580 has a date in the DocumentGroupID field.
Derived tables
Of the records in the Documents table, 20076 have related entries in the Images table; 14693 have related records in the MultiSection table. The
Image records with UniqueID 90331 through 90341 inclusive (11 in all) have no corresponding
record in the Documents table. These records also occur in the MultiSection table. |
|---|