| |
|
|
|
| |
|
|
|
|
Top of page
|
Identity statement
|
| Title |
1999 snapshot
|
| NDAD reference |
CRDA/17/DS/1/1 |
| Dates of creation of datasets |
1982 |
| Dates of contents of datasets |
1982-1994 |
| Date of last input to datasets |
1994 |
| Date of last access to datasets |
|
| Extent of datasets |
1 dataset: 6.5 MB, 25828 records |
| ISAD(G) level of description |
File |
|
Top of page
|
Administrative context
|
| Aim and purpose |
|
| Statement of responsibility |
|
|
Top of page
|
Source of acquisition
|
| Source of acquisition |
The main Bat Data file was transferred from English Nature on 2 floppy discs as a zipped file containing a CSV (comma-separated values) file. The disks were received by NDAD on the 4th of March 1999. The lookup files were transferred on 19th November and 16th December 1998.
|
|
Top of page
|
Nature and content
|
| Scope and content |
The Bats dataset records information arising from enquiries about bats to the conservation agencies. Specifically it is a summary of bat enquiries received by, or notified to, English Nature which contain some useful information about bats or their roosts. Although primarily concerned with data from enquiries the system can also accept data from surveys etc., as long as the site can be identified. Its primary function is to gather information about bats, their roosts and interactions with humans and provide easily accessible records of past enquiries involving English Nature/NCC under the Wildlife and Countryside Act. It is not intended that the database should be used to log every phone call about bats and every enquiry or request for leaflets.
|
| Digital processing and conversion |
- All fields were enclosed in double inverted commas and were separated by commas.
- Any double inverted commas in text fields were replaced with single inverted commas.
- All dates were exported with a four figure year and are in the form dd/mm/yyyy (dates are actually held within ARev as the number of days since December 31st 1967 (day 0)).
- Within multi-valued fields, values are separated with the character "|".
- Text fields, such as comments, may contain commas and also the "|" character, indicating where lines were originally split.
The main data table was transferred as a CSV file so all the processing that was required was to remove the field names from the first line and convert the line endings to unix format. Similar minimal processing was required on the lookup tables but in addition duplicate key fields were removed from tables COUNTIES, ENQUIRY_TYPES, LPAS, ORGANISATION_CODES, SITE_TYPES, VICE_COUNTIES. (EN explained that the field was duplicated during export and advised that the first field should be removed from each of these tables).
The order of fields in the SPECIES_CODES table was found to differ from the data dictionary; English Nature confirmed that the fields should be in the latter order (and that the order in the file transferred to NDAD was purely a result of the export process) and therefore the data in the field RCODE was moved from position 3 to position 5 (ie to form the last field in the table).
During processing of the data in the BATDATA table, two apparently partially-corrupted records were detected. Inspection of the data held for the first of these, BATKEY=190261, shows that the data for 190261 is combined with data for a separate record, BATKEY=990955 (this conclusion was confirmed by EN). Presumably this occurred at some stage because of the odd characters in the LOCALITY field. There is a record for BATKEY 990955 (with exactly the same data) within the database. EN advised that record 190261 should be removed from the file, particularly as it is not present in their copy of the database; however for completeness it has been retained within the NDAD archive but with the duplicated data from record 990955 removed from the 190261 record.
The second partially-corrupted record, BATKEY=190366, has odd characters in the NOTES field and this appears to have caused at some stage data from record 003547 to be appended to the data for BATKEY 190366. There is a full record in the database for the former BATKEY (003547) and therefore the duplicated data has been removed from the record for BATKEY 190366.
The final change made to the BATDATA file during processing was the addition of a field to hold a partial grid reference - to allow access to the data (as agreed by EN in view of their wish to close the full grid reference - in order not to reveal the exact location of the bat sighting/roost), at the 10 kilometre square level. The new, derived field was produced by taking the two letters at the beginning of GREF, followed by the first of the Eastings grid line numbers and the first of the Northings. |
|
Top of page
|
Conditions of access and use
|
| Access conditions |
Certain fields within the British Bats dataset are closed. These include SITE_NAME; GREF; LOCALITY; ADDRESS; VISITORS; NOTES and SURNAME. While the data in the GREF field (grid reference for site) is closed, the data has been made available at the 10Km square level.
|
|
Top of page
|
Allied materials
|
| Related units of description |
|
| Associated material |
|
| Publications produced by the
originating department |
|
| Publications produced by
researchers working on the datasets |
|
|
Top of page
|
Structure
|
| Logical structure and schema |
The Bats datasets consist of a main table (Batdata), containing the details of the bat enquiry, together with a number of lookup tables which provide translation of the various encoded fields. Some of the tables contain 'multi-valued' fields (these are a feature of Pick-based database management systems), which means that a single field in the database holds a number of values. For instance the field SPECIES_CODE holds 0, 1 or many entries depending on how many bat species were recorded as present; the field BATCOUNT records the number present of each species; and the field RTYPE_CODE records how each of the species was identified. Relationships have been set up (within the NDAD archive, mirroring those within the original system) between the tables so that the meanings of encoded fields can be picked up from the lookup tables, but the linking does not function within NDAD for the records/fields that contain multi-values, and therefore the meanings of all the codes have also been included in the catalogues. The relationship between LPA_CODES in the COUNTIES table and the LPAS table has not been implemented within NDAD because nearly all (63 of the 67) entries in LPA_CODES are multi-values. In the original system, both fields DEAL_CODE and INVOLVED linked to the table ORGANISATION_CODES; within NDAD only one link can be implemented between two tables and therefore the latter relationship (ie INVOLVED to ORGANISATION_CODES) has not been implemented.
The table names, field names and original field descriptions are taken from the data dictionaries (but with any full stops in the table and field names replaced by underscores) which were printed by English Nature from their system and transferred to NDAD as paper documents (see the Dataset Documentation Catalogue, reference CRDA/17/DD/4/5). Thus, for example, the lookup table for types of enquiries is named ENQUIRY_TYPES rather the name of the file as transferred to NDAD (ENQTYPE). ARev allows for a field to have more than one name (and definition); NDAD has chosen to use the name which appeared to be the clearest or the main name. ARev also allows 'symbolic' (calculated) fields to be defined; these have not been preserved other than within the afore-mentioned data dictionaries; see also documents CRDA/17/DD/4/6 and CRDA/17/DD/4/7. It is worth noting that in early versions of Arev, the field type did not have to be defined and therefore for many of the fields the 'Generic type' column in the data dictionary is blank. Fields are defined as type F (data field) or S (symbolic field) and either S (Single-valued) or M (Multi-valued). (Field type G is a 'Group' field. EN have explained that within Arev a group of fields can be defined so that for instance all the fields in a group can be listed by just specifying the group field).
The text in curly brackets in the 'Further descriptive information' part of the field descriptions, eg {Maximum length 40}, has been taken from the data dictionaries, for more information, see the Dataset Documentation Catalogue, reference CRDA/17/DD/4/5. The dataset comprises the following table(s): | Table number | NDAD reference | Name | Title |
|---|
| 1 | CRDA/17/DS/1/1/1 | BATDATA | Bat Data | | 2 | CRDA/17/DS/1/1/2 | COUNTIES | Counties Lookup Table | | 3 | CRDA/17/DS/1/1/3 | ENQUIRY_TYPES | Enquiry Types Lookup Table | | 4 | CRDA/17/DS/1/1/4 | LPAS | LPAs (Local Planning Authorities) Lookup Table | | 5 | CRDA/17/DS/1/1/5 | ORGANISATION_CODES | Organisations Lookup Table | | 6 | CRDA/17/DS/1/1/6 | RECORD_TYPES | Record Types Lookup Table | | 7 | CRDA/17/DS/1/1/7 | SITE_TYPES | Site Types Lookup Table | | 8 | CRDA/17/DS/1/1/8 | SPECIES_CODES | Bat Species Lookup Table | | 9 | CRDA/17/DS/1/1/9 | VCOUNTY | Grid reference (10km2) Lookup Table | | 10 | CRDA/17/DS/1/1/10 | VICE_COUNTIES | Vice-Counties Lookup Table | | 11 | CRDA/17/DS/1/1/11 | VISIT_CODES | Visit Codes Lookup Table. |
|
| How data was originally captured and validated |
|
| Constraints on the reliability of
the data |
|
|
Top of page
|
Validation
|
| Content validation |
During the processing of the dataset, several anomalies were identified in the data:
|
Table
|
Anomaly
|
Further details
|
| BATDATA |
Two partially corrupted records. |
BATKEY=190261: odd characters in the LOCALITY field. BATKEY=190366: odd characters in the NOTES field after the text 'Owner likes b'. See 'Digital Processing and Conversion' for further information. |
| BATDATA |
Two records where the entry (VOG) in DISTRICT_CODE is not present in table LPAS |
BATKEY=330399, 330402. Possible that should be code for 'Vale of Glamorgan' which is VDG in table LPAS |
| BATDATA |
One record with incorrect grid reference (GREF) - contains a comma where there should be a digit. |
BATKEY=170283. |
| BATDATA |
Two records where the 10 Km grid ref part of GREF does not match a record in the VCOUNTY table |
BATKEY=170950 10KMGRID=TS35; BATKEY=170985 10KMGRID=GR66 |
| BATDATA |
One record with ENQTYPE blank |
BATKEY=190261. |
| BATDATA |
8 records with invalid entries in the SITE_TYPE field |
BATKEY=121060, 190000, 190263, 190264, 320569 (SITE_TYPE=HOUSE), BATKEY=310322 (SITE_TYPE=RESTAURANT), BATKEY=330349 (SITE_TYPE blank), BATKEY=990109 (SITE_TYPE=WATER MILL). |
| BATDATA |
25 records where the entry (5) in VISIT_CODE is not present in table VISIT_CODES |
BATKEY=003643, 001683, 130045, 003773, 003232, 004425, 004729, 004498, 002934, 002894, 001911, 005860, 004926, 003496, 004468, 004466, 001035, 004500, 002290, 004866, 003543, 005142, 001192, 001044, 001515. |
| BATDATA |
1 records with invalid entry in SPECIES_CODE ie entry not present in table SPECIES_CODES |
BATKEY=151607, SPECIES_CODE=00. |
| BATDATA |
1 record with invalid entry in RTYPE_CODE ie entry not present in table RECORD_TYPES |
BATKEY= 140132, RTYPE_CODE = 6. |
| BATDATA |
3 records with invalid dates. |
BATKEY=170913, DATE & VISIT_DATE =04/06/0888. BATKEY=180837, VISIT_DATE=08/08/0909. BATKEY=325248, DATE=04/10/0936. |
| BATDATA |
7 records with 'unlikely' dates. |
BATKEY=005033, DATE=01/11/1888. BATKEY=005275, DATE & VISIT_DATE=11/11/1888. BATKEY=005268, DATE=11/11/1888. BATKEY=004562, DATE & VISIT_DATE=20/06/2008. BATKEY=004782, DATE & VISIT_DATE=22/02/2008. BATKEY=171173,DATE & VISIT_DATE= 01/01/2011. BATKEY=171566, DATE & VISIT_DATE=01/01/2011 |
| COUNTIES |
One COUNTY for which the Local Authorities (LPA_CODES) are not listed. |
CNUMB=60 |
| VCOUNTY |
The Channel Isles (Vice county code 113), is present in VICE_COUNTIES but not in VCOUNTY. |
- |
- All DATES in BATDATA are between 1980 and 1994 except those listed above and 3 dates in the 1960/70s: BATKEY=330316, DATE & VISIT_DATE=06/10/1966. BATKEY=170848, DATE & VISIT_DATE=13/07/1977. BATKEY=310327, DATE=101/01/1978.
- Two records in BATDATA (BATKEY=140881, 130330) where there is no data in the SITE_NAME, DISTRICT, LOCALITY and ADDRESS fields.
- A number of records in BATDATA have differing numbers of values in the SPECIES_CODE, BATCOUNT and RTYPE_CODE fields.
Some do include the separator eg BATKEY=000986 has 2 species listed along with 2 types of identification and the BATCOUNT field = '|2'. In other cases, where there is more than one type of bat recorded, the single figure in the BATCOUNT or RTYPE_CODE fields presumably indicates the number of, or type of identification used for, all the species of bats detected. Some entries have more than one RTYPE_CODE for one entry in SPECIES_CODE; presumably this indicates that the single species of bat detected was identified by more than one method.
- 3 records in VCOUNTY where the OS sheet numbers (field SHEET) are not given, TKMSQ=NL82 , NM10, NO72. 1 record (TKMSQ=TF54) where the BRC_COUNTY is blank.
|
| Transformation validation |
Sample checks were carried out to compare the transformed data against the original data. These included carrying out plausibility checks on the data, comparing the value of specific fields and checking that the overall number of records and fields remained the same. No discrepancies were detected between the original and the transformed data. During the spot checking, it was confirmed that, as in the original data, there are no records for Species code 9, the Mouse-eared bat.
|
|
Top of page
|
Links to related datasets
|
| Related datasets |
There are no related datasets in this series.
|
|
Top of page
|
Last updated 2007-07-09 14:13:38
|
|
|