The National Archives

Friday 9 January

   
 
 NDAD: The National Digital Archive of Datasets
Welcome (home page) About NDAD Users Contributors  
Search Browse News Help (new window)  
 
 

Dataset details: CRDA/33/DS/1

1975-1976

 
 
Quick reference Full details
 
  View in hierarchy
 

Jump to :

  Context   |   Identity statement   |   Administrative context   |   Source of acquisition   |   Nature and content   |   Conditions of access and use   |   Allied materials   |   Structure   |   Validation   |   Related datasets  

Context

Children's Difficulties on Starting Infant School
Top of pagetop of page

Identity statement

Title 1975-1976
NDAD referenceCRDA/33/DS/1
Dates of creation of datasets[1975-1976]
Dates of contents of datasetsJanuary 1975-September 1976
Date of last input to datasets 1976
Date of last access to datasets
Extent of datasets1 dataset: 30.3 KB; 3 tables comprising: 'January_1975' (101 records), 'September_1975' (159 records), 'Follow-up' (114 records).
ISAD(G) level of description File
Top of pagetop of page

Administrative context

Aim and purpose
Statement of responsibility
Top of pagetop of page

Source of acquisition

Source of acquisition

The Children's Difficulties on Starting Infant School dataset was transferred by the Essex Data Archive on a single CD, as files containing fixed-length records, and was received by NDAD on the 29th of February 2000.

Top of pagetop of page

Nature and content

Scope and content

This dataset contains data from a study conducted by the Thomas Coram Research Unit on behalf of the Department of Health and Social Security. The study provides information relating to difficulties experienced by children on starting infant school, and factors affecting these difficulties. It includes a follow-up study of some of the children, to assess the extent to which initial difficulties persisted. The assessments were carried out by the children's teachers, who selected appropriate responses from a set list provided. In most cases the responses took the form of a choice of 4 options (ranging from the child coping well with the activity in question, to the child experiencing difficulty). Several questions simply required a number within a defined range (for example, 'Age in Months').

The main issues dealt with in the dataset concern the following aspects of the child's behaviour: settling in, co-operation with others, relationship with the teacher, concentration, use of play materials, self-reliance, verbalisation, ability to follow instructions, ability to cope with personal needs, sociability, physical co-ordination, fine motor control, and general difficulties. Further details of the administrative background of this dataset are provided in the Series Catalogue.

Digital processing and conversion

The dataset was transferred to NDAD as three converted data files, three SPSS syntax files, and a WinZip file, 1514img.zip, which contained digital versions of the paper documentation. The data files were named e514a.dat, e514b.dat, and e514c.dat, and were already in fixed-length record format and so required no further conversion by NDAD. Each data file contains nearly the same number of fields (20 fields containing useable data, plus either 9 or 10 extra fields containing unknown data or blanks, therefore file e514a.dat contains 29 fields in total, e514b.dat contains 30, and e514c.dat contains 29 fields). In each file the key is a concatenation of two fields, V1 and V2 (AREA and CHILD NUMBER).

The three SPSS syntax files were named e514a.sps, e514b.sps, and e514c.sps, and contained information on length of fields, 'variable labels' and 'value labels' for each field, and on 'missing values' where appropriate. This information has been used to catalogue the fields within each table. The 'variable labels' provided in SPSS have been used as descriptions within the table catalogues. Similarly the value labels are used in the encoding section for fields and the missing values in the missing value(s) section.

The names of the data files transferred by the Essex Data Archive were not the original names used within the Government Department, but relate to the Data Archive reference number. It was decided, therefore, to use alternative tablenames which would reflect the content. The Essex file named e514a.dat therefore became 'January_1975', the file e514b.dat became 'September_1975', and file e514c.dat became 'Follow-up'.

Top of pagetop of page

Conditions of access and use

Access conditions
Top of pagetop of page

Allied materials

Related units of description
Associated material
Publications produced by the originating department
Publications produced by researchers working on the datasets
Top of pagetop of page

Structure

Logical structure and schema

The dataset consists of three tables. Table 1 consists of records of 101 children, Table 2 has details of 159 children, and Table 3 has records of 114 children. The key fields 'V1' and 'V2' are the first two fields in each table. These fields uniquely identify individual children, as is clear from comparisons of Tables 2 and 3, which show that the same individuals are being studied. The follow-up data (Table 3) consists only of records of children mentioned previously in Table 2. Children mentioned in Table 1 have no follow-up data. Not every child from Table 2 is mentioned in Table 3, however (159 records in Table 2, as compared with 114 records in Table 3). Therefore, although the data in Tables 2 and 3 are considered to have a 1:1 relationship, not every record in Table 2 will have a corresponding record in Table 3.

The dataset comprises the following table(s):

Table numberNDAD referenceNameTitle
1CRDA/33/DS/1/1January_1975Initial data, January 1975 intake
2CRDA/33/DS/1/2September_1975Initial data, September 1975 intake
3CRDA/33/DS/1/3Follow-upFollow-up data, September 1975 intake
How data was originally captured and validated
Constraints on the reliability of the data

There are no known constraints on the data.

Top of pagetop of page

Validation

Content validation

In each table, there are several fields that are not defined in the original SPSS syntax file. For example the field Column5 contains records with the value '0'. It is not listed in the original syntax file, but the dataset documentation lists it as '0 -Missing data'. Fields like this were not defined as fields in the original data, and so did not have field names. But because the data is in fixed-length record format, any blanks or missing values in the data need to be defined. It was therefore necessary to define a new field for every gap between the existing defined fields. The field names (eg Column18-21) explain where in the data the gap occurred, and a length and data type is specified. See Table Catalogue for further descriptions of each field.

The following fields correspond with spaces in the data files which are not explained in the syntax files. These fields are explained in the dataset documentation (see Dataset Documentation Catalogue, reference CRDA/33/DD/2):

Table 1 Column5, Column11-15, Column 18-21, Column23, Column26, Column28-34, Column36, Column39-42, Column 47-78

Table 2 Column5, Column11-15, Column18-21, Column23, Column26, Column28-34, Column36, Column39-43, Column48-49, Column50-78

Table 3 Column 5, Column11-15, Column18-21, Column23, Column26, Column28-34, Column36, Column39-44, Column49-78

Several checks were carried out on the data (see Transformation Validation). These checks revealed no inconsistencies in the data, and only one very small inconsistency in the documentation. This inconsistency concerns the code for missing values, which was designated as '9'. The 'Notes for disentangling the data' document (see Dataset Documentation Catalogue, reference CRDA/33/DD/3/1) states that for each table each question from Item1 to Item 12 contains the missing value coding. The SPSS data dictionary file, however, states that only a few particular questions require the missing value code. The data itself corresponds most closely with the data dictionary file, and so this is the version that was used when the missing values were defined.

Transformation validation

Although no transformations were carried out on the data (it was transferred to NDAD in a format which required no alteration), several simple validation checks were carried out to ensure that the data was unchanged. No discrepancies were found, although several points concerning the original data arose during this process, which are discussed in section 7.1 Content Validation. The checks included counts of numbers of records per table, and number of fields. Several queries were performed on a selection of fields, checking aspects of the data such as area, gender, happiness of child at start. These were checked against the original data and no inconsistencies were found. Another check that was carried out was on the total number of responses for each choice given in a particular question. These totals were checked against Marginals information which was provided in the documentation (see Dataset Documentation Catalogue, references CRDA/33/DD/2/1, CRDA/33/DD/2/2, and CRDA/33/DD/2/3), and no discrepancies were found.

Top of pagetop of page

Links to related datasets

Related datasets

There are no related datasets in this series.

Top of pagetop of page

Last updated 2003-04-10 16:56:40

 
 

NDAD v3.0