| Content validation | The data was checked for discrepancies and inconsistencies.
In the Elderly_People table (Table 1, reference CRDA/34/DS/1/1),
the following discrepancies were found:
| Field |
Error |
| ConsultInter |
There are two code values, 3 and 4, whose meaning is not known.
These have been described as 'Unknown'. |
It was noticed that in several questions the coding, as
described in the original questionnaire, differed slightly from the
coding used in the data files. This occurred in several fields, for
example, M1YearofBirth, M52/53TimesinHosp (where 2 questions were
merged together), also M56/57AttOPOrDayH, M55/58, and IP27. Other
fields contained coding which was not clear, due to a lack of
explanation in the documentation or to abbreviated codes. For
example, M65RegHelp3, M83ConfusionScore, IP3HomeVisits.
In the second part of this table (the fields described as
belonging to Card 2), there are several fields which do not fit any
questions in the existing questionnaires. Although field names and
codes are available for these fields, it is not possible to
discover what the data itself relates to, nor even what the codes
signify (for example the fields called 'Move', 'Pain', 'Sleep', etc.
all contain a code 'NHP not comp', which has no explanation in the
documentation, and so cannot be fully understood). It is possible
that these fields relate to the Health Profile (Dataset Documentation Catalogue
reference CRDA/34/DD/2/2), but there is no documentation explaining
this.
At the end of this table (in the section described as Card 6,
from variable 29 onwards), there are several fields for which
nothing is known. There is no documentation which provides field
names, or details of coding, or a context for the data. These
fields contain mostly blank records, and only a few coded
responses.
In the General_Practitioners table (Table 2, reference
CRDA/34/DS/1/2), there were no problems with the data itself.
However, the meaning of the second half of the table, which
consists of fields from Card 8, is not clear. There is no
documentation available which adequately explains these fields, and
although the details of coding and field names are available, it is
not known to which questionnaire they refer. It is probable that
the questions relate to general practitioners, and that they are
perhaps interpretations from other questions in the interviews,
rather than questions which were actually asked of the GP
directly.
There are also some unclear abbreviations in the coding for
several fields in this table, namely fields VP30ImpElderHelp2,
VP30ImpElderHelp3, and VP30ImpElderHelp4.
In the Medicines table (Table 3, reference CRDA/34/DS/1/3), the
following discrepancies were found:
| Field |
Error |
| GM17HowLongTake |
The code for the response 'Uncertain' is described in the
documentation as having a value '7'. However, in the data file, the
value is '9'. |
| ISSMCClassCode |
There are 33 occurrences of ISSMC code '090', for which there
is no documentation, or corresponding BNF code. This has therefore
been described as 'Unknown code'. |
The field ISSMCClassCode, at the beginning of this table, has a
length of 3 digits. It contains the Institute for Social Studies in
Medical Care (ISSMC) classification codes. These codes are
different from, but relate to, the standard drug codes published
twice yearly by the British National Formulary (BNF). In the
documentation accompanying the dataset, both the BNF and ISSMC
codes have been listed, and so both sets of codes are provided for
this field.
It is still not clear which volume of the BNF is being consulted
for the data on the code sheets. Since these codes are updated
every six months, the information is of limited value unless it is
known which volume of the BNF they are from. This information does
not appear in the documentation.
Several fields in this table were found to contain slightly
different coding to the originals in the questionnaires. These
fields included: PM4Produced/5Container, which consists of two
questions merged together, MCategory, PM20TypeofGP,
PM24HowOftenTake. There are also several fields, such as
PM8QuantityonLabel, PM9Directions, PM11AvoidAlcohol,
PM12DateonLabel, M59/60WhyStoppedTaking, PM31ImpTakeAdvised, for
which the codes appear to have been calculated at a later date.
In this table, as in Table 1, there are several fields for which
there is little or no documentation. These include the fields
prefixed PH, which are questions asked of pharmacists about
medicines taken by elderly people. The fields prefixed HD, which
are questions asked of hospital doctors about medicines taken by
elderly people, also have no documentation. No questionnaires are
available which contain these questions.
There are other parts of the survey which are not clearly linked
to any part of the data files transferred to NDAD. For example, it
is not clear where the information on Consultants' Views and
Practices is recorded. This was one of the questionnaires included
in the survey, as is stated in the documentation. The questionnaire
itself, however, and any relevant data, is missing.
Similarly, the Helpers Questionnaire, which is mentioned in the
documentation, seems to be missing. There is no separate data for
questions answered by helpers rather than by the subjects
themselves, and it appears that this data has been simply
integrated into the main data files at the initial data-recording
stage.
Tables 1 and 3 are related to each other, via the key fields
AREA, 2ndDigit, 3rdDigit, and 4thDigit, which make up the serial
number used to identify each person. Table 3 records details of
medicines for each elderly person interviewed, and so each serial
number in Table 3 refers back to a record in Table 1. There are,
however, three anomalies in Table 3 - where a serial number occurs
which has no corresponding record in Table 1. They are as
follows:
| Row # |
AREA |
2ndDigit |
3rdDigit |
4thDigit |
| 1539 |
5 |
1 |
1 |
9 |
| 1540 |
5 |
1 |
1 |
9 |
| 1708 |
6 |
1 |
3 |
0 |
It is probable, in these cases, that the anomalies are due to a
simple error in the field 4thDigit, as these records are similar in
other respects to previous or subsequent records.
Although Table 2 contains the same four key fields as Tables 1
and 3, it does not appear to have a direct relationship with either
of the other two tables, and so no link has been created. The data
in this table relates not to the elderly people, but to the doctors
themselves. |
|---|
| Transformation validation | The number of records in each file of the original transferred
dataset were compared to the number of records in each file that
was created after conversion. These corresponded exactly.
Similarly, the number of fields in each file of the original data
were compared to the number of fields in each file of the
transformed data, and found to correspond exactly. A number of
checks were carried out on the transformed data, and no
discrepancies were found (except those already listed under Content validation, which are anomalies
occurring in the original data.) The original data files were
opened in SPSS format in order to carry out some tests: some simple
queries were performed on a random selection of fields in the 3
tables of the original data files. These queries were repeated on
the transformed data and in all cases gave results which were
consistent with the original queries.
The document 'Full report of all variables', (Dataset Documentaton Catalogue,
reference CRDA/34/DD/4/1), contains, for most fields in Table 1,
handwritten totals for occurrences of each code value in each
field. As an extra check, these totals were compared with the
values calculated in SPSS, and with a few exceptions, were found to
match exactly. It is thought likely that where the two values do
not match, this is as a result of human error when adding up or
writing down the totals on the document. The anomalies are as
follows:
| Field |
Code value |
Total in documentation |
Total in SPSS |
| M78MaritalStatus |
1 |
6 |
66 |
| M81AgreementFrom |
1 |
604 |
594 |
| M81AgreementFrom |
2 |
180 |
182 |
| M81AgreementFrom |
x |
21 |
29 |
| M44DrugRecord |
4 |
0 |
1 |
| M44DrugRecord |
5 |
111 |
110 |
| Patient-GPTie |
2 |
29 |
28 |
| Patient-GPTie |
3 |
16 |
15 |
| Patient-GPTie |
5 |
17 |
18 |
| Patient-GPTie |
8 |
75 |
76 |
The document 'Full report of all variables', (Dataset Documentation Catalogue,
reference CRDA/34/DD/4/1), also contains handwritten totals for the
data in Table 2 (Cards 7 and 8). It is not possible to make any
comparisons between these figures and results obtained by our own
tests, however, as it appears that the handwritten totals add up to
805, whereas Table 2 only contains 401 records. It is not clear
from the documentation exactly what the totals represent.
Data from Table 3 is not described in 'Full report of all
variables', and so does not have handwritten totals for each field.
The documents 'Codes for pharmacist about each drug', and
'Medicines reported by doctors' do contain handwritten numbers, but
it is not clear what these represent. They do not appear to match
totals found in the data. |
|---|