Wednesday, 12 July 2017

A longer second look at the 2016 Census

Reader advisory!  
This post is mainly about pretty technical aspects of the Census.  I think it is however interesting background to the more overtly interesting stuff about the nature of folk in the area.


This post is mainly about the quality of the data in the 2016 Census.  Some issues are raised suggesting that there are a few problems with the data in our small area, particularly for number of dwellings and detailed characteristics of the population, but they are probably not such that the results are unusable.  It would however be useful to consider the impact of these findings on more detailed analysis and particularly any comparison with 2011 Census results.


A couple of weeks ago the ABS released the first batch of  'real' data from the 2016 Census.  (I am very cynical about the stuff they released earlier in the form of some strange averages.  That was for public relations purposes and not that useful for any analysis.)  The data was released through standard profiles and very helpfully went to quite small areas.

At that time I posted the outcome of my analysis, for the State Suburb of Carwoola, of the age distribution, person counts and dwelling counts.  My conclusion was that the first two were believable but the latter seemed to be rather low considering the area seems to be growing rather than shrinking.  Of course, if there is an issue with number of dwellings this implies that there are consequent issues with the people in them!

Last week the ABS released the first tranch of data for Table Builder, an on-line system that allows users to download information to fit their own table designs rather than the standard profile tables.  I rate this as a fantastic product since, in addition to the added flexibility it offers, I hate having to download umpteen standard tables when all I need is a simple tabulation of a couple of variables.

There is however an additional benefit in that Table Builder includes some information not included in the Profiles.  I am particularly pleased to see that information appears, for 2016 only, about the imputation of age (and a few other key variables).   Here is the basic 2016 Table Builder menu showing the Imputation Flag fields.

Very few people don't give an age when they complete the form so in effect this indicates person non-response: the collector has identified an Occupied Dwelling but a completed form has not been received.  Although I didn't complete the form on-line (apart from the chaos of the Census night, I'm not sure that was an option for people in caravan parks) I would assume that the on-line data entry system would have been intelligent enough not to allow the form to be submitted without an age.  This should further reduce the (already limited) scope for imputation being required, except for form non-response.

Let's move on to some results.


The first chart shows the age-imputation rates (ie number of records for which age was imputed as a percentage of the total number of records) for a hierarchy of areas (the "selected suburbs" are explained below).
It is interesting that NSW performs slightly better than Australia as a whole but I will pass over that for now.  My set of selected rural-residential State Suburbs perform a little worse than Queanbeyan-Palerang LGA as a whole. 
It is interesting that is NSW is split into Sydney and the rest the former has an imputation rate of 5.02% while the latter is at 5.89%.  For Victoria the contrast is even more evident: Melbourne 4.87%, Rest of the State 6.13%.  A more rigorous split into urban/rural is not possible until the full set of information is released later.

The next chart shows the individual State Suburbs of interest to me.  The three components of the Gazette catchment area are shown first, then Captains Flat (closely linked to the Gazette area) and finally the two more northerly Suburbs.
At first glance the folk of Hoskinstown are due to visit the Naughty Corner, while the denizens of Primrose Valley/Urila and Bywong get a large bouquet.


It is important to realise that data is imputed where:
  • the Collector assesses that :
    • a dwelling is on a property and 
    • was occupied on Census Night; and
  • a form was not received for that dwelling.
If the Collector doesn't realise that a dwelling exists, or if the collector realises that a dwelling exists but considers that it was unoccupied on Census Night then data will not be imputed for that dwelling,  This causes particular difficulties in cases such as the Widgiewa/Whiskers Creek Rds where it seems that the Collector didn't visit houses but simply left the material in letterboxes.  For example:
1.     if there is a dwelling but no letterbox no dwelling record will exist.  (That was the case for our place on Census Night.  I am told that a number of other houses don't have a letterbox as residents use PO Boxes in town close to their work.)
2.     if there is a letterbox a dwelling record will be created even if there isn't a dwelling and 
1.     if the property owner visits the area on (eg) the weekend and removes the census material from the box it will probably be recorded as an occupied dwelling (and thus records imputed) but
2.     if  the census material is not taken away from the box it will probably be recorded as an unoccupied dwelling (and person records not imputed) even if the dwelling is occupied but the occupier can't be bothered cleaning the crud out of the letterbox (because the useful stuff goes to their urban PO Box) 
The importance of imputation is made explicit in the report of the Census Independent Assurance Panel (CIAP) where it is shown (Table 3.2.2) that the final under-enumeration rate for the Census is 1.0% being a balance between a 4.3% gross undercount of people on Census forms, a 1.3% gross overcount of people on Census forms and a net overcount of 2.1% of persons imputed (there is obviously a rounding effect in that sum).

Back in the day (1996, 2001) when I worked on the Census the results of the PES gave something like 1.6% gross undercount and 0.1% gross overcount.  We used to contrast this with the USA (who ran a Census in those days) and had something of the order of 8.5% undercount and 6.9% overcount but still claimed a net underenumeration rate of 1.6%.  Obviously Australia has a way to go to get to the US situation, but its all downhill.

Table 3.2.2 also shows raw numbers as well as rates.  This shows that the net overcount of imputed (aka invented) people was 490,174.  Now the number of imputed age records for Australia given in Table Builder is 1,287,265.  Comparing those two values shows that 38% of imputed records were in error.  (A sensation-seeking journalist would add "an amazing" before the 38%!)  What is the problem?  

Again the CIAP Report is very helpful.  In the text of section 3.2.2 they give 4 situations which explain the over-imputation:
1.     non-responding private dwellings were incorrectly deemed to be occupied on Census night; or 
2.     too many people were imputed into a (non-reporting) private dwelling that was correctly deemed to be occupied (the report notes this to be a small contribution); 
3.     People were incorrectly imputed into non-private dwellings on Census night, due to either: 
1.     an overestimate of the Census night occupancy of non-private dwellings, or 
2.     because people were counted a second time on a form at their private dwelling residence. 
Noting the views of the CIAP regarding case 2, and noting that there are no significant non-private dwellings in the Stoney Creek area (as far as I am aware) we are left with case 1 as being a possible cause of over-imputation in this area.   This would fit well with the idea of weekenders, where the property was actually unoccupied on Census night but the form was removed from the letterbox when the owner visited on the weekend and thus person records were incorrectly imputed. 
However this may simply balance out – for persons - cases in which the collector didn’t identify dwellings that were occupied (and in which the occupants gave up trying to get a form or log-in credentials through the help line).  Thus:
  •      the number of persons may be not too bad an estimate but
  •            a higher than expected number of records may have “not stated” for detailed characteristics which are not imputed. 
  •     As indicated in my discussion of the profile data in my earlier post it still looks as though the number of dwellings is somewhat understated

At a more general level the observation that Capital Cities appear to have a lower imputation rate than the rest of their States is intriguing.  In terms of the reasons for over-imputation offered by CIAP I would have thought that non-private dwellings were generally more evident in the big cities than the rest of the State (and this will be checked).  It would thus seem that the impact of incorrectly identifying unoccupied private dwellings as occupied is largely a rural issue.  This suggests to me that the mail-out –internet back approach worked well in major urban centres, but the traditional drop-off collector follow-up approach has been less successful in the rural areas.

No comments: