Guide to the Census of Population, 2021
Appendix 1.7 – Use of administrative data to impute non‑responding households in areas with low response rates

Introduction

Several adaptations were implemented in the 2021 Census collection plan to mitigate the impact of the COVID-19 pandemic and other potential risks. Statistics Canada proactively developed a statistical contingency plan based on the secure, responsible, and appropriate use of administrative data to support the 2021 Census in the event of disruptions to census collection. Because census data are often the only source of information for some sub-populations and small areas, it was important to maximize data quality in locations with low response rates to meet the census goals of producing high-quality data for small levels of geography. Linked administrative data from federal and provincial data sources were used to improve the imputation of non‑responding households at the imputation stage after 2021 Census collection ended. This was done in areas where response rates were low and for dwellings where good quality administrative data were available. Despite a successful overall census enumeration in 2021, some areas of the country had lower-than-expected response rates. This imputation plan contributed to a quality improvement in population and dwelling counts.

Objectives for the imputation plan

Imputation is a statistical method aimed at reducing bias introduced by non-response. This is achieved by identifying people or households that have characteristics similar to the incomplete record and substituting their values to fill in the missing or erroneous responses. The imputation plan was designed to use linked administrative data in a secure, responsible, and appropriate way to impute the census data for non-responding households in areas with low response rates, and for dwellings where good quality administrative data were available. The objective was to ensure high-quality population and dwelling counts in areas where census collection was affected by COVID-19, a natural disaster, or low response rates, and to apply the strategy consistently and equitably across the regions of Canada.

Development of the imputation plan

Statistics Canada has many years of experience using administrative data to ensure the high quality that Canadians expect from census data. For example, Statistics Canada has been using administrative data from the Canada Revenue Agency (CRA) since 2006 to improve the accuracy of income data. More recently, Statistics Canada used administrative data to enumerate the Wood Buffalo census subdivision and its municipality of Fort McMurray, Alberta, during the 2016 Census despite residents being evacuated because of a wildfire. In the preparations leading up to the 2021 Census, Statistics Canada conducted a study to simulate a statistical contingency plan using 2016 Census data. The research evaluated using administrative data to impute some non-responding households, compared with using the usual donor imputation method for all non-responding households. For this research, statistical models were developed to derive a “household” from administrative data and to compare the quality of those households with true census households.

Results of the administrative data imputation tests

Results showed that using administrative data to impute some non-responding households improved the quality of the data for key population and demographic indicators, compared with the usual donor imputation method. For example, whole household imputation (WHI) using administrative data improved the quality of the population and dwelling counts by age, by sex at birth, and at various levels of geography compared with traditional WHI, which is based on donor imputation. The gains in data quality from administrative data use were more pronounced in geographic areas with response rates below 90%. Good quality administrative data, measured based on their predicted consistency with census responses, were available for about half of non‑responding households in the simulation.

Implementing the administrative data imputation plan

Based on evidence from research and tests of the imputation plan, Statistics Canada determined that linked administrative data should be used to support the traditional WHI method in some circumstances. Linked administrative data were used to maximize data quality for non-responding households and implemented according to the following criteria:

The calculations used to determine the number of usual residents were also based on distributions by household size from the Dwelling Classification Survey (DCS) in mail-out and list-leave areas. Otherwise, the number of usual residents is determined through donor imputation. Similar to households who responded to the census, imputed households were linked to tax data from the CRA to obtain income characteristics.

Information about the imputation plan and the use of administrative data to support the census was posted on our website. The plan was also included in the Supplement to Statistics Canada’s Generic Privacy Impact Assessment related to the 2021 Census of Population in March 2021.

Administrative Data

The administrative data used to impute non-responding households came from federal and provincial data sources already provided to Statistics Canada, such as data from the CRA; Immigration, Refugees, and Citizenship Canada; provincial vital statistics files (births and deaths); provincial driver’s licence files; and the Indian Register.Note 1  When no direct response was received for a dwelling in an area with low response rates, good quality administrative data were used to impute variables such as date of birth, sex at birth, and the number of usual residents at the dwelling.

Reference date

For the 2021 Census, the reference date for data reporting is May 11, 2021. For non-responding households imputed using administrative data, various administrative data sources were used with a reference date as close as possible to May 11, 2021, to simulate a response on Census Day.

Scope of the imputation plan

The 2021 Census had a successful enumeration, with 98% of Canadians responding to the census. However, because some localized areas of the country showed response rates well below the national rate, administrative data were used to support the imputation of non-responding households in these areas. About 1,045 collection units (out of about 49,000 in Canada) showed a response rate below 90%, had good quality administrative data, and were therefore in-scope for this imputation plan. Approximately 12,000 non-responding households were imputed using administrative data, representing less than 0.1% of occupied private dwellings in Canada. The imputation plan used data already provided to Statistics Canada, and meets the highest standards of privacy, confidentiality and data security.

Data quality

For population and dwelling counts in areas where the administrative data imputation plan was implemented, the data went through the same quality assessments, validation and certification as the overall census data. These additional steps were taken to ensure that the population and dwelling counts in areas where administrative data were used for imputation provided data that meet the same high standards of data quality expected of all census data. All census variables from both the short-form and long-form census were carefully validated. For each census question, the combined imputation rate from both administrative and traditional donor sources will be reported at various levels of geography (see Chapter 9 on data quality evaluation).


Date modified: