Series title: Census of Canada
Documentation:
File type: Microdata | File size: 304,309 KB | Number of variables: 124 | Number of cases: 844,476
Access restrictions: DLI
Survey Date(s): 2006
Topics: Census / Education, training and learning / Labour / Ethnic diversity and immigration / Population and demography / Income, pensions, spending and wealth / Families, households and housing / Languages / Society and community
Smallest level of geographic coverage: Census Metropolitan Area
Geographic Coverage: Canada
This individuals file provides data on the characteristics of the population. The 2006 Census Public Use Microdata Files (PUMFs) contain samples of anonymous responses to the 2006 Census questionnaire. The files have been carefully scrutinized to ensure the complete confidentiality of the individual responses. The Individuals File will be available on March 4, 2010 and the hierarchical file will be available in the fall of 2010.
Microdata files are unique among census products in that they give users access to non-aggregated data. The PUMFs user can group and manipulate these variables to suit data and research requirements. Tabulations excluded from other census products can be created or relationships between variables can be analysed using different statistical tests. PUMFs provide quick access to a comprehensive social and economic database about Canada and its people.
Most of the subject matter covered by the census is included in the microdata files. To ensure the respondents' anonymity, geographic identifiers have been restricted to provinces/territories and large metropolitan areas.
With 123 variables, this comprehensive tool is excellent for policy analysts, pollsters, social researchers and anyone interested in modelling and performing statistical regression analysis using census data.
The 2006 Census Public Use Microdata Files (PUMFs) contain data based on a sample that represents approximately 2.7% of the population enumerated in the census. In order to protect the confidentiality of the information provided by individual respondents, special measures were taken.
Reduced level of detail: Data for small geographic areas are not available in this product. The user will find information only for the provinces, territories and selected census metropolitan areas. Further, the data have been aggregated to preserve confidentiality and to provide as much detail as possible in order to maintain the analytical value of the file.
Suppression of data: For selected variables, data were suppressed from some records. The code "Not available" was assigned to those records.
Rounding: Income data were rounded so as not to exceed pre-set upper and lower income limits.
Weighting: The microdata files contain a record for each selected unit in the sample. Each record contains a certain number of characteristics or variables. Thus, each of these units represents, on average, many other units that are not part of the sample. To represent all of these other units in the estimation process, a variable called "WEIGHT" (weighting factor) has been added to the files; it corresponds to the number of units (including the unit selected) represented by each record from the files. The weighting factor therefore indicates the number of times a record must be repeated to obtain population estimates.
The 2006 Census public use microdata file (PUMF) on individuals contains 844,476 records, representing 2.7% of the Canadian population. These records were drawn from a sample of one-fifth of the Canadian population (sample data from questionnaire 2B). The 2006 PUMF includes 123 variables. Of these, 102 variables, or 83%, come from the individual universe and 21 variables, or 17%, are drawn from the family, household and dwelling universes. The file does not include people living in institutions.
This user guide is divided into four chapters: Chapter 1 contains the record layout, an indispensable tool for using the file. Chapter 2 describes the variables contained in the file and indicates, for each variable, the number of the question from which it comes. Chapters 3 and 4 respectively deal with the sampling method and factors affecting data quality and reliability.
Since the 1971 Census, Statistics Canada has traditionally produced three public use microdata files: The Individual File, the Family File and the Household and Dwelling File. To meet users’ needs and allow international comparison of PUMFs, Statistics Canada has decided to produce two files for the 2006 Census: The Individual File and the Hierarchical File (summer 2010). The latter file will contain combined data from the family, household and dwelling universes.
Users wanting more details on the concepts and definitions of census variables can consult the 2006 Census Dictionary, online at http://www12.statcan.gc.ca/english/census06/reference/dictionary/index.cfm. Other information on the 2006 Census may also be obtained by contacting Statistics Canada’s regional reference centres, which are listed in the section entitled 'How to get help.'
It is important for Statistics Canada to protect the confidential information that it collects. Owing to the very nature of a microdata file, various actions are taken to fulfil this commitment.
The smallest geographic unit in the 2006 PUMF is the census metropolitan area (CMA). Data at the scale of geographic areas smaller than CMAs are not provided for this product. Also, the user will find that this product contains only information on the largest census metropolitan areas and the provinces. Yukon Territory, the Northwest Territories and Nunavut are grouped under the term 'Northern Canada.'
Furthermore, the data have been aggregated in such a way as to preserve confidentiality while, at the same time, providing as much detail as possible in order to maintain the analytical value of the file. For example, the data on occupation do not indicate 'physician', but rather the more general category 'occupations in medicine and health.' This category also includes other medical occupations, such as 'nurse.'
For a few records, the codes for certain variables were changed to indicate Not available, so as to guarantee data confidentiality. Users must make sure to exclude them from their calculations.
The PUMF contains lower and upper income limits. Thus, the data on total income and sources of income are adjusted proportionally.
The content of the 2006 PUMF is largely the same as that of the 2001 PUMF. However, various changes should be noted, resulting from new questions in the 2006 Census and more generally from improvement of the content of the file. Note that the 2006 PUMF does not contain, as in 2001, variables with two levels of content: more detailed content for Quebec, Ontario and the West and less detailed content for the Atlantic provinces and the territories. Because the duplication of variables did not entail an increase in content, duplicate variables were eliminated from the 2006 PUMF and replaced by a single variable with content for all of Canada.
New variables were inserted to reflect the content of the 2006 Census questionnaire.
Change in the content of certain variables
Other new variables in the 2006 PUMF
This chapter provides notes on the sampling method and the quality of the data related to the file. It includes the following sections:
In Section A, the target population is defined, and the way in which the sample was selected is explained. Section B covers the concept of weighting and briefly describes the usual estimators. Finally, Section C explains how to estimate sampling error and provides the guidelines for disseminating estimates.
The target population in the file includes all Canadian citizens and landed immigrants who have a usual place of residence in Canada or who are abroad, either on a military base or on a diplomatic mission. The file also includes data on non-permanent residents of Canada, that is, persons who hold a student authorization or an employment authorization or a minister's permit, or who are refugee claimants, and members of their family living with them.
The file excludes the following population groups: institutional residents, residents of incompletely enumerated Indian reserves or Indian settlements, and foreign residents (foreign diplomats, members of the armed forces of another country stationed in Canada and residents of another country temporarily visiting Canada).
The microdata sample for individuals is selected using a three-phase sampling plan. The first sampling phase consists of the sample of one-fifth of the population (20% sample data). This is a cluster sample. It consists of all households who completed the long questionnaire in the census. This sample was divided into two parts representative of Canada in order to create two sampling frames used to select the microdata samples. The first frame was used to select microdata from the individuals file. The second frame was used to select microdata from the hierarchical file. The third phase consisted in selecting records from the individuals file. The final sample represents 2.7% of the target universe.
In the 2006 Census, four out of five households were enumerated using a short questionnaire consisting of six questions of a demographic and linguistic nature. The remaining households received a questionnaire containing, in addition to the six questions on the short questionnaire, 45 other questions (some divided into sub-questions) covering a wide range of topics. These questions were supplemented by eight other questions on housing.
The first phase of sampling for the microdata file on individuals is the sampling of households that completed the long census questionnaire. This first phase of sampling is divided into two strata: the first (stratum consisting of canvasser areas) includes all households enumerated on Indian reserves and northern parts of Canada. (All households in these areas had to complete a long questionnaire by way of an in-home interview.) The second stratum consists of the sample of households (one household in five) selected systematically to respond to the long questionnaire. Each household is given a weighting factor by the census. This weighting factor ranges between 1 and 25, and is not necessarily a whole number.
Each household may thus represent a number of Canadian households. Only records that belong to the target population are included in the first-phase sample.
To create the sampling frame for the sample of individuals, the households in the first-phase sample were divided into two portions. These households were then sorted by province of residence, type of household (private or collective), number of usual residents in household and dissemination area. After this sorting, households were separated according to rank parity.
The third phase of sampling is the selection of the sample of individuals. This sample was drawn from one of the portions created in the second phase. It was selected in proportion to the first-phase weighting factors, which were then doubled to take into account the division of the file into two portions.
Since the objective is to have a self-weighted sample making up 2.7% of the target universe, individuals are selected systematically, in proportion to twice their weighting factor, with a sampling interval of 37. It is important to note that the final result is not a self-weighted sample. This is explained in Section A. 2. (c) below.
Before the sample is selected, the records are sorted according to certain variables to ensure that the sample is properly representative. These variables are:
The sample is selected systematically using a sampling interval of 37 and a random start between 1 and 37. The probability of selecting a record is proportional to twice its selection weighting factor determined during the first phase of sampling. To be more precise, the weighting factor of the first individual in the database is doubled, and this figure is added to the random start. The sum obtained is compared to the sampling interval; if it is at least as large as the latter, the individual is selected; otherwise, we move on to the next individual, doubling of his or her weighting factor and adding it to the previous sum. The result is again compared to the sampling interval. When an individual is selected, we subtract the sampling interval from the cumulative total before selecting another individual. The sample size is equal to 2.7% of the target population. The file contains 844,476 records. So that the sum of all weighting factors of selected records would yield the published number of individuals in the target universe, we made a slight adjustment. As a result, each record has a weighting factor of 36.99457415.
The microdata file contains a record for each selected unit in the sample. Each record contains a certain number of characteristics or variables described in Chapter 2. Therefore, each of these units represents, on average, many other units that are not part of the sample. To represent all these other units in the estimation process, the file contains a variable called 'WEIGHT' (weighting factor for individuals), which corresponds to the number of units (including the selected unit) represented by each record in the file. WEIGHT still has the same value: 36.99457415.
The weighting factor therefore indicates the number of times a record must be repeated to obtain population estimates. For example, to estimate the number of persons who speak Chinese at home in Canada in the target universe, it is necessary to total the weighting factors of all records belonging to this category in the file.
Note: Users must refrain from publishing unweighted tables and from conducting analyses based on unweighted data from the microdata file. They must also make sure to exclude from their calculations all values that are unavailable or not applicable.
The microdata file contains two types of variables: numeric variables such as income, and nominal variables such as mother tongue. The estimators often used for the two types of variables are:
At the sample level, a total for one area is obtained by counting the 'units' that have the characteristics sought in the area.
The total at the population level is obtained by summing the weighting factors of all the records having the characteristic(s) sought in the area.
The object is to estimate the total number of women aged 25 and over, living in Edmonton, and whose highest level of schooling was a master's degree or a doctorate. We need to find the number of records in the file for which: CMA = 835, SEX = 1, (AGEGRP = 9 and AGEGRP ^= 88) and HDGREE = 12 or 13 and total the WEIGHT variable over all these records. We accordingly obtain a total of 370 records that meet all of these conditions. Consequently, the result is 13,688.
A proportion can be defined as the ratio of two totals. The estimate of a proportion is obtained by first calculating the total number of 'units' in the sample that have the characteristic(s) sought and then dividing it by the total number of sample units on which we want to base the estimate. Note that the denominator may represent all the individuals in a geographic area or a subset of individuals within a geographic area.
We want to estimate the proportion of individuals living in the Montréal census metropolitan area (CMA) who are immigrants. In this case, the total in the numerator is the sum of the weighting factors of records in the sample for which the immigrant status indicator is 'immigrant' in the Montréal CMA; in other words, WEIGHT is totalled for the records for which: IMMSTAT = 3 and CMA = 462. This number is then divided by the total in the denominator, which is the number of individuals in the Montréal CMA, that is, by the sum of WEIGHT for records such that CMA = 462. This yields the following proportion: 738,671 / 3,585,699 = 0.2060 meaning that just over 20% of the individuals in the Montréal CMA are immigrants. Thus, in this example, the total in the denominator is based on the total number of individuals in a geographic area.
We want to estimate the following proportion: out of all males aged 20 to 44 living in the Vancouver CMA, the proportion whose legal marital status is 'divorced.' In this case, the total in the numerator is the number of individuals living in the Vancouver CMA who are male, aged 20 to 44 and divorced, that is, the sum of the WEIGHT variable for records for which: CMA = 933, SEX = 2, 8 = AGEGRP = 12 and MARST = 1. This total is then divided by the denominator, which is the sum of WEIGHT for all individuals residing in the Vancouver CMA who are male and aged 20 to 44, that is, the sum of WEIGHT for records for which CMA = 933, SEX = 2, 8 = AGEGRP = 12. From this we obtain: 12,800 / 380,859 = 0.0336 meaning that approximately 3.4% of males aged 20 to 44 in Vancouver are divorced. Thus, in this example, the total in the denominator is based on a subset of records in a geographic area.
The estimate of a ratio can be defined as the ratio of two totals or two proportions. To estimate the ratio of two totals, simply obtain the totals to appear respectively in the numerator and the denominator and divide one by the other. To estimate the ratio of two proportions, simply obtain the proportions to be used respectively in the numerator and the denominator and divide one by the other.
At the population level, a total for one area or for a subset of individuals within an area is obtained by first identifying the records targeted by the area or by the subset. WEIGHT is then multiplied by the value of the variable for each unit, and the results are totalled.
To estimate the average of a variable in a given geographic area, WEIGHT is multiplied by the given value of the variable for the sample records that belong to the area, the results are totalled, and the total is divided by the sum of the WEIGHT values for the sample units in the area. It is possible that we will want to estimate the average of a variable for a subset of individuals in a given area. In this case, it is necessary to multiply WEIGHT by the given value of the variable for the sample records that belong to the subset in question, total the results and divide this total by the sum of the WEIGHT values for the sample units that are in the same subset.
We want to estimate the average total income of women aged 15 years and over living in Ontario who have an income. In the numerator, WEIGHT is multiplied by the value of the 'total income' variable (TOTINC ^= 8,888,888, TOTINC ^= 9,999,999, TOTINC ^= 0) for each female individual (SEX = 1) aged 15 or over (AGEGRP = 6, AGEGRP ^= 88) in the province of Ontario (PR = 35); the results are then totalled, and the total is divided by the sum of WEIGHT for female individuals 15 years of age and over in Ontario, that is, for all records in the file for which SEX = 1 (AGEGRP = 6, AGRGRP ^= 88) and PR = 35. The result obtained is: $146,041,760,309 / 4,789,688 = $30,490.87 meaning that the average total income of women aged 15 and over living in Ontario who have an income is around $30,490.
The estimate of a ratio may be defined as the ratio of two totals or two averages. To estimate the ratio of two totals, simply obtain the totals to appear respectively in the numerator and the denominator and divide one by the other. To estimate the ratio of two averages, simply obtain the averages to be used respectively in the numerator and the denominator and divide one by the other.
As the microdata file covers a sample of 'units' in the census sample, there is not necessarily complete agreement between the estimates established from the file and the results based on the population as a whole. The observed difference is attributable to two types of intrinsic errors: sampling errors and non-sampling errors.
The sampling error is an error attributable to the fact that the study covers only a fraction of the population. Different samples would have yielded different estimates. In general, these differences are represented by the sampling variability. The procedure for estimating the sampling variability is described in the next section.
The 'coefficient of variation' is a measure frequently used to determine the degree of sampling variability. This is simply the relationship of the standard error of an estimate to the value of that estimate or, in other words, the standard error expressed as a percentage of the targeted estimate.
The sampling plan must be taken into account in computing the sampling error. The Individuals File does not contain all the necessary information. In order to estimate this sampling error, we propose an approximate method called the 'random groups method.' This method, which is described in detail in Chapter 2 of the book Introduction to Variance Estimation (Wolter, K. M., Introduction to Variance Estimation, Springer Series in Statistics, Springer-Verlag, New York, 1985.) is easy to apply. One of its features is that it tends to overestimate the sampling error for small estimates. This results in a conservative procedure for testing significant differences.
The principle is as follows: the sample was divided into eight replicates, each representative of the sample. These replicates or portions are defined by their weighting factors, WT1, WT2, … , WT8, for example, the fourth replicate is the set of records for which WT4 is greater than 0. The values for a given replicate weighting factor is 0 if the record is not part of the replicate for this factor or 8 * WEIGHT (eight times the value of the weighting factor).
After calculating the desired estimate with all records as in Section B.2, the following calculations are required:
We want to find the coefficient of variation of the estimate obtained in example 1. We found that there were 13,688 women aged 25 years and over living in Edmonton, for whom the highest level of schooling attained is a master's degree or a doctorate. (see example online, page 97)
We want to find the coefficient of variation of the estimate obtained in example 2. We found that 20% of the individuals in the Montréal CMA are immigrants. (see example online, page 98)
We want to find the coefficient of variation of the estimate obtained in example 3. We found that 3.4% of males aged 20 to 44 in Vancouver are divorced. The different estimates by replicate are: (see example online, page 98)
We want to find the coefficient of variation of the estimate obtained in example 4. We found that the average total income of females aged 15 and over living in Ontario who have income is around $30,490. The different estimates by replicate are: (see example online, page 99)
We will give an example of a SAS code for producing coefficients of variation. Assume that you want to create a multi-dimensional data table for which you wish to obtain a coefficient of variation for the estimates found in each cell. For example, you want to have a table giving the average total income of single persons whose income is not nil, broken down by visible minority status and sex.
(see example online, pages 100-103)
Category | Alphabetic code | Coefficient of variation (%) | Recommendation |
---|---|---|---|
Unrestricted | A B C D E | 0.0 – 1.0 1.0 – 2.5 2.5 – 5.0 5.0 – 10.0 10.0 – 16.5 | The estimates may be included in a general release without restriction. The letter A indicates that the estimate is very reliable. The letter B indicates that the estimate is reliable, but less so than one from category A, and so on. |
Restricted | F G | 16.5 – 25.0 25.0 – 33.3 | The estimates are sufficiently reliable for specific purposes, but must be used with caution. When these estimates are used, it is preferable to point out that their sampling variability is higher. |
Not to be released | Over 33.3 | If the value obtained is lower than the value shown in column G, it is preferable not to release these estimates. It is recommended that they be removed from the statistical tables. |
We will give an example of a Stata code for producing coefficients of variation. Assume that you want to create a multi-dimensional data table for which you wish to obtain a coefficient of variation for the estimates found in each cell. For example, you want to have a table giving the average total income of single persons whose income is not nil, broken down by visible minority status and sex.
(see example online, pages 105-106)
Sampling error is only one of the components of a survey’s total error. Non-sampling error may also contribute to the total error. This type of error is introduced, for example, when imputing data referring to cases of non-response or of obvious reporting errors (response error), when a person is missed or counted more than once (coverage error), or at the time of coding or data capture (processing error). Furthermore, some measures, such as changing the codes of a few variables to 'Not available' for certain records are necessary to comply with the confidentiality criteria. Measurements of sampling variability studied in the preceding sections take into account only observed variability in census data. Therefore, they do not reflect inaccuracies introduced into the census data and the sample by non-sampling error, and by measures taken to meet the confidentiality criteria.
Users should be aware that the limits of census geographic areas are subject to change from one census to the next. Therefore, when using data from two or more censuses, users must be aware of, and take into consideration, any changes to the geographic boundaries and/or the conceptual definition of the areas being compared. Users wishing to obtain additional information in this regard should refer to the following electronic reference tool: GeoSuite, 2006 Census, Catalogue no. 92-150-XCB.
The population counts shown here for a particular area represent the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians staying in a dwelling in that area on Census Day and having no usual place of residence elsewhere in Canada, as well as persons considered as 'non-permanent residents' (see Section C below). In most areas, there is little difference between the number of usual residents and the number of people staying in the area on Census Day. For certain places, however, such as tourist or vacation areas, or areas including large work camps, the number of people staying in the area at any particular time could significantly exceed the number of usual residents shown here.
Data on the population of non-permanent residents in Canada are derived from the answers given to the questions on citizenship and landed immigrant status. Non-permanent residents are persons who are not Canadian citizens by birth (Question 10) and who answered 'No' to the question on landed immigrant status (Question 11).
In all population censuses since 1991, both permanent and non-permanent residents were enumerated. Non-permanent residents are persons who held an employment authorization or a student authorization or were refugee claimants at the time of the census. Family members living with them were also included in the non-permanent resident category.
In the 1991, 1996 and 2001 censuses, non-permanent residents also included persons having a ministerial permit; this permit was eliminated by Citizenship and Immigration Canada before the 2006 Census.
Before 1991, only permanent residents of Canada were included in the census. Non-permanent residents were considered foreign residents and were not enumerated. (The 1941 Census is the only exception.)
Today in Canada, non-permanent residents make up a significant segment of the population, especially in several census metropolitan areas. Their presence can affect the demand for such government services as health care, education, employment programs and language training. The inclusion of non-permanent residents in the census facilitates comparisons with provincial and territorial statistics (marriages, divorces, births and deaths) which include this population. Furthermore, enumerating non-permanent residents enables Canada to better reflect the United Nations (UN) recommendation that long-term residents (persons living in a country for more than one year) should be enumerated in the census.
According to the 1996 Census, there were 166,715 non-permanent residents in Canada, representing 0.6% of the total population. There were more non-permanent residents in Canada at the time of the 2001 Census: 198,640 non-permanent residents or 0.7% of the total population. The 2006 Census enumerated 265,356 non-permanent residents, constituting 0.8% of the total population. The number of non-permanent residents has grown steadily from one census to another.
It should be noted, however, that while every attempt has been made to enumerate non-permanent residents, factors such as language barriers, reluctance to complete a government form or difficulty understanding the need to participate may have affected the enumeration of this population and resulted in undercounting.
In the 1991 Census and previous censuses, the Aboriginal population was determined using the ethnic origin question, based primarily on the ancestry dimension. Again in 1996, respondents could report their Aboriginal ethnic origin or ancestry. However, a new question was included in the questionnaire for the 1996 Census. That question, which concerned self-reporting of Aboriginal ancestry, enabled respondents who identified with at least one Aboriginal group (North American Indian, Métis or Inuit) to define themselves as 'Aboriginal.' The same question was asked in the 2001 and 2006 censuses.
It is important to note that the 2001 and 2006 data on the self-reported Aboriginal population are not comparable with either the 1991, 1996, 2001 or 2006 ethnic origin or ancestry figures. The concepts underlying these figures are very different. For example, some persons who have Aboriginal ancestors do not see themselves as Aboriginal (and vice versa).
In order to protect the confidentiality of data in the 2006 Public Use Microdata File (PUMF), the 'Rented' and 'Band housing' categories have been combined as in the 1996 and 1991 PUMFs. Furthermore, gross rent data for individuals living in Band housing have been imputed to prevent inadvertent disclosure of individual information.
Users should use caution when using housing and shelter cost data for analyses focused entirely or largely on the Aboriginal population.
The NAICS (North American Industry Classification System) 2002 is a revision of the NAICS 1997. Industry data in the 2001 Census were produced using the NAICS 1997. To compare the industry coded to the NAICS 2002 with data coded to the NAICS 1997, use the Industry (historical) variable. See the 2006 Census Dictionary, Catalogue no. 92-566-XWE.
The 2006 data on industry can be totalled for various populations, of which the most often used are:
The other members of the labour force, namely unemployed persons having worked before January 1, 2005 or who have never worked, are classified in the category 'Industry – Not applicable.'
Insofar as possible, responses given to questions on industry were coded using a precoded list of establishments, to ensure uniformity with the NAICS codes assigned to the same establishments in other Statistics Canada surveys.
A comparison of industry data according to the NAICS 2002 is also available from the Labour Force Survey. For more information about the LFS, please consult the Guide to the Labour Force Survey, catalogue No. 71-543-G. For further information about census data on labour force activity, please contact the census labour market analysts.
The National Occupational Classification for Statistics 2006 (NOC–S 2006) is a minor update to the NOC–S 2001 used in classifying data from the 2001 Census. The purpose of this update was to add new occupation titles that had come into use in the intervening years. No structural change was made. Data from the NOC–S 2006 are directly comparable with 2001 Census data drawn from the NOC–S 2001.
Occupational data from the 1991 and 1996 censuses were produced using the Standard Occupational Classification (SOC) 1991. To compare the occupational data coded to the NOC–S 2006 with data coded to the SOC 1991, it is necessary to use the Occupation (historical) variable.
Occupational data drawn from the 2006 Census can be totalled for various populations, of which the most often used are:
The other members of the labour force, namely unemployed persons having worked before January 1, 2005 or who have never worked, are classified in the category 'Occupation – Not applicable.'
If the respondent did not indicate his or her occupation or provide enough details to allow coding, a computer-generated NOC–S 2006 code was assigned based on other economic and demographic information provided by the respondent.
Human Resources and Social Development Canada classifies occupational data according to the National Occupational Classification 2006 (NOC 2006). The structure of this classification is similar to that of the National Occupational Classification for Statistics 2006 (NOC–S 2006). These two classifications share 520 unit groups, 140 sub-groups and 10 major categories. The sub-groups make up respectively 47 major groups in the NOC–S 2006 and 26 major groups in the NOC 2006. Occupational data from the 2006 Census can be obtained coded both to the NOC–S 2006 and to the NOC 2006.
The 2006 Census collected income information from all individuals 15 years and over in private households and from non-institutional residents of collective households. The family and household income statistics shown for individuals in this file are for those in private households only.
Census income statistics are subject to sampling variability. Although such sampling variability may be quite small for large population groups, its effects cannot be ignored in the case of very small subgroups of population in an area or in a particular category. This is because, all other things being equal, the larger the sample size, the smaller the error. For this reason, published income data for areas below the provincial level, where the non-institutional population was less than 250 or the number of households was less than 40, have been suppressed. The users of this microdata file are strongly advised to exercise caution in the interpretation of statistics based on relatively small totals.
In 2006, for the first time Canadians had the option of granting permission to retrieve income information directly from their tax records. This reduced respondent burden and improved the quality of the income data. Those who did not select this option were required to provide the income information on the paper form or via the Internet. This change, as well as the modified privacy protection methods described in the next section reduce substantially the direct comparability of some estimates derived from the 2001 and 2006 PUMF.
All users should be aware of the rounding and replacement of extreme values described in the following section. Users interested in comparisons between censuses are advised to consult the section on Data quality in the Income and Earnings Reference Guide, 2006 Census Catalogue no. 97-563-GWE2006003 (http://www12.statcan.gc.ca/census-recensement/2006/ref/rp-guides/income-revenu-eng.cfm).
In planning this microdata file, it was deemed essential to utilize procedures to guard against the possibility of associating a particular income with an identifiable individual, family or household. To accomplish this, the incomes of individuals selected for this microdata file were subjected to the following rounding and adjustment procedure.
Income and shelter costs values were rounded and top coded to reinforce the confidentiality of the data. The method however minimized the impact on quality.
First, since a large portion of all income sources are from taxation files, it was necessary to round all values. Some were randomly rounded with a base of 100, they are INVST, RETIR, CHDBN, CQPPB, GOVTI, GTRFS, OASGI and EICBN; the others with a base of 1,000, they are TOTINC, WAGES, SEMPI, OTINC, TOTINC_AT, EMPIN, INCTAX, MRKINC. Moreover, if a value of any source was higher than 100,000, the rounding base used was 10,000. If a value was rounded to 0, the value 1 was assigned in order to maintain the applicability condition for income sources. Since the rounding was random, some relations within income sources are no longer valid. However, this rounding technique maintains the statistical nature of the data. The rounding base for the VALUE values was set to 10,000, and to 100 for the variables OMP and RENT.
Second, large income sources and shelter costs were top coded to eliminate all possibility of disclosure. The values greater than the 90th percentile in each geographical region of shelter costs values were top coded. They are VALUE, OMP and RENT. The top code was set to the average of the top coded values within every geographic region. Thus, if one sums all values of a variable in a geographic area, one obtains the same sum as if no top coding was done on the data. For income sources, the same technique was used but only with values exceeding the 99th percentile and has taken into account the gender of the person. Some supplementary top coding was necessary to eliminate the possibility of residual disclosure. Also, some negative values were down coded using the standard method, that is the negative values lower than a threshold were down coded. The down coded value is the threshold.
The number of records affected by this procedure and its impact on individual income are summarized in the following Tables 1A-K, 2 and 3.
Table 1 provides a description of the limits imposed by confidentiality considerations.
Tables 2 and 3 provide comparative assessments of estimates from the Census master file and the Public Use Microdata File.
Table 2 provides the number of recipients and aggregate income received by source and Table 3 provides employment income distributional statistics by all geographies available on the Public Use Microdata File.
Table 1 Percentage distribution of individuals 15 years of age and over, with income, by 2005 income size groups, Canada, Census and PUMF (Individuals), 2006 Census
Table 2 Comparison of estimates by income source, Canada, census and PUMF (individuals), 2005
Table 3 Comparison of employment income estimates, by PUMF geographies, census and PUMF (individuals), 2005
(see online, pages 128-135)
Changes have been made in the language classification used in our products. In this appendix, the 2006, 2001, and 1996 classifications are compared.
Please note that in the second part of the questions on home language and language of work, the respondent had the option of marking the 'No' circle to indicate that there was no other language used on a regular basis.
The individual categories used in 2006 do not always match those used in 2001 and 1996. In most cases, however, the corresponding number can be obtained by adding all members of the language family.
(see online, pages 136-139)
(see online, pages 140-147)
For further information on census definitions, concepts and questions, PUMF users are asked to consult the reference guides and technical reports on the 2006 Census at http://www12.statcan.gc.ca/censusrecensement/2006/ref/rp-guides/index-eng.cfm.
Statistics Canada. Families Reference Guide, 2006 Census, Catalogue no. 97 553-GWE2006003.
Statistics Canada. Place of Birth, Generation Status, Citizenship and Immigration Reference Guide, 2006 Census, Catalogue no. 97-557-GWE2006003.
Statistics Canada. Languages Reference Guide, 2006 Census, Catalogue no. 97-555-GWE2006003.
Statistics Canada. Journey to Work Reference Guide, 2006 Census, Catalogue no. 97-561-GWE2006003.
Statistics Canada. Housing and Dwelling Characteristics Reference Guide, 2006 Census, Catalogue no. 97-554-GWE2006003.
Statistics Canada. Visible Minority Population and Population Group Reference Guide, 2006 Census, Catalogue no. 97-562-GWE2006003.
Statistics Canada. Mobility and Migration Reference Guide, 2006 Census, Catalogue no. 97-556-GWE2006003.
Statistics Canada. Ethnic Origin Reference Guide, 2006 Census, Catalogue no. 97-562-GWE2006025.
Statistics Canada. Income and Earnings Reference Guide, 2006 Census, Catalogue no. 97-563-GWE2006003.
Statistics Canada. Education Reference Guide, 2006 Census, Catalogue no. 97-560-GWE2006003.
Statistics Canada. Labour Market Activity and Unpaid Work Reference Guide, 2006 Census, Catalogue no. 97-559-GWE2006003.
Statistics Canada. Aboriginal Peoples Technical Report, 2006 Census, Catalogue no. 92-569-XWE.
The Advisory Services Division of Statistics Canada provides an information dissemination network across the country through eight regional reference centres.
Advisory services can help you identify your informational needs, identify sources of available data, consolidate and integrate data from different sources, develop profiles, provide analysis of highlights or tendencies and, finally, provide training on products, services, Statistics Canada concepts and the use of statistical data.
For more information, call the toll-free line listed below or send an e-mail to infostats@statcan.gc.ca.
E-mail: infostats@statcan.gc.ca
Telephone (Canada and the United States only):
Telephone (outside Canada and the United Sates):
Statistical Reference Centre (National Capital Region)
Rm. 1500, Main Building
Holland Avenue
OTTAWA, Ontario
K1A 0T6
Atlantic advisory services: Serving the provinces of Newfoundland and Labrador, Nova Scotia, Prince Edward Island and New Brunswick.
Atlantic Advisory Services
Statistics Canada.
2nd Floor, Box 11
1741 Brunswick Street
Halifax, Nova Scotia B3J 3X8
Toll-free number: 1-800-263-1136