AUBER Webinar: Differential Privacy and the U.S. Census
By James Donahue
In January, AUBER hosted its first webinar of 2020 titled “Census Differential Privacy: Is Your State and Your Economic Research Center Ready” With 40+ participants signing in from locations across the U.S., the webinar featured presentations from Michael Hawes, Senior Advisor for Data Access and Privacy at the U.S. Census Bureau, and Elizabeth Garner, the State Demographer from Colorado Department of Local Affairs. Following is an article discussing the issues presented in the webinar.
In the age of modern supercomputers that can re-identify individuals based on certain census responses, the U.S. Census Bureau is faced with solving a technological and mathematical dilemma. The issue of re-identification, though somewhat present in the preceding decennial Censuses, poses a novel and complex problem with solutions that either compromise respondents’ identity or render research nonviable. To address the issue, Title 13 of the U.S. code was enacted to transition the U.S. Census Bureau’s obligation of privacy of its respondents from an ethical matter to a legal one.
Privacy advocates argue that the Census Bureau is legally obligated to operate under more rigorous concealments given the rise of adversarial supercomputing. Title 13 stipulates that, “Private information cannot be published. It is against the law to disclose or publish any private information that identifies an individual or business.” In the webinar, Michael Hawes discussed Title 13 of the U.S. Code on behalf of the Census Bureau, emphasizing that “the protection of confidentiality is a core integral component of our organization … We take [it] very seriously.” Despite stringent precautionary measures taken in the 2010 Census, Hawes revealed that internal Census experiments could re-identify participants based only on some, not all, responses found in publicly available data.
The modern implementation of Title 13 gave way to a new school of thought on data privacy known as differential privacy. Differential privacy allows the Census to be selective in what they release about respondents and also offers up the freedom to reorganize responses altogether. Moreover, the rise of differential privacy has allowed the Census to quantify the level of privacy that is lost in the statistics that the Bureau releases. This parameter for determining the loss of privacy is denoted by the Greek letter “epsilon,” or ε. Generally, ε is be a real value in between 1/1000 and 1, with a higher value indicating that individuals could be re-identified with ease. Thus, the Census’ solution is to reevaluate which statistics are released across specific geographies based on an ε that they will decide upon in the coming months as Census response data is collected, analyzed, and released.
One example of differential privacy is a relocation of households to different locations in order to fool potential data adversaries; individuals that live in a specific city or county could be moved to an entirely different city or county to make re-identification more challenging. In the AUBER webinar, Elizabeth Garner, the State Demographer with the Colorado Department of Local Affairs, compared the 2010 Census release to a 2010 version using differential privacy that was provided by the U.S. Census Bureau. Garner observed a lack of consistency in the differential privacy file between population, age, housing units, and households. Garner stated that less populated counties in Colorado gain an advantage applying differential privacy: “Larger counties are losing population, while smaller, rural counties are gaining population.” Because the U.S. Census population numbers are used to distribute funding and create representation in the State House of Representatives, larger counties could see a dilution in their funding and voting power due to lower population counts.
The main drawbacks with the proliferation of release selectivity are two-fold. The first: economic researchers could be restricted in their study of population data over time. With the 2020 Census possibly construing population counts purposely, researchers will have increased difficulties in evaluating population and demographic characteristics over a specific time series. Moreover, the statistical integrity of data sets is placed into question as demographers and economists alike are unable to examine data effectively. The second issue relates to policy. As mentioned above, Colorado uses census data to allot representation in the State House of Representatives, thus, biasing the data for rural counties. However, representation is not the only issue that arises from deliberate miscounts. State aid dollars could be disproportionately allocated to less populated areas as a result of the upward skew in their population figures.
James Donahue is a student research assistant at the Business Research Division, University of Colorado Boulder