rss_2.0Proceedings on Privacy Enhancing Technologies FeedSciendo RSS Feed for Proceedings on Privacy Enhancing Technologies on Privacy Enhancing Technologies 's Cover Utility and Privacy of Demographic Data in Education Technology by Causal Analysis and Adversarial-Censoring<abstract> <title style='display:none'>Abstract</title> <p>Education technologies (EdTech) are becoming pervasive due to their cost-effectiveness, accessibility, and scalability. They also experienced accelerated market growth during the recent pandemic. EdTech collects massive amounts of students’ behavioral and (sensitive) demographic data, often justified by the potential to help students by personalizing education. Researchers voiced concerns regarding privacy and data abuses (e.g., targeted advertising) in the absence of clearly defined data collection and sharing policies. However, technical contributions to alleviating students’ privacy risks have been scarce. In this paper, we argue against collecting demographic data by showing that gender—a widely used demographic feature—does not <italic>causally</italic> affect students’ course performance: arguably the most popular target of predictive models. Then, we show that gender can be inferred from behavioral data; thus, simply leaving them out does not protect students’ privacy. Combining a feature selection mechanism with an adversarial censoring technique, we propose a novel approach to create a ‘private’ version of a dataset comprising of fewer features that predict the target without revealing the gender, and are interpretive. We conduct comprehensive experiments on a public dataset to demonstrate the robustness and generalizability of our mechanism.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Revisiting Identification Issues in GDPR ‘Right Of Access’ Policies: A Technical and Longitudinal Analysis<abstract> <title style='display:none'>Abstract</title> <p>Several data protection regulations permit individuals to request all personal information that an organization holds about them by utilizing Subject Access Requests (SARs). Prior work has observed the identification process of such requests, demonstrating weak policies that are vulnerable to potential data breaches. In this paper, we analyze and compare prior work in terms of methodologies, requested identification credentials and threat models in the context of privacy and cybersecurity. Furthermore, we have devised a longitudinal study in which we examine the impact of responsible disclosures by re-evaluating the SAR authentication processes of 40 organizations after they had two years to improve their policies. Here, we demonstrate that 53% of the previously vulnerable organizations have not corrected their policy and an additional 27% of previously non-vulnerable organizations have potentially weakened their policies instead of improving them, thus leaking sensitive personal information to potential adversaries. To better understand state-of-the-art SAR policies, we interviewed several Data Protection Officers and explored the reasoning behind their processes from a viewpoint in the industry and gained insights about potential criminal abuse of weak SAR policies. Finally, we propose several technical modifications to SAR policies that reduce privacy and security risks of data controllers.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Privacy-preserving training of tree ensembles over continuous data<abstract> <title style='display:none'>Abstract</title> <p>Most existing Secure Multi-Party Computation (MPC) protocols for privacy-preserving training of decision trees over distributed data assume that the features are categorical. In real-life applications, features are often numerical. The standard “in the clear” algorithm to grow decision trees on data with continuous values requires sorting of training examples for each feature in the quest for an optimal cut-point in the range of feature values in each node. Sorting is an expensive operation in MPC, hence finding secure protocols that avoid such an expensive step is a relevant problem in privacy-preserving machine learning. In this paper we propose three more efficient alternatives for secure training of decision tree based models on data with continuous features, namely: (1) secure discretization of the data, followed by secure training of a decision tree over the discretized data; (2) secure discretization of the data, followed by secure training of a random forest over the discretized data; and (3) secure training of extremely randomized trees (“extra-trees”) on the original data. Approaches (2) and (3) both involve randomizing feature choices. In addition, in approach (3) cut-points are chosen randomly as well, thereby alleviating the need to sort or to discretize the data up front. We implemented all proposed solutions in the semi-honest setting with additive secret sharing based MPC. In addition to mathematically proving that all proposed approaches are correct and secure, we experimentally evaluated and compared them in terms of classification accuracy and runtime. We privately train tree ensembles over data sets with thousands of instances or features in a few minutes, with accuracies that are at par with those obtained in the clear. This makes our solution more efficient than the existing approaches, which are based on oblivious sorting.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Understanding Privacy-Related Advice on Stack Overflow<abstract> <title style='display:none'>Abstract</title> <p>Privacy tasks can be challenging for developers, resulting in privacy frameworks and guidelines from the research community which are designed to assist developers in considering privacy features and applying privacy enhancing technologies in early stages of software development. However, how developers engage with privacy design strategies is not yet well understood. In this work, we look at the types of privacy-related advice developers give each other and how that advice maps to Hoepman’s privacy design strategies.</p> <p>We qualitatively analyzed 119 privacy-related accepted <italic>answers</italic> on <italic>Stack Overflow</italic> from the past five years and extracted 148 pieces of advice from these answers. We find that the advice is mostly around compliance with regulations and ensuring confidentiality with a focus on the <monospace>inform</monospace>, <monospace>hide</monospace>, <monospace>control</monospace>, and <monospace>minimize </monospace>of the Hoepman’s privacy design strategies. Other strategies, <monospace>abstract</monospace>, <monospace>separate</monospace>, <monospace>enforce</monospace>, and <monospace>demonstrate</monospace>, are rarely advised. Answers often include links to official documentation and online articles, highlighting the value of both official documentation and other informal materials such as blog posts. We make recommendations for promoting the under-stated strategies through tools, and detail the importance of providing better developer support to handle third-party data practices.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Comprehensive Analysis of Privacy Leakage in Vertical Federated Learning During Prediction<abstract> <title style='display:none'>Abstract</title> <p>Vertical federated learning (VFL), a variant of federated learning, has recently attracted increasing attention. An <italic>active party</italic> having the true labels jointly trains a model with other parties (referred to as <italic>passive parties</italic>) in order to use more features to achieve higher model accuracy. During the prediction phase, all the parties collaboratively compute the predicted confidence scores of each target record and the results will be finally returned to the active party. However, a recent study by Luo <italic>et al</italic>. [28] pointed out that the active party can use these confidence scores to reconstruct passive-party features and cause severe privacy leakage.</p> <p>In this paper, we conduct a comprehensive analysis of privacy leakage in VFL frameworks during the prediction phase. Our study improves on previous work [28] regarding two aspects. We first design a general gradient-based reconstruction attack framework that can be flexibly applied to simple logistic regression models as well as multi-layer neural networks. Moreover, besides performing the attack under the white-box setting, we give the first attempt to conduct the attack under the black-box setting. Extensive experiments on a number of real-world datasets show that our proposed attack is effective under different settings and can achieve at best twice or thrice of a reduction of attack error compared to previous work [28]. We further analyze a list of potential mitigation approaches and compare their privacy-utility performances. Experimental results demonstrate that privacy leakage from the confidence scores is a <italic>substantial</italic> privacy risk in VFL frameworks during the prediction phase, which cannot be simply solved by crypto-based confidentiality approaches. On the other hand, processing the confidence scores with information compression and randomization approaches can provide strengthened privacy protection.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00FP-Radar: Longitudinal Measurement and Early Detection of Browser Fingerprinting<abstract> <title style='display:none'>Abstract</title> <p>Browser fingerprinting is a stateless tracking technique that aims to combine information exposed by multiple different web APIs to create a unique identifier for tracking users across the web. Over the last decade, trackers have abused several existing and newly proposed web APIs to further enhance the browser fingerprint. Existing approaches are limited to detecting a specific fingerprinting technique(s) at a particular point in time. Thus, they are unable to systematically detect novel fingerprinting techniques that abuse different web APIs. In this paper, we propose FP-R<sc>adar</sc>, a machine learning approach that leverages longitudinal measurements of web API usage on top-100K websites over the last decade for early detection of new and evolving browser fingerprinting techniques. The results show that FP-R<sc>adar</sc> is able to early detect the abuse of newly introduced properties of already known (e.g., <monospace>WebGL</monospace>, <monospace>Sensor</monospace>) and as well as previously unknown (e.g., <monospace>Gamepad</monospace>, <monospace>Clipboard</monospace>) APIs for browser fingerprinting. To the best of our knowledge, FP-R<sc>adar</sc> is the first to detect the abuse of the <monospace>Visibility </monospace>API for ephemeral fingerprinting in the wild.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00PUBA: Privacy-Preserving User-Data Bookkeeping and Analytics<abstract> <title style='display:none'>Abstract</title> <p>In this paper we propose Privacy-preserving User-data Bookkeeping &amp; Analytics (PUBA), a building block destined to enable the implementation of business models (e.g., targeted advertising) and regulations (e.g., fraud detection) requiring user-data analysis in a privacy-preserving way. In PUBA, users keep an unlinkable but authenticated cryptographic logbook containing their historic data on their device. This logbook can only be updated by the operator while its content is not revealed. Users can take part in a privacy-preserving analytics computation, where it is ensured that their logbook is up-to-date and authentic while the potentially secret analytics function is verified to be privacy-friendly. Taking constrained devices into account, users may also outsource analytic computations (to a potentially malicious proxy not colluding with the operator).We model our novel building block in the Universal Composability framework and provide a practical protocol instantiation. To demonstrate the flexibility of PUBA, we sketch instantiations of privacy-preserving fraud detection and targeted advertising, although it could be used in many more scenarios, e.g. data analytics for multi-modal transportation systems. We implemented our bookkeeping protocols and an exemplary outsourced analytics computation based on logistic regression using the MP-SPDZ MPC framework. Performance evaluations using a smartphone as user device and more powerful hardware for operator and proxy suggest that PUBA for smaller logbooks can indeed be practical.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00How to prove any NP statement jointly? Efficient Distributed-prover Zero-Knowledge Protocols<abstract> <title style='display:none'>Abstract</title> <p>Traditional zero-knowledge protocols have been studied and optimized for the setting where a single prover holds the complete witness and tries to convince a verifier about a predicate on the witness, without revealing any additional information to the verifier. In this work, we study the notion of distributed-prover zero knowledge (DPZK) for arbitrary predicates where the witness is shared among multiple mutually distrusting provers and they want to convince a verifier that their shares together satisfy the predicate. We make the following contributions to the notion of distributed proof generation: (i) we propose a new MPC-style security definition to capture the adversarial settings possible for different collusion models between the provers and the verifier, (ii) we discuss new efficiency parameters for distributed proof generation such as the number of rounds of interaction and the amount of communication among the provers, and (iii) we propose a compiler that realizes distributed proof generation from the zero-knowledge protocols in the Interactive Oracle Proofs (IOP) paradigm. Our compiler can be used to obtain DPZK from arbitrary IOP protocols, but the concrete efficiency overheads are substantial in general. To this end, we contribute (iv) a new zero-knowledge IOP Graphene which can be compiled into an efficient DPZK protocol. The (D + 1)-DPZK protocol D-Graphene, with D provers and one verifier, admits <italic>O</italic>(<italic>N</italic><sup>1</sup><italic><sup>/c</sup></italic>) proof size with a communication complexity of <italic>O</italic>(D<sup>2</sup> ·(<italic>N</italic><sup>1−2</sup><italic><sup>/c</sup></italic>+<italic>N<sub>s</sub></italic>)), where <italic>N</italic> is the number of gates in the arithmetic circuit representing the predicate and <italic>N<sub>s</sub></italic> is the number of wires that depends on inputs from two or more parties. Significantly, only the distributed proof generation in D-Graphene requires interaction among the provers. D-Graphene compares favourably with the DPZK protocols obtained from the state-of-art zero-knowledge protocols, even those not modelled as IOPs.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases<abstract> <title style='display:none'>Abstract</title> <p>Organizations often collect private data and release aggregate statistics for the public’s benefit. If no steps toward preserving privacy are taken, adversaries may use released statistics to deduce unauthorized information about the individuals described in the private dataset. Differentially private algorithms address this challenge by slightly perturbing underlying statistics with noise, thereby mathematically limiting the amount of information that may be deduced from each data release. Properly calibrating these algorithms—and in turn the disclosure risk for people described in the dataset—requires a data curator to choose a value for a privacy budget parameter, <italic>ɛ</italic>. However, there is little formal guidance for choosing <italic>ɛ</italic>, a task that requires reasoning about the probabilistic privacy–utility tradeoff. Furthermore, choosing <italic>ɛ</italic> in the context of statistical inference requires reasoning about accuracy trade-offs in the presence of both measurement error and differential privacy (DP) noise.</p> <p>We present <bold>Vi</bold>sualizing <bold>P</bold>rivacy (ViP), an interactive interface that visualizes relationships between <italic>ɛ</italic>, accuracy, and disclosure risk to support setting and splitting <italic>ɛ</italic> among queries. As a user adjusts <italic>ɛ</italic>, ViP dynamically updates visualizations depicting expected accuracy and risk. ViP also has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences. Finally, we present results of a study where 16 research practitioners with little to no DP background completed a set of tasks related to setting <italic>ɛ</italic> using both ViP and a control. We find that ViP helps participants more correctly answer questions related to judging the probability of where a DP-noised release is likely to fall and comparing between DP-noised and non-private confidence intervals.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Who Knows I Like Jelly Beans? An Investigation Into Search Privacy<abstract> <title style='display:none'>Abstract</title> <p>Internal site search is an integral part of how users navigate modern sites, from restaurant reservations to house hunting to searching for medical solutions. Search terms on these sites may contain sensitive information such as location, medical information, or sexual preferences; when further coupled with a user’s IP address or a browser’s user agent string, this information can become very specific, and in some cases possibly identifying.</p> <p>In this paper, we measure the various ways by which search terms are sent to third parties when a user submits a search query. We developed a methodology for identifying and interacting with search components, which we implemented on top of an instrumented headless browser. We used this crawler to visit the Tranco top one million websites and analyzed search term leakage across three vectors: URL query parameters, payloads, and the <italic>Referer</italic> HTTP header. Our crawler found that 512,701 of the top 1 million sites had internal site search. We found that 81.3% of websites containing internal site search sent (or <italic>leaked</italic> from a user’s perspective) our search terms to third parties in some form. We then compared our results to the expected results based on a natural language analysis of the privacy policies of those leaking websites (where available) and found that about 87% of those privacy policies do not mention search terms explicitly. However, about 75% of these privacy policies seem to mention the sharing of some information with third-parties in a generic manner. We then present a few countermeasures, including a browser extension to warn users about imminent search term leakage to third parties. We conclude this paper by making recommendations on clarifying the privacy implications of internal site search to end users.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00CoverDrop: Blowing the Whistle Through A News App<abstract> <title style='display:none'>Abstract</title> <p>Whistleblowing is hazardous in a world of pervasive surveillance, yet many leading newspapers expect sources to contact them with methods that are either insecure or barely usable. In an attempt to do better, we conducted two workshops with British news organisations and surveyed whistleblowing options and guidelines at major media outlets. We concluded that the soft spot is a system for initial contact and trust establishment between sources and reporters. <italic>CoverDrop</italic> is a two-way, secure system to do this. We support secure messaging within a news app, so that all its other users provide cover traffic, which we channel through a threshold mix instantiated in a Trusted Execution Environment within the news organisation. CoverDrop is designed to resist a powerful global adversary with the ability to issue warrants against infrastructure providers, yet it can easily be integrated into existing infrastructure. We present the results from our workshops, describe CoverDrop’s design and demonstrate its security and performance.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Updatable Private Set Intersection<abstract> <title style='display:none'>Abstract</title> <p>Private set intersection (PSI) allows two mutually distrusting parties each with a set as input, to learn the intersection of both their sets without revealing anything more about their respective input sets. Traditionally, PSI studies the <italic>static</italic> setting where the computation is performed only once on both parties’ input sets. We initiate the study of updatable private set intersection (UPSI), which allows parties to compute the intersection of their private sets on a regular basis with sets that also constantly get updated. We consider two specific settings. In the first setting called <italic>UPSI with addition</italic>, parties can add new elements to their old sets. We construct two protocols in this setting, one allowing both parties to learn the output and the other only allowing one party to learn the output. In the second setting called <italic>UPSI with weak deletion</italic>, parties can additionally delete their old elements every <italic>t</italic> days. We present a protocol for this setting allowing both parties to learn the output. All our protocols are secure against semi-honest adversaries and have the guarantee that both the computational and communication complexity only grow with the set updates instead of the entire sets. Finally, we implement our UPSI with addition protocols and compare with the state-of-the-art PSI protocols. Our protocols compare favorably when the total set size is sufficiently large, the new updates are sufficiently small, or in networks with low bandwidth.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00RegulaTor: A Straightforward Website Fingerprinting Defense<abstract> <title style='display:none'>Abstract</title> <p>Website Fingerprinting (WF) attacks are used by local passive attackers to determine the destination of encrypted internet traffic by comparing the sequences of packets sent to and received by the user to a previously recorded data set. As a result, WF attacks are of particular concern to privacy-enhancing technologies such as Tor. In response, a variety of WF defenses have been developed, though they tend to incur high bandwidth and latency overhead or require additional infrastructure, thus making them difficult to implement in practice. Some lighter-weight defenses have been presented as well; still, they attain only moderate effectiveness against recently published WF attacks. In this paper, we aim to present a realistic and novel defense, RegulaTor, which takes advantage of common patterns in web browsing traffic to reduce both defense overhead and the accuracy of current WF attacks. In the closed-world setting, RegulaTor reduces the accuracy of the state-of-the-art attack, Tik-Tok, against comparable defenses from 66% to 25.4%. To achieve this performance, it requires 6.6% latency overhead and a bandwidth overhead 39.3% less than the leading moderate-overhead defense. In the open-world setting, RegulaTor limits a precision-tuned Tik-Tok attack to an <italic>F</italic><sub>1</sub>-score of. 135, compared to .625 for the best comparable defense.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Privacy-Preserving Positioning in Wi-Fi Fine Timing Measurement<abstract> <title style='display:none'>Abstract</title> <p>With the standardization of Wi-Fi Fine Timing Measurement (Wi-Fi FTM; IEEE 802.11mc), the IEEE introduced indoor positioning for Wi-Fi networks. To date, Wi-Fi FTM is the most widely supported Wi-Fi distance measurement and positioning system. In this paper, we perform the first privacy analysis of Wi-Fi FTM and evaluate devices from a wide variety of vendors. We find the protocol inherently leaks location-sensitive information. Most notably, we present techniques that allow any client to be localized and tracked by a solely passive adversary. We identify flaws inWi-Fi FTM MAC address randomization and present techniques to fingerprint stations with firmware-specific granularity further leaking client identity. We address these shortcomings and present a privacy-preserving passive positioning system that leverages existing Wi-Fi FTM infrastructure and requires no hardware changes. Due to the absence of any client-side transmission, our design hides the very existence of a client and as a side-effect improves overall scalability without compromising on accuracy. Finally, we present privacy-enhancing recommendations for the current and next-generation protocols such as Wi-Fi Next Generation Positioning (Wi-Fi NGP; IEEE 802.11az).</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Efficient Set Membership Proofs using MPC-in-the-Head<abstract> <title style='display:none'>Abstract</title> <p>Set membership proofs are an invaluable part of privacy preserving systems. These proofs allow a prover to demonstrate knowledge of a witness <italic>w</italic> corresponding to a secret element <italic>x</italic> of a public set, such that they jointly satisfy a given NP relation, <italic>i.e.</italic> ℛ(<italic>w, x</italic>) = 1 and <italic>x</italic> is a member of a public set {<italic>x</italic><sub>1</sub>, . . . , x<sub>𝓁</sub>}. This allows the identity of the prover to remain hidden, eg. ring signatures and confidential transactions in cryptocurrencies.</p> <p>In this work, we develop a new technique for efficiently adding logarithmic-sized set membership proofs to any MPC-in-the-head based zero-knowledge protocol (Ishai et al. [STOC’07]). We integrate our technique into an open source implementation of the state-of-the-art, post quantum secure zero-knowledge protocol of Katz et al. [CCS’18].We find that using our techniques to construct ring signatures results in signatures (based only on symmetric key primitives) that are between 5 and 10 times smaller than state-of-the-art techniques based on the same assumptions. We also show that our techniques can be used to efficiently construct post-quantum secure RingCT from only symmetric key primitives.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00User-Level Label Leakage from Gradients in Federated Learning<abstract> <title style='display:none'>Abstract</title> <p>Federated learning enables multiple users to build a joint model by sharing their model updates (gradients), while their raw data remains local on their devices. In contrast to the common belief that this provides privacy benefits, we here add to the very recent results on privacy risks when sharing gradients. Specifically, we investigate Label Leakage from Gradients (LLG), a novel attack to extract the labels of the users’ training data from their shared gradients. The attack exploits the direction and magnitude of gradients to determine the presence or absence of any label. LLG is simple yet effective, capable of leaking potential sensitive information represented by labels, and scales well to arbitrary batch sizes and multiple classes. We mathematically and empirically demonstrate the validity of the attack under different settings. Moreover, empirical results show that LLG successfully extracts labels with high accuracy at the early stages of model training. We also discuss different defense mechanisms against such leakage. Our findings suggest that gradient compression is a practical technique to mitigate the attack.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Increasing Adoption of Tor Browser Using Informational and Planning Nudges<abstract> <title style='display:none'>Abstract</title> <p>Browsing privacy tools can help people protect their digital privacy. However, tools which provide the strongest protections—such as Tor Browser—have struggled to achieve widespread adoption. This may be due to usability challenges, misconceptions, behavioral biases, or mere lack of awareness. In this study, we test the effectiveness of nudging interventions that encourage the adoption of Tor Browser. First, we test an informational nudge based on <italic>protection motivation theory</italic> (PMT), designed to raise awareness of Tor Browser and help participants form accurate perceptions of it. Next, we add an <italic>action planning</italic> implementation intention, designed to help participants identify opportunities for using Tor Browser. Finally, we add a <italic>coping planning</italic> implementation intention, designed to help participants overcome challenges to using Tor Browser, such as extreme website slowness. We test these nudges in a longitudinal field experiment with 537 participants. We find that our PMT-based intervention increased use of Tor Browser in both the short- and long-term. Our coping planning nudge also increased use of Tor Browser, but only in the week following our intervention. We did not find statistically significant evidence of our action planning nudge increasing use of Tor Browser. Our study contributes to a greater understanding of factors influencing the adoption of Tor Browser, and how nudges might be used to encourage the adoption of Tor Browser and similar privacy enhancing technologies.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Editors’ Introduction the Feasibility and Generalizability of Fingerprinting Internet of Things Devices<abstract> <title style='display:none'>Abstract</title> <p>In recent years, we have seen rapid growth in the use and adoption of Internet of Things (IoT) devices. However, some loT devices are sensitive in nature, and simply knowing what devices a user owns can have security and privacy implications. Researchers have, therefore, looked at fingerprinting loT devices and their activities from encrypted network traffic. In this paper, we analyze the feasibility of fingerprinting IoT devices and evaluate the robustness of such fingerprinting approach across multiple independent datasets — collected under different settings. We show that not only is it possible to effectively fingerprint 188 loT devices (with over 97% accuracy), but also to do so even with multiple instances of the same make-and-model device. We also analyze the extent to which temporal, spatial and data-collection-methodology differences impact fingerprinting accuracy. Our analysis sheds light on features that are more robust against varying conditions. Lastly, we comprehensively analyze the performance of our approach under an open-world setting and propose ways in which an adversary can enhance their odds of inferring additional information about unseen devices (e.g., similar devices manufactured by the same company).</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00Differentially Private Simple Linear Regression<abstract> <title style='display:none'>Abstract</title> <p>Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study regression algorithms that satisfy <italic>differential privacy</italic>, a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with side information about the dataset. Motivated by the <italic>Opportunity Atlas</italic>, a high-profile, small-area analysis tool in economics research, we perform a thorough experimental evaluation of differentially private algorithms for simple linear regression on small datasets with tens to hundreds of records—a particularly challenging regime for differential privacy. In contrast, prior work on differentially private linear regression focused on multivariate linear regression on large datasets or asymptotic analysis. Through a range of experiments, we identify key factors that affect the relative performance of the algorithms. We find that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small datasets (e.g., hundreds of datapoints), while algorithms based on Ordinary Least Squares or Gradient Descent perform better for large datasets. However, we also discuss regimes in which this general finding does not hold. Notably, the differentially private analogues of Theil–Sen (one of which was suggested in a theoretical work of Dwork and Lei) have not been studied in any prior experimental work on differentially private linear regression.</p> </abstract>ARTICLE2022-03-03T00:00:00.000+00:00en-us-1