Journal of Official Statistics Feed

Book Review: . . 2021 Wiley, ISBN: 978-1-119-37168-7, 624 pps

Sun, 10 Dec 2023 00:00:00 GMT

Temporally Consistent Present Population from Mobile Network Signaling Data for Official Statistics

Sun, 10 Dec 2023 00:00:00 GMT

Mobile network data records are promising for measuring temporal changes in present populations. This promise has been boosted since high-frequency passively-collected signaling data became available. Its temporal event rate is considerably higher than that of Call Detail Records – on which most of the previous literature is based. Yet, we show it remains a challenge to produce statistics consistent over time, robust to changes in the “measuring instruments” and conveying spatial uncertainty to the end user. In this article, we propose a methodology to estimate – consistently over several months – hourly population presence over France based on signaling data spatially merged with fine-grained official population counts. We draw particular attention to consistency at several spatial scales and over time and to spatial mapping reflecting spatial accuracy. We compare the results with external references and discuss the challenges which remain. We argue data fusion approaches between fine-grained official statistics data sets and mobile network data, spatially merged to preserve privacy, are promising for future methodologies.

Small Area Estimates of Poverty Incidence in Costa Rica under a Structure Preserving Estimation (SPREE) Approach

Sun, 10 Dec 2023 00:00:00 GMT

Obtaining reliable estimates in small areas is a challenge because of the coverage and periodicity of data collection. Several techniques of small area estimation have been proposed to produce quality measures in small areas, but few of them are focused on updating these estimates. By combining the attributes of the most recent versions of the structure-preserving estimation methods, this article proposes a new alternative to estimate and update cross-classified counts for small domains, when the variable of interest is not available in the census. The proposed methodology is used to obtain and up-date estimates of the incidence of poverty in 81 Costa Rican cantons for six postcensal years (2012–2017). As uncertainty measures, mean squared errors are estimated via parametric bootstrap, and the adequacy of the proposed method is assessed with a design-based simulation.

Editorial Collaborators

Sun, 10 Dec 2023 00:00:00 GMT

Application of Sampling Variance Smoothing Methods for Small Area Proportion Estimation

Sun, 10 Dec 2023 00:00:00 GMT

Sampling variance smoothing is an important topic in small area estimation. In this article, we propose sampling variance smoothing methods for small area proportion estimation. In particular, we consider the generalized variance function and design effect methods for sampling variance smoothing. We evaluate and compare the smoothed sampling variances and small area estimates based on the smoothed variance estimates through analysis of survey data from Statistics Canada. The results from real data analysis and simulation study indicate that the proposed sampling variance smoothing methods perform very well for small area estimation.

Small Area with Multiply Imputed Survey Data

Sun, 10 Dec 2023 00:00:00 GMT

In this article, we propose a framework for small area estimation with multiply imputed survey data. Many statistical surveys suffer from (a) high nonresponse rates due to sensitive questions and response burden and (b) too small sample sizes to allow for reliable estimates on (unplanned) disaggregated levels due to budget constraints. One way to deal with missing values is to replace them by several plausible/imputed values based on a model. Small area estimation, such as the model by Fay and Herriot, is applied to estimate regionally disaggregated indicators when direct estimates are imprecise. The framework presented tackles simultaneously multiply imputed values and imprecise direct estimates. In particular, we extend the general class of transformed Fay-Herriot models to account for the additional uncertainty from multiple imputation. We derive three special cases of the Fay-Herriot model with particular transformations and provide point and mean squared error estimators. Depending on the case, the mean squared error is estimated by analytic solutions or resampling methods. Comprehensive simulations in a controlled environment show that the proposed methodology leads to reliable and precise results in terms of bias and mean squared error. The methodology is illustrated by a real data example using European wealth data.

Answering Current Challenges of and Changes in Producing Official Time Use Statistics Using the Data Collection Platform MOTUS

Sun, 10 Dec 2023 00:00:00 GMT

The modernization of the production of official statistics faces challenges related to technological developments, budget cuts, and growing privacy concerns. At the same time, there is a need for shareable and scalable platforms to support comparable data, leading to several online data collection strategies being rolled out. Time Use Surveys (TUS) are particularly affected by these challenges and needs as they (while producing rich data) are complex, time-intensive studies (because they include multiple tasks and are administered at the household level). This article introduces the Modular Online Time Use Survey (MOTUS) data collection platform and explains how it accommodates the challenges of and changes in the production of a TUS that is carried out in line with the Harmonized European Time Use Survey guidelines. It argues that MOTUS supports a shift in the methodological paradigm of conducting TUS by being timelier and more cost efficient, by lowering respondent burden, and by improving the reliability of the data collected. Importantly, the modular structure allows MOTUS to be easily deployed for various TUS configurations. Moreover, this versatile structure allows comparable, complex diary surveys (such as the household budget survey) to be performed on the same platform and with the same applications.

Block Weighted Least Squares Estimation for Nonlinear Cost-based Split Questionnaire Design

Sun, 10 Dec 2023 00:00:00 GMT

In this study, we advocate a two-stage framework to deal with the issues encountered in surveys with long questionnaires. In Stage I, we propose a split questionnaire design (SQD) developed by minimizing a quadratic cost function while achieving reliability constraints on estimates of means, which effectively reduces the survey cost, alleviates the burden on the respondents, and potentially improves data quality. In Stage II, we develop a block weighted least squares (BWLS) estimator of linear regression coefficients that can be used with data obtained from the SQD obtained in Stage I. Numerical studies comparing existing methods strongly favor the proposed estimator in terms of prediction and estimation accuracy. Using the European Social Survey (ESS) data, we demonstrate that the proposed SQD can substantially reduce the survey cost and the number of questions answered by each respondent, and the proposed estimator is much more interpretable and efficient than present alternatives for the SQD data.

Index to Volume 39, 2023

Sun, 10 Dec 2023 00:00:00 GMT

A Rejoinder to Garfinkel (2023) – Legacy Statistical Disclosure Limitation Techniques for Protecting 2020 Decennial US Census: Still a Viable Option

Thu, 07 Sep 2023 00:00:00 GMT

In our article “Database Reconstruction Is Not So Easy and Is Different from Reidentification”, we show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques, also sometimes called legacy statistical disclosure limitation (SDL) techniques. Furthermore, we also point out that, even if reconstruction can be performed, it does not imply reidentification. Hence, the risk of reconstruction does not seem to warrant replacing traditional SDC techniques with differential privacy (DP) based protection. In “Legacy Statistical Disclosure Limitation Techniques Were Not an Option for the 2020 US Census of Population and Housing”, by Simson Garfinkel, the author insists that the 2020 Census move to DP was justified. In our view, this latter article contains some misconceptions that we identify and discuss in some detail below. Consequently, we stand by the arguments given in “Database Reconstruction Is Not So Easy:: :”.

Letter to Editor Quality of 2017 Population Census of Pakistan by Age and Sex

Thu, 07 Sep 2023 00:00:00 GMT

This Letter to Editor is a supplement to the previously published article in the Journal of Official Statistics (Wazir and Goujon 2021).

In 2021, a reconstruction method using demographic analysis for assessing the quality and validity of the 2017 census data has been applied, by critically investigating the demographic changes in the intercensal period at national and provincial levels. However, at the time when the article was written, the age and sex structure of the population from the 2017 census had not yet been published, making it hard to fully appreciate the reconstruction of the national and subnational level populations.

In the meantime, detailed data have become available and offer the possibility to assess the reconstruction’s outcome more in detail. Therefore, this letter aims two-fold: (1) to analyze the quality of the age and sex distribution in the 2017 Population census of Pakistan, and (2) to compare the reconstruction by age and sex to the results of the 2017 population census. Our results reveal that the age and sex structure of the population as estimated by the 2017 census suffer from some irregularities. Our analysis by age and sex reinforces the main conclusion of previous article that the next census in Pakistan should increase in quality with an inbuild post-enumeration survey along with post-census demographic analysis.

Comment to Muralidhar and Domingo-Ferrer (2023) – Legacy Statistical Disclosure Limitation Techniques Were Not An Option for the 2020 US Census of Population And Housing

Thu, 07 Sep 2023 00:00:00 GMT

The Article Database Reconstruction is Not So Easy and Is Different from Reidentification, by Krish Muralidhar and Josep Domingo-Ferrer, is an extended attack on the decision of the U.S. Census Bureau to turn its back on legacy statistical disclosure limitation techniques and instead use a bespoke algorithm based on differential privacy to protect the published data products of the Census Bureau’s 2020 Census of Population and Housing (henceforth referred to as the 2020 Census). This response explains why differential privacy was the only realistic choice for protecting sensitive data collected for the 2020 Census. However, differential privacy has a social cost: it requires that practitioners admit that there is inherently a trade-off between the utility of published official statistics and the privacy loss of those whose data are collected under a pledge of confidentiality.

Towards Demand-Driven On-The-Fly Statistics

Thu, 07 Sep 2023 00:00:00 GMT

A prototype of a question answering (QA) system, called Farseer, for the real-time calculation and dissemination of aggregate statistics is introduced. Using techniques from natural language processing (NLP), machine learning (ML), artificial intelligence (AI) and formal semantics, this framework is capable of correctly interpreting a written request for (aggregate) statistics and subsequently generating appropriate results. It is shown that the framework operates in a way that is independent of a specific statistical domain under consideration, by capturing domain specific information in a knowledge graph that is input to the framework. However, it is also shown that the prototype still has its limitations, lacking statistical disclosure control. Also, searching the knowledge graph is still time-consuming.

Database Reconstruction Is Not So Easy and Is Different from Reidentification

Thu, 07 Sep 2023 00:00:00 GMT

In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this article, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures.

Looking for a New Approach to Measuring the Spatial Concentration of the Human Population

Thu, 07 Sep 2023 00:00:00 GMT

In the article a new approach for measuring the spatial concentration of human population is presented and tested. The new procedure is based on the concept of concentration introduced by Gini and, at the same time, on its spatial extension (i.e., taking into account the concept of spatial autocorrelation, polarization). The proposed indicator, the Spatial Gini Index, is then computed by using two different kind of territorial partitioning methods: MaxMin (MM) and the Constant Step (CS) distance. In this framework an ad hoc extension of the Rey and Smith decomposition method is then introduced. We apply this new approach to the Italian and foreign population resident in almost 7,900 statistical units (Italian municipalities) in 2002, 2010 and 2018. All elaborations are based on a new ad hoc library developed and implemented in Python.

A Note on the Optimum Allocation of Resources to Follow up Unit Nonrespondents in Probability Surveys

Thu, 07 Sep 2023 00:00:00 GMT

Common practice to address nonresponse in probability surveys in National Statistical Offices is to follow up every non respondent with a view to lifting response rates. As response rate is an insufficient indicator of data quality, it is argued that one should follow up non respondents with a view to reducing the mean squared error (MSE) of the estimator of the variable of interest. In this article, we propose a method to allocate the nonresponse follow-up resources in such a way as to minimise the MSE under a quasi-randomisation framework. An example to illustrate the method using the 2018/19 Rural Environment and Agricultural Commodities Survey from the Australian Bureau of Statistics is provided.

Predicting Days to Respondent Contact in Cross-Sectional Surveys Using a Bayesian Approach

Thu, 07 Sep 2023 00:00:00 GMT

Surveys estimate and monitor a variety of data collection parameters, including response propensity, number of contacts, and data collection costs. These parameters can be used as inputs to a responsive/adaptive design or to monitor the progression of a data collection period against predefined expectations. Recently, Bayesian methods have emerged as a method for combining historical information or external data with data from the in-progress data collection period to improve prediction. We develop a Bayesian method for predicting a measure of case-level progress or productivity, the estimated time lag, in days, between first contact attempt and first respondent contact. We compare the quality of predictions from the Bayesian method to predictions generated from more commonly-used predictive methods that leverage data from only historical data collection periods or the in-progress round of data collection. Using prediction error and misclassification as short- or long- day lags, we demonstrate that the Bayesian method results in improved predictions close to the day of the first contact attempt, when these predictions may be most informative for interventions or interviewer feedback. This application adds to evidence that combining historical and current information about data collection, in a Bayesian framework, can improve predictions of data collection parameters.

Constructing Building Price Index Using Administrative Data

Fri, 09 Jun 2023 00:00:00 GMT

Improving the accuracy of deflators is crucial for measuring real GDP and growth rates. However, construction prices are often difficult to measure. This study uses the stratification and hedonic methods to estimate price indices. The estimated indices are based on the actual transaction prices of buildings (contract prices) obtained from the Statistics on Building Starts survey information from the administrative sector in Japan. Compared with the construction cost deflator (CCD), calculated by compounding input costs, the estimated output price indices show higher rates of increase during the economic expansion phase after 2013. This suggests that the profit surge in the construction sector observed in that period is not fully reflected in the CCD. Furthermore, the difference between the two “output-type” indices obtained by stratification and hedonic methods shrinks when the estimation methods are precisely configured.

Design and Sample Size Determination for Experiments on Nonresponse Followup using a Sequential Regression Model

Fri, 09 Jun 2023 00:00:00 GMT

Statistical agencies depend on responses to inquiries made to the public, and occasionally conduct experiments to improve contact procedures. Agencies may wish to assess whether there is significant change in response rates due to an operational refinement. This work considers the assessment of response rates when up to L attempts are made to contact each subject, and subjects receive one of J possible variations of the operation under experimentation. In particular, the continuation-ratio logit (CRL) model facilitates inference on the probability of success at each step of the sequence, given that failures occurred at previous attempts. The CRL model is investigated as a basis for sample size determination– one of the major decisions faced by an experimenter–to attain a desired power under a Wald test of a general linear hypothesis. An experiment that was conducted for nonresponse followup in the United States 2020 decennial census provides a motivating illustration.

Effects of Changing Modes on Item Nonresponse in Panel Surveys

Fri, 09 Jun 2023 00:00:00 GMT

To investigate the effect of a change from the telephone to the web mode on item nonresponse in panel surveys, we use experimental data from a two-wave panel survey. The treatment group changed from the telephone to the web mode after the first wave, while the control group continued in the telephone mode. We find that when changing to the web, “don’t know” answers increase moderately from a low level, while item refusal increases substantially from a very low level. This is the case for all person groups, although socio-demographic characteristics have some additional effects on giving a don’t know or a refusal when changing mode.