Regrouped design in privacy analysis for multinomial microdata

Shu Mei Wan, Wen Yaw Chung, Monica Mayeni Manurung, Kwang Hwa Chang, Chien Hua Wu

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we are dealing with the dual goals for protecting privacy and making statistical inferences from the disseminated data using the regrouped design. It is not difficult to protect the privacy of patients by perturbing data. The problem is to perturb the data in such a way that privacy is protected, and also, the released data are useful for research. By applying the regrouped design, the dataset is released with the dummy groups associated with the actual groups via a pre-specified transition probability matrix. Small stagnation probabilities of regrouped design are recommended to reach a small disclosure risk and a higher power of hypothesis testing. The power of test statistic in the released data increases as the stagnation probabilities depart from 0.5. The disclosure risk can be reduced further if more quasi-identifiers are relocated. An example of National Health Insurance Research Database is given to illustrate the use of the regrouped design to protect the privacy and make the statistical inference.

Original languageEnglish
Pages (from-to)179-192
Number of pages14
JournalStatistical Analysis and Data Mining
Volume15
Issue number2
DOIs
Publication statusPublished - Apr 2022

Keywords

  • disclosure risk
  • regrouped design
  • transition probability matrix

ASJC Scopus subject areas

  • Analysis
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Regrouped design in privacy analysis for multinomial microdata'. Together they form a unique fingerprint.

Cite this