Abstract
In this paper, we are dealing with the dual goals for protecting privacy and making statistical inferences from the disseminated data using the regrouped design. It is not difficult to protect the privacy of patients by perturbing data. The problem is to perturb the data in such a way that privacy is protected, and also, the released data are useful for research. By applying the regrouped design, the dataset is released with the dummy groups associated with the actual groups via a pre-specified transition probability matrix. Small stagnation probabilities of regrouped design are recommended to reach a small disclosure risk and a higher power of hypothesis testing. The power of test statistic in the released data increases as the stagnation probabilities depart from 0.5. The disclosure risk can be reduced further if more quasi-identifiers are relocated. An example of National Health Insurance Research Database is given to illustrate the use of the regrouped design to protect the privacy and make the statistical inference.
Original language | English |
---|---|
Pages (from-to) | 179-192 |
Number of pages | 14 |
Journal | Statistical Analysis and Data Mining |
Volume | 15 |
Issue number | 2 |
DOIs | |
Publication status | Published - Apr 2022 |
Keywords
- disclosure risk
- regrouped design
- transition probability matrix
ASJC Scopus subject areas
- Analysis
- Information Systems
- Computer Science Applications