Skip to main content

Advertisement

Springer Nature Link
Account
Menu
Find a journal Publish with us Track your research
Search
Saved research
Cart
  1. Home
  2. Science China Information Sciences
  3. Article

Privacy-preserving explainable AI: a survey

  • Review
  • Open access
  • Published: 07 November 2024
  • Volume 68, article number 111101, (2025)
  • Cite this article

You have full access to this open access article

Download PDF
Save article
View saved research
Science China Information Sciences Aims and scope Submit manuscript
Privacy-preserving explainable AI: a survey
Download PDF
  • Thanh Tam Nguyen1,
  • Thanh Trung Huynh2,
  • Zhao Ren3,
  • Thanh Toan Nguyen4,
  • Phi Le Nguyen5,
  • Hongzhi Yin6 &
  • …
  • Quoc Viet Hung Nguyen1 
  • 4145 Accesses

  • 14 Citations

  • Explore all metrics

This article has been updated

Abstract

As the adoption of explainable AI (XAI) continues to expand, the urgency to address its privacy implications intensifies. Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations. This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures. Our contribution to this field comprises a thorough analysis of research papers with a connected taxonomy that facilitates the categorization of privacy attacks and countermeasures based on the targeted explanations. This work also includes an initial investigation into the causes of privacy leaks. Finally, we discuss unresolved issues and prospective research directions uncovered in our analysis. This survey aims to be a valuable resource for the research community and offers clear insights for those new to this domain. To support ongoing research, we have established an online resource repository, which will be continuously updated with new and relevant findings.

Article PDF

Download to read the full article text

Similar content being viewed by others

Privacy-Aware Explanations for Team Formation

Chapter © 2023

Privacy and Security Considerations in Explainable AI

Chapter © 2024

Balancing XAI with Privacy and Security Considerations

Chapter © 2024

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Computational Intelligence
  • Data Privacy
  • Information Model
  • Philosophy of Artificial Intelligence
  • Privacy
  • Artificial Intelligence
  • Explainable Artificial Intelligence in Machine Learning Systems

Change history

  • 07 July 2025

    In this article the missing funding note has been added.

References

  1. Goodman B, Flaxman S. European Union regulations on algorithmic decision making and a “right to explanation”. AI Mag, 2017, 38: 50–57

    Google Scholar 

  2. Chang H, Shokri R. On the privacy risks of algorithmic fairness. In: Proceedings of IEEE European Symposium on Security and Privacy, 2021. 292–303

    Google Scholar 

  3. Ancona M, Ceolini E, Oztireli C, et al. Towards better understanding of gradient-based attribution methods for deep neural networks. In: Proceedings of International Conference on Learning Representations, 2018

    Google Scholar 

  4. Ribeiro M T, Singh S, Guestrin C. “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2016. 1135–1144

    Chapter  Google Scholar 

  5. Lundberg S M, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of Conference on Neural Information Processing Systems, 2017

    Google Scholar 

  6. Guidotti R. Counterfactual explanations and how to find them: literature review and benchmarking. Data Min Knowl Disc, 2024, 38: 2770–2824

    Article  MathSciNet  Google Scholar 

  7. Bodria F, Giannotti F, Guidotti R, et al. Benchmarking and survey of explanation methods for black box models. Data Min Knowl Disc, 2023, 37: 1719–1778

    Article  MathSciNet  Google Scholar 

  8. Guidotti R, Monreale A, Ruggieri S, et al. A survey of methods for explaining black box models. ACM Comput Surv, 2018, 51: 1–42

    Article  Google Scholar 

  9. Gilpin L H, Bau D, Yuan B Z, et al. Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of IEEE International Conference on Data Science and Advanced Analytics, 2018. 80–89

    Google Scholar 

  10. Goethals S, Sörensen K, Martens D. The privacy issue of counterfactual explanations: explanation linkage attacks. ACM Trans Intell Syst Technol, 2023, 14: 1–24

    Article  Google Scholar 

  11. Ferry J, Aïvodji U, Gambs S, et al. SoK: taming the triangle–on the interplays between fairness, interpretability and privacy in machine learning. 2023. ArXiv:2312.16191

  12. Sokol K, Flach P. Counterfactual explanations of machine learning predictions: opportunities and challenges for AI safety. In: Proceedings of the AAAI Workshop on Artificial Intelligence Safety, 2019

    Google Scholar 

  13. Luo X, Jiang Y, Xiao X. Feature inference attack on Shapley values. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2022. 2233–2247

    Google Scholar 

  14. Naretto F, Monreale A, Giannotti F. Evaluating the privacy exposure of interpretable global explainers. In: Proceedings of IEEE International Conference on Cognitive Machine Intelligence, 2022. 13–19

    Google Scholar 

  15. Artelt A, Vaquet V, Velioglu R, et al. Evaluating robustness of counterfactual explanations. In: Proceedings of IEEE Symposium Series on Computational Intelligence, 2021. 1–9

    Google Scholar 

  16. Machado G R, Silva E, Goldschmidt R R. Adversarial machine learning in image classification: a survey toward the defender’s perspective. ACM Comput Surv, 2021, 55: 1–38

    Article  Google Scholar 

  17. Biggio B, Roli F. Wild patterns: ten years after the rise of adversarial machine learning. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018. 2154–2156

    Google Scholar 

  18. Rigaki M, Garcia S. A survey of privacy attacks in machine learning. ACM Comput Surv, 2023, 56: 1–34

    Article  Google Scholar 

  19. Hu H, Salcic Z, Sun L, et al. Membership inference attacks on machine learning: a survey. ACM Comput Surv, 2022, 54: 1–37

    Google Scholar 

  20. Liu B, Ding M, Shaham S, et al. When machine learning meets privacy: a survey and outlook. ACM Comput Surv, 2022, 54: 1–36

    Google Scholar 

  21. Baniecki H, Biecek P. Adversarial attacks and defenses in explainable artificial intelligence: a survey. Inf Fusion, 2024, 107: 102303

    Article  Google Scholar 

  22. Papernot N, McDaniel P, Goodfellow I, et al. Practical black-box attacks against machine learning. In: Proceedings of the ACM on Asia Conference on Computer and Communications Security, 2017. 506–519

    Google Scholar 

  23. Liu Z, Guo J, Yang W, et al. Privacy-preserving aggregation in federated learning: a survey. IEEE Trans Big Data, 2024. doi: https://doi.org/10.1109/TBDATA.2022.3190835

    Google Scholar 

  24. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access, 2018, 6: 52138–52160

    Article  Google Scholar 

  25. Došilović F K, Brčić M, Hlupić N. Explainable artificial intelligence: a survey. In: Proceedings of International Convention on Information and Communication Technology, Electronics and Microelectronics, 2018. 210–215

    Google Scholar 

  26. Banisar D. The right to information and privacy: balancing rights and managing conflicts. World Bank Institute Governance Working Paper, 2011. https://documents.worldbank.org/en/publication/documents-reports/documentdetail/847541468188048435/the-right-to-information-and-privacy-balancing-rights-and-managing-conflicts-access-to-information-program

    Book  Google Scholar 

  27. Vo V, Le T, Nguyen V, et al. Feature-based learning for diverse and privacy-preserving counterfactual explanations. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2023. 2211–2222

    Google Scholar 

  28. Mochaourab R, Sinha S, Greenstein S, et al. Robust counterfactual explanations for privacy-preserving SVM. In: Proceedings of ICML Workshops, 2021

    Google Scholar 

  29. Harder F, Bauer M, Park M. Interpretable and differentially private predictions. In: Proceedings of AAAI Conference on Artificial Intelligence, 2020. 34: 4083–4090

    Article  Google Scholar 

  30. Shokri R, Strobel M, Zick Y. On the privacy risks of model explanations. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2021. 231–241

    Chapter  Google Scholar 

  31. Shokri R, Strobel M, Zick Y. Exploiting transparency measures for membership inference: a cautionary tale. In: Proceedings of the AAAI Workshop on Privacy-Preserving Artificial Intelligence, 2020

    Google Scholar 

  32. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. 2013. ArXiv:1312.6034

  33. Bach S, Binder A, Montavon G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. Plos One, 2015, 10: e0130140

    Article  Google Scholar 

  34. Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: Proceedings of International Conference on Machine Learning, 2017. 3145–3153

    Google Scholar 

  35. Sliwinski J, Strobel M, Zick Y. Axiomatic characterization of data-driven influence measures for classification. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 33: 718–725

    Article  Google Scholar 

  36. Smilkov D, Thorat N, Kim B, et al. SmoothGrad: removing noise by adding noise. 2017. ArXiv:1706.03825

  37. Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: Proceedings of International Conference on Machine Learning, 2017. 3319–3328

    Google Scholar 

  38. Miura T, Hasegawa S, Shibahara T. MEGEX: data-free model extraction attack against gradient-based explainable AI. 2021. ArXiv:2107.08909

  39. Springenberg J T, Dosovitskiy A, Brox T, et al. Striving for simplicity: the all convolutional net. 2014. ArXiv:1412.6806

  40. Deng H. Interpreting tree ensembles with intrees. J Dialogue Studies, 2019, 7: 277–287

    Google Scholar 

  41. Slack D, Hilgard S, Jia E, et al. Fooling lime and shap: adversarial attacks on post hoc explanation methods. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020. 180–186

    Chapter  Google Scholar 

  42. Jetchev D, Vuille M. XorSHAP: privacy-preserving explainable AI for decision tree models. Cryptology ePrint Archive, 2023. https://eprint.iacr.org/2023/1859

    Google Scholar 

  43. Datta A, Sen S, Zick Y. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of IEEE Symposium on Security and Privacy, 2016. 598–617

    Google Scholar 

  44. Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst, 2014, 41: 647–665

    Article  Google Scholar 

  45. Maleki S, Tran-Thanh L, Hines G, et al. Bounding the estimation error of sampling-based Shapley value approximation. 2013. ArXiv:1306.4265

  46. Begley T, Schwedes T, Frye C, et al. Explainability for fair machine learning. 2020. ArXiv:2010.07389

  47. Aïvodji U, Hara S, Marchand M, et al. Fooling shap with stealthily biased sampling. In: Proceedings of International Conference on Learning Representations, 2022

    Google Scholar 

  48. Montenegro H, Silva W, Gaudio A, et al. Privacy-preserving case-based explanations: enabling visual interpretability by protecting privacy. IEEE Access, 2022, 10: 28333–28347

    Article  Google Scholar 

  49. Koh P W, Liang P. Understanding black-box predictions via influence functions. In: Proceedings of International Conference on Machine Learning, 2017. 1885–1894

    Google Scholar 

  50. Kenny E M, Ford C, Quinn M, et al. Explaining black-box classifiers using post-hoc explanations-by-example: the effect of explanations and error-rates in XAI user studies. Artif Intell, 2021, 294: 103459

    Article  MathSciNet  Google Scholar 

  51. Lipton Z C. The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue, 2018, 16: 31–57

    Article  Google Scholar 

  52. Kim B, Rudin C, Shah J A. The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: Proceedings of Conference on Neural Information Processing Systems, 2014

    Google Scholar 

  53. Nugent C, Doyle D, Cunningham P. Gaining insight through case-based explanation. J Intell Inf Syst, 2009, 32: 267–295

    Article  Google Scholar 

  54. Angelov P, Soares E. Towards explainable deep neural networks (xDNN). Neural Netws, 2020, 130: 185–194

    Article  Google Scholar 

  55. Angelov P, Soares E. Towards deep machine reasoning: a prototype-based deep neural network with decision tree inference. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, 2020. 2092–2099

    Google Scholar 

  56. Li O, Liu H, Chen C, et al. Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of AAAI Conference on Artificial Intelligence, 2018

    Google Scholar 

  57. Chen C, Li O, Tao D, et al. This looks like that: deep learning for interpretable image recognition. In: Proceedings of Conference on Neural Information Processing Systems, 2019

    Google Scholar 

  58. Papernot N, McDaniel P. Deep k-nearest neighbors: towards confident, interpretable and robust deep learning. 2018. ArXiv:1803.04765

  59. Chen Z, Bei Y, Rudin C. Concept whitening for interpretable image recognition. Nat Mach Intell, 2020, 2: 772–782

    Article  Google Scholar 

  60. Silva W, Poellinger A, Cardoso J S, et al. Interpretability-guided content-based medical image retrieval. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention, 2020. 305–314

    Google Scholar 

  61. Kim S, Chae D K. What does a model really look at?: extracting model-oriented concepts for explaining deep neural networks. IEEE Trans Pattern Anal Mach Intell, 2024, 46: 4612–4624

    Article  Google Scholar 

  62. Kenny E M, Keane M T. Twin-systems to explain artificial neural networks using case-based reasoning: comparative tests of feature-weighting methods in ANN-CBR twins for XAI. In: Proceedings of International Joint Conferences on Artificial Intelligence, 2019. 2708–2715

    Google Scholar 

  63. Kuppa A, Le-Khac N A. Adversarial XAI methods in cybersecurity. IEEE Trans Inform Forensic Secur, 2021, 16: 4924–4938

    Article  Google Scholar 

  64. Wachter S, Mittelstadt B, Russell C. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL & Tech, 2017, 31: 841

    Google Scholar 

  65. Dodge J, Liao Q V, Zhang Y, et al. Explaining models: an empirical study of how explanations impact fairness judgment. In: Proceedings of the International Conference on Intelligent User Interfaces, 2019. 275–285

    Google Scholar 

  66. Binns R, van Kleek M, Veale M, et al. ‘It’s reducing a human being to a percentage’ perceptions of justice in algorithmic decisions. In: Proceedings of CHI Conference on Human Factors in Computing Systems, 2018. 1–14

    Google Scholar 

  67. Karimi A-H, Schölkopf B, Valera I. Algorithmic recourse: from counterfactual explanations to interventions. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2021. 353–362

    Chapter  Google Scholar 

  68. Ustun B, Spangher A, Liu Y. Actionable recourse in linear classification. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2019. 10–19

    Chapter  Google Scholar 

  69. Laugel T, Lesot M-J, Marsala C, et al. Inverse classification for comparison-based interpretability in machine learning. 2017. ArXiv:1712.08443

  70. Dhurandhar A, Chen P-Y, Luss R, et al. Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of Conference on Neural Information Processing Systems, 2018

    Google Scholar 

  71. Pawelczyk M, Lakkaraju H, Neel S. On the privacy risks of algorithmic recourse. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2023. 9680–9696

    Google Scholar 

  72. Mothilal R K, Sharma A, Tan C. Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2020. 607–617

    Google Scholar 

  73. Severi G, Meyer J, Coull S, et al. Explanation-guided backdoor poisoning attacks against malware classifiers. In: Proceedings of USENIX, 2021. 1487–1504

    Google Scholar 

  74. Kuppa A, Le-Khac N-A. Black box attacks on explainable artificial intelligence (XAI) methods in cyber security. In: Proceedings of International Joint Conference on Neural Networks, 2020. 1–8

    Google Scholar 

  75. Liu M, Liu X, Yan A, et al. Explanation-guided minimum adversarial attack. In: Proceedings of the International Conference on Machine Learning for Cyber Security, 2022. 257–270

    Google Scholar 

  76. Nguyen T, Lai P, Phan H, et al. XRand: differentially private defense against explanation-guided attacks. In: Proceedings of AAAI Conference on Artificial Intelligence, 2023. 873–881

    Google Scholar 

  77. Abdukhamidov E, Abuhamad M, Woo S S, et al. Hardening interpretable deep learning systems: investigating adversarial threats and defenses. IEEE Trans Dependable Secure Comput, 2024, 21: 3963–3976

    Article  Google Scholar 

  78. Garcia W, Choi J I, Adari S K, et al. Explainable black-box attacks against model-based authentication. 2018. ArXiv:1810.00024

  79. Zhang X, Wang N, Shen H, et al. Interpretable deep learning under fire. In: Proceedings of USENIX, 2020

    Google Scholar 

  80. Veale M, Binns R, Edwards L. Algorithms that remember: model inversion attacks and data protection law. Philos Trans R Soc A, 2018, 376: 20180083

    Article  Google Scholar 

  81. Shokri R, Strobel M, Zick Y. Privacy risks of explaining machine learning models. 2019. ArXiv:1907.00164

  82. Yeom S, Giacomelli I, Fredrikson M, et al. Privacy risk in machine learning: analyzing the connection to overfitting. In: Proceedings of IEEE Computer Security Foundations Symposium, 2018. 268–282

    Google Scholar 

  83. Sablayrolles A, Douze M, Schmid C, et al. White-box vs black-box: Bayes optimal strategies for membership inference. In: Proceedings of International Conference on Machine Learning, 2019. 5558–5567

    Google Scholar 

  84. Carlini N, Chien S, Nasr M, et al. Membership inference attacks from first principles. In: Proceedings of IEEE Symposium on Security and Privacy, 2022. 1897–1914

    Google Scholar 

  85. Liu Y, Zhao Z, Backes M, et al. Membership inference attacks by exploiting loss trajectory. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2022. 2085–2098

    Google Scholar 

  86. Li Z, Liu Y, He X, et al. Auditing membership leakages of multi-exit networks. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2022. 1917–1931

    Google Scholar 

  87. Ye J, Maddi A, Murakonda S K, et al. Enhanced membership inference attacks against machine learning models. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2022. 3093–3106

    Google Scholar 

  88. Quan P, Chakraborty S, Jeyakumar J V, et al. On the amplification of security and privacy risks by post-hoc explanations in machine learning models. 2022. ArXiv:2206.14004

  89. Liu H, Wu Y, Yu Z, et al. Please tell me more: privacy impact of explainability through the lens of membership inference attack. In: Proceedings of IEEE Symposium on Security and Privacy, 2024

    Google Scholar 

  90. Petitcolas F A. Kerckhoffs’ principle. In: Proceedings of Encyclopedia of Cryptography, Security and Privacy, 2023. 1–2

    Google Scholar 

  91. Craven M W, Shavlik J W. Using sampling and queries to extract rules from trained neural networks. In: Proceedings of Machine Learning Proceedings, 1994. 37–45

    Google Scholar 

  92. Pawelczyk M, Broelemann K, Kasneci G. Learning model-agnostic counterfactual explanations for tabular data. In: Proceedings of the Web Conference, 2020. 3126–3132

    Google Scholar 

  93. Huang C, Swoopes C, Xiao C, et al. Accurate, explainable, and private models: providing recourse while minimizing training data leakage. 2023. ArXiv:2308.04341

  94. Sweeney L. Simple demographics often identify people uniquely. Health, 2000, 671: 1–34

    Google Scholar 

  95. Brughmans D, Leyman P, Martens D. NICE: an algorithm for nearest instance counterfactual explanations. Data Min Knowl Disc, 2024, 38: 2665–2703

    Article  MathSciNet  Google Scholar 

  96. Keane M T, Smyth B. Good counterfactuals and where to find them: a case-based technique for generating counterfactuals for explainable AI (XAI). In: Proceedings of Case-Based Reasoning Research and Development. Cham: Springer, 2020. 163–178

    Google Scholar 

  97. Pawelczyk M, Broelemann K, Kasneci G. On counterfactual explanations under predictive multiplicity. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2020. 809–818

    Google Scholar 

  98. Aïvodji U, Bolot A, Gambs S. Model extraction from counterfactual explanations. 2020. ArXiv:2009.01884

  99. Dwork C, Smith A, Steinke T, et al. Exposed! A survey of attacks on private data. Annu Rev Stat Appl, 2017, 4: 61–84

    Article  Google Scholar 

  100. Ferry J, Aïvodji U, Gambs S, et al. Probabilistic dataset reconstruction from interpretable models. 2023. ArXiv:2308.15099

  101. Ferry J. Addresing interpretability fairness & privacy in machine learning through combinatorial optimization methods. Dissertation for Ph.D. Degree. Toulouse: Université Paul Sabatier-Toulouse III, 2023

    Google Scholar 

  102. Garfinkel S, Abowd J M, Martindale C. Understanding database reconstruction attacks on public data. Commun ACM, 2019, 62: 46–53

    Article  Google Scholar 

  103. Song C, Ristenpart T, Shmatikov V. Machine learning models that remember too much. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017. 587–601

    Google Scholar 

  104. Carlini N, Liu C, Erlingsson Ú, et al. The secret sharer: evaluating and testing unintended memorization in neural networks. In: Proceedings of USENIX, 2019. 267–284

    Google Scholar 

  105. Salem A, Bhattacharya A, Backes M, et al. Updates-leak: data set inference and reconstruction attacks in online learning. In: Proceedings of USENIX, 2020. 1291–1308

    Google Scholar 

  106. Gambs S, Gmati A, Hurfin M. Reconstruction attack through classifier analysis. In: Proceedings of Data and Applications Security and Privacy XXVI, 2012. 274–281

    Chapter  Google Scholar 

  107. Milli S, Schmidt L, Dragan A D, et al. Model reconstruction from model explanations. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2019. 1–9

    Google Scholar 

  108. Fredrikson M, Jha S, Ristenpart T. Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2015. 1322–1333

    Google Scholar 

  109. Yang Z, Zhang J, Chang E-C, et al. Neural network inversion in adversarial setting via background knowledge alignment. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2019. 225–240

    Google Scholar 

  110. Zhang Y, Jia R, Pei H, et al. The secret revealer: generative model-inversion attacks against deep neural networks. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2020. 253–261

    Google Scholar 

  111. Dosovitskiy A, Brox T. Inverting visual representations with convolutional networks. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2016. 4829–4837

    Google Scholar 

  112. He Z, Zhang T, Lee R B. Model inversion attacks against collaborative inference. In: Proceedings of the Annual Computer Security Applications Conference, 2019. 148–162

    Chapter  Google Scholar 

  113. Zhao X, Zhang W, Xiao X, et al. Exploiting explanations for model inversion attacks. In: Proceedings of International Conference on Computer Vision, 2021. 682–692

    Google Scholar 

  114. Dumoulin V, Visin F. A guide to convolution arithmetic for deep learning. 2016. ArXiv:1603.07285

  115. Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of International Conference on Computer Vision, 2017. 618–626

    Google Scholar 

  116. Rehman A, Rahim R, Nadeem S, et al. End-to-end trained CNN encoder-decoder networks for image steganography. In: Proceedings of European Conference on Computer Vision Workshops, 2019. 723–729

    Google Scholar 

  117. Zhang Y, Tian Y, Kong Y, et al. Residual dense network for image super-resolution. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2018. 2472–2481

    Google Scholar 

  118. Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2016. 2921–2929

    Google Scholar 

  119. Miller T. Explanation in artificial intelligence: insights from the social sciences. Artif Intelligence, 2019, 267: 1–38

    Article  MathSciNet  Google Scholar 

  120. Song C, Shmatikov V. Overlearning reveals sensitive attributes. In: Proceedings of International Conference on Learning Representations, 2020

    Google Scholar 

  121. Ganju K, Wang Q, Yang W, et al. Property inference attacks on fully connected neural networks using permutation invariant representations. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018. 619–633

    Google Scholar 

  122. Melis L, Song C, de Cristofaro E, et al. Exploiting unintended feature leakage in collaborative learning. In: Proceedings of IEEE Symposium on Security and Privacy, 2019. 691–706

    Google Scholar 

  123. Zhang W, Tople S, Ohrimenko O. Leakage of dataset properties in multi-party machine learning. In: Proceedings of USENIX, 2021. 2687–2704

    Google Scholar 

  124. Duddu V, Boutet A. Inferring sensitive attributes from model explanations. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2022. 416–425

    Google Scholar 

  125. Chen J, Song L, Wainwright M, et al. Learning to explain: an information-theoretic perspective on model interpretation. In: Proceedings of International Conference on Machine Learning, 2018. 883–892

    Google Scholar 

  126. Salem A, Zhang Y, Humbert M, et al. ML-leaks: model and data independent membership inference attacks and defenses on machine learning models. 2018. ArXiv:1806.01246

  127. Tramèr F, Zhang F, Juels A, et al. Stealing machine learning models via prediction APIs. In: Proceedings of USENIX, 2016. 601–618

    Google Scholar 

  128. Jagielski M, Carlini N, Berthelot D, et al. High accuracy and high fidelity extraction of neural networks. In: Proceedings of USENIX, 2020. 1345–1362

    Google Scholar 

  129. Wang Y, Qian H, Miao C. DualCF: efficient model extraction attack from counterfactual explanations. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2022. 1318–1329

    Google Scholar 

  130. Nguyen D, Bui N, Nguyen V A. Feasible recourse plan via diverse interpolation. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2023. 4679–4698

    Google Scholar 

  131. Artelt A, Hammer B. Convex density constraints for computing plausible counterfactual explanations. In: Proceedings of Artificial Neural Networks and Machine Learning, 2020. 353–365

    Google Scholar 

  132. Kumari K, Jadliwala M, Jha S K, et al. Towards a game-theoretic understanding of explanation-based membership inference attacks. 2024. ArXiv:2404.07139

  133. Luo X, Wu Y, Xiao X, et al. Feature inference attack on model predictions in vertical federated learning. In: Proceedings of IEEE International Conference on Data Engineering, 2021. 181–192

    Google Scholar 

  134. Barocas S, Selbst A D, Raghavan M. The hidden assumptions behind counterfactual explanations and principal reasons. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2020. 80–89

    Google Scholar 

  135. Kasirzadeh A, Smart A. The use and misuse of counterfactuals in ethical machine learning. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2021. 228–236

    Chapter  Google Scholar 

  136. Hashemi M, Fathi A. PermuteAttack: counterfactual explanation of machine learning credit scorecards. 2020. ArXiv:2008.10138

  137. Dwork C, Roth A. The algorithmic foundations of differential privacy. FNT Theor Comput Sci, 2014, 9: 211–407

    Article  MathSciNet  Google Scholar 

  138. Patel N, Shokri R, Zick Y. Model explanations with differential privacy. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2022. 1895–1904

    Google Scholar 

  139. Yang F, Feng Q, Zhou K, et al. Differentially private counterfactuals via functional mechanism. 2022. ArXiv:2208.02878

  140. Hamer J, Valladares J, Viswanathan V, et al. Simple steps to success: axiomatics of distance-based algorithmic recourse. 2023. ArXiv:2306.15557

  141. Pentyala S, Sharma S, Kariyappa S, et al. Privacy-preserving algorithmic recourse. 2023. ArXiv:2311.14137

  142. Holohan N, Braghin S, Aonghusa P M, et al. Diffprivlib: the IBM differential privacy library. 2019. ArXiv:1907.02444

  143. Chaudhuri K, Monteleoni C, Sarwate A D. Differentially private empirical risk minimization. J Mach Learn Res, 2011, 12: 1069–1109

    MathSciNet  Google Scholar 

  144. Wang D, Ye M, Xu J. Differentially private empirical risk minimization revisited: faster and more general. In: Proceedings of Conference on Neural Information Processing Systems, 2017

    Google Scholar 

  145. Joshi D, Thakkar J. k-means subclustering: a differentially private algorithm with improved clustering quality. In: Proceedings of ACM International Conference on Information and Knowledge Management, 2022

    Google Scholar 

  146. Lu Z, Shen H. Differentially private k-means clustering with convergence guarantee. IEEE Trans Dependable Secure Comput, 2020, 18: 1541–1552

    Google Scholar 

  147. Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2016. 308–318

    Google Scholar 

  148. Wagner T, Naamad Y, Mishra N. Fast private kernel density estimation via locality sensitive quantization. In: Proceedings of International Conference on Machine Learning, 2023. 339–367

    Google Scholar 

  149. Wang G. Interpret federated learning with Shapley values. 2019. ArXiv:1905.04519

  150. Watson L, Andreeva R, Yang H-T, et al. Differentially private Shapley values for data evaluation. 2022. ArXiv:2206.00511

  151. Naidu R, Priyanshu A, Kumar A, et al. When differential privacy meets interpretability: a case study. 2021. ArXiv:2106.13203

  152. Bu Z, Wang Y-X, Zha S, et al. Differentially private optimization on large model at small cost. In: Proceedings of International Conference on Machine Learning, 2023. 3192–3218

    Google Scholar 

  153. Hooker S, Erhan D, Kindermans P-J, et al. A benchmark for interpretability methods in deep neural networks. In: Proceedings of Conference on Neural Information Processing Systems, 2019

    Google Scholar 

  154. Veugen T, Kamphorst B, Marcus M. Privacy-preserving contrastive explanations with local foil trees. In: Cyber Security, Cryptology, and Machine Learning. Cham: Springer, 2022

    Google Scholar 

  155. van der Waa J, Robeer M, van Diggelen J, et al. Contrastive explanations with local foil trees. 2018. ArXiv:1806.07470

  156. Lindell Y. Secure multiparty computation. Commun ACM, 2021, 64: 86–96

    Article  Google Scholar 

  157. Jia J, Salem A, Backes M, et al. MemGuard: defending against black-box membership inference attacks via adversarial examples. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2019. 259–274

    Google Scholar 

  158. Olatunji I E, Rathee M, Funke T, et al. Private graph extraction via feature explanations. PoPETs, 2023, 2023: 59–78

    Article  Google Scholar 

  159. Montenegro H, Silva W, Cardoso J S. Privacy-preserving generative adversarial network for case-based explainability in medical image analysis. IEEE Access, 2021, 9: 148037–148047

    Article  Google Scholar 

  160. Chen J, Konrad J, Ishwar P. VGAN-based image representation learning for privacy-preserving facial expression recognition. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018. 1570–1579

    Google Scholar 

  161. Montavon G, Lapuschkin S, Binder A, et al. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn, 2017, 65: 211–222

    Article  Google Scholar 

  162. Gade K, Geyik S C, Kenthapadi K, et al. Explainable AI in industry. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019. 3203–3204

    Chapter  Google Scholar 

  163. Kaur H, Nori H, Jenkins S, et al. Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In: Proceedings of CHI Conference on Human Factors in Computing Systems, 2020. 1–14

    Google Scholar 

  164. Hu S, Liu X, Zhang Y, et al. Protecting facial privacy: generating adversarial identity masks via style-robust makeup transfer. In: Proceedings of Conference on Computer Vision and Pattern Recognition, 2022. 14–23

    Google Scholar 

  165. Liu H, Wang Y, Zhang Z, et al. Matrix factorization recommender based on adaptive Gaussian differential privacy for implicit feedback. Inf Process Manage, 2024, 61: 103720

    Article  Google Scholar 

  166. Liu Z, Jiang Y, Jiang W, et al. Guaranteeing data privacy in federated unlearning with dynamic user participation. 2024. ArXiv:2406.00966

  167. Mi D, Zhang Y, Zhang L Y, et al. Towards model extraction attacks in GAN-based image translation via domain shift mitigation. In: Proceedings of AAAI Conference on Artificial Intelligence, 2024. 902–910

    Google Scholar 

  168. Zhang Y, Hu S, Zhang L Y, et al. Why does little robustness help? A further step towards understanding adversarial transferability. In: Proceedings of IEEE Symposium on Security and Privacy, 2024

    Google Scholar 

  169. Nguyen T T, Huynh T T, Ren Z, et al. A survey of machine unlearning. 2022. ArXiv:2209.02299

  170. Huynh T T, Nguyen T B, Nguyen P L, et al. Fast-FedUL: a training-free federated unlearning with provable skew resilience. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, 2024

    Google Scholar 

  171. Liu Z, Guo J, Yang W, et al. Dynamic user clustering for efficient and privacy-preserving federated learning. IEEE Trans Dependable Secure Comput, 2024. doi: https://doi.org/10.1109/TDSC.2024.3355458

    Google Scholar 

  172. Li Z, Chen H, Ni Z, et al. Balancing privacy protection and interpretability in federated learning. 2023. ArXiv:2302.08044

  173. Zhang J, Bareinboim E. Fairness in decision-making—the causal explanation formula. In: Proceedings of AAAI Conference on Artificial Intelligence, 2018

    Google Scholar 

  174. Frye C, de Mijolla D, Begley T, et al. Shapley explainability on the data manifold. In: Proceedings of International Conference on Learning Representations, 2021

    Google Scholar 

  175. Mittelstadt B, Russell C, Wachter S. Explaining explanations in AI. In: Proceedings of ACM Conference on Fairness, Accountability, and Transparency, 2019. 279–288

    Chapter  Google Scholar 

  176. Gillenwater J, Joseph M, Kulesza A. Differentially private quantiles. In: Proceedings of International Conference on Machine Learning, 2021. 3713–3722

    Google Scholar 

  177. Ghosh A, Shanbhag A, Wilson C. FairCanary: rapid continuous explainable fairness. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2022. 307–316

    Chapter  Google Scholar 

  178. Li Z, van Leeuwen M. Explainable contextual anomaly detection using quantile regression forests. Data Min Knowl Disc, 2023, 37: 2517–2563

    Article  MathSciNet  Google Scholar 

  179. Merz M, Richman R, Tsanakas A, et al. Interpreting deep learning models with marginal attribution by conditioning on quantiles. Data Min Knowl Disc, 2022, 36: 1335–1370

    Article  MathSciNet  Google Scholar 

  180. Alvarez-Melis D, Jaakkola T. Towards robust interpretability with self-explaining neural networks. In: Proceedings of Conference on Neural Information Processing Systems, 2018

    Google Scholar 

  181. Zhang Z, Liu Q, Wang H, et al. ProtGNN: towards self-explaining graph neural networks. In: Proceedings of AAAI Conference on Artificial Intelligence, 2022. 9127–9135

    Google Scholar 

  182. Khosla M. Privacy and transparency in graph machine learning: a unified perspective. 2022. ArXiv:2207.10896

  183. Tiddi I, Schlobach S. Knowledge graphs as tools for explainable machine learning: a survey. Artif Intell, 2022, 302: 103627

    Article  MathSciNet  Google Scholar 

  184. Rajabi E, Etminani K. Knowledge-graph-based explainable AI: a systematic review. J Inf Sci, 2024, 50: 1019–1029

    Article  Google Scholar 

  185. Qian J, Li X Y, Zhang C, et al. Social network de-anonymization and privacy inference with knowledge graph model. IEEE Trans Dependable Secure Comput, 2019, 16: 679–692

    Article  Google Scholar 

  186. Wang Y, Huang L, Yu P S, et al. Membership inference attacks on knowledge graphs. 2021. ArXiv:2104.08273

  187. Domingo-Ferrer J, Pérez-Solà C, Blanco-Justicia A. Collaborative explanation of deep models with limited interaction for trade secret and privacy preservation. In: Proceedings of WWW Companion, 2019. 501–507

    Google Scholar 

  188. Gaudio A, Smailagic A, Faloutsos C, et al. DeepFixCX: explainable privacy-preserving image compression for medical image analysis. WIREs Data Min Knowl, 2023, 13: e1495

    Article  Google Scholar 

  189. Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst, 2021, 32: 4–24

    Article  MathSciNet  Google Scholar 

  190. Liu Z, Luong N C, Wang W, et al. A survey on blockchain: a game theoretical perspective. IEEE Access, 2019, 7: 47615–47643

    Article  Google Scholar 

  191. Yuan H, Yu H, Gui S, et al. Explainability in graph neural networks: a taxonomic survey. IEEE Trans Pattern Anal Mach Intell, 2022, 45: 5782–5799

    Google Scholar 

  192. Prado-Romero M A, Prenkaj B, Stilo G, et al. A survey on graph counterfactual explanations: definitions, methods, evaluation, and research challenges. ACM Comput Surv, 2024, 56: 1–37

    Article  Google Scholar 

  193. Dai E, Zhao T, Zhu H, et al. A comprehensive survey on trustworthy graph neural networks: privacy, robustness, fairness, and explainability. 2022. ArXiv:2204.08570

  194. Ren Z, Qian K, Schultz T, et al. An overview of the ICASSP special session on AI security and privacy in speech and audio processing. In: Proceedings of ACM Multimedia Workshop, 2023

    Google Scholar 

  195. Li Z, Shi C, Zhang T, et al. Robust detection of machine-induced audio attacks in intelligent audio systems with microphone array. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2021. 1884–1899

    Google Scholar 

  196. Carlini N, Wagner D. Audio adversarial examples: targeted attacks on speech-to-text. In: Proceedings of IEEE Security and Privacy Workshops, 2018. 1–7

    Google Scholar 

  197. Abdullah H, Warren K, Bindschaedler V, et al. SoK: the faults in our ASRs: an overview of attacks against automatic speech recognition and speaker identification systems. In: Proceedings of IEEE Symposium on Security and Privacy, 2021. 730–747

    Google Scholar 

  198. Ren Z, Qian K, Dong F, et al. Deep attention-based neural networks for explainable heart sound classification. Machine Learn Appl, 2022, 9: 100322

    Google Scholar 

  199. Ren Z, Baird A, Han J, et al. Generating and protecting against adversarial attacks for deep speech-based emotion recognition models. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2020. 7184–7188

    Google Scholar 

  200. Chang Y, Ren Z, Nguyen T T, et al. Example-based explanations with adversarial attacks for respiratory sound analysis. In: Proceedings of Interspeech, 2022. 1–5

    Google Scholar 

  201. Liu Z, Guo J, Yang M, et al. Privacy-enhanced knowledge transfer with collaborative split learning over teacher ensembles. In: Proceedings of Secure and Trustworthy Deep Learning Systems Workshop, 2023. 1–13

    Google Scholar 

  202. Liu Z, Lin H Y, Liu Y. Long-term privacy-preserving aggregation with user-dynamics for federated learning. IEEE Trans Inform Forensic Secur, 2023, 18: 2398–2412

    Article  Google Scholar 

  203. Liu Z, Guo J, Lam K Y, et al. Efficient dropout-resilient aggregation for privacy-preserving machine learning. IEEE Trans Inform Forensic Secur, 2023, 18: 1839–1854

    Article  Google Scholar 

  204. Belhadj-Cheikh N, Imine A, Rusinowitch M. FOX: fooling with explanations: privacy protection with adversarial reactions in social media. In: Proceedings of International Conference on Privacy, Security and Trust, 2021. 1–10

    Google Scholar 

  205. Jia R, Dao D, Wang B, et al. Towards efficient data valuation based on the Shapley value. In: Proceedings of International Conference on Artificial Intelligence and Statistics, 2019. 1167–1176

    Google Scholar 

Download references

Acknowledgements

This work was supported by ARC Discovery Early Career Researcher Award (Grant No. DE200101465) and ARC DP Project (Grant No. DP240101108).

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

  1. School of Information and Communication Technology, Griffith University, Gold Coast, QLD, 4215, Australia

    Thanh Tam Nguyen & Quoc Viet Hung Nguyen

  2. School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne, Lausanne, 1015, Switzerland

    Thanh Trung Huynh

  3. Faculty of Mathematics and Computer Science, University of Bremen, Bremen, 28359, Germany

    Zhao Ren

  4. Faculty of Information Technology, HUTECH University, Ho Chi Minh City, 70000, Vietnam

    Thanh Toan Nguyen

  5. Department of Computer Science, Hanoi University of Science and Technology, Hanoi, 10000, Vietnam

    Phi Le Nguyen

  6. School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, QLD, 4072, Australia

    Hongzhi Yin

Authors
  1. Thanh Tam Nguyen
    View author publications

    Search author on:PubMed Google Scholar

  2. Thanh Trung Huynh
    View author publications

    Search author on:PubMed Google Scholar

  3. Zhao Ren
    View author publications

    Search author on:PubMed Google Scholar

  4. Thanh Toan Nguyen
    View author publications

    Search author on:PubMed Google Scholar

  5. Phi Le Nguyen
    View author publications

    Search author on:PubMed Google Scholar

  6. Hongzhi Yin
    View author publications

    Search author on:PubMed Google Scholar

  7. Quoc Viet Hung Nguyen
    View author publications

    Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Thanh Toan Nguyen, Hongzhi Yin or Quoc Viet Hung Nguyen.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T.T., Huynh, T.T., Ren, Z. et al. Privacy-preserving explainable AI: a survey. Sci. China Inf. Sci. 68, 111101 (2025). https://doi.org/10.1007/s11432-024-4123-4

Download citation

  • Received: 04 April 2024

  • Revised: 26 June 2024

  • Accepted: 07 August 2024

  • Published: 07 November 2024

  • Version of record: 07 November 2024

  • DOI: https://doi.org/10.1007/s11432-024-4123-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • privacy-preserving explainable AI
  • privacy attacks
  • privacy defences
  • PrivEx
  • PPXAI

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

Not affiliated

Springer Nature

© 2026 Springer Nature