Generative AI (GenAI) models, like ChatGPT, have become a part of modern life, but also pose a significant privacy risk. User prompts, which often contain sensitive
information necessary for a useful response, are processed on untrusted cloud services. Even if GenAI service providers are not malicious, this this sensitive information can be learned and, in some cases, even indexed by search engines and made public, which are undesirable.
To address such problem, this project proposes a suite of lightweight sanitization mechanisms to minimize the risk of data exposure while preserving the usefulness of the prompt for the GenAI model. The suite of sanitization algorithms, including Hong-Kongcentric format-preserving encryption, Large-Language-Model-utility-aware metric local differential privacy and in-context-learning-based sanitization, will be developed to transform the user's prompt into a sanitized version within local and trusted environment where is usually equipped with limited computational resources. This approach offers multiple strategies to optimize the balance between privacy and response usefulness. Our work will contribute to a safer digital environment, empowering users and organizations to use GenAI technologies without compromising personal or proprietary data. The proposed solution will benefit anyone using GenAI, from individuals to large corporations, by mitigating the serious privacy risks associated with these services.
R&D Project Database
A Smart Sanitization Process for Enhancing Personal Data Privacy
| Overview |
More information
| Project Reference | ITP/072/25LP |
| Project Coordinator | Dr CD Shum |
| Approved Funding Amount | HK$ 2.79M |
| Project Period | 01 Jan 2026 - 31 Dec 2026 |





