A Smart Sanitization Process for Enhancing Personal Data Privacy | R&D Project Database | Logistics and Supply Chain MultiTech R&D Centre


Overview	Generative AI (GenAI) models, like ChatGPT, have become a part of modern life, but also pose a significant privacy risk. User prompts, which often contain sensitive information necessary for a useful response, are processed on untrusted cloud services. Even if GenAI service providers are not malicious, this this sensitive information can be learned and, in some cases, even indexed by search engines and made public, which are undesirable. To address such problem, this project proposes a suite of lightweight sanitization mechanisms to minimize the risk of data exposure while preserving the usefulness of the prompt for the GenAI model. The suite of sanitization algorithms, including Hong-Kongcentric format-preserving encryption, Large-Language-Model-utility-aware metric local differential privacy and in-context-learning-based sanitization, will be developed to transform the user's prompt into a sanitized version within local and trusted environment where is usually equipped with limited computational resources. This approach offers multiple strategies to optimize the balance between privacy and response usefulness. Our work will contribute to a safer digital environment, empowering users and organizations to use GenAI technologies without compromising personal or proprietary data. The proposed solution will benefit anyone using GenAI, from individuals to large corporations, by mitigating the serious privacy risks associated with these services.

Overview

Generative AI (GenAI) models, like ChatGPT, have become a part of modern life, but also pose a significant privacy risk. User prompts, which often contain sensitive
information necessary for a useful response, are processed on untrusted cloud services. Even if GenAI service providers are not malicious, this this sensitive information can be learned and, in some cases, even indexed by search engines and made public, which are undesirable.
To address such problem, this project proposes a suite of lightweight sanitization mechanisms to minimize the risk of data exposure while preserving the usefulness of the prompt for the GenAI model. The suite of sanitization algorithms, including Hong-Kongcentric format-preserving encryption, Large-Language-Model-utility-aware metric local differential privacy and in-context-learning-based sanitization, will be developed to transform the user's prompt into a sanitized version within local and trusted environment where is usually equipped with limited computational resources. This approach offers multiple strategies to optimize the balance between privacy and response usefulness. Our work will contribute to a safer digital environment, empowering users and organizations to use GenAI technologies without compromising personal or proprietary data. The proposed solution will benefit anyone using GenAI, from individuals to large corporations, by mitigating the serious privacy risks associated with these services.


Project Reference	ITP/072/25LP
Project Coordinator	Dr Chung Dak Shum
Approved Funding Amount	HK$ 2.79M
Project Period	01 Jan 2026 - 31 Dec 2026

Project Reference

ITP/072/25LP

Project Coordinator

Dr Chung Dak Shum

Approved Funding Amount

HK$ 2.79M

Project Period

01 Jan 2026 - 31 Dec 2026