🔒🔑 Privacy and LLM (maybe a wacky idea)

As I've been tinkering with my journaling chat app, I've been thinking a lot about privacy and how to keep user data safe when using external AI services like OpenAI's API. One solution I've been considering is using Named Entity Recognition (NER) to mask sensitive data before sending it to the API. By using NER to identify and categorize important information in the text, we can replace personal details with more generic placeholders like "NAME:1" or "ADDRESS:1", keeping the data secure while still maintaining context.


Example: NER Case

Original Input: "My name is Alice, and I work at TechCorp. I live in New York City."

Input After Anonymization: "My name is NAME:1, and I work at ORG:1. I live in CITY:1."

AI Response: "Hello, NAME:1! It must be exciting to work at ORG:1 and live in CITY:1. I hope you're enjoying your time there."

Response After Filling in the Mask: "Hello, Alice! It must be exciting to work at TechCorp and live in New York City. I hope you're enjoying your time there."


Looking ahead, I think there could be even more opportunities to ensure privacy in AI applications. For example, one could imagine a world where we have smaller local language models (LLMs) that can locally anonymize and de-anonymize data while still benefiting from larger foundational models through an LLM provider. This could create a more secure environment where sensitive data never has to leave a user's device, while still taking advantage of the power of AI.


Example: LLM Case

Original Input: "My name is Alice, and I work at TechCorp. I live in New York City."

Input After Anonymization: "My name is NAME:1, and I work at ORG:1<tags of similar organizations and descriptions>. I live in CITY:1<…>"

AI Response: "Hello, NAME:1! It must be exciting to work at ORG:1 and live in CITY:1. I hope you're enjoying your time there."

Response After Filling in the Mask: "Hello, Alice! It must be exciting to work at TechCorp and live in New York City. I hope you're enjoying your time there


These are just some small ideas, but I believe that privacy in AI applications is a crucial issue that needs more attention. If you have any thoughts or ideas on this topic, I'd love to chat and explore more solutions together. Ultimately, by prioritizing privacy and implementing innovative techniques like NER, we can make sure that users' data is always safe, while still enjoying the benefits of AI services.