Key considerations for AI in finance: Data security & privacy

May 14, 2024

MindBridge

As finance professionals, it is an important part of our mandate to deploy technology to constantly drive quality, speed, and impact of what we do. At the same time, we must take a sober look at what technology can deliver. ROI, time-to-value and risk mitigations need to be quantifiable and convincing. AI is no exception.

AI is confronting us with several challenges as we learn to deploy and safely use AI-based tools in the CFO office. Specifically:

Data privacy & security: Finance teams are working with a lot of confidential data. How do we deploy AI tools in finance while not compromising data security and privacy?
Accuracy: We need to ensure minimal to no error in our processes and work product. How do we manage the risk of errors generated by AI tools?
Teams: AI holds the promise to vastly increase productivity of individuals but also to automate what today are human tasks. How do we manage this transition while keeping team morale and motivation high?
Productivity: Finance today plays an important part in operational execution, strategy, and decision-making. Adopting AI-based tools increases productivity and competitiveness while also introducing risks. How do we balance running a modern and highly productive finance function while not taking undue risks?

Explore key considerations for using AI in finance with a focus on data security and privacy. Learn how MindBridge AI navigates data privacy challenges and enhances ROI through innovative AI applications.

In this article, I want to give a practical overview of how to think about the specific challenges regarding data security and privacy when deploying AI in finance.

Key data privacy and security considerations when using AI tools

As is always the case when considering data privacy, it is helpful to keep in mind that perfect data privacy and security does not exist. Risks, commercial impact, and costs need to be assessed and weighted.

In the case of generative AI, and particularly Large Language Models (LLMs) such as OpenAI’s ChatGPT, Google’s Gemini or Anthropic’c Claude, several specific data privacy risks need to be considered.

1) AI data privacy concern 1: LLM privacy policies

When diving deeper into the details of specific vendor’s product features, privacy policies and public statements it becomes clear that:

The privacy policy is the best place to look for guidance: The privacy policy as published on the vendors site is the most relevant evidence of how they collect, use, store, and protect user data. It is considered legally binding. However, users should keep in mind that these policies can be changed by vendors at their own discretion at any time.
The situation remains fluid: As vendor’s products are evolving rapidly, so are their products, features, policies, and stance on essential considerations impacting data privacy.
Vendors can be ambiguous: A vendor’s communication, features and policies do not always seem to be aligned. As an example, ChatGPT at the time of writing is providing settings that suggest the ability for the user to opt for prompt deletion (Features X and Y), however the privacy policy gives ChatGPT the right to store prompt data for as long as they choose.

It is prudent to always take a conservative approach when interacting with LLMs.

2) AI data privacy concern 2: Data storage.

LLMs are by design the biggest data ingestion engines the world has ever seen. At the core, these models need to be “trained” on a huge amount of real-world data, and the amount and quality of the training data is a key determinant of the model’s performance.

As a result, it is in most LLM’s DNA to store and use for training whatever data is available. Most vendors therefore by default store “prompt data” indefinitely. This obviously does not sit well with many users, especially in a professional context. Some vendors therefore provide opt-outs that give the user more control over their input data. It is important to observe the nuance of these features:

Some LLMs do not only store chat prompts (meaning all the information the user inputs into the chat) but also a derivative of that information that allows the LLM to build a profile of the user to improve future answers to prompts. In the case of ChatGPT, both prompt deletion and the storage of “meta” information on the user need to be turned off separately from each other.
The opt-outs are sometimes hard to find. ChatPGTs opt out from training the model on the user’s prompt data is for example labelled “Improve the model for everyone”
The manual deletion of prompts does not necessarily result in an immediate deletion. Data may be present on the providers’ systems for several more weeks.

A look into the privacy policy of the specific LLM provides clarity on how the vendor stores, uses, and monetizes the data. I link below to the privacy policies of the 5 largest models.

3) AI data privacy concern 3: Data usage.

LLM (Large Language Models) providers are taking a “progressive” stance on intellectual property when it comes to training their models. The data hunger of these models is so massive that even the internet is deemed to have been exhausted by the largest and more advanced models by now. Most vendors by default use prompt data, i.e., the data provided by users when interacting with the model, to train their model.

Some vendors provide the option to stop the model from using the prompt data for training purposes by deactivating a feature

As stated above, most privacy policies currently in force do not mention opt-outs and provide the vendor instead with the right to use all the provided data for training purposes.
This does not mean that the prompt data is immediately deleted.

4) AI data privacy concern 4: Data breaches and leakage.

The risk of data breaches is not specific to LLMs but rather a risk most if not all software applications deal with. In the case of LLMs there is an additional, more subtle risk: Data Leakage. LLMs are trained on vast datasets that can include sensitive or personally identifiable information that has been input into the model as a prompt. There is a risk that the model could inadvertently generate outputs containing data it was trained on, potentially exposing confidential information.

How to safely use LLMs in finance

The data and privacy risks outlined above should be taken seriously, and a cautious approach seems in order. That said, there are a lot of applications for AI in finance that create a lot of value while mitigating these risks effectively.

The following are suggestions on how to safely use LLMs in finance:

1) Keep a critical eye on your LLM model’s privacy policy

As mentioned above, users of LLMs should not rely on the appearance of certain features or the declarations of management made to the press.

Rather:

Bookmark and monitor the privacy policy of the LLM you are using. See links to the privacy policy of all major LLMs at the end of this article.
Take a very conservative stance and plan for the uncertain future development of these policies. The vendors are often still in the initial stages of explosive growth and the recent history shows that product, features, practices, and policies do still evolve.

As a matter of principle, confidential information should not be shared with any LLM, under any circumstances. Most LLM’s privacy policies give the vendor the right to store and use the provided data indefinitely.

3) Use LLMs for tasks that do not require the disclosure of proprietary data

LLMs can create a lot of value in finance even when no proprietary data is disclosed.

Examples of applications in finance where LLMs are strong include:

Investment and market research & benchmarking
Tax, legal and Gaap support
Understanding, correcting, and writing Excel formulas

These are a few high-level examples, however training finance teams in use cases where LLMs can speed up, increase the quality, and/or automate certain tasks that do not require the disclosure of proprietary information can be very valuable.

4) Anonymize confidential data before use with an LLM

In most enterprises financial processes are seldom fully automated and manual data transformations are required. LLMs are powerful tools for the transformation of data. For example,

financial statements can be easily transformed to Excel
bank statements can be transformed to CSV
ledger entries can be cleansed and grouped for easy tracking and analysis.

While the data is confidential in nature, the analysis will work without any reference to the company, legal entity, or other qualifier that would allow a 3^rd party to link it back to the specific company or entity.

Once this information is removed, the risk of the data being linked back to the specific entity is low while the analysis is not compromised.

This approach should not be taken with highly confidential data such as core IP, personal data, and client data, while anonymized financial transaction data might be appropriate to use in this manner. This requires judgement. It is therefore vital to train the team well so they are equipped to navigate this safely.

5) Use sophisticated techniques and tools to obfuscate your data

Data obfuscation is a technique that modifies sensitive data to prevent unauthorized access by masking or altering its original form. Common methods such as masking, encryption, and tokenization help maintain data usability for legitimate processes while ensuring compliance with privacy laws. While this technique has been around for a long time, the rise of LLMs has sparked the development of new tools that act as a layer in between your data and the LLM, effectively obfuscating any data before it is transferred to the LLM. “Opaque” for example is a tool that aims at providing this service.

6) Create a policy and train your team on how to safely work with AI tools

Companies should develop and publish a policy and train their teams on the safe interaction with large language models (LLMs) primarily to safeguard sensitive information and ensure compliance with ethical standards. Training can help employees understand the capabilities and limitations of LLMs, preventing the misuse of the technology and mitigating risks associated with data privacy breaches. A well-defined policy guides employees on appropriate interactions and scenarios for LLM use, fostering responsible usage.

Furthermore, training empowers employees to effectively leverage LLMs for productivity without compromising the company’s integrity or the security of its data. The teams need to develop an intuition on where LLMs are strong and create value as well as what the risks are, and an initial training can help kickstart this process.

Look ahead: New privacy challenges on the horizon as generative AI tools integrate more tightly into our current tools

AI brings significant quality, productivity gains, and important security and privacy challenges. It is wise to take a conservative approach and never share non-anonymized confidential data with a Large Language Model. Today, generative AI is mostly accessed via a chat interface or API, which gives the user a lot of control over the provided data. As outlined above, several simple techniques exist to safely use and draw value from AI tools in finance today. Finance today.

A new challenge arises from the emergence of tools that integrate tightly into our current toolset and by doing so gain much deeper access to the data we work with.

Appendix: Links to privacy policies of major LLMs

Portions of this article have been informed by or had early-stage drafts generated by AI tool(s) and have been reviewed, edited, and clarified by my team and myself.

About the author: Matthias Steinberg is CFO of MindBridge.ai, a leader in AI driven financial risk discovery. Previously he was CFO of IONOS.com, Europe’s largest mass market hosting provider where he oversaw a multi-year effort to ready the company for its IPO. In the process, he developed an interest in AI technologies to transform risk management and assurance in the Enterprise.