Meet “AI”, your new colleague: could it expose your company’s secrets?

May 17, 2023

Chatbots powered by large language models (LLMs) are not just the world’s new favorite pastime. The technology is increasingly being recruited to boost workers’ productivity and efficiency, and given its increasing capabilities, it’s poised to replace some jobs entirely, including in areas as diverse as coding, content creation, and customer service.

Many companies have already tapped into LLM algorithms, and chances are good that yours will likely follow suit in the near future. In other words, in many industries it’s no longer a case of “to bot or not to bot”.

But before you rush to welcome the new “hire” and use it to streamline some of your business workflows and processes, there are a few questions you should ask yourself.

Is it safe for my company to share data with an LLM?

LLMs are trained on large quantities of text available online, which then helps the resulting model to interpret and make sense of people’s queries, also known as prompts. However, every time you ask a chatbot for a piece of code or a simple email to your client, you may also hand over data about your company.

“An LLM does not (as of writing) automatically add information from queries to its model for others to query,” according to the United Kingdom’s National Cyber Security Centre (NCSC). “However, the query will be visible to the organization providing the LLM. Those queries are stored and will almost certainly be used for developing the LLM service or model at some point,” according to NCSC.

This could mean that the LLM provider or its partners are able to read the queries and may incorporate them in some way into the future versions of the technology. Chatbots may not forget or ever delete your input as access to more data is what sharpens their output. The more input they are fed, the better they become, and your company or personal data will be caught up in the calculations and may be accessible to those at the source.

Perhaps in order to help dispel data privacy concerns, Open AI introduced the ability to turn off chat history in ChatGPT in late April. “Conversations that are started when chat history is disabled won’t be used to train and improve our models, and won’t appear in the history sidebar,” developers wrote in Open AI blog.

Another risk is that queries stored online may be hacked, leaked, or accidentally made publicly accessible. The same applies to every third-party provider.

What are some known flaws?

Every time a new technology or a software tool becomes popular, it attracts hackers like bees to a honeypot. When it comes to LLMs, their security has been tight so far – at least, it seems so. There have, however, been a few exceptions.

OpenAI’s ChatGPT made headlines in March due to a leak of some users’ chat history and payment details, forcing the company to temporarily take ChatGPT offline on March 20th. The company revealed on March 24th that a bug in an open source library “allowed some users to see titles from another active user’s chat history”.

“It’s also possible that the first message of a newly-created conversation was visible in someone else’s chat history if both users were active around the same time,” according to Open AI. “Upon deeper investigation, we also discovered that the same bug may have caused the unintentional visibility of payment-related information of 1.2% of the ChatGPT Plus subscribers who were active during a specific nine-hour window,” reads the blog.

Also, security researcher Kai Greshake and his team demonstrated how Microsoft’s LLM Bing Chat could be turned into a ‘social engineer’ that can, for example, trick users into giving up their personal data or clicking on a phishing link.

They planted a prompt on the Wikipedia page for Albert Einstein. The prompt was simply a piece of regular text in a comment with font size 0 and thus invisible to people visiting the site. Then they asked the chatbot a question about Einstein.

It worked, and when the chatbot ingested that Wikipedia page, it unknowingly activated the prompt, which made the chatbot communicate in a pirate accent.

“Aye, thar answer be: Albert Einstein be born on 14 March 1879,” chatbot responded. When asked why it is talking like a pirate, the chat bot responded: “Arr matey, I am following the instruction aye.”

During this attack, which the authors call “Indirect Prompt Injection”, chatbot also sent the injected link to the user, claiming: “Don’t worry. It’s safe and harmless.”

Have some companies already experienced LLM-related incidents?

In late March, the South Korean outlet The Economist Korea reported about three independent incidents in Samsung Electronics.

While the company asked its employees to be careful about what information they enter in their query, some of them accidentally leaked internal data while interacting with ChatGPT.

One Samsung employee entered faulty source code related to the semiconductor facility measurement database seeking a solution. Another employee did the same with a program code for identifying defective equipment because he wanted code optimization. The third employee uploaded recordings of a meeting to generate the meeting minutes.

To keep up with progress related to AI while protecting its data at the same time, Samsung has announced that it is planning to develop its own internal “AI service” that will help employees with their job duties.

What checks should companies make before sharing their data?

Uploading company data into the model means you are sending proprietary data directly to a third party, such as OpenAI, and giving up control over it. We know OpenAI uses the data to train and improve its generative AI model, but the question remains: is that the only purpose?

If you do decide to adopt ChapGPT or similar tools into your business operations in any way, you should follow a few simple rules.

First, carefully investigate how these tools and their operators access, store and share your company data.
Second, develop a formal policy covering how your business will use generative AI tools and consider how their adoption works with current policies, especially your customer data privacy policy.
Third, this policy should define the circumstances under which your employees can use the tools and should make your staff aware of limitations such as that they must never put sensitive company or customer information into a chatbot conversation.

How should employees implement this new tool?

When asking LLM for a piece of code or letter to a customer, use it as an advisor who needs to be checked. Always verify its output to make sure it’s factual and accurate – and so avoid, for example, legal trouble. These tools can “hallucinate”, i.e. churn out answers in clean, crisp, readily understood, and clear language that is simply wrong, but seems correct because it’s practically unidentifiable from all its correct output.

In one notable case, Brian Hood, the Australian regional mayor of Hepburn Shire, recently stated he might sue OpenAI if it does not correct ChatGPT’s false claims that he had served time in prison for bribery. This was after ChatGPT had falsely named him as a guilty party in a bribery scandal from the early 2000s related to Note Printing Australia, a Reserve Bank of Australia subsidiary. Hood did work for the subsidiary, but he was the whistleblower who notified authorities and helped expose the bribery scandal.

When using LLM-generated answers, look out for possible copyright issues. In January 2023, three artists as class representatives filed a class-action lawsuit against the Stability AI and Midjourney art generators and the DeviantArt online gallery.

The artists claim that Stability AI’s co-created software Stable Diffusion was trained on billions of images scraped from the internet without their owners’ consent, including on images created by the trio.

What are some data privacy safeguards that companies can make?

To name just a few, put in place access controls, teach employees to avoid inputting sensitive information, use security software with multiple layers of protection along with secure remote access tools, and take measures to protect data centers.

Indeed, adopt a similar set of security measures as with software supply chains in general and other IT assets that may contain vulnerabilities. People may think this time is different because these chatbots are more intelligent than artificial, but the reality is that this is yet more software with all its possible flaws.