The rise of generative AI has led to a plethora of publicly accessible artificial intelligence tools, but what are the risks when external AI tools are used with corporate data?
Since the launch of ChatGPT by Open AI in November 2022, interest in generative artificial intelligence (GenAI) tools has increased dramatically. Its ability to generate a response based on a question or request has seen it used for a variety of purposes, from writing emails to underpinning chatbots.
The recent Work trend index report by Microsoft, based on a survey of more than 31,000 professional employees, shows that 75% of knowledge workers are now using some form of GenAI in their jobs, and nearly half of those surveyed started using it within the past six months. However, nearly 80% of those using GenAI are bringing their own AI to work, and the percentage increases slightly when focusing on small businesses. It is worth noting that this adoption is not just by younger users, who are typically more likely to embrace new technology, but by users of all ages.
As more information is generated and needs to be processed, we increasingly struggle with what is known as digital debt. An example of this is email overload. The Microsoft report notes that approximately 85% of emails are read in less than 15 seconds – this shows why people are keen to move towards tools that help streamline the mundane tasks in their working lives.
“There is this digital debt that has built up over decades, but it has been accelerated during the pandemic,” says Nick Hedderman, senior director of the modern work business group for Microsoft. “68% of the people we spoke to said they’re struggling with the volume and pace of work. Nearly 50% said they feel burnt out.”
The generative AI tools that are typically being used by professionals are those found on smartphones (such as Galaxy AI) or on the internet (such as ChatGPT). Unfortunately, because these tools are open source, they are outside of corporate oversight. Furthermore, when an online tool is free, then the user is frequently the product as their information is usable by others.
“If it’s free, you need to think about it in the same way as any social media site. What data is it being trained on? In essence, are you now the commodity?” says Sarah Armstrong-Smith, chief of security for Microsoft. “Whatever you put in, is that going into training models? How are you verifying that data is held securely and not being utilised for other purposes?”
More than anything else, the use of external generative tools is a data governance challenge, rather than a GenAI problem, as it relies on shadow IT – hardware or software used in an organisation that is not overseen by the IT department.
“You’ve always had sanctioned versus unsanctioned applications. You’ve always had challenges with data sharing across the cloud platforms,” says Armstrong-Smith. “If it’s that easy to cut and paste something out of any corporate system and put it into a cloud application, irrespective if it’s a generative AI app or any other app, you have a problem with data governance and data leakage. The fundamental issues of data control, data governance and all of those things don’t go away. In fact, what it’s highlighted is the lack of governance and control.”
Data governance
The data governance problem of using external generative AI tools is twofold.
First, there is data leakage, where users are copying potentially confidential information and pasting it into an online tool that they have no control over. This data could be accessed by others and used in the training of AI tools.
Sarah Armstrong-Smith, Microsoft
There is also leakage into an organisation, if unverified and uncorroborated information is added to an organisation’s knowledge base. Users are all too often assuming that the information provided by an external GenAI tool is correct and appropriate – they are not corroborating the data to ensure it is factually accurate, which they would be more likely to do when searching for information on the internet.
“The danger is, if you take a random dataset that you have not verified and don’t know what it’s trained on, and then bring that dataset into a corporate environment or vice versa, you can even poison the actual model or the algorithm because you’re introducing non-verified data into the corporate dataset,” says Armstrong-Smith.
This latter is the more serious problem, as potentially incorrect or misleading data is incorporated into a knowledge base and used to inform decision-making processes. It could also poison datasets that are used to train in-house AI, thereby causing the AI to give misleading or incorrect information.
We have already seen instances of improperly used GenAI tools leading to poor results. Generative AI is being trialled within the legal profession as a possible tool to assist in writing legal documents. In one instance, a lawyer used ChatGPT to prepare a filing, but the generative AI hallucinated fake cases, which were presented to the court.
“In a corporate environment, you have to be mindful of the fact that it is business data,” says Armstrong-Smith. “It is a business context, so what tools do you have available today that are going to have all the governance in place? It’s going to have security; it’s going to have resilience. It’s going to have all of those things built in by design.”
If a significant proportion of employees are routinely relying on external applications, then there is demonstratively a need for that digital tool. To ascertain the most appropriate generative AI solution, it is best to identify the use cases. That way, the most appropriate tool can be deployed to meet the needs of employees and to seamlessly fit into their existing workflow.
The key advantage of using a corporate generative AI tool rather than an open platform, such as ChatGPT, is that data management is maintained throughout the development process. As the tool is kept within the network boundaries, corporate data can be protected. This mitigates possible leakages from using external tools.
The protection offered by using a corporate AI tool is that the back-end system is protected by the AI provider. However, it is worth noting that protection for the front end – as in the use cases and deployment models – remains the responsibility of the user organisation. It is here that data governance remains key and should be considered an essential element of any development process when deploying generative AI tools.
“We’ve always referred to it as a shared responsibility model,” says Armstrong-Smith. “The platform providers are responsible for the infrastructure and the platform, but what you do with it in terms of your data and your users is the responsibility of the customer. They have to have the right governance in place. A lot of these controls are already built-in by default; they just have to take advantage of them.”
Awareness among users
Once generative AI tools are available in-house, employees need to be aware of their presence for them to be used. Encouraging their adoption can be challenging if employees have developed a way of working that relies on using external GenAI platforms.
As such, an awareness programme promoting the generative AI tool would educate users on the tool’s accessibility and functionality. Internet moderation systems could also redirect users from external platforms to the in-house GenAI tool.
Generative AI is here to stay, and while expectations may have peaked, its uses are likely to grow and become ubiquitous.
“I think for a lot of companies, and where you will certainly see Microsoft focusing, is on this concept of agentic generative AI,” says Henderson. “This is where you take a business process and figure out how an agent might serve an organisation internally.” An agent could operate within an organisation’s network and carry out specific functions, such as scheduling meetings or sending invoices.
Although generative AI is a new technology, which could mitigate mundane and time-consuming tasks, data protection continues to remain a key concern. It is therefore incumbent upon organisations to make employees aware of the risks posed by using external tools and to have the appropriate generative AI tools within their own network to protect the sanctity of their data.
“As we know with technology, as it gets more commoditised, the price is going to come down, which means AI is going to be more mainstream across the board and you’ve got more choice about what model to use,” concludes Armstrong-Smith.