The term “Artificial Intelligence” (AI) has many possible meanings, although a simple one defines it as the ability of a computer to perform tasks that have traditionally required human intelligence. AI has been a part of our lives for many years now as it is used in everyday consumer products such as spam filters for emails, targeted advertising, and facial recognition on our phones, as well as more complex applications, such as employee recruitment and management of supply chains. However, in recent months there has been a surge in media coverage and public interest in AI technology, including the release of an open letter calling AI labs to pause the training of AI systems signed by big names like Elon Musk and Steve Wozniak. In this article, we will explain what has changed in the world of AI and why everyone is talking about it.

Generative AI

Typically, human interaction with AI has been limited to predictive models, which identify patterns and present results. For instance, a spam filter algorithm can determine whether an email is spam or not by comparing it to previous examples. Similarly, ad targeting software evaluates past interactions to decide which ads are relevant to a user.

Conversely, we have been increasingly exposed to generative AI in recent months. Unlike predictive models, generative AI is not limited to classifying information, but creates new content based on user prompts. Among the best-known generative AI models are DALL-E 2 (capable of creating almost any form of visual art in just a few seconds after user requests), Amper Music (a music composition tool), and the famous ChatGPT, whose workings we have previously described.

Generative AI models are capable of producing human-like content and offer a glimpse of what the future may hold. But they also raise many legal, social, and ethical concerns, including on the data protection field. This explains the regulatory and media attention it has been receiving.

Natural Language Processing and Large Language Models

Natural Language Processing (NLP) is another key factor that has brought artificial intelligence into the limelight.

For most of the history of computing, we have had to communicate with software using its own language rather than ours. For example, to perform a sum in a spreadsheet, we have to write a formula such as „=SUM(A1:A2)“ for the software to recognize it. NLP refers to the ability of algorithms to comprehend human language. The idea is that, instead of writing a formula, we can request the spreadsheet to „please sum the first and second cells,“ and the machine will understand and execute the task.

Natural Language Processing (NLP) is not a new concept; it was first developed in the 1950s, and consumers have been exposed to the technology for many years, particularly through chatbots and voice assistants such as Siri and Alexa. Until recently, interactions with these systems have been clunky and imprecise. However, recent progress in the field have led to the development of Large Language Models (LLMs), a type of AI system trained on massive amounts of text data. Since their emergence in 2018 and thanks to improvements in hardware there have been significant advances in the scale, accuracy, multimodality, and performance of LLMs.

ChatGPT is an example of how an LLM can achieve human-like interactions. This conversational AI can understand prompts regardless of whether the user makes typing mistakes or uses slang words, which is more akin to how a human would understand. This ability has many practical applications that tech companies are building into their products, such as the way Microsoft’s Copilot will allow users to „talk“ to Office applications.

Legal outlook

At the moment, there is no comprehensive AI regulation to effectively manage the opportunities and implications of this technology. The United States recently began receiving public input to design policies to address AI audits, risks, and other measures.  China, in turn, has released draft rules for public comment that aim to ensure that AI-generated content adheres to core socialist values, data security, and personal data protection.

The European Union is expected to vote on the Artificial Intelligence Act on April 26. This legislation aims to improve the data quality, transparency, human supervision, and accountability in the field of AI, as well as addressing ethical dilemmas and implementation hurdles.

The core of the AI Act is to classify different types of AI systems depending on the risk they pose. The scale starts with “minimal risk”, such as spam filters, which will be allowed without additional obligations. At the other end of the spectrum is “unacceptable risk”, which will be banned and includes, for instance, technology that uses subliminal techniques to distort a person’s behavior.

The original text of the draft AI Act did not pay particular attention to general-purpose AI, i.e. AI systems capable of performing a wide range of tasks – such as most LLMs. However, with the popularity of ChatGPT and other chatbots skyrocketing, new provisions have been proposed stating that general purpose systems will sometimes have to comply with the requirements stablished for high-risk applications and requiring member states to adopt further implementing acts on the topic.

Data privacy problems of LLMs

Regardless of the regulatory status of artificial intelligence, many use cases of large language models involve the processing of personal data, making them subject to applicable data protection legislation. From this perspective, there are still many challenges to be addressed regarding LLM applications.

In the European Union, complying with the principle of data accuracy outlined in Article 5(1)(d) of the GDPR is a significant challenge. LLMs are trained on massive volumes of information, often in the billions or even trillions of words, which are frequently sourced from public text corpuses or Wikipedia. There is typically no guarantee that the original information is accurate, which raises the possibility of the LLM learning and reproducing incorrect facts. Furthermore, even if the original data was correct, the knowledge of the algorithm may become outdated if circumstances change following its training. This is problematic when it comes to personal data because data controllers are required by the GDPR to ensure the accuracy of the personal information they process.

Article 17 of the GDPR, which grants the right to erasure, is also relevant to this discussion. Current LLMs are technically incapable of fully forgetting information that was previously fed into their systems, making it almost impossible for data subjects to exercise this right. This means that if personal data was present in the database used to train the algorithm, the information (or information derived from it) could potentially remain there indefinitely, which contradicts the GDPR’s right to be forgotten. We have explained the issue with the inability to forget in more detail in a previous article.

Finally, another problem that organizations governed by the GDPR are facing when dealing with LLMs is related to international transfers. Most of the LLM applications currently available in the market are based in the US, and users‘ personal information, including data about interactions, is transferred there. Most of the tech companies offering these services do not provide self-hosting, the possibility to subscribe to Standard Contractual Clauses, or any other solution to help with GDPR compliance, which makes the use of these tools problematic at best.

In the UK, the Information Commissioner’s Office (ICO) has introduced guidelines on AI and data protection to assist companies in complying with privacy regulations while still being able to utilize AI for industrial and economic purposes. Although these guidelines are highly beneficial and have significant practical implications, they are intended for AI systems in general and do not provide solutions for the specific issues associated with LLMs.

As can be seen, more work needs to be done by EU and UK data protection authorities to address the specific challenges posed by LLMs in relation to people’s privacy. While both jurisdictions are taking steps in the right direction, they do not provide a comprehensive solution for these issues and further guidance is needed to ensure that organizations developing and using LLMs can comply with individuals’ rights to privacy.

What the future holds

As we move deeper into the era of artificial intelligence, it becomes increasingly crucial to uphold a robust regulatory framework that balances technological advancements with individuals’ rights to privacy and security of the information. With the rise of natural language processing, large language models, and general-purpose AI, a whole new set of ethical and regulatory concerns emerge. As we navigate this landscape, it is vital to prioritize the protection of individuals and implement strong safeguards against potential misuse.

The year 2023 is poised to be a turning point for AI technology, and it is imperative that regulators keep pace with its evolution. Developers, data protection experts, and society at a large must engage in thoughtful debate and decision-making to ensure that these powerful tools are used ethically and responsibly.