Security tests for AI systems
AI systems can be tricked into revealing protected data through clever prompts. Where the points of attack lie and what OWASP recommends.

Security testing for AI-based software covers the entire lifecycle of an AI model: from the training phase, in which manipulated input data can influence the model, to the usage phase, in which attackers extract protected data through targeted queries. The OWASP Top 10 for AI and the OWASP AI Security & Privacy Guide serve as a framework for orientation.
Key Takeaways
- AI models can be tricked into revealing protected data through indirect queries, even if direct queries are blocked because the model does not recognize role labels as a circumvention attempt.
- Security risks for AI-based systems span the entire lifecycle: from manipulated training data to theft of the model to misused queries during operation.
- Output filters are more effective than input filters because they directly check whether content worthy of protection, such as salaries or personal data, appears in the generated response.
- The OWASP Top 10 for AI-based software offers a structured introduction to security requirements, but requires specialist knowledge for practical implementation, either internally or through external experts.
What makes security different in AI systems
With AI-based software, you don’t know in advance exactly how it will behave. That’s exactly the point. A machine learning model learns for itself instead of being programmed down in the traditional way and, in the best case, should solve the task better than code ever could.
This property brings with it the first security problem. If the behavior is not known in advance, it is difficult to determine whether the system is secure against attackers. Security is one aspect of what Jan Jürjens calls trustworthy AI.
Most companies do not develop AI themselves. They buy in a model or integrate it as a component into their own infrastructure. The question is then: how do you make an IT infrastructure that uses AI secure against attacks and trustworthy overall?
Points of attack are distributed over the entire life cycle
Security risks with AI extend from training to use. There is not just one weak point, but a chain of phases, each with its own requirements.
In the training phase, there is a risk that an attacker could infiltrate data that influences the model in an unintended way. Those who train themselves must secure this phase. Manipulated training data can hardly be clearly identified later on.
As soon as the model exists, classic security requirements apply, only in relation to a new asset. Nobody should be able to copy and copy the model, nobody should be able to manipulate it. An AI model is often a company asset and needs the same protection as other critical data.
The third area is added during use. This is about ensuring that requests to the model do not undermine the security rules and that no data that is protected flows out.
Why a chatbot can disclose personal data
A filter on the direct question is not enough because the same information can be requested via detours. This is the core problem with securing queries.
Jan Jürjens describes the pattern using a salary example. If a model has been trained on employee data, the direct question about a person’s salary should be blocked. If you instead ask for the salary of a specific role, such as the compliance officer, the system can readily provide the answer. If you know which person has this role, you also know their salary.
The lack of explainability exacerbates this. You get an answer, but no explanation as to how the model got there.
It’s not normally the case that AI works like a search engine. It aggregates the information from its knowledge store so that in the end nobody can tell what the answer consists of.
Jan Jürjens
Because the answer is created from aggregated information, it is almost impossible to prove at a later stage whether any inadmissible information has been included. This is precisely why validation must begin during training.
Penetration testing also applies to AI
The most effective approach against manipulated queries is intensive testing from the perspective of an attacker. You think about how someone would proceed and submit these queries to the system yourself.
The procedure corresponds to penetration testing of classic software. You take on the role of the attacker and try to elicit a response or information that the system should not actually give out. If you find such a gap, you then stop this type of request.
You can never be sure that you have found everything. It’s a race, but this point also applies to security testing of classic software.
Filters on the output are often more effective than on the input
Checking the output often beats trying to catch every nested input. The reason for this is practical: before the output, it is possible to check directly whether a content appears in it, for example.
Both places make sense. At the input you block problematic queries, at the output you check the result before it leaves the system. Indirectly created queries are difficult to catch completely, whereas checking the output is effective regardless of the trickiness of the question.
Restricting queries makes sense anyway. Even with models on a public database, you don’t want someone to suck out the entire model through mass queries and rebuild it. A quantity limit is therefore standard.
Reputation is another reason for filters. An application should not be able to be made to issue foul remarks because this will fall back on the model manufacturer and the operating company.
OWASP also supplies the catalogs for AI
There are already established tools for AI security, supported by the OWASP consortium. The organization began as the Open Web Application Security Project and renamed itself the Open Worldwide Application Security Project because it has long been working beyond web applications.
Specific resources exist for AI, including an OWASP guide to security and privacy and an OWASP Top 10 for AI. The Top 10 approach is known from the web sector and has been transferred to AI-based software.
This documentation is easy to understand and a useful starting point. The real hurdle lies not in reading them, but in implementing them.
The comprehensibility belies the effort involved
As a developer, you can understand the OWASP materials, but testing your infrastructure requires someone who can assess it. This is the honest classification if your job is to quickly implement a chatbot.
To assess whether the infrastructure is actually secure, you need either trained in-house staff or external support. Here too, AI is nothing fundamentally new, but corresponds to the situation in the classic security sector.
Whoever operates AI bears responsibility for the surrounding area
As soon as you use AI professionally, you are an operator within the meaning of the AI Regulation, and this gives rise to obligations. The regulation distinguishes between developers, operators and users.
In most cases, you build AI into your architecture as a black box. You cannot look into the model and its internal processes. However, you are responsible for everything that happens around the model.
This includes several questions at once:
- What data comes in, and is it protected?
- How does the data come out, and is it protected when it is connected?
- What happens to the results, and is this even permitted?
The AI Regulation also restricts the applications for which AI may be used. Compliance with these rules is part of the operator’s duty. Also make sure, as far as possible, that the provider also complies with the regulations.
The biggest weak point remains the human in front of the computer
AI is often used carelessly, and this is precisely where an underestimated risk lies. Documents are uploaded and analyzed without anyone checking what data is flowing out.
Many users blindly rely on the response of a chatbot. This is dangerous because something depends on the accuracy of the answer, whether privately or in the company. AI is not one hundred percent reliable, and hallucinations occur as soon as you probe a little.
Jan Jürjens describes an example from everyday life. When asked about a trampoline park in Koblenz, a chatbot very confidently mentioned a name that does not exist there. The system had presumably found the name for another city and simply transferred it to Koblenz. For one museum, it even gave an address from Aachen, a street that doesn’t even exist in Koblenz.
If you check whether the answer is correct, the system initially sticks to it and only admits the error after repeated probing.
AI gives you hypotheses, not truths
Use AI to give you hypotheses and carry out the verification step yourself if the answer is important. This separation is the practical consequence of the lack of reliability.
Some tasks are fine, such as quickly summarizing a long document. A statement may also appear there that was not in the document, but this can be checked against the original. You should always have this fallback option.
Caution is advised with answers that you cannot verify yourself. A false, self-confident answer weighs heavily, especially when it comes to security issues. Raising user awareness is therefore just as important as the technical filters.
Related Posts

Richard Seidl
•Jun 2, 2026
Patient agility: Is agile working dying?

Richard Seidl
•May 26, 2026