What counts as Personal Data in AI Models

April 1, 2026

The way we look at personal data is changing.

Personal data is something that businesses have been dealing with for a long time now. There’s the checklist: names, addresses, phone numbers etc. And removing those variables means the data is safe. It’s anonymous, so you don’t need to worry about it.

Enter AI.

AI broadens the idea of what personal data is, and sometimes it can be very difficult to spot. The assumption that anonymising data puts you outside of the range of GDPR is muddied by the adoption of AI.

So, what is considered personal data in the age of AI technology?

Direct and Indirect Identifiers

Yes, direct identifiers are straightforward. If data includes names or email addresses, you don’t need a legal expert to tell you that it contains personal data.

Indirect identifiers are where it starts to get difficult.

At face value a combination of seemingly random information might not look like anything sensitive. Age ranges, postcodes, employers and job titles. It’s harmless data on its own, but together it can narrow someone down pretty quickly. If you add behavioural data, or historical transaction patterns into the mix you’re often a lot closer to identifying someone than you might think.

It’s one thing for a human to view this information, but AI systems thrive on patterns. AI doesn’t need names or email addresses. It can find somebody just by the patterns in their data, and if your model can single someone out, you’re in personal data territory.

Inferred Data Still Belongs to Someone

AI systems don’t just process data. They use that data to generate new information about people, like financial risk scores, or health indicators. It might not seem like it, but those outputs can still point to an identifiable person.

Output is often treated as separate from input data, but it shouldn’t be. If a predication is made about a person, based on their input data, that prediction can fall into the category of personal data. You might not have somebody’s name on file, but if your AI’s output has changed the way they’re treated, that matters; and it can be a huge gap in modern governance.

You Can't Hide Behind Public Data

There’s a common misconception that if data was publicly available, it’s fair game. But there’s a big difference between publicly available data, and publicly available data being used to train AI. Public access doesn’t erase data protection obligations, context and purpose still matter.

While it can be acceptable to gather up publicly available data and train your AI with it, the thing you need to be mindful of is how that data is being used. AI systems scale data use. They combine, amplify, and retain information in ways that go far beyond the original context in which it was shared.

Data privacy laws put scrutiny on the justification for using data, and AI models can make that justification difficult to provide.

AI Muddies the Water

Traditional privacy concerns are usually more of a technical issue. If you can remove identifiers from your spreadsheet, then you’re good to go. But AI doesn’t work like that. It combines and infers from data pools and generates outputs based on that data. Outputs directly relate to individuals, and business decisions are shaped by those outputs.

Underestimating the scope of personal data can have huge, governance related effects.

If you can’t accurately see what counts, then you’re likely to be too narrow with your risk assessments. That leads to incomplete documentation and improper transparency with regulators. If scrutiny comes your way, you’ll find it very hard to stand up to it without a full picture.

Conclusion

Understanding personal data should help you shape the way you design and monitor you AI systems, but you can’t govern AI properly if your definition of personal data is outdated.

AI has expanded the scope of responsibility surrounding personal data. You need to be aware of how data is being used and interpreted at every step of the process. One harmless set of data might be fine on its own, but when it’s fed into an AI it could have a huge knock-on effect.

In the world of AI, what counts as personal data is often much wider than you think. Understanding that fact can go a long way to boosting your governance efforts and avoiding hefty legal penalties.

< Older Post

Newer Post >

Share this article

What counts as Personal Data in AI Models

Direct and Indirect Identifiers

Inferred Data Still Belongs to Someone

You Can't Hide Behind Public Data

AI Muddies the Water

Conclusion

The AI Governance Maturity Model: What is Your Organization's Level?

Do You Need ISO/IEC 42001 If You're Complying with the EU AI Act?

Can AI Training Data Violate GDPR?

Copyright 2025 iFactum® All Rights Reserved.

About Us

About SafeShield

Information Security Commitment

Commitment to our Customers

Privacy Policy

Cookie Declaration

Services

ISMS Implementation

Risk Assessments

Vulnerability Scans

Penetration Testing

Incident Response Planning

Security Audits

Business Continuity Planning

Security Training & Certification

Cybersecurity Training for Executives

Supply Chain Risk Management

Compliance Management

Continuous Compliance Monitoring

Contact Us

Chat with us or send your inquiry via our Contact Us form.