Just How Big is a Petabyte? The Myth of Cheap Storage
We all store too much data—but at enterprise scale, how much is too much? And what’s the real cost?
3 min read
Nick Pollard
:
Mar 17, 2025 10:15:05 AM
A few months ago, we were speaking to a Government Agency about their data estate.
"How big is it?" we asked.
There was a pause. Some nervous shuffling. Then:
"Hmmm. We don’t really know... we think around 9 petabytes with around 40+ Applications with access to it? Maybe more. But we have at least 12 more data silos, we don’t actually know the number."
Nine petabytes. That’s roughly 4.5 trillion pages of documents. Imagine printing all of that, stacking it up, and realising you have no idea what’s in there, whether you need it, or how much of it should have been deleted years ago.
This is not an isolated case. Most large organisations have no real grasp of their total data footprint. And now, with GDPR reaching its 7-year mark in May 2025, that’s about to become a serious problem.
If your company operates on a 7-year data retention policy, then from June 2025, you’ll officially have vast amounts of outdated, unnecessary, and potentially non-compliant data on your hands. For most organisations, that means:
You get the picture. And if someone (an ex-employee, a customer, a regulator) submits a Subject Access Request (SAR) asking for all the data you have on them, they could theoretically ask for 10 years’ worth of information.
Your response?
"We only keep data for 7 years."
Sounds great in theory but do you actually have a way of proving that?
The real challenge. Even if companies think they’ve got a handle on retention almost most of them never have a system to continuously track and remove aging data. And when you actually start looking, things get messy:
Now imagine trying to run a search across all of that to find what’s hit the 7-year mark. Most organisations don’t have the infrastructure, tools, or time to deal with this at scale. So they don’t. They leave it. They hope no one asks. Or they just buy more storage.
IT teams know this issue is spiraling. But here’s the real issue—many organisations don’t even know how big their data estate actually is. Think back to that government institution with 9PB+ of data spread across silos (with a further 12 more!) . How much of that do they still need? How much of it could be removed? No one really knows.
And if you don’t know what you have, how do you know what to delete?
That’s the real challenge. GDPR doesn’t just require that you keep data for a set period. It also states that when data is no longer necessary, it must be deleted. This means businesses need a rolling process to continuously identify and remove aging data—not just a one-time cleanup.
Most companies? They don’t have one.
For those who get ahead of this, it’s not just a compliance exercise—it’s a chance to clean house:
The reality is; this isn’t a problem that will fix itself. If you don’t tackle this now, you’ll be forced to do it later—probably in a high-pressure, high-risk situation, when a regulator, a lawsuit, or a massive data breach forces you to scramble.
Far better to deal with it before it becomes a headline.
If you really want to know where your company stands, ask your IT team one simple question:
"If we had to find and remove everything over seven years old tomorrow, how would we do it?"
If the answer is "We wouldn’t know where to start,"—it’s probably time to start looking.
Nick Pollard leads EMEA consulting for One Discovery. He is a seasoned leader with more than 20 years of experience working in real-time investigation, legal and compliance workflows across highly regulated environments like banking, energy and healthcare as well as national security organizations. You can contact at nick.pollard AT onediscovery.com
We all store too much data—but at enterprise scale, how much is too much? And what’s the real cost?
What Happens When Regulators Ask, “What’s in the Lake?”
Are Your Clients Prepared for the New Era of Accountability?