4 min read

DORA and the Data Lake Problem

DORA and the Data Lake Problem

What Happens When Regulators Ask, “What’s in the Lake?”


A few months ago, I found myself talking to someone in financial services about DORA—the Digital Operational Resilience Act. They seemed calm. Suspiciously calm.

"Yeah, we’re across it," they said.
I nodded.
"So… you know what’s in your data lake?" I asked.
A long pause.
"Well… no. Not exactly. But we log everything."

Ah. Logging everything—the corporate equivalent of hoarding every receipt you've ever received just in case one of them matters someday.

And now, thanks to DORA, someone (most likely from compliance) is going to come along, rifle through all of it, and ask if you even know what you’ve got, why you’re keeping it, and what you’d do if something went wrong.
At this point, most companies stare blankly into the middle distance.

DORA: The Compliance Tsunami That’s About to Hit

The Digital Operational Resilience Act (DORA) is the EU’s latest attempt to stop financial institutions from collapsing the moment something goes wrong.

It’s about digital risk—cyberattacks, data breaches, third-party failures—and how banks, insurance companies, and investment firms manage and recover from digital disruptions.

On the surface, it sounds like standard regulatory fare:

✅ Risk management – Be aware of the risks in your IT estate.
✅ Incident reporting – Spot problems quickly, and report them.
✅ Resilience testing – Regularly test that your systems can handle shocks.
✅ Third-party risk management – Ensure vendors don’t compromise security.

It all seems reasonable until you realise something deeply inconvenient: you cannot manage, report on, or test something if you don’t know what’s in it.

And this is where the data lake problem begins.

The Myth of the Organised Data Lake

At some point in the last 15 years, someone in IT had a bright idea.

"Let’s build a data lake!" they said.

"It’ll be structured! Governed! Full of valuable insights!"

Fast-forward to today, and that data lake has become a swamp—a place where data goes to disappear, never to be seen again. It’s an unsearchable, ungoverned mass of logs, transactions, emails, documents, and customer records, scattered across multiple storage systems, with no real classification or lifecycle management.

Which brings us to the central problem: DORA doesn’t just ask whether you store data securely—it asks if you can prove what’s in the lake.

And suddenly, financial institutions have to answer some very uncomfortable questions:

  1. What’s in the lake? (Nobody really knows.)
  2. Which of it is actually important? (Unclear.)
  3. What happens if a regulator asks you to find something specific? (Panic.)

It turns out that "we log everything" is not a strategy—it’s just a slow-motion disaster waiting to unfold.

Lake Michigan, but Digital

To illustrate just how absurdly big these data lakes are, let’s talk numbers.

We recently spoke to a financial institution with an estimated 70 petabytes of data spread across multiple silos. They weren’t exactly sure of the number.

70 petabytes. That’s 35 trillion pages of documents. If you printed them all out and stacked them up, they’d reach the moon and back, multiple times.

If someone in compliance asks you to search through all of that for a specific transaction from 2018, you have two options:

  1. Try to find it and watch your infrastructure melt in real time.
  2. Spend a small fortune on cloud services to brute-force the search.

Neither is a great outcome.

The Cost of Keeping Everything "Just in Case"

The problem with financial services isn’t that they don’t want to be compliant—it’s that compliance at this scale is horrifically expensive.  Let’s break it down.

1PB of Storage = £1M+ Over Five Years

Storing 1PB of data for five years costs upwards of £1 million in infrastructure, cloud costs, energy, and security.

Now multiply that by 70.  A 70PB data lake is not cheap.

And yet, companies persist with the belief that they should store everything forever, just in case it turns out to be important later.  It’s a bit like keeping every email you’ve ever received, including that one from 2011 about the office Christmas party that you never attended.

Why DORA is a Wake-Up Call

Under DORA, financial institutions need to demonstrate:

🔍 What’s in their data lake (good luck).
🛠 How they manage risk in real time (very difficult if you don’t know what’s there).
⚠️ How they’ll respond to an incident (see above).

DORA isn’t just a cybersecurity regulation—it’s about proving that your entire digital estate is functional, recoverable, and actually understood.

Which means the strategy of “just storing everything” no longer works.

How to Fix the Mess (Before a Regulator Forces You To)

The solution isn’t deleting everything (although that would be satisfying). It’s about actually understanding what’s in your data lake, classifying it properly, and ensuring that when something bad happens, you can act fast.  That means:

✅ Data Classification: If you don’t know what you’ve got, you can’t manage it.
✅ Smart Search & Automation: Because no human is manually reviewing 70PB of data.
✅ Resilience Testing That Doesn’t Just Tick a Box: It has to actually work under pressure.
✅ Regulator-Ready Reporting: If you don’t want to be the next headline, you need real-time insights.

This is what financial institutions should have been doing all along—but now, thanks to DORA, it’s no longer optional.

Final Thought: The Lake is Rising

If you really want to get a sense of where your organisation stands, ask your IT team:

"If a regulator asked us to retrieve specific data from our lake within 24 hours, how would we do it?"

If the answer involves sighing, staring at the floor, or pretending not to hear the question, it might be time to start sorting things out.

And if you’d rather not drown in the rising tide of data compliance, take a look at Lightning IQ—because the only thing worse than having a 70PB data lake is having to explain to regulators why you don’t know what’s in it.

 

Nick Pollard leads EMEA consulting for One Discovery.  He is a seasoned leader with more than 20 years of experience working in real-time investigation, legal and compliance workflows across highly regulated environments like banking, energy and healthcare as well as national security organizations. You can contact at nick.pollard AT onediscovery.com

 

The 2025 Data Reckoning: Why Businesses Are Drowning in Their Own Information

The 2025 Data Reckoning: Why Businesses Are Drowning in Their Own Information

Are Your Clients Prepared for the New Era of Accountability?

Read More
Just How Big is a Petabyte? The Myth of Cheap Storage

Just How Big is a Petabyte? The Myth of Cheap Storage

We all store too much data—but at enterprise scale, how much is too much? And what’s the real cost?

Read More
GDPR is Turning 7. Your Data is Getting Old. Now What?

GDPR is Turning 7. Your Data is Getting Old. Now What?

What happens when companies finally reach the 7-year GDPR data retention limit? Spoiler: Most aren’t ready.

Read More