What a Data Audit Actually Includes

Ask three agencies what a data audit is and you will get three different answers. One sends you a 12-page PDF of generic observations. Another books a 90-minute call, takes notes, and emails you a list of things to consider. The third quotes you £5,000 and never actually looks at your tracking layer.

When we say data audit at BaP Data, we mean something specific. A forensic pass through four layers of your data infrastructure. Forty checkpoints. Each one a documented test against a known standard. Every finding ranked by estimated revenue impact, biggest leak first. A written report you can hand to your developer or your agency the same day you read it.

This piece is the inside of that audit. What we look at. What we typically find. What comes out the other end. And what it costs.

If you would rather skip the explanation and run the framework yourself, the e-commerce data audit checklist is at the bottom of this page. Free, three pages, no email gate. The exact framework we use on paid engagements.

PART 01Why this matters now¶

According to Gartner's research on data quality, the average organisation loses $12.9 million per year to poor data quality. The figure is enterprise-weighted, so it overstates the picture for a £2M Shopify store. But the same Gartner research surfaced something more interesting: nearly 60% of organisations do not measure the cost at all.

They suspect something is off. They have felt the friction. The Meta dashboard says one number, Shopify says another, the accountant says a third. Every month somebody manually reconciles the gap in a spreadsheet. Nobody has ever put a price on what that drift is actually costing.

A separate study from MIT Sloan Management Review put the broader figure at 15 to 25% of revenue lost to bad data annually. On a £2M operation, that is £300,000 a year disappearing into reporting noise.

The audit finds where it is going. Not by guessing. Not by best-practice talk. By going through every checkpoint, line by line, and pricing the leak.

PART 02The four layers we audit¶

A real data audit is not one thing. It is four things examined separately, because each layer can fail independently and each requires a different diagnostic skill. The four layers are tracking, attribution, reporting and data quality. Forty checkpoints in total — fifteen in tracking, ten in attribution, eight in reporting, seven in data quality.

→

The layers compound. Broken tracking poisons attribution. Broken attribution poisons reporting. Broken reporting poisons the decisions you make from the dashboard. You cannot fix the dashboard until you have fixed the layers underneath it.

PART 03Layer 1 · Tracking (15 checkpoints)¶

Tracking is the foundation. If the tracking is wrong, nothing downstream can be right. We start here every time.

The fifteen checkpoints we run:

GA4 base tag firing correctly across every page template, including soft 404s and post-purchase pages.
Purchase event triggering on order confirmation, not on thank-you page load. This single misconfiguration is the silent killer that double-counts every refreshed page.
Add-to-cart event firing with the correct item_id, value and currency parameters.
Begin-checkout event present, accurate, and firing only on first checkout entry.
View-item event firing on product pages with the full ecommerce payload.
Refund events configured and sending back to GA4. Most setups skip this entirely, which means your reported revenue never adjusts for returns.
GA4 and Google Tag Manager properly connected, with no duplicate measurement IDs.
Duplicate tag detection across the page. Sites running both gtag.js and GTM frequently double-count every event.
Data layer implementation quality — whether the data layer pushes structured objects or a mess of strings stitched together at runtime.
Cross-domain tracking configured for any subdomain, checkout redirect, or payment provider hosted page.
Server-side tagging status: running, partial, or not implemented.
Consent Mode v2 signal flow: granted, denied, default state, and whether the modelled conversions look plausible.
Consent management platform firing order verified. The cookie banner must initialise before GTM, not after.
GA4 data stream settings correct, including timezone, currency, and session timeout configuration.
Enhanced measurement settings appropriate. Auto-events can create noise as easily as signal.

↳

What we typically find: about 80% of Shopify stores we audit have at least one purchase event misfiring. The two most common patterns are refreshes inflating revenue, and the event firing on the thank-you page load rather than the confirmed order. Both are fixable in a day. Both have been distorting your numbers for months or years.

On Consent Mode v2 specifically, Privado.ai's State of Website Privacy Report 2024 found that 74% of the most visited websites in Europe do not honour GDPR opt-in consent as required. If you sell into the EEA and you have not had a CMP firing order check, the odds are not in your favour.

PART 04Layer 2 · Attribution (10 checkpoints)¶

If tracking answers "are events firing correctly", attribution answers "is the right channel getting the credit". This is the layer where budget decisions live. Get attribution wrong, and you will spend the next quarter cutting the channel that was actually working.

The ten checkpoints:

Default attribution model in GA4 — data-driven, last-click, or first-click — and whether the chosen model matches the purchase cycle of the business.
Attribution window settings across GA4, Meta and Google Ads. These should be consistent. They rarely are.
Meta Conversions API (CAPI) implementation and event match quality score. Anything below 7.0 means the API is sending events Meta cannot reliably match.
Google Ads conversion import status, and the gap between GA4 and Google Ads reported conversions for the same campaigns.
UTM parameter consistency across email, paid social, paid search, and any affiliate or influencer activity. Inconsistency here is the most common cause of inflated direct traffic.
Direct traffic volume sanity check. Anything over 20% of total traffic usually hides misattribution somewhere upstream.
Organic vs paid split accuracy — whether paid clicks are bleeding into organic because of UTM stripping or referrer issues.
Cross-channel overlap — how many conversions are credited to multiple channels, and what that says about the customer journey you cannot currently see.
Last-click vs data-driven divergence. We frequently see gaps of more than 30% between the two models. That gap is the budget you are about to misallocate.
iOS14 attribution impact on reported ROAS post-ATT — whether the platform is delivering modelled conversions and how much of your reported ROAS is now an estimate rather than a measurement.

↳

What we typically find: Meta CAPI implementations with sub-7 event match quality scores. UTM chaos across the email tool. A 30% gap between platform-reported ROAS and what GA4 is willing to credit. The cumulative effect is that the founder is making £20,000-per-month budget decisions on numbers that have a 25% margin of error.

The five revenue leaks piece covers the most common patterns we see in this layer. Worth reading after this one.

PART 05Layer 3 · Reporting (8 checkpoints)¶

Tracking is the data. Attribution is the credit. Reporting is whether anyone can actually use it. This is the layer most agencies skip entirely. We do not.

The eight checkpoints:

Who is the intended audience for each report — founder, marketing lead, board. A board dashboard built for an operator does not work, and an operator dashboard built for a board does not get opened.
Whether the right metrics are front and centre, or buried below decorative widgets that no one acts on.
Decision-action mapping — whether each metric on the dashboard connects to a decision somebody will actually make.
Report freshness and automation. Manual refresh equals nobody opens it. The dashboard goes stale within two weeks.
Alert and anomaly setup. Will the dashboard tell you when something breaks, or are you waiting for the monthly review to catch a four-week problem.
Dashboard access rights and permissions. Three founders we audited had no working dashboard because permissions had silently been revoked during a Google Workspace migration.
Data-ink ratio — whether decorative elements are obscuring the signal that matters.
Mobile readability for the dashboards founders check on the go. Most are illegible on a phone.

↳

What we typically find: dashboards with 14 metrics nobody looks at. Three KPIs the founder actually cares about, all buried below the fold. No automation, so the dashboard is 11 days stale by the time anyone notices.

A dashboard that does not connect to a decision is decoration. Audit it ruthlessly.

We covered the deeper version of this in our piece on why your dashboard isn't making you money. The short version: if a metric doesn't connect to a decision you'll actually make this quarter, it doesn't belong on the dashboard.

PART 06Layer 4 · Data Quality (7 checkpoints)¶

This is the layer clients are always surprised matters. Then they see what we find, and they understand.

The seven checkpoints:

Spam and bot traffic filtering. Most GA4 properties have not configured this since the universal-to-GA4 migration, so they are still inflated by referral spam that disappeared from Universal Analytics three years ago.
Internal IP exclusions. Your own team's traffic is inflating engagement metrics, especially for stores with a heavy internal QA cycle.
Test orders excluded from reporting. Test transactions skewing AOV by 15% or more is common, particularly on stores that run launch campaigns through the live storefront.
Currency consistency across Shopify, GA4, Meta and Google Ads. Multi-currency stores routinely have one or more platforms reporting in the wrong base currency.
Timezone alignment between platforms. A six-hour mismatch invalidates any time-of-day analysis and quietly corrupts day-of-week patterns.
Negative quantity events. Returns coded incorrectly in the data layer create phantom negative revenue that compounds month over month.
Historical data anomalies and spikes — the £40K day that was actually a duplicated import, the £2K dip that was a timezone bug, not a campaign failure.

↳

What we typically find: at least one timezone misalignment in roughly 90% of audits. Shopify in London time. Meta in Pacific time. Google Ads in Eastern. The "weekly performance review" is comparing apples to oranges to mangoes. This layer is where the trust gets rebuilt — once you can verify the numbers are clean, the layers above start to make sense again.

Annotated GA4 DebugView showing a correctly firing purchase event with key fields highlighted — What we look for. This is what passing looks like.

PART 07What you get at the end¶

The deliverable is concrete and physical-feeling. Not "a deck of recommendations". Not "a set of insights". What lands in your inbox five to seven working days after we start:

A 10 to 20 page written report. Every issue documented, every finding ranked by estimated revenue impact, biggest leak first. The report is structured so you can read it in order or jump to the section that matters to you.
A one-hour debrief call. We walk you through the findings live, answer questions, and prioritise the fix list based on your team's capacity and the urgency of each issue.
A prioritised fix list with time estimates per item. Most issues take a few hours each. Some take a day. A few require dev resource. The list tells you which is which, in order.
A shareable findings document. You can forward it to your developer, your agency, or your in-house data lead. They will see exactly what needs to be done and why, without you having to translate.

That is the deliverable. Not slides. Not a Loom video. A written, citable document and a structured handover.

PART 08Pricing & scope¶

Five to seven working days from brief to delivered report. The fee is fixed at £750 to £1,800 depending on stack complexity — how many platforms, how much custom development, whether server-side tagging is involved, and how many subdomains we are crossing.

For a typical Shopify store on standard apps, you are looking at £900 to £1,200. For a complex stack with sGTM, multiple regional storefronts and a custom checkout, it is closer to the top of the range. Payment is 50/50. Half on engagement, half on delivery. Scope is agreed in writing before we start. No surprise additions, no scope creep dressed up as extra findings.

What a data audit is not

Not a strategy consultation. We will tell you what is broken and what it is costing you. We will not tell you how to grow the business. That is a different engagement.
Not a GTM implementation. If we find issues that need new tags built or a server-side container set up from scratch, that is a separate scope.
Not a dashboard redesign. We will tell you which dashboards are not earning their keep. Building new ones is separate work.
A diagnostic. What is broken, what it is costing you, what to fix first. Audits work best when they are focused — the moment you blend audit into implementation, the audit becomes a sales pitch for the bigger engagement. Ours is not.

PART 09The NutriSeed example¶

The closest public example is the audit we ran on NutriSeed. We pulled the publicly available data, ran the framework, and documented what we found. Six issues in total.

Three in tracking: the purchase event firing on page load rather than confirmed order, no refund events configured, and a duplicate measurement ID inherited from a legacy gtag.js install. Two in attribution: UTM inconsistency in the email tool, and the Meta CAPI sending events without a value parameter. One in data quality: timezone misalignment between Shopify and GA4 distorting the weekly performance review.

None of them catastrophic on their own. All of them quietly distorting the numbers the team was using to make budget decisions. The combined estimated revenue impact across the six issues was in the mid five figures annually. Each fix took less than a working day.

Not a single dramatic finding. A handful of structural issues that have been quietly bleeding revenue for months — each one fixable individually, each one priced in real numbers.

The PawCart case study is another worked example. That one came back with £280,000 of recoverable revenue across three attribution failure points. Different stack, different findings, same framework.

Sample BaP Data audit deliverables: written report, prioritised fix list, and 40-checkpoint checklist — The three deliverables: written report, prioritised fix list, 40-checkpoint checklist.

Lower barrier

Download the 40-checkpoint checklist

Three pages. Free. No email gate. The exact framework we use on every paid audit, in checklist form you can print or work through in a doc.

→ Download the checklist

Higher intent

Book a 30-minute discovery call

In the first ten minutes we will tell you whether an audit makes sense for your current setup. No pitch if it does not apply.

→ Book a call

The audit pays for itself if it surfaces a single meaningful issue. Most engagements surface five or six.

Sources. Gartner data quality research; MIT Sloan Management Review, Thomas Redman; Privado.ai State of Website Privacy Report 2024; author's field notes from e-commerce data audits 2023–2026.

What a data audit actually includes.

PART 01Why this matters now¶

PART 02The four layers we audit¶

PART 03Layer 1 · Tracking (15 checkpoints)¶

PART 04Layer 2 · Attribution (10 checkpoints)¶

PART 05Layer 3 · Reporting (8 checkpoints)¶

PART 06Layer 4 · Data Quality (7 checkpoints)¶

PART 07What you get at the end¶

PART 08Pricing & scope¶

What a data audit is not

PART 09The NutriSeed example¶

Download the 40-checkpoint checklist

Book a 30-minute discovery call

Jimmy Okoth

Why your dashboard isn't making you money

Browse the full archive

Ready to find out what's breaking in your data infrastructure?

What a data audit actually includes.

PART 01Why this matters now¶

PART 02The four layers we audit¶

PART 03Layer 1 · Tracking (15 checkpoints)¶

PART 04Layer 2 · Attribution (10 checkpoints)¶

PART 05Layer 3 · Reporting (8 checkpoints)¶

PART 06Layer 4 · Data Quality (7 checkpoints)¶

PART 07What you get at the end¶

PART 08Pricing & scope¶

What a data audit is not

PART 09The NutriSeed example¶

Download the 40-checkpoint checklist

Book a 30-minute discovery call

Jimmy Okoth

Why your dashboard isn't making you money

Browse the full archive

Related pieces.

5 revenue leaks hiding in your e-commerce data

The actuarial approach to e-commerce pricing

Why your dashboard isn't making you money

Ready to find out what's breaking in your data infrastructure?