What a CMS Audit Actually Reveals Before You Migrate a Single Page
Most CMS migrations do not fail because of bad code. They fail because nobody looked at what was on the website before the project started.
The team picks a new platform. They scope the work based on what they think exists. They write a statement of work, set a budget, hire developers, and begin building. Three weeks later, someone discovers the site has four times as many content types as the documentation said. Modules nobody knew existed. A localization structure that completely changes the architecture requirements.
At that point, the budget is wrong. The timeline is wrong. The content model decisions made in week one are wrong. Everything built so far needs rework.
This is not a rare scenario. It is the most common way CMS migrations go sideways. And the fix is straightforward: audit before you touch a single file.
What "Auditing first" actually means
A CMS audit before migration is not a content quality review. You are not checking whether your blog posts are well-written or whether your landing page copy converts.
You are answering four structural questions:
- What content types actually exist on the site?
- What components and modules does the site use to display content?
- How does the site handle multiple languages, regions, or variations?
- What percentage of the content can be migrated programmatically versus what requires manual decisions?
The answers to these questions determine the project scope, the architecture of the destination platform, the migration strategy, and the actual cost. Without them, you are quoting a project you do not understand.
The gap between what teams think they have and what is actually there
We ran a full audit on a 32,000-page HubSpot site before writing a single line of migration code. The team told us they had 12 content types and roughly 40 modules.
The actual count: 57 content types and 166 modules.
11 languages. 48 HubDB tables powering dynamic content. 5 different banner modules doing the same job. 4 hero sections. Three CTA components that were functionally identical.
The team was not wrong because they were careless. They were wrong because nobody had ever looked at the full picture. Content was built by different developers over different years with no central inventory. The CMS became a reflection of every contractor, agency, and sprint that touched it, without anyone tracking what accumulated.
That gap ... 12 content types in the documentation versus 57 in reality ... is not unusual. It is the norm on any site older than three years with more than one person touching the CMS.
Why the audit changes everything downstream
Once you have an accurate inventory, several things become possible that were not before.
First, you can consolidate before you migrate. Those 166 modules became 40 after consolidation. Same functionality. Cleaner architecture. The destination platform was not burdened with four decades of accumulated technical debt from the source.
Second, you can quantify the automation opportunity. Of the 31,000+ content entries on that site, 42% were fully automatable. No manual review needed. No decisions to make. They could be pulled from HubSpot, transformed to fit the new content model, and loaded into ContentStack programmatically.
Without the audit, nobody knows that number. Teams either over-engineer everything (treating all content as if it needs manual attention) or miss the scope of what actually requires human decisions. The 42% automation finding cut the migration timeline significantly. That finding was only possible because of the audit.
Third, the architecture decision becomes data-driven instead of opinionated. The question of whether to lift-and-shift or redesign the content model is not a philosophical one. The audit answers it. If you have 166 modules, 57 content types, and 11 languages, a lift-and-shift carries years of technical debt directly into your new platform. The data tells you to consolidate first.
This is the same discipline that content teams need when managing websites at scale day-to-day ... not just during migrations. If you have ever tried to update metadata across hundreds of pages or find where a specific word appears across your entire HubSpot site, you understand the value of having full visibility into your content layer. We wrote about this in our piece on what bulk editing actually requires across a large HubSpot site, and the same principle applies: you cannot manage what you cannot see.
The four steps of a pre-migration audit
Here is the framework we use before any migration project starts.
Step 1: Content inventory - Export every page, post, template, and content entry in the source CMS. Count them. Categorize them. Map them to content types. The number will surprise you.
On HubSpot specifically, this means exporting pages, blog posts, landing pages, HubDB tables, and any custom objects. Do not assume the platform's own reporting gives you an accurate count. It often does not surface the full picture when content spans multiple hubs or was imported from external systems.
Step 2: Module and component mapping - Document every module and component used to render content on the site. Note which ones are duplicates or near-duplicates. A banner module and a "feature banner" module that produce identical output are one component, not two.
This step is the one teams most commonly skip because it feels like designer work rather than migration work. It is not. The component inventory determines how many content types you need in the destination platform and how complex the migration transforms will be.
Step 3: Localization audit - If the site serves multiple languages or regions, map how the source CMS handles them versus how the destination CMS handles them. HubSpot, for example, treats each language variant as a separate page. ContentStack uses locale fallbacks. That is not a minor implementation detail. It is a fundamental architecture difference that changes the content model.
Missing this in discovery means discovering it after you have started building. That is an expensive time to find an architecture problem.
Step 4: Automation assessment - Once the inventory is complete, classify entries by migration complexity. Fully automatable: structured content, no manual review needed, can be scripted. Requires human review: content that does not map cleanly to the new model, pages with custom layouts, anything that requires a judgment call.
The ratio between these two buckets determines the human hours required for the migration. Without this classification, every hour estimate is a guess.
The cost of skipping the audit
Skipping the audit feels like saving time at the start of a project. It is not. It is borrowing time from the middle of the project, at a much higher interest rate.
Discovery gaps found in week three cost more to fix than discovery gaps found in week zero. The architecture decisions made without complete information need to be reversed. The content that was scoped incorrectly needs to be rescoped. The developers who built for 12 content types need to rebuild for 57.
None of that is recoverable from a budget or timeline perspective without pain. The audit is not a nice-to-have step at the front of the project. It is the project. Everything else depends on what it reveals.
Website hygiene, content visibility, and knowing what is actually on your site matter beyond migration projects too. We cover the ongoing side of this in our website hygiene playbook for HubSpot marketers, which gets into the day-to-day equivalent of this kind of structured audit work.
One note on tooling
Running a full content audit manually on a large HubSpot site is technically possible and extremely slow. You are exporting CSVs, cross-referencing template reports, pulling HubDB table lists from the API, and manually categorizing thousands of entries in a spreadsheet.
The same problem shows up in day-to-day website management: HubSpot does not have bulk visibility tools built in. You cannot see all your metadata gaps at once. You cannot search across the full site for a specific string. You cannot pull a structured view of all your modules and which pages they appear on.
This is a known gap. The HubSpot community has had an open feature request for sitewide find-and-replace since 2017. The bulk editing challenges that make content teams slow are the same ones that make audits hard.
That is the gap we are working on at Smuves. Start with the free tier if you want to see what your content inventory actually looks like before committing to anything.