How "Substantial" is Ranking Content?

Chris Green
Aug 7
5 min read

Content Substance and Rankings - TL;DR

The data suggests that more than 50% of ranking results in the top 15 organic positions are “fluffy” according to the substance model.
There’s no clear difference by keyword intent, but much more data (10x the current sample) is needed to confirm this.
There might be a negative correlation between substance and rankings (i.e., “substantial” content ranks lower), but:
- We need to check if the model’s definition of “substance” really aligns with user intent.
- The model’s chunk size/training set may need fine-tuning.
- The sample size is too small to draw strong conclusions.

Hypothesis: “SEO is not substantial enough, and therefore will rank worse”

The SEO industry spans lots of functions. If someone is “an SEO” or “works in SEO” their actual job could include a huge array of distinct and potentially different skillsets. One area where I’ve generally thought we underperformed was content production.

Not content strategy or utilising content to progress a business’s needs - that’s a general strength for SEO - but knowing what content is actually good/worthwhile or genuinely beneficial to users. That often hasn’t been a particular strong point.

The last 2-3 years have seen a lot of this play out, as certain content formulas, guidelines, and production practices have been shown to be ineffective or even detrimental for the actual end user. A lot of people have realised that their content may actually be poor, or at best so mediocre that it might as well not exist.

An element of this is that content has been too “fluffy” or not substantial enough. The knowledge and research were surface-level, so the resulting content added no additional value for users and was unlikely to be ranked above other, similar content.

Maybe this is my latent guilt setting in - that maybe I know that some previously recommended content just wasn’t good enough. Maybe - some of you may argue - that I should just speak for myself and stop painting the industry with the “not great at great content” brush.

When Dejan released their Content Substance Classifier I felt inspired to investigate further, whilst the post is long, let me paraphrase a quote from “Foundation” that Dan used as an example - this was the element that really stood out for me:

“He said nothing at all… and took [a 1,000 word blog] to say it!”

To summarise the Content Substance Classifier itself:

Dan Introduces a new method for detecting low-quality “cyberfluff” content online.
Where they trained an AI model (transformer-based) to tell the difference between “fluff” (empty) and “substance” (meaningful) writing.
Then they fine-tune the model to make yes/no judgments on individual samples, still across different difficulty levels.
This results in a final model that can reliably spot meaningful content across many topics and isn’t fooled by shallow writing tricks.

Sounds super interesting to me. After reaching out to Dan he mentioned that the model was available on HuggingFace and was happy for me to test it out. So I did.

Does “Substantial” Content Correlate with Ranking Results?

This is the question we all want to know the answer to - or at least I wanted to check. How much does writing “fluff” impact the ability for the content to rank?

It’s a simple question, but the longer you think about it, the more related questions emerge. Does “substance” mean “accessible”, or does “substance” better meet user needs or the intent of the query?

For this particular investigation, we’re only looking at “substance” as defined by the classifier—so just one dimension of what is really a multi-dimensional question.

The Definition of Substance/Fluff

Many of you may be asking what the definition of “substance” is, and this is an area I’ll return to later on, but if you want to demo this process yourself, you can on the Streamlit App, here.

To get really “meta” the above example shows the demo running the analysis on the opening sentence of this post. Broadly, it was 50% fluff/substance. As a counter-example, I took the abstract/example from Dan’s blog, and you can see that it is all “substantial”.

Right away, it appears that “substance” may be leaning towards more academic-style text and less towards typical “blogging”. That may give you an idea what you can expect in the results!

Methodology

Here’s how I’ve gone about getting the data to investigate this:

Scrape the first 2 pages of organic results for 600 keywords from 3 different niches and 4 intent stages
Use readability.js to extract page content from each (where it is accessible)
Break the content in 132-token “chunks”
Send each chunk to the classifier API
Average the chunk classifications for the page to get a final definition

Conveniently, I’ve been working on another study (coming soon!) where I developed a pipeline to source, scrape, and extract web content across roughly 600 keywords. The output scraped content has been used for this study.

Scraping web content reliably at scale is not easy, so there are a number of pages that have not had content successfully extracted for various reasons. Where this has happened for more than 60% of the result per keyword, that keyword (and successful extractions) have been excluded.

During testing I found that quite a lot of the variability in fluff definitions was based on the length of the content classified. The demo classifies the text based on paragraph breaks and I found that shorter passages often were less likely to be marked as “fluff”, so I chunked all text into 132-token segments to remove that as a potential variable.

Any pages that did not have enough content to significantly “fill” one of the “chunks” were automatically classified as “fluff” (auto-fluff).

Results

Fluff % Overall

Across the entire dataset (507 keywords after excluding SERPs with a low fetch/extraction rate), the result was pretty stark: SERPs contain a lot of fluff.

I’d suggest that there is a slight correlation between position and fluff—possibly a case that “substance” may negatively correlate with rankings. I stress that I am being tentative here.

Fluff By Intent

If we break it down by intent, it doesn’t change the result in many interesting ways - especially when thinking about the original expectations.

Ignoring ranking positions, Transactional was the most “fluffy” and Navigational was the least overall:

Intent	Fluff % (page 1-2)
Commercial	55.41%
Informational	54.43%
Navigational	53.76%
Transactional	58.89%

Does the occurrence of “fluff” in SERPs change with intent? In small ways, yes, but the broad trends are similar. The dataset is possibly too small to interrogate these results deeply, so again, conclusions should be tentative.

Conclusions

As far as my original hypothesis “SEO is not substantial enough, and therefore will rank worse”, I think if we treat “substance” as classified by the model as the sole definition, I have totally contradicted my original assumptions.

If anything, this does make the question of “what IS substance” more important, if you disagree with the model’s definition, then the findings are less significant. Many would also say, “Have you seen Google’s results? Of course the results are fluffy!”

“Have you seen Google’s results? Of course the results are fluffy!”

User intent has to be considered within the context of what kind of content best matches the user’s need.. It is of course likely that substance is not the most important element here. Keyword intent will likely have a part to play in this, but again, we’d need more data to tease out the nuances and determine whether the difference is significant.

I am super grateful to Dan and Dejan for making this model available for this kind of research. This activity as a thought experiment is super valuable and will certainly influence my thinking in this area, even if I don’t feel like I have something definitive to add—at least for now.