Technical SEO isn't rocket science. With a good understanding of search engines,
your website's ranking can jump.
The first blog in our series on technical SEO covers one of the most important functions of a search engine: crawling.
In addition to covering its process, we'll discuss the factors that affect
your SEO, along with some tips for speeding up the process.
Note: Since about 92.4% of internet users have Google as their default browser, this series (crawling, indexing, and ranking), is going to be focused on this search engine.
Obviously, URLs don't appear in search results just by themselves. Google must first discover them. To do that, Google sends spiders (bots) to large, well-known sites (those that normally appear on the first search result page). Once they have collected links on these pages, the spiders proceed to crawl them. They then collect liked pages there and crawl them. This cycle goes on and on ...
Consider the following metrics when determining a website's
trustworthiness:
If websites were people, DA would be their fame: how many people follow them on social media, and who are these people?
Here are some examples using Mozbar:
Since both metrics are logarithmic, the higher a website gets on the
scale, the harder it will be to progress. So it would be much easier to take your DA from 10 to
20 than from 60 to 62.
Although the crawl budget usually concerns huge websites with thousands of pages or blog posts, it can also help you have an insight into what goes on when your website grows.
As Google confirmed:
The crawl limit is determined by many factors, the main two are:
Whatever structure you choose, it must meet the following
requirements:
So if your important pages are buried “somewhere” on your website, with no internal links pointing to them, and your categories are almost absent from the blog section, that’s a problem that you should fix as soon as possible.
Here are a few structures of websites you probably know:
How to assess your site’s structure?
Several tools can help you assess your site’s architecture like:
First, how do broken links crop up on our sites?
It’s not a coincidence, these errors appear for various reasons such as:
If for some reason, your site has 404 errors, it won’t only affect your SEO score, but also annoy your most loyal visitors.
SEOPressor did a Google poll on what annoys people the most when visiting a website, and here’s the result:
How to assess broken links?
You can find broken links by using one of my favorite free SEO tools: Semrush
First, go to the management section, go to projects, and choose your project (in this example, I’ll show you The Marketing Recipe’s dashboard)
Go to the Site audit
Click on Issues
And if you have any internal broken links, they’ll appear. In our case, we have none :)
NOTE: you need to upgrade to see external broken links.
How to fix broken links??
There are mainly 3 solutions for 404 errors:
Here’s an example of a 404 page on Disney’s site
The first blog in our series on technical SEO covers one of the most important functions of a search engine: crawling.
Note: Since about 92.4% of internet users have Google as their default browser, this series (crawling, indexing, and ranking), is going to be focused on this search engine.
What Is Web Crawling and Why Is It Important for SEO?
I knew nothing about crawling at the beginning of my SEO journey. I just
couldn’t wrap my head around it, probably because most content I found
about it was filled with complicated and technical terms that I shouldn’t
even known.
However, when I started my blog, I came to understand what turned out to be quite a simple concept (at least for us content writers).
So what is web crawling?
However, when I started my blog, I came to understand what turned out to be quite a simple concept (at least for us content writers).
So what is web crawling?
It is the process of discovering and indexing pages on the internet.
If a URL (website/page/blog post/...) is not crawled, it can not be
indexed, and thus any SEO tactic won’t help you.
- BONUS: Get The Only Keyword Research Template You'll Need for free.
The Crawling Process
According to Siteefy: 252,000 new websites are created every day, 10,500 every hour, and 175 every minute. This is just the number of websites, not to mention the number of blog posts and pages. The question is: How does Google discover them?Obviously, URLs don't appear in search results just by themselves. Google must first discover them. To do that, Google sends spiders (bots) to large, well-known sites (those that normally appear on the first search result page). Once they have collected links on these pages, the spiders proceed to crawl them. They then collect liked pages there and crawl them. This cycle goes on and on ...
The web crawling process |
Once Google collects and stores millions of URLs, it then indexes them.
Before we go on, you might wonder how search engines decide what websites are to be “trusted”... Does Google trust every website appearing on the first results page?
It doesn't really work like that. The ranking of a brand-new website based on a ridiculously easy keyword does not constitute trustworthiness.
Before we go on, you might wonder how search engines decide what websites are to be “trusted”... Does Google trust every website appearing on the first results page?
It doesn't really work like that. The ranking of a brand-new website based on a ridiculously easy keyword does not constitute trustworthiness.
Domain Authority (DA)
Domain authority is a metric on a scale of 1 to 100 used to predict how well a website will rank. The scale is based on the quality and quantity of links to that website.If websites were people, DA would be their fame: how many people follow them on social media, and who are these people?
Here are some examples using Mozbar:
- Amazon
With a domain authority of 96/100, Amazon is one of the most powerful
websites on the internet.
Page Authority (PA)
Just like DA, a page authority metric on a scale of 1 to 100 is used to predict how well a webpage will rank. The scale is based on the quality and quantity of both internal and external links to that webpage.There are 4 major factors that influence the crawling process:
1. Crawl Budget
Crawling costs Google money, which is why they have a budget for it. This leads us to an important conclusion: there is a crawl limit.Although the crawl budget usually concerns huge websites with thousands of pages or blog posts, it can also help you have an insight into what goes on when your website grows.
As Google confirmed:
The crawl limit is determined by many factors, the main two are:
- Crawl health: the performance of a website (page speed, mobile-friendliness, UX, …)
- Crawl demand: the popularity of web pages and how often they’re searched for
2. Site Structure
A site's structure is its architecture. Good websites have well-designed architecture, whereas “inferior” ones either have a terrible structure or don’t have one at all.- Logical
- Clear
- Easy to navigate
- Based on assigned priorities
So if your important pages are buried “somewhere” on your website, with no internal links pointing to them, and your categories are almost absent from the blog section, that’s a problem that you should fix as soon as possible.
Here are a few structures of websites you probably know:
- Amazon
- Netflix
Several tools can help you assess your site’s architecture like:
- Screaming Frog
- Ahrefs
- SEMrush
- More free SEO tools
3. Broken Links
Also known as 404 errors, broken links are SEO’s enemy, and thus our enemy.First, how do broken links crop up on our sites?
It’s not a coincidence, these errors appear for various reasons such as:
- Changing permalinks without a redirect
- Moving the external site
- Moving or deleting content linked to (PDFs, videos, …)
- Broken elements within the page’s code
- Restrictions to outside access by the firewall
If for some reason, your site has 404 errors, it won’t only affect your SEO score, but also annoy your most loyal visitors.
SEOPressor did a Google poll on what annoys people the most when visiting a website, and here’s the result:
How to assess broken links?
You can find broken links by using one of my favorite free SEO tools: Semrush
First, go to the management section, go to projects, and choose your project (in this example, I’ll show you The Marketing Recipe’s dashboard)
Go to the Site audit
Click on Issues
And if you have any internal broken links, they’ll appear. In our case, we have none :)
NOTE: you need to upgrade to see external broken links.
How to fix broken links??
There are mainly 3 solutions for 404 errors:
- Removing the broken link
- Automatically directing users to a relevant page
- Creating a 404 page: creating a page to direct users will give them a good user experience, and allow bots to crawl your site faster.
Here’s an example of a 404 page on Disney’s site
4. Non-Crawlable Content
There have been rumors about JavaScrip circulating in the SEO community for the last decade. Although Google denied that using JavaScript could affect your site's crawlability, it seems as if it could.
The majority of SEO specialists, as well as most tools, recommend avoiding
JavaScript as much as possible. You can use ScreamingFrog to identify
pages that contain it and determine if it's possible to reduce it.
How to Get Google to Crawl Your Website Faster
Waiting for Google bots to randomly stumble upon a dofollow link to your brand-new website can take forever (unless you’ve got your own trusted sites). To speed up the crawling process, you can do 2 things:1. Submit Your Sitemap to Google Search Console
A sitemap is a list of
all your website pages. It comes in various formats:
If you happen to be on a Blogspot subdomain, you can use an XML sitemap generator.
Once you have your sitemap ready, you need to submit it to Google
Search Console.
Once set up, each time you
add another page,
your search engine will automatically
detect new
URLs and crawl them.
- XML
- RSS / ATOM File
- Plain Text file
If you happen to be on a Blogspot subdomain, you can use an XML sitemap generator.
2. Get Backlinks from Trusted Websites
Another way to speed up the crawling process and your awaited SEO results is to get quality backlinks. For a new site, this method is quite difficult, but if you manage to get a few quality backlinks, you will certainly appear on those first search results!Takeaways: Website Crawling & SEO
Let’s sum up the most important points:- Technical SEO is not rocket science (I had to say it again)
- Google does not automatically detect new content
- There are 4 factors to mind when speaking about web crawling: Crawl Budget, site structure, broken links, and non-crawlable content.
- To shake up the process, you can either submit your sitemap to Google Search Console (which you should) or get as many quality backlinks as you can.