Can my URLs use non-English words? John Mueller answers this question on Google’s SEO Snippets video series.
For sites that target users outside of English-speaking regions, it’s sometimes unclear if they can really use their own language for URLs, and if so, what about non-English characters?
Google search uses URLs primarily as a way to address a piece of content. We use URLs to crawl a page, which is when Googlebot goes to check the page and to use the pages content for our search results.
As long as URLs are valid and unique, that’s fine.
For domain names and top-level domains non-Latin characters are represented with Unicode encoding. This can look a little bit weird at first. For example, if you take Mueller, my last name, with the dots on the U, that would be represented slightly differently as a domain name. For browsers and for Google search, both versions of the domain name are equivalent; we treat them as one and the same. The rest of the URL can use unicode utf-8 encoding for non-Latin characters. You can use either the escape version or the unicode version within your website; they’re also equivalent to Google.
Regardless of what you place within your URLs, make it easy for folks to link to your pages. For example, avoid using spaces, commas and other special characters in the URL. They work for Google, but they make linking a little bit harder. Use dashes to separate words in your URLs. Sme prefer using underscores; that’s fine, too. Dashes are usually a little bit easier to recognize. And if your site is available in multiple languages, use the appropriate language in URLs for content in that language.
So to sum it up, yes, non-English words and URLs are fine, [and] we recommend using them for non-English websites.
What should I do with old 404 errors? John Mueller answers this question on Google’s SEO Snippets video series.
Today’s question comes from San Francisco.
What should I do with 404s in Search Console that are from ancient versions of my site?
So sites evolve over time, URLs change, you add redirects, redirects get dropped over the years, sometimes URLs are just no longer needed. These URLs end up returning 404. So they show up in Search Console as crawl errors. But what does that mean?
When an invalid URL is opened it’s the right thing for a server to return a 404 page not found error. When doing a restructuring of your website, we recommend redirecting from old URLs to the new ones and updating the links that go to the old URLs to point to the new ones directly.
However over time you might decide to drop those redirects. Maybe because of the maintenance overhead or just maybe you forgot about them. These URLs are now 404 is in Search Console.
In your server logs or analytics check for traffic to those URLs, if there’s no traffic that’s great. In Search Console, check for links to those URLs, are there no relevant links? That’s great too. If you see nothing special in either the links or the traffic, having those pages returned 404 it’s perfectly fine.
If you do see traffic to those URLs or see links pointing at those URLs, check where they’re coming from and have those links point out the new URLs instead. Or if it looks like a lot of traffic or links are going to those URLs, perhaps putting a redirect back in place would be more efficient.
That works for a few crawlers but what if you have a ton of 404 errors? Search Console makes this easy, it prioritizes crawl errors for you, if the top errors in the report are all irrelevant, you can rest assured that there’s nothing more important further down on the list.
Crawl errors for 404s that you don’t want to have indexed, don’t negatively affect the rest of your site in search. So take your time to find a good approach that works for you.
Why does the indexed pages count vary? John Mueller answers this question on Google’s SEO Snippets video series.
Today’s question comes all the way from Brazil. Macau asks, why is a number of pages indexed on Search Console different than what appears on google.com?
Depending on where you look you might see different numbers for your site’s count of index pages. Which one is the right number and which one should you use?
The actual number of pages on a website is surprisingly hard to determine. At first one might assume that it’s just a matter of counting through the pages starting with the home page and following the links from there. However on most websites there are many many ways to reach a specific page. There might be different URL parameters, so everything after a question mark in the URL that lead to the same page. Sometimes upper and lower case URLs also lead to the same page or perhaps you can add a slash to the end and still get the same page. Some websites have a calendar or something similar that leads to an endless number of new and valid pages. Assuming that most websites have an infinite number of possible URLs should Google just show an infinite count? That probably isn’t that useful.
So which numbers can you see and where do they come from?
There are three main places to get counts for the number of indexed URLs. First you can check a site query in Google search. Second you can use the index status report in Search Console. Or third you can look at the index count per Sitemap file.
Let’s take a look at these options.
In Google search you can just enter site and then a colon and then your domain name. Google will show you a sample of the pages index from your website together with an approximate count of the URLs on your website. This number is generally a very very rough approximation based on what we’ve seen from your website over time. We try to show search results as quickly as possible, so that count is optimized more for speed rather than for accuracy. It’s useful to look at this as a very rough order of magnitude but we don’t recommend using that count as a metric.
The second and third methods require that you use Google Search Console. Search Console is a free tool you can sign up for and verify your website with. In Search Console there’s a report that shows a number of indexed URLs from your website. This count is much more accurate and includes the actual index URLs from your website. It’s mostly based on pages that have unique content, so it would usually exclude URLs with irrelevant URL parameters for example. However it can still include many URLs that you don’t necessarily care about.
The third place to check is a sitemaps index URL count. Per sitemap file that you submit you can see how many of those URLs were actually indexed. One thing to keep in mind is that this count is based on the exact URL as specified in your sitemap file. So if your pages content is indexed with a slightly different URL then that wouldn’t be counted there.
In practice, we recommend either the sitemap file or the index URL report in Search Console as a basis for your website’s metrics. These are currently the most accurate indexing numbers available to site owners.
How often does Google re-index websites? John Mueller answers this question on Google’s SEO Snippets video series.
Today’s webmaster question was submitted by Jim from Vancouver in Washington. He asks how often does Google reindex a website? It seems like it’s much less often than it used to be. We add or remove pages from our site and it’s weeks before those changes are reflected in Google search.
You too might be wondering how long does it take for Google to recognize bigger changes on a website. And from there what can you do to speed that up.
Looking at the whole website all at once or even within a short period of time can cause a significant load on a website. Googlebot tries to be polite and is limited to a certain number of pages every day. This number is automatically adjusted as we better recognize the limits of a website. Looking at portions of a website means that we have to prioritize how we crawl.
So how does this work? In general, Googlebot tries to crawl important pages more frequently to make sure that most critical pages are covered. Often this will be a websites home page or maybe higher-level category pages. New content is often mentioned and linked from there, so it’s a great place for us to start. We’ll recrawl these pages frequently, maybe every few days. maybe even much more frequently depending on the website.
Do fixed penalties affect SEO? John Mueller answers this question on Google’s SEO Snippets video series.
Today’s question is from Switzerland. Michael asks, I had a penalty, and I fixed it. Will Google hold a grudge?
First off, at Google we call them manual actions, not penalties, since they’re generally applied manually by a team here, and they don’t always have a negative effect on a site overall. In Search Console, we inform sites about any manual actions their site might have. If you receive such a notification, you can take action on that, resolve the issue, submit are consideration request. The web spam team processes these. And if they can confirm that the issue is fixed, they’ll lift the manual action.
It might take a bit of time for everything to be reprocessed, but Google’s algorithms won’t hold that issue against the site in the long-term. However, it’s possible that a site temporarily had an unnatural advantage before. By fixing this issue, your site will return to its natural location in our search results.
Additionally, things change on the web and in our search results all the time. A site’s visibility in search can change over time, even if nothing on the website changes. So with that in mind,it can be normal that a site doesn’t return to exactly the same place as before manual action.
So in short, no, Google’s algorithms don’t hold a grudge. However, visibility in search can change over time, regardless of any manual action.
My site’s template has multiple H1 tags. Is this a problem? John Mueller answers this question on Google’s SEO Snippets video series.
An H1 element is commonly used to mark up a heading on a page. There’s something to be said for having a single, clear topic of a page, right? So how critical is it to have just one of these on a page? The answer is short and easy.
It’s not a problem.
With HTML5, it’s common to have separate H1 elements for different parts of a page. If you use an HTML5 template, there’s a chance your pages will correctly use multiple H1 headings automatically.
That said, regardless of whether you use HTML5 or not, having multiple H1 elements on a page is fine. Semantically marking up your page’s content to let search engines know how it fits together is always a good idea.
If you end up using multiple headings on a page, that’s fine.
Subdomain or subfolder, which is better for SEO? John Mueller answers this question on Google’s SEO Snippets video series.
Deepak from India asks us subdomain or subfolder, which one is the most beneficial for SEO. Google web search is fine with using either subdomains or subdirectories. Making changes to sites URL structure tends to take a bit of time to settle down and search so I recommend picking a set up that you can keep for longer. Some servers make it easier to set up different parts of a website as subdirectories, that’s fine for us. This helps us with crawling since we understand that everything’s on the same server and can crawl it in a similar way. Sometimes this also makes it easier for users to recognize that these sections are all a part of the same bigger website. On other servers using sub directories for different sections like a blog and a shop can be trickier and it’s easier to put them on separate subdomains. That also works for us.
You’ll need to verify subdomains separately in Search Console, make any changes to settings and track overall performance per subdomain. We do have to learn how to crawl them separately but for the most part that’s just a formality for the first few days.
So in short use what works best for your setup and think about your longer-term plans when picking one or the other.
Is a crawl-delay rule ignored by Googlebot? John Mueller answers this question on Google’s SEO Snippets video series.
The crawl-delay directive for robots.txt files was introduced by other search engines in the early days. The idea was that webmasters could specify how many seconds a crawler would wait between requests to help limit the load on a web server. That’s not a bad idea overall.
However, it turns out that servers are really quite dynamic, and sticking to a single period between requests doesn’t really make sense.
The value given there is the number of seconds between requests, which is not that useful now that most servers are able to handle so much more traffic per second. Instead of the crawl-delay directive, we decided to automatically adjust our crawling based on how your server reacts. So if we see a server error, or we see that the server is getting slower, we’ll back off on our crawling.
Additionally, we have a way of giving us feedback on our crawling directly in Search Console. So site owners can let us know about their preferred changes in crawling.
With that, if we see this directive in your robots.txt file, we’ll try to let you know that this is something that we don’t support.
Of course, if there are parts of your website that you don’t want to have crawled at all, letting us know about that in the robots.txt file is fine.