HTTP and HTTPS are Different in Sitemap XML File
Providing a website’s sitemap to a search engine is important. However, providing incorrect
sitemap.xml might lead to crawling failure, excluding the page from being searched. One common error is using the wrong protocol in the URL.
In this article’s context, sitemap is a file that lists pages, videos, and other files of a site. It is not necessary - most search engines can crawl websites without it. However, if the full list is provided, search engines can benefit from it. For example, in Google’s Search Console, there is a menu where the owner of the website can submit URLs of sitemaps.
Following is part of
sitemap.xml retrieved from this site.
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>https://parosa.net/visual-studio-code-keyboard-shortcuts-favorites/</loc> <lastmod>2020-09-16T12:00:00+00:00</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> <url> <loc>https://parosa.net/html-label-without-for-attribute-and-id/</loc> <lastmod>2020-08-17T11:30:00+00:00</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url> </urlset>
<url>, there are tags such as
<loc>. Focusing on
<loc>(URL of the page), note that it starts with
HTTP and HTTPS
A page can be accessed without entering the protocol(e.g.
http://) portion of the URL. Moreover, even if HTTPS is enforced, a page can usually be accessed with
http:// in the URL. However, this is not the case in sitemaps. In fact, Google considers migrating from HTTP to HTTPS as moving of a site, and clearly states the following.
Be sure to add the HTTPS property to Search Console.
Search Console treats HTTP and HTTPS separately; data for these properties is not shared in Search Console. So if you have pages in both protocols, you must have a separate Search Console property for each one.
from Overview: Site moves with URL changes, Google Support
Possible Error Messages
Providing a wrong
sitemap.xml to search engines results in index errors. These are the error messages received when a sitemap with a wrong URL protocol is provided to Google. (
<loc> starting with
http:// on a HTTPS enforced site)
Discovered - currently not indexed
Typically, Google tried to crawl the URL but the site was overloaded; therefore Google had to reschedule the crawl.
These pages are typically not indexed, and we think that is appropriate. These pages are either duplicates of indexed pages, or blocked from indexing by some mechanism on your site, or otherwise not indexed for a reason that we think is not an error.
발견됨 - 현재 색인이 생성되지 않음 (상태: 제외됨)
from Index Coverage report, Google Support
All pages in
sitemap.xml are shown in the Search Console, but all of them are constantly excluded from being indexed. Unlike Google’s explanation, the site is not overloaded (if URL with a wrong protocol is provided).
Note that there are other causes of Excluded, including (1) Blocked by robots.txt and (2) Excluded by ‘noindex’ tag. Read the documentation thoroughly before asking questions on forums.
Github Pages and Jekyll
In Github Pages, Enforce HTTPS option is enabled by default. If enabled,
<loc> should use HTTPS protocol, not HTTP.
sitemap.xml is automatically generated if a setting file is added in its root directory. In the following example,
<loc> tag uses a variable
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://parosa.net/http-and-https-are-different-in-sitemap-xml-file/</loc> </url> <url> <loc>https://parosa.net/draggable-components-in-react-native-with-touch-support/</loc> </url> <url> <loc>https://parosa.net/exporting-image-asset-for-different-screen-resolutions/</loc> </url> <url> <loc>https://parosa.net/full-screen-react-native-android-build/</loc> </url> <url> <loc>https://parosa.net/visual-studio-code-keyboard-shortcuts-favorites/</loc> </url> <url> <loc>https://parosa.net/html-label-without-for-attribute-and-id/</loc> </url> </urlset>
This variable is from
_config.yml file, and it (1) should specify the protocol and (2) should not end with a slash(/). If a slash is added in the end, there will be two slashes(/) in the generated
For example, this site is set as