Turning Learners into Developers
- Introduction
Understanding search engine working is essential if you want to rank your website on search engines like Google, Microsoft Bing and Yahoo.
In this guide, you will learn search engine working, including crawling, indexing, ranking, sitemap, and robots.txt.
- What is Search Engine Working?
Search engine working refers to the complete process used by search engines to discover, store, and display web pages in search results.
- How Search Engine Working Happens

There are 3 main steps in search engine working:
1. Crawling
Crawling is the first step in search engine working, where bots scan websites.
2. Indexing
Indexing is the second step in search engine working, where data is stored.
3. Ranking
Ranking is the final step in search engine working, where results are shown.
Search Engine Working Flow : Crawling → Indexing → Ranking
| Step | Role in Search Engine Working |
|---|---|
Crawling | Finds pages |
Indexing | Stores pages |
Ranking | Shows best results |
Simple Example
- Crawling = Finding books
- Indexing = Saving books in a library
- Ranking = Showing best books first
- What is a Crawler?
A crawler (web crawler or spider) is a software program used by search engines like Google and Microsoft Bing to automatically browse websites and collect information.
How a Crawler Works
- Starts with a list of known URLs (seed URLs)
- Visits each page like a browser
- Reads content (text, images, links)
- Follows internal and external links
- Sends collected data for indexing
- Repeats continuously
Importance of Crawlers
- Discover new web pages
- Update existing content
- Help websites appear in search results
Crawling vs Indexing vs Ranking
| Factor | Crawling | Indexing | Ranking |
|---|---|---|---|
Purpose | Discover pages | Store data | Show results |
Step | First | Second | Final |
Visibility | ❌ | ❌ | ✅ |
- What is Sitemap?
A sitemap is a file that lists all important pages of your website, helping search engines like Google easily find and crawl them.
How Sitemap Works
- Create a sitemap (sitemap.xml)
- Upload it to your website
- Submit it via Google Search Console
- Crawlers use it to discover pages faster
Example Sitemap
<urlset>
<url>
<loc>https://example.com/</loc>
</url>
<url>
<loc>https://example.com/blog</loc>
</url>
</urlset>Benefits of Sitemap
- What is robots.txt?
The robots.txt file controls search engine working by telling crawlers what to access.
How robots.txt Works
- Crawler visits your site
- Checks /robots.txt first
- Reads instructions
- Follows allow/disallow rules
Example robots.txt
User-agent: *
Disallow: /admin/
Allow: /blog/Basic robots.txt (Recommended)
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# Allow important assets
Allow: /wp-content/uploads/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
# Block unnecessary files
Disallow: /readme.html
Disallow: /license.txt
# Sitemap
Sitemap: https://yourdomain.com/sitemap_index.xmlHow This Works
- User-agent: * → Applies to all crawlers (like Googlebot from Google)
- Disallow: /wp-admin/ → Blocks admin area (not useful for SEO)
- Allow: admin-ajax.php → Needed for frontend functionality
- Allow uploads/themes/plugins → Ensures CSS, JS, images are crawlable
- Sitemap line → Helps crawlers quickly find all pages
Advanced Version (More Optimized)
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# Block search & query URLs
Disallow: /?s=
Disallow: /search/
# Block author pages (optional)
Disallow: /author/
# Allow core assets
Allow: /wp-content/uploads/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
# Sitemap
Sitemap: https://yourdomain.com/sitemap_index.xmlSitemap vs robots.txt
Feature | Sitemap | robots.txt |
|---|---|---|
Purpose | What to crawl | What NOT to crawl |
Type | XML file | Text file |
Role | Discovery | Control |

