You'll configure and run the Site Search web crawler to index a public website. You'll start with a simple crawl and then learn where to monitor results.
By the end, you'll be able to:
- Add start URLs for your crawl
- Apply basic include and exclude rules
- Run and monitor your first crawl
Note: Crawlers are only available in certain subscriptions. Daily crawl frequency and maximum pages per crawl depend on subscription limits.
Prerequisites
Before you start, you'll need:
- A SearchStax Site Search account
- A Search App created and ready for data ingestion
- A public website URL that doesn't require authentication
- A basic understanding of your site's structure (sitemap, sections, or domains)
1. Create a Crawler
- In SearchStax, open your Search App.
- Go to Site Search > App Settings > Data Management > Crawler.
- Click Create a Crawler and enter a unique Crawler name.
- Enter the Start URL for the page where the crawl should begin. You can also enter a sitemap.xml or sitemap_index.xml URL.
- Click Save changes.
Confirm the new crawler appears in the Crawler List.
Tip: For your first run, use your sitemap.xml to quickly discover pages without deep crawling. Sitemap discovery doesn't automatically index URLs that only appear in the sitemap. Inclusion and exclusion rules still apply.
2. Set Crawl Depth and Schedule
- In Settings, find Crawl depth and choose a value that matches your scope. Leave the default for an unlimited crawl unless you need to limit reach.
- Under Schedule, enable the daily schedule and choose a Target Start Time.
- Click Save changes.
3. Define Inclusion and Exclusion Rules
- In Settings, open Inclusions.
- Add a rule that targets a section of your site (for example, Beginning with
https://example.com/blog/). Click + to add it. - Open Exclusions and add a rule to omit unwanted paths (for example, Contains
/admin/). Click + to add it. - Click Save changes.
Note: Inclusion and exclusion URL patterns are case-sensitive. If rules conflict, exclusions take precedence.
4. Run Your First Crawl
- Return to the Crawler list.
- Select your crawler and click Crawl Now.
After completion, proceed to Preview to validate results.
5. Verify Indexing with Search Preview
- In the top navigation bar, click Preview.
- Run a broad search, such as
*, or search for a known page title.
Confirm documents from your site appear in Preview results. See Previewing your first search in SearchStax for more information.
6. View Crawl History
- Open your crawler and go to the History tab.
- Review summary statistics for recent runs, including items indexed and URLs crawled.
Confirm the most recent run shows expected counts and status.
Note: Because daily crawl frequency and maximum pages per crawl depend on your plan, large sites may require scoped rules or multiple crawlers.
What's Next?
Now that your data is ingested, preview your search in SearchStax to confirm that data ingestion was successful.