Changing crawler inclusion or exclusion rules changes what the SearchStax Site Search Web Crawler should discover on future crawls. It might not immediately remove content that was already crawled and indexed before the rule change.
Use this article if you changed crawler rules and still see older pages, excluded URLs, or outdated content in search results.
For general crawler setup, see Crawler. For rule configuration details, see Crawler Exclusions.
Why This Happens
Crawler rules control what the crawler can discover and process during a crawl. When you add or change an inclusion or exclusion rule, the new rule applies to crawler activity after the change is saved.
Content that was crawled before the rule change may still appear before a later crawl runs successfully and processes the updated configuration.
For example, stale content may appear after you:
- Add an exclusion rule for a URL pattern.
- Narrow crawler scope with inclusion rules.
- Change a Start URL, sitemap, or crawler scope.
- Remove a section of your site from crawler discovery.
- Fix URL patterns that previously allowed duplicate or unwanted pages.
What Changes After You Update Rules
After you save an inclusion or exclusion rule change, the crawler uses the updated settings for future crawls.
The updated rules can affect which URLs the crawler discovers, crawls, and indexes during the next run.
What May Not Change Immediately
Older crawled content may not disappear from search results immediately after you save the rule change.
This can happen because the existing indexed content came from an earlier crawl. Before a later crawl runs successfully, the index may still contain content from the previous crawler configuration.
Crawler history can also show older crawl runs. These are historical records of previous crawler activity. An older crawl run may include URLs that matched the crawler rules at that time, even if those URLs don't match the current rules.
What to Check First
After changing inclusion or exclusion rules, check the following:
- The rule was saved.
- The rule matches the URL pattern you intended.
- The rule uses the correct match type, such as Beginning with, Contains, Ending with, or Matching regex.
- The URL capitalization matches the rule, because crawler rules can be case-sensitive.
- The page isn't still discoverable through another allowed URL pattern.
- A crawl completed successfully after the rule change.
Run a New Crawl
After you save the rule change, run a new crawl when one is available, or wait for the next scheduled crawl.
- In Site Search, go to Site Search > App Settings > Data Management > Crawler.
- Open the crawler.
- Confirm that the inclusion or exclusion rule is saved.
- Run a crawl when one is available, or wait for the next scheduled crawl.
- Open the crawler's History tab.
- Confirm that the crawl completed successfully after the rule change.
Use the latest successful crawl to verify how the crawler behaved after the rule change. Older crawl history can help you compare past behavior, but it may reflect settings that aren't active anymore.
Check the Content Again
After the new crawl completes, test the affected content again.
Check whether:
- The excluded page still appears in search results.
- The page appears under a different URL.
- The same content is still available through another allowed URL version, such as a trailing-slash URL, non-trailing-slash URL, www URL, non-www URL, /index URL, or parameterized URL.
- The sitemap or site links still point to a URL that shouldn't be crawled.
- A successful crawl has run since the rule change.
If the page still appears, save examples before contacting Support.
When to Contact Support
Contact SearchStax Support if stale content still appears after a successful crawl using the updated rules.
Include:
- Search App name.
- Crawler name.
- Start URL.
- The inclusion or exclusion rule you changed.
- The approximate time the rule was changed, including time zone.
- The time of the successful crawl after the rule change.
- Example URLs that shouldn't appear anymore.
- Screenshots or search queries showing where the stale content still appears.
- Any recent changes to your sitemap, redirects, canonical URLs, or site navigation.
Support can help review the crawler configuration, crawl history, and affected URLs.