Choosing a Data Ingestion Method

Choose the right data ingestion method for your SearchStax Site Search implementation.

The method you choose depends on your content management system, technical resources, and how much control you need over the indexing process. Making the right choice upfront will streamline your setup and get your content indexed efficiently.

Data Ingestion Methods Comparison

SearchStax Site Search offers three methods to get your content indexed:

CMS/DXP Connector

Use a connector if you're running Drupal or Sitecore. The corresponding modules integrate directly with your content management system to index published content automatically.

Advantages

  • Indexes content as you publish it
  • Respects your site's content permissions and workflows
  • Maps your content fields directly to search fields
  • Handles content updates and deletions automatically

Best For

  • Drupal 8–11 sites
  • Sitecore 9.1–10.4 implementations
  • Teams migrating from Acquia Search
  • Sites with frequently updated content

Web Crawler

The web crawler indexes content from your public website. You provide start URLs and rules, and the crawler ingests pages as visitors would see them.

Crawlers are only available in certain subscriptions. Daily crawl frequency and maximum pages per crawl depend on subscription limits.

Advantages

  • Works with any website platform
  • No development work required
  • Indexes content exactly as visitors see it
  • Can handle sites with thousands of pages
  • Supports multiple file formats (HTML, PDF, Office documents)

Considerations

  • Only indexes publicly accessible pages
  • May need configuration for complex site structures
  • Requires periodic re-crawling for content updates (daily schedule)
  • Respects robots.txt directives
  • Index updates appear after the crawl completes
  • Daily crawl frequency and max pages per crawl depend on your subscription
  • Maximum file size: HTML up to 1 MB, RTF up to 1 GB; only the first 100 kb of extracted text is indexed
  • Limited to websites in English
  • Can’t parse dynamic content

Best For

  • Static websites
  • Custom-built sites without CMS integration options
  • Marketing sites with relatively stable content
  • Proof-of-concept implementations

Ingest API

The Ingest API gives you complete control over what content gets indexed and how it's structured. You send content directly to SearchStax using REST API calls.

Advantages

  • Full control over content structure and timing
  • Can combine data from multiple sources
  • Handles complex content relationships
  • Real-time indexing capabilities

Considerations

  • Requires custom development
  • You manage the indexing workflow
  • Need to handle content updates and deletions

Best For

  • Custom applications
  • Headless CMS implementations
  • Multi-source content aggregation
  • Complex data transformation requirements

Choosing Your Method

If You Use Drupal or Sitecore

Use the corresponding CMS/DXP connector. This is the most efficient path for supported platforms. The connector handles the complexity of content mapping and keeps your search index synchronized with your content updates.

If You Have a Public Website

Start with the web crawler if it’s included in your subscription. It's the fastest way to index content without any development work. You can switch to the API later if you need more control or faster updates.

If You Have a Custom Application or Complex Requirements

Use the Ingest API. It gives you full control over what and when content is indexed. It also supports combining data from multiple sources. While it requires more development effort, it offers the most flexibility and can provide near real-time updates to your index.

If You're Not Sure

Check whether your subscription includes crawlers. If it does, try it first as a proof of concept. You’ll see how your content appears in Site Search without writing code.

If crawlers aren’t available, start with the Ingest API. Even with a small content sample you can validate that Site Search is working with your data. 

Combine Ingestion Methods

You can use multiple ingestion methods for the same search app. For example, use a connector for your main content and the API for supplementary data from other systems. And depending on your plan, you can also add multiple crawlers to the same app to combine multiple sites into one index.

What's Next

Once you've identified your ingestion method, follow the tutorial for your chosen approach:

Articles in this section