CastleCMS Feature of the Week: Site Crawler

Site Crawler: Making Search More Powerful

CastleCMS includes a sophisticated search engine that scales to millions of records. This built-in search responds near-instantaneously to queries typed in the search box that appears at the top of all pages, and on other optional search tiles that can be added anywhere else.

Search on site

Filtering lets you narrow down the results according to content type. For example, a visitor to your site might be interested in videos related to "security".

Search with options

The search engine automatically indexes all of your CastleCMS site as content is added, modified, and deleted. This real-time indexing is what allows search results to be returned so quickly. Long-running indexing operations, resulting from mass uploads of PDF documents for example, are automatically queued and run in the background so as not to interrupt or slow down your content editors.

CastleCMS' search engine indexes all text on your site: the title, summary, and body of all content items, including the text content of PDF files and office documents such as Word and Excel. Because CastleCMS automatically extracts the text in PDFs you upload, and can be configured to use optical character recognition (OCR) to extract text from scanned images you upload, the search engine will truly cover all of your site content.

Example PDF

CastleCMS' search engine is also security-aware; It knows if the user is logged in, and if so, which site content that user is authorized to see, and will then show search results only for that content. This means that content marked as private really stays that way.

If you use CastleCMS as a collaborative workspace tool, with different user groups able to view, edit, and search their own workspace content, you need not worry that they are searching for and finding content they aren't supposed to.

Site Crawler

Imagine you'd like your CastleCMS’ search engine to look for content not just on your site but also on other sites. For example, a university may have several separate sites that host information for students, and you'd like your main student site to return search results from all those sites.

CastleCMS Site Crawler does exactly that; It lets you configure the search engine to index ("crawl") other sites and include results from those other sites alongside the results from yours.

A site administrator uses the Site Crawler configuration panel to activate the feature and specify where to find the site maps of the other sites to be crawled.

Blog_SiteCrawler4

After the configuration changes have been saved, a process on the CastleCMS server will run on a schedule that reads the provided site maps and indexes the mapped content.

Once this crawling process has run, your site users will start seeing search results from those sites as well as yours when they drop down the search results site menu and select one of the configured other sites:

Search Different Site

Search results different site

Search results different site 2

A site administrator will find Site Crawler statistics on the configuration panel showing when the external sites were last crawled and how many external content items were indexed. This information is useful for ensuring that the crawler is configured correctly and is running on schedule.

Site Crawler Config

CastleCMS Site Crawler: Making Search More Powerful

Site Crawler makes CastleCMS' search a "one stop shop" for all information relevant to your users, without the need for expensive third-party solutions.