XML Search
Author: Martin Robinson
Implementing Live Site XML Search
Author: Martin Robinson
1. Introduction
This guide will quickly walk you through installing and configuring a custom search engine on your site. It assumes that you have a recent version of PHP installed on your web server and that you are comfortable with Cascade. Knowledge of PHP is also useful, but not necessary to use this guide.
2. Components
There are several components to the search engine including index blocks, the search frame, the search script itself, the configuration page and the logging mechanism. The script will read its configuration from a configuration file that you will publish (with the use of a data definition), search the XML data files, display the results to the user and do any necessary logging. The index blocks gather content from inside the CMS and publish it to XML files on your web server. The search frame is an HTML template which will surround the results from the search script.
To begin, create a new folder to contain all the search-related assets. We chose "search" for this example.
3. The Index Blocks
For every section of your site that you want to search, create an index block which indexes that section. If you simply want to index your entire site at once, make an index block for the top-level folder. It is recommended that you break up your site into sections though (especially if it is large).
When creating your index blocks, be sure to check "Render page XML inline" This will ensure that the search script looks through page contents as well as metadata. Be sure to only index pages and to give your index block a meaningful name. In our case we named it "search-index-block." Finally, make sure that "Regular content" and "System Metadata" are enabled under "Indexed Asset Content." For each block, make sure that the depth of the index is large enough to capture all of the sub-pages.
For each index block, create a page (preferably one that will publish to XML) which uses that block in an XML configuration. Be sure that you turn off indexing when you create this page. Now we need to assign a stylesheet to each search index page to rewrite the links and clean it up for the script. Import the stylesheet,index-block-search-filter.xsl, included below and assign it the index blocks on your new pages.
Now your index blocks should be ready.
4. The Search Frame
Now we are going to create an HTML frame to display the search results in. This should probably be derived from your global template. In any case, this page's content should look like this:
<form action="search.php" method="get">
<input maxlength="50" name="s" size="15" value="{$QUERY}" type="text"/>
<input name="search" value="search" type="submit" />
<select name="d">
<option selected="selected" value="-1">Entire Site</option>
<option value="0">Section 0</option>
<option value="1">Section 1</option>
<option value="2">Section 2</option>
<option value="3">Section 3</option>
</select>
</form>
{$SEARCH}
Note that the option values here correspond to the index blocks you created above. Order must be consistent throughout this process, so they must be listed in the same order when you create the configuration file. Passing a value of "-1" for the 'd' parameter will always search all data files.
When the script runs it will replace {$QUERY} with the search query and {$SEARCH} with the search results.
5. The Script
Now, create a file in your search directory, turn off indexing on the file and change the contents to that of the search.php file included below. This is the script which will do the searching. If your deployment environment uses PHP4, you will want to use the script search.php4 and also the unzipped minixml folder (from minixml.zip) in the same directory. Notice that at the top of this file is the line $config_file = "search-config.xml". We will create this configuration file next.
6. The Configuration Editor
Instead of simply uploading a configuration file, we will create a data definition, so that we can edit the configuration file inside of the CMS. Create a data definition (we called ours "Search Config") using the included data definition search-config-data-definition.xml.
Now create a page named search-config with the data definition that you just created. If possible configure the page so that it publishes XML to .xml file. If not, you will have to change the $config_file variable in search.php to reflect the file extension that this page publishes to. In any case, it must output XML.
This configuration must be filtered as well, so load the stylesheet "search-config-filter.xsl" and assign it to the page you just created. If you edit the page you just created you will be presented with some configuration options. Most are self explanatory. Be sure to point the "Frame file" and "Index blocks" to the appropriate assets. The "Result style" field is for making a template for each result. To display a link to each page using the display-name and summary from the page metadata use:
<p><a xhref="%%%path%%%"><b>%%%display-name%%%</b></a><br/> %%%summary%%%</p>
There are several other template fields as well. The No Results Header is the text that the search engine will display when it cannot find any matching pages for a query. This can be used to display other search options to a user. The Results header is the text shown when results are found. It is often helpful to show the user some information about the total number of results. The Next Page Link Text and Previous Page Link Text is the text to display in the link to the next and previous pages of results. All of these template allow the user of some variables listed in the table below.
| {$QUERY} | The search terms that the user entered |
| {$START_RESULT} | Number of the first result shown on this page |
| {$END_RESULT} | Number of the last result shown on this page |
| {$TOTAL_RESULTS} | Number of total results this query found |
| {$RESULTS_ON_NEXT_PAGE} | Number of results that will be shown on the next page |
| {$PER_PAGE} | Maximum number of results shown per page |
Generally, pieces of metadata from the page structure can be accessed by enclosing the name in "%%%". For an idea of what is available, take a look at the XML output from the index blocks.
7. Logging
To enable logging, simply enter a web server writeable filename in the search-config page (it must be relative to the search script location on the server). You should enter a number for the maximum log entries you wish to keep as well.
8. Using
To use the script, publish the page and file assets and access search.php using the GET method shown in the listing above. For example, to search the entire web site for the phrase "markets" simply go to http://mywebsite.com/search/search.php?s=markets&d=-1
9. Troubleshooting
You may have to manually specify the targets of your assets in the search-config stylesheet. To do this simply edit the stylesheet and change lines looking like this:
<xsl:text>[system-asset:page]</xsl:text>to this:
<xsl:text>[system-asset:page:target=/intranet]</xsl:text>
Be sure to change "/intranet" to the appropriate target.
XML Search Files
- search-java.zip
This is the Java version of the XML search. Full source code is included. - search-php4.txt
This search script is for use with PHP 4. - minixml.zip
This PHP code is only necessary when using the search script with PHP 4. - search.zip
This zip file contains all the scripts and XML for the search engine. - index-block-search-filter.xsl
This XSL is used to filter a standard index block inside the CMS before it is published to a live server. - search-config-data-definition.xml
This data definition XML has all the necessary fields for populating the config file. - search-config-filter.xsl
This XSL is used to transform the data definition XML into the config file. - search-config.xml
This is an example config file for the search script. - search-terms-log-xml-php.txt
This is an example search terms log file. This file needs to be writeable by the script on the production server. - search-php.txt
This is the script that performs the search on the production server. - sync_files-php.txt
This script is used to synchronize the search terms log file with Cascade via web services on a regularly scheduled interval. - search.aspx
This is the .NET file that includes the search control. - SearchResult.cs
This is the back-end .NET search code. - search-aspx.cs
This is the front-end .NET search code.