XML Content Search

NEWS & TIPS

  • Site Access Keys
  • Top: Alt+t
    Previous: Alt+,
    Next: Alt+.
    Up: Alt+;
    (Note: use Ctrl on the Mac)

XML Search

Author: Martin Robinson

1. Introduction

This guide will quickly walk you through installing and configuring a custom search engine on your site. It assumes that you have a recent version of PHP installed on your web server and that you are comfortable with Cascade. Knowledge of PHP is also useful, but not necessary to use this guide.

2. Components

There are several components to the search engine including index blocks, the search frame, the search script itself, the configuration page and the logging mechanism. The script will read its configuration from a configuration file that you will publish (with the use of a data definition), search the XML data files, display the results to user and do any necessary logging. The index blocks gather content from inside the CMS and publish it to XML files on your web server. The search frame is an HTML template which will surround the results from the search script.

To begin, create a new folder to contain all the search-related assets. We chose "search" for this example.

3. The Index Blocks

For every section of your site that you want to search, create an index block which indexes that section. If you simply want to index your entire site at once, make an index block for the top-level folder. It is recommended that you break up your site into sections though (especially if it is large).

When creating your index blocks, be sure to check "Include default page content inline in rendered XML." This will ensure that the search script looks through page contents as well as metadata. Be sure to only index pages and to give your index block a meaningful name. In our case we named it "search-index-block." Finally, make sure that "Regular content" and "System Metadata" are enabled under "Indexed Asset Content." For each block, make sure that the depth of the index is large enough to capture all of the sub-pages.

For each index block, create a page (preferably one that will publish to XML) which uses that block in an XML configuration. Be sure that you turn off indexing when you create this page.  Now we need to assign a stylesheet to each search index page to rewrite the links and clean it up for the script. Import the stylesheet, index-block-search-filter.xsl, included above and assign it the index blocks on your new pages.

Now your index blocks should be ready.

4. The Search Frame

Now we are going to create an HTML frame to display the search results in. This should probably be derived from your global template. In any case, this page's content should look like this:

<form action="search.php"


method="get"> <input


maxlength="50" name="s"


size="15" value="{$QUERY}"


type="text"/> <input


name="search" value="search"


type="submit" /> <select


name="d"> <option


selected="selected"


value="-1">Entire


Site</option> <option


value="0">Section


0</option> <option


value="1">Section


1</option> <option


value="2">Section


2</option> <option


value="3">Section


3</option> </select>


</form> {$SEARCH}


Note that the option values here correspond to the index blocks you created above. Order must be consistent throughout this process, so they must be listed in the same order when you create the configuration file. Passing a value of "-1" for the 'd' parameter will always search all data files.

When the script runs it will replace {$QUERY} with the search query and {$SEARCH} with the search results.

5. The Script

Now, create a file in your search directory, turn off indexing on the file and change the content to that of the search.php file included above. This is the script which will do the searching. Notice that at the top of this file is the line $config_file = "search-config.xml". We will create this configuration file next.

6. The Configuration Editor

Instead of simply uploading a configuration file, we will create a data definition, so that we can edit the configuration file inside of the CMS.  Create a data definition (we called ours "Search Config") using the included data definition search-config-data-definition.xml.

Now create a page named search-config with the data definition that you just created. If possible configure the page so that it publishes XML to .xml file. If not, you will have to change the $config_file variable in search.php to reflect the file extension that this page publishes to. In any case, it must output XML.

This configuration must be filtered as well, so load the stylesheet "search-config-filter.xsl" and assign it to the page you just created. If you edit the page you just created you will be presented with some configuration options. Most are self explanatory. Be sure to point the "Frame file" and "Index blocks" to the appropriate assets. The "Result style" field is for making a template for each result. To display a link to each page using the display-name and summary from the page metadata use:

<p><a


xhref="%%%path%%%"><b>%%%display-name%%%</b></a><br/>


%%%summary%%%</p>


There are several other template fields as well. The No Results Header is the text that the search engine will display when it cannot find any matching pages for a query. This can be used to display other search options to a user. The Results header is the text shown when results are found. It is often helpful to show the user some information about the total number of results. The Next Page Link Text and Previous Page Link Text is the text to display in the link to the next and previous pages of results. All of these template allow the user of some variables listed in the table below.

{$QUERY} The search terms that the user entered
{$START_RESULT} Number of the first result shown on this page
{$END_RESULT} Number of the last result shown on this page
{$TOTAL_RESULTS} Number of total results this query found
{$RESULTS_ON_NEXT_PAGE} Number of results that will be shown on the next page
{$PER_PAGE} Maximum number of results shown per page

Generally, pieces of metadata from the page structure can be accessed by enclosing the name in "%%%". For an idea of what is available, take a look at the XML output from the index blocks.

7. Logging

To enable logging, simply enter a web server writeable filename in the search-config page (it must be relative to the search script location on the server). You should enter a number for the maximum log entries you wish to keep as well.

8. Using

To use the script, publish the page and file assets and access search.php using the GET method shown in the listing above. For example, to search the entire web site for the phrase "markets" simply go to http://mywebsite.com/search/search.php?s=markets&d=-1

9. Troubleshooting

You may have to manually specify the targets of your assets in the search-config stylesheet. To do this simply edit the stylesheet and change lines looking like this:

<xsl:text>[system-asset:page]</xsl:text>


to this:
<xsl:text>[system-asset:page:target=/intranet]</xsl:text>


Be sure to change "/intranet" to the appropriate target.

XML Search Files 

  1. index-block-search-filter.xsl
    This XSL is used to filter a standard index block inside the CMS before it is published to a live server.
  2. search-config-data-definition.xml
    This data definition XML has all the necessary fields for populating the config file.
  3. search-config-filter.xsl
    This XSL is used to transform the data definition XML into the config file.
  4. search-config.xml
    This is an example config file for the search script.
  5. search-terms-log-xml-php.txt
    This is an example search terms log file. This file needs to be writeable by the script on the production server.
  6. search-php.txt
    This is the script that performs the search on the production server.
  7. sync_files-php.txt
    This script is used to synchronize the search terms log file with Cascade via web services on a regularly scheduled interval.
  8. search.zip
    This zip file contains all the scripts and XML for the search engine.
  9. minixml.zip
    This PHP code is only necessary when using the search script with PHP 4.
  10. search-php4.txt
    This search script is for use with PHP 4.
  11. search.aspx
    This is the .NET file that includes the search control.
  12. SearchResult.cs
    This is the back-end .NET search code.
  13. search-aspx.cs
    This is the front-end .NET search code.
  14. search-java.zip
    This is the Java version of the XML search. Full source code is included.
Last modified on Tue, 19 Dec 2006 15:50:20 -0500

Top / Up / Table of Contents