Implementation

Implementation ( download code)

The script will be segregated into two parts. The first part will parse the XML and construct two data structures, one containing all the web pages and another containing all the quotes. The second part will handle actually importing web pages and quotes into the system. Our script will process HTML web page content with PHP's Tidy interface. Tidy will ensure that our data is well-formed XHTML encoded in UTF-8, which is suitable for importing into Cascade. Finally, the PHP SOAP API will take the data as an associative array and make the appropriate Web Services calls to create page assets for each web page and quote.

Parsing the XML

PHP includes a built-in XML parser based on expat, which means that PHP XML Parsing functions can parse, but not validate XML. Ensure that the XML you pass PHP is valid. Our parser will be an event-based parser, so we will register some functions to run when the parser encounters certain XML elements (like character data or an opening tag). We will have to keep track of the state of the XML between events. For our purposes, a simple stack will be sufficient (we simply want to know if the current tag is part of quotes or webpages structure). Here is the code for setting up the parser:

Notice that we've initialized some arrays to hold all the web pages and quotes. We've also initialized the tag stack, which will record the current tag structure. For instance, when we are looking at an id tag inside of a web_pages tag, the stack will look like this:

This allows us to have a look at the current context of an event. Let's take a look at the event callbacks that we registered in the previous code.

When the parser finds an opening tag, we want to put that tag onto the tagstack. If the opening tag is one the ones we are interested in (either quotes or web_pages), we should prepare the appropriate array by adding a new empty entry. When we encounter a closing tag, we should pop one tag off of the tag stack.

When we find character data between tags we want to figure out what context we are in (by looking one level down on the tag stack) and record the data if we need to. We want to place the data in an associative array inside of $pages and $quotes. This way a particular quote body will be $quotes[3]["body"], which will make it easier to access the data later. Notice that we always append data to fields. This is because the XML parser handles all character content, including other XML that is parses from inside of content tags of web_pages.

When the parser finishes its work, we should have two data structures, $pages and $quotes with the information from the XML file. Now let's take a look at the functions that will process and import our data.

Initially, the import_web_pages() function will configure Tidy to output the type of XHTML we need. Next, it will initialize some arrays to hold paths. The paths will be an associative array that acts like a hash table to keep track of each unique path.

The script will now iterate over every page pulled from the XML file. Notice that it replaces the old HTML in the content field with tidied XHTML. We ensure that the content will be valid and well-formed XHTML after the wrapping it in Cascade's system-xml tag.

Here we split the old CMS-style path field into a Cascade path and a system name for our new page. We know that path elements are separated by '/' so we look for the last one in the path. The string after that is the system name (once we remove the extension) and everything before it is the Cascade parent directory.

Finally, we store the path in our previously-initialized array and end the foreach loop.

The rest of import_web_pages() simply sorts the path array, creates a SOAP client and calls functions to add each folder ( add_folder()) and each web page ( add_page()). These functions will be described below.

Adding a Single Folder

The add_folder() function adds a single folder to Cascade. It takes in the path of the folder to add and removes the last path element as the folder name. Then it constructs an associative array that the SOAP API will turn into a SOAP message. When using SOAP be warned that even though your message might conform to the WSDL that you use, this does not mean that your message was a success. Be sure to check the response that the server returns with __getLastReponse() as shown below.

In PHP the associative array that the SOAP functions accept is structured in a 'tag name' => "tag value" fashion. To pass in multiple tags of a particular tag name, simply associate the 'tag name' key with array of values. This is demonstrated in the add_page() function. Tags may be nested by associating particular tags with other associative arrays.

Adding a Single Web Page

The add_page() function is very similar to the add_folder() function, although the request structure is more complex and we must ensure that the metadata we pass to Cascade is valid UTF-8 text.

Importing Quotes

Importing the quotes is very similar to importing the web pages with one major difference: instead of simply passing XHTML content to Cascade, we want to place values in a data definition. Before running this script, we created a data definition in Cascade named Quote which contained fields for the source, type, and body of each quote. Then we created a configuration set which would format this data definition. Here we see the data definition we used for each quote.

Next, you can see the code for importing the quotes. Notice the structure of the parameter array this time. Structured data definition nodes are packaged in an array inside of the structuredDataNodes tag. Each element is an entry of the structuredDataNode array. To pass more than one node, simply add another level of arrays inside of structuredDataNode.

Summary

The Web Service interface to Cascade Server eases automation of asset creation. PHP's built-in SOAP and XML support makes it useful for migrating to Cascade Server from another product. Content passed to Cascade may have to be converted to XHTML with PHP Tidy and to UTF-8 with the PHP function, utf8_encode(). The PHP SOAP API requires parameters be formed using associative arrays, whose keys describe the tag names. If the call succeeds the server's response can be accessed through the __getLastResponse method of the SoapClient object. Migration becomes far less tedious and time consuming with the use of Web Services.

See Also