Migrate Existing XML As Structured Content
Digest
Cascade Server's Web Services interface exposes to SOAP clients and interface for adding and editing assets. This allows repetitive or difficult tasks to be accomplished easily with any SOAP-compatible programming language. As a demonstration of these features, this guide will walk users through the task of importing many database entries from an old CMS (in the form of an XML dump) into Cascade server with PHP and SOAP calls. Web Services makes this process much quicker than a manual import.
Technical
Migrating Existing XML as Structured Content
Preparation for Migrating Existing XML as Structured Content:
To access Cascade Web Services with PHP, a working PHP installation (as well as an installation of Cascade Server) is necessary. To use the SOAP and Tidy interfaces in PHP, ensure that the following lines are in the php.ini file:
On Unix-like systems, ensure that the following lines are in the php.conf file:
The XML
The data that the import application will import is stored in an XML dump from a defunct CMS application. In this case, there are two types of data to be retrieved from the XML. Web pages consist of the metadata for a web page and a block of HTML content. Quotes are simply four fields which describe the quote: an id, a quote type, a quote source, and the body of the quote. The structure of the XML is:
Notice that the XML may contain other second-level tags, so a parser is created that is flexible enough to work with all of the data. The script will pull out the data for each web page and import them into an existing template in the system, while preserving the metadata and path. This will require a folder structure to hold the pages. Finally, the script will pull the fields of each quote from the XML and create a page using a data definition for quotes.Implementation
The scriptwill be segregated into two parts. The first part will parse the XML and construct two data structures, one containing all the web pages and another containing all the quotes. The second part will handle actually importing web pages and quotes into the system. The script will process HTML web page content with PHP's Tidy interface. Tidy will ensure that data is well-formed XHTML encoded in UTF-8, which is suitable for importing into Cascade. Finally, the PHP SOAP API will take the data as an associative array and make the appropriate Web Services calls to create page assets for each web page and quote.
Parsing the XML
PHP includes a built-in XML parser based on expat, which means that PHP XML Parsing functions can parse, but not validate XML. Ensure that the XML passed to PHP is valid. The parser will be an event-based parser, so some functions will be registered to run when the parser encounters certain XML elements (like character data or an opening tag). Keep track of the state of the XML between events. For these purposes, a simple stack will be sufficient (all that is necessary to know is if the current tag is part of quotes or webpages structure). Here is the code for setting up the parser:
Notice that some arrays have been initialized to hold all the web pages and quotes. The tag stack was also initialized, which will record the current tag structure. For instance, when looking at an id tag inside of aweb_pagestag, the stack will look like this:
This allows for looking at the current context of an event. Take a look at the event callbacks that were registered in the previous code.
When the parser finds an opening tag, we want to put that tag onto the tagstack. If the opening tag is one the ones we are interested in (either quotes or web_pages), we should prepare the appropriate array by adding a new empty entry. When we encounter a closing tag, we should pop one tag off of the tag stack.
When character data is found between tags, it is necessary to figure out what context it is in (by looking one level down on the tag stack) and record the data. Data should be placed in an associative array inside of $pages and $quotes. This way, a particular quote body will be $quotes[3]["body"], which will make it easier to access the data later. Notice that data is always appended to fields. This is because the XML parser handles all character content, including other XML that is parses from inside of contenttags of web_pages.
When the parser finishes its work, there should be two data structures, $pages and $quotes with the information from the XML file. Take a look at the functions that will process and import the data.
Initially, the import_web_pages() function will configure Tidy to output the type of XHTML needed. Next, it will initialize some arrays to hold paths. The paths will be an associative array that acts like a hash table to keep track of each unique path.
The script will now iterate over every page pulled from the XML file. Notice that it replaces the old HTML in the content field with tidied XHTML. We ensure that the content will be valid and well-formed XHTML after the wrapping it in Cascade's system-xml tag.
Here we split the old CMS-style path field into a Cascade path and a system name for our new page. We know that path elements are separated by '/' so we look for the last one in the path. The string after that is the system name (once we remove the extension) and everything before it is the Cascade parent directory.
Finally, we store the path in our previously-initialized array and end the foreach loop.
The rest of import_web_pages() simply sorts the path array, creates a SOAP client and calls functions to add each folder ( add_folder()) and each web page ( add_page()). These functions will be described below.
Adding a Single Folder
The add_folder() function adds a single folder to Cascade. It takes in the path of the folder to add and removes the last path element as the folder name. Then it constructs an associative array that the SOAP API will turn into a SOAP message. When using SOAP be warned that even though your message might conform to the WSDL that you use, this does not mean that your message was a success. Be sure to check the response that the server returns with __getLastReponse() as shown below.
In PHP the associative array that the SOAP functions accept is structured in a 'tag name' => "tag value" fashion. To pass in multiple tags of a particular tag name, simply associate the 'tag name' key with array of values. This is demonstrated in theadd_page() function. Tags may be nested by associating particular tags with other associative arrays.
Adding a Single Web Page
The add_page() function is very similar to the add_folder() function, although the request structure is more complex and we must ensure that the metadata we pass to Cascade is valid UTF-8 text.
Importing Quotes
Importing the quotes is very similar to importing the web pages with one major difference: instead of simply passing XHTML content to Cascade, we want to place values in a data definition. Before running this script, we created a data definition in Cascade named Quotewhich contained fields for the source, type, and body of each quote. Then we created a configuration set which would format this data definition. Here we see the data definition we used for each quote.
Next, you can see the code for importing the quotes. Notice the structure of the parameter array this time. Structured data definition nodes are packaged in an array inside of the structuredDataNodes tag. Each element is an entry of the structuredDataNode array. To pass more than one node, simply add another level of arrays inside of structuredDataNode.

