The XML

The data that the import application will import is stored in an XML dump from a defunct CMS application. In this case there are two types of data we want to retrieve from the XML. Web pages consist of the metadata for a web page and a block of HTML content. Quotes are simply four fields which describe the quote: an id, a quote type, a quote source, and the body of the quote. The structure of the XML is:

Notice that the XML may contain other second-level tags, so we want to create a parser that is flexible enough to work with all of the data. Our script will pull out the data for each web page and import them into an existing template in the system, while preserving the metadata and path. This will require a folder structure to hold the pages. We will take care of this step as well. Finally, the script will pull the fields of each quote from the XML and create a page using a data definition for quotes.