XML for Java Developers
G22.3033-002
Dr. Jean-Claude Franchitti
New York University
Computer Science Department
Courant Institute of Mathematical Sciences
Session 2: XSL
Transformations (XSLT)
Course Title: XML for Java Developers Course
Number: g22.3033-002
Instructor: Jean-Claude Franchitti Session: 2
XSLT Overview
The W3C's XSL Working Group released an update to
the XSL Working Draft Specification. The new XSL draft specification dumps
virtually every feature (with the exception of formatting objects) into a new
working draft called XSL Transformations, or simply XSLT.
XSL now refers specifically to XSL formatting
objects, which are to XML what Cascading Style Sheets are to HTML.
XSLT contains various features including tree
processing, patterns, and templates, and adds a plethora of new features.
The pattern syntax has been expanded and a new
syntax called location paths has been introduced.
The most dramatic change, however, is the addition
of a complete expression language, which looks much like a small programming
language.
The following covers the salient points of XSLT
expressions and give you an idea of how you can use them. Also covered are iteration and conditional processing in
XSLT.
To illustrate the technology, the following shows
how you can take any number of database records in XML format, sort them in
either ascending or descending order, and transform them to HTML for
presentation.
Tree
Processing in XSLT
An XML document can be broken down into a collection
of objects that are ordered in a hierarchical fashion. This hierarchical
representation is called a tree.
Tree structures are useful because they express
relationships between your XML elements in a very simple way. In fact
Hierarchical data structures are excellent for organizing data because they
have a singular, unambiguous point of view, making their semantics very
powerful
To process your XML documents, including
transforming XML into HTML, you need a way go through this tree and select
particular elements. Once you have an element in hand, you can do all kinds of
things to it in preparation for output:
· Add formatting to a headline
style
· Generate text for a text
element
· Create an entirely new
element
· Process the same element in
different ways, based on a set of constraints
· Process a set of elements
with a common structure (as with database records)
To quickly review, an XML processor takes a
marked-up document and produces a tree-like structure containing the elements,
attributes, entities, and so on, of your document. At this point, you can
access the "objects" in this tree through any Document Object Model
(DOM) API. You can also invoke an XSL processor to apply formatting to these
objects and output them in just about any manner you can think of. The XSL processor
takes the tree generated in XML, called the "source tree," and
creates a new "result tree" that
includes all of the objects to be output along with pertinent formatting
information.
As a style-sheet developer, you can control the
creation of the result tree process through template rules. You define a
template rule in a style sheet using the <xsl:template> element. Your rule
generally consists of two parts: a pattern that's used to match with elements,
attributes, and other nodes in the source tree; and the template that generates
part of the result tree. For example, the template rule in the following
example looks for paragraph elements in the source tree.
Example
1:
<xsl:template
match = "para">
<fo:block font-size = "10pt"
space-before = "12pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
When the processor finds such an element, the
formatting portion of the template rule is applied to the paragraph content.
Your template rules are placed inside an <xsl:stylesheet> element, as shown in the
following example.
Example
2:
<xsl:stylesheet
xmlns:xsl = "http://www.w3.org/TR/WD-xsl" xmlns:fo =
"http://www.w3.org/TR/WD-xsl" result-ns = "fo">
<xsl:template match = "para">
<fo:block font-size =
"10pt" space-before = "12pt">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
</xsl:stylesheet>
The style-sheet element has several optional
attributes, which are listed in the following table
CHARACTER |
DESCRIPTION |
| |
The
or operator (|) expresses alternatives; for example, (emph|b) |
/ |
Used
to compose a longer pattern |
// |
Matches
descendants instead of children |
. |
Selects
the current node |
.. |
Selects
the parent of the current node |
* |
Wildcard
character; matches with all elements |
Table1 : XSL Pattern Syntax
However, Example 2 simply defines the namespace, xmlns:xsl, which is a required step.
Note that the namespace must point to the URI shown in the example. Next, a
namespace for the result object is defined. Example 2 creates a namespace, fo, for formatting objects and
assigns it to the result namespace.
There are many unresolved questions related to the
implementation of formatting objects in a device-independent manner. So, the
formatting-objects portion of the XSL proposal is still in limbo.
Asked whether Microsoft supported them, the
company's XML evangelist, Adam Bosworth, responded that Microsoft does not
support the formatting-objects portion of the W3C proposal, nor does it have
plans to support them in the future.
Marie Wieck, director of IBM's technology network
computing software division, concedes that that there are still ambiguities in
the proposal and more work still needs to be done. However, IBM will support
the XSL standard when it solidifies, assuming customer demand warrants it.
Fortunately, the URI in Example 2 can point at
things other than fo. For example, later in the
following, we will transform an XML document and output HTML. The first step in
setting up such a transformation is defining the following namespace:
xmlns="http://www.w3.org/TR/REC-html40"
result-ns=""
In this case, the result-ns is optional and is included
for illustration purposes. As you develop your style sheets, keep in mind that
XSL-defined elements are recognized only in the style sheet, not in the source
document.
Patterns in XSL
Patterns can be used to locate objects within an XML
document tree. From there, you can specify template rules that let you format
these objects.
Patterns
Returning to our source tree, it would be nice to be
able to locate any node and then apply specific formatting to that node. That's
where patterns come in.
You use a pattern to select a node or set of nodes
in the source document. In this way, you can control how the XSL processor
processes your document.
The syntax for creating patterns is straightforward
and resembles the paths used in directory structures. Therefore, it's helpful
to remember that the patterns you specify are always in relation to your
current position in the tree. The simplest pattern is an element type. For
example, a pattern <xsl:template match = "chapter"> matches any child element
that's a chapter.
There are a number of operators that let you control
how to search for patterns within the tree. As mentioned earlier, pattern
syntax resembles the syntax used for traversing directory structures. For
instance, a period represents the current node in the tree, just as it would
represent the current directory in a directory structure. Likewise, two periods
(..) refer to the parent of the
current node. The slash character (/) lets you select specific descendant
patterns. For example, "chapter/title/paragraph" would start at the current
node, look for a chapter child, then a title, and ultimately match with
any paragraph descendants. Again, this is
very much like using directory paths, so it should feel intuitive.
You can also use the wildcard character (*) to match all elements. For
example, */subhead selects all subhead grandchildren of the
current node. On the other hand, chapter/* matches any element that
has a chapter parent. Another operator, //, matches descendants
instead of children. For example, chapter//subhead matches all subhead elements with a chapter ancestor.
You can create alternative paths through the tree
using the or operator (|). For instance, you could
select either a chapter or an appendix using chapter|appendix. You can also string longer
patterns together. As you construct your patterns, however, keep in mind that / binds more tightly than |. For example, */chapter/title | ../preface/title would select either the
title chapter/title grandchild of the current,
or look to the parent of the current node for a preface/title descendant. Note the white
space between the selectors on either side of the or operator. White space is
not significant, so you can break things as you like for better readability.
Other
Node Types
So far, we have described the syntax for accessing
element nodes within the source tree. But a node can contain other objects as
well. To distinguish these other objects, you'll have to identify them for the
XSL processor.
For example, to identify an object as an attribute,
you must prefix the attribute name with the @ symbol. The pattern syntax is pretty much
the same, though. The figure/@caption pattern selects the caption attribute of the figure
element, which is a child of the current node. The @* pattern selects all attributes.
You can select comments in the source tree using the
comment pattern. Using the comment() pattern without any
arguments selects all comment nodes. Similarly, the pi() pattern matches all
processing- instruction child nodes. In addition, you can specify an argument
that indicates a target for the processing instruction, such as pi("xml-stylesheet").
Tests, Comparisons, And
Refinement
You can refine the result returned by a pattern by
specifying the parameter within square brackets ([ ]) after the pattern. For
example, list[@type] matches list elements with
a type attribute, book[editor] matches child book elements that have at least
one editor child element.b
Another thing XSL lets you do is compare patterns to
strings. For example, list[@type="ordered"] matches type attributes
with a value of ordered, and figure[@caption="Figure1"]
looks for "Figure1" captions. Finally, contact [name="Joe Butler"], selects the child element
with the value "Joe Butler".
XSL also lets you test for positions relative to a
sibling. In particular, you can select the first and last child elements in a
branch, as well as the first and last elements of their respective types.
The options are listed in the following table:
TEST |
DESCRIPTION |
first-of-any() |
Succeeds
if the node is the first element child |
last-of-any() |
Succeeds
if the node is the last element child |
first-of-type() |
Succeeds
if the node is the first element child of its type |
last-of-type() |
Succeeds
if the node is the last element child of its type |
Table 2 : Selecting the position of a node
Putting Patterns to Work
Some additional syntactical details have been
omitted. However, you should have the foundation to create some very powerful
patterns. This will let you access and ultimately format and output virtually
any object within your source tree.
To demonstrate in a real-world sense, let us create
an XML document and an accompanying XSL style sheet that will transform our
document into HTML.
The XSL style sheet shows how you can combine CSS
style rules with XSL to format HTML. This is, in fact, how most Web developers
will handle XML.
Listing One below presents news.xml, an XML document
containing a news story.
Listing One:
<?xml
version="1.0"?>
<Story>
<SectionTitle>News&Views</SectionTitle>
<Headline>New Web Graphics Standard
Emerges</Headline>
<Deck>Vector graphics allows images to be resized,
cropped and printed at different resolutions</Deck>
<Dateline>March 1, 1999</Dateline>
<Byline Email="jubutler@xyz.com">Joe
Butler</Byline>
<BodyText
ID="P1">
<DropCap>W</DropCap>hile
XML has primarily been used for text, the World Wide
Web Consortium (W3C)
released the first public working draft of the
<bold>Scalable Vector
Graphics (SVG)</bold> format, which is defined in XML.
SVG is intended to be a
vendor-neutral, cross-platform format for XML vector
graphics over the Web. The
working draft status indicates that the W3C is
making the proposal public
and openly soliciting feedback.
</BodyText>
<BodyText
ID="P2">
The use of vector graphics
means that Web designers will be able to reuse
images more effectively and
that images can be easily resized, cropped and
printed at different
resolutions. <Pullquote>Because it is defined in
XML</Pullquote>, the
SVG format can be read by <italic>any</italic> existing
XML parser, and programmers
and script developers will be able to access SVG
documents through any DOM
API to, for example, create animations. Text within
images, such as figure
captions, will be maintained as text, so it can easily
be searched by search
engines. And Webmasters will be able to apply style
sheets equally well to XML
text and SVG.
</BodyText>
<BodyText
ID="P3">
Members of the W3C's SVG
Working Group include Adobe, IBM, Apple, Microsoft,
Sun, HP, Corel, Macromedia,
Netscape, and Quark. For those
interested, a
public mailing list,
www-svg@w3.org, has been started. You can get more
information on SVG at
<Anchor
myURL="http://www.w3.org/Graphics/SVG/">www.w3.org/Graphics/SVG/</Anchor>.
</BodyText>
</Story>
The root element, Story, contains the other
elements for this document. Elements were created for the section in which the
story runs, along with elements for the title, dek (subtitle), byline,
dateline, and so on. The BodyText element contains the
content for the news story, and includes additional markup to create a dropcap
for the leading character in the first paragraph, and some bold and italics.
Since we are not interested in validating this document, no document type
definition (DTD) is specified.
The style sheet for this document, news.xsl in
Listing Two below contains the template rules to process Listing One. When a
rule maps to a source element, the rule's template is instantiated. The
templates may contain literal "result" elements, character data, and instructions for
creating a portion of the result tree. So, after creating the namespaces for the
<xsl:stylesheet> element as detailed
earlier, Listing Two creates a template rule to process the root element, Story. The root node is a special
case, so Listing Two uses the / pattern to get the root element. (If you need to access the document
element, you can use the pattern: /*.)
Next, the template includes some "literal"
HTML that will be passed directly to the result tree. Note that this includes
the CSS style rules for formatting the document.
After the style rules, we find an HTML <TITLE> element, which specifies a
title for the document. The title comes from the SectionTitle of the XML document, so we
need to process a portion of it to get this title. We will use <xsl:apply-templates>, which processes descendant
nodes.
You specify the nodes to be processed by using a
pattern in a select attribute, in this case, select="Story/SectionTitle". If no select attribute is included, then
apply-templates would process the immediate
children of the current node. Now, the "News&Views" title will
appear in the title bar of the browser.
Next, Listing Two creates the Body of the HTML document, using
apply-templates and a select pattern to similarly
process the story title, dek (subtitle), byline, and text elements of the
document. HTML SPAN
and DIV elements are used to apply
the CSS styles. We have included additional templates to handle specific
elements within the document. There are template rules to handle paragraph
formatting for the BodyText, format the dropcap, handle
bold and italics, and to create a mailto anchor that's included in
the author's byline.
You can test this yourself using tools from your
Server Side XML workbench. You'll need to install the XML4J parser and LotusXSL
(See http://www.alphaWorks.ibm.com,
and installation instructions).
From there you can run this example from the command
line.
Summarizing
the use of patterns
The key to applying templates to document elements
is patterns. Patterns let you access any object in your document tree. Without
question, there's a lot more to template rules than covered here. For example,
template rules let you create new elements, attributes and attribute sets,
comments, and more.
Ultimately, though, XSL's contribution to Web
developers is its tremendous ability to transform XML documents. By adding new
style sheets for different output formats, you can create transformations for
virtually any medium you desire. You can even generate custom transformations
for specific browsers. That means you can create Web sites that push the
envelope with new features without leaving behind users with older browsers. It
also means you can render XML in any browser, even Lynx.
Expressions in
XSLT
As mentioned earlier, you may want to process a set
of elements with a common structure (as with database records) using XLST. This
is were expressions come in.
The XSLT expression language lets you select one or
more elements, specify conditions for processing nodes, and generate new
elements that can be inserted into the result tree.
The expression language provides some general
purpose functions that let you, for example, determine the number of nodes in a
tree fragment, get the position of the current node, and so on.
There are functions that support Boolean operations,
and functions to manipulate strings and numbers. When an expression is
evaluated, you get back an object whose type is either a string, number, boolean, node-set, or a result tree fragment. The string type refers to a Unicode
character string. A boolean is represented as either a
true or false value. The number type represents a
floating-point real number. A node-set refers to nodes in the source tree, and result tree fragment refers to elements in the
result tree.
Of course, an expression can simply be a pattern, as
described earlier. In that case, the expression returns the set of nodes
selected by the pattern. However, XSLT provides various functions that let you
manipulate these different object types. For example, Table 3 presents a list
of the proposed functions for handling strings.
|
Description |
string() |
Converts
an object to a Unicode character string. |
concat() |
Concatenates
two strings and returns the result. |
contains() |
Determines
whether a substring is contained within a string. |
starts-with() |
Takes
two string arguments: If the second string starts with the characters in the
first string, this function returns with a value of true. |
substring-before() |
Returns
the substring preceding a specified character. |
substring-after() |
Returns
the substring following a specified character. |
normalize() |
Removes
leading/trailing white space and reduces extra white space to a single space. |
translate() |
Translates
a string of characters |
format-number() |
Formats
number strings |
Table 3 : String Functions
The basic function, string(), converts an object of
another type to a string. For instance, if the object was originally a number type, string() performs a conversion and
returns a string in the form of a real number. If the number is negative, a
negative sign (-) precedes the string.
Boolean values are converted to the strings true and false. If the object is a node-set, the first node (in
document order) is selected and that value is converted to a string. An empty
string is returned if the set is empty. A result tree fragment is converted to a string by
treating it as a single document fragment node. In all cases, the argument
defaults to the current node if the argument is omitted.
A complementary function, number(), does the opposite of string(): It takes a string that
represents a numeric value and converts it to that value as per Table 4.
Expression |
Description |
div |
Operator
that divides two numbers and returns a floating-point value. |
quo |
Operator
that divides two numbers and returns an integer. |
mod |
Operator
that returns the remainder of a floating-point division operation. |
number() |
Function
that converts the value specified in its argument to a number. |
sum() |
Function
that returns the sum of the values of the nodes in the argument node-set. |
round() |
Function
that returns the round of a number as an integer. |
floor() |
Function
that returns the largest integer not greater than the argument. |
ceiling() |
Function
that returns the smallest integer not smaller than the argument. |
Table 4 : Floating-point
functions and operators
If the string does not represent a number, then the
function returns a value of 0. The input string may contain white space, and Boolean values are
converted to 1 (true), or 0 (false). If the argument
contains a node-set or a result tree fragment, it's converted to a string
and then evaluated as just described. Finally, if you don't supply an input
string in the argument, the current node is used.
Another useful function in Table 3 is concat(), which takes two strings
and concatenates them. For example, let's say your application performs a
database lookup and needs to add a label to one field in a record. You might
use concat("Name: ", "Joe Butler"). The result would be Name:
Joe Butler.
Another function, contains(), could be useful in
searching for substrings. For example contains("ML",
"AfterHTML.com") will return with a value of true. What's unclear from the
draft specification is whether case-sensitive comparisons are allowed. A
case-sensitive comparison would mean, for example, that contains("ml",
"AfterHTML.com") would return false. Two related functions, substring-before and substring-after are illustrated in Example
3.
Example 3:
(a)
substring-before("AfterHTML.com", "HTML")
(b)
substring-after("AfterHTML.com", "HTML")
XSLT provides several other functions and operators
for handling numbers. The div operator divides two
numbers and returns a floating-point number as specified by the IEEE 754
specification. The quo operator also divides two
numbers, but truncates the result and returns an integer. The mod operator similarly divides
two numbers, but returns the remainder as an integer. For example, 10 quo 3 returns the value 3, while 10 mod 3 returns the value 1. The sum() function takes a node-set and returns the sum of the
values of the nodes in the set. round() returns an integer after the value has been rounded
off. The floor() function returns an integer
representing the largest number not greater than the argument value. And the ceiling() function returns the
smallest integer that is not less than the argument value.
Booleans
Booleans are particularly useful when comparing two
values. XSLT provides five functions and five operators that let you make these
comparisons as indicated in Table 5.
Expression |
Description |
boolean() |
Evaluates
its argument value and returns a Boolean value. |
not() |
Returns
the opposite Boolean value. |
true() |
Returns
a value of true. |
false() |
Returns
a value of false. |
lang() |
Compares
the language of the context node. |
=,
<, >, <=, >= |
Converts
each operand to a number and compares the two numbers. |
or |
Returns
true if either operand is true. |
and |
Returns
true only when both operands are true. |
Table 5 : Boolean Functions
and Operators
The boolean() function simply evaluates its argument and converts
it to a Boolean. The argument can be a number, node list, result tree fragment, or a string. The next function, not() negates whatever Boolean
value the argument would normally return. Thus, not() returns a value of true when its argument is false.
The true() function forces a true value to be returned, and
likewise, false() always returns false.
The Boolean operators listed in Table 5 directly
test the values on either side of the operand. For <, >, <=, or >=, each operand is converted
to a number and then the two numbers are compared. For example, 1 < 2 returns a value of true and 2 <= 1 returns false. The = operator is treated
differently depending on the argument type. Number types are treated as just
described for the other operands. However, if the argument is not of type number, the operands are converted
to strings and the string values are compared. The or operator evaluates each operand and converts
it to a Boolean, then compares the two Boolean values. The result of the
operation is true if either value is true.
The and operator is converted
similarly. However, both operands must be true for the result to return true.
XML lets you specify the language for elements using
an xml:lang attribute. The lang() function examines this
value for the current node and compares it to the language specified in its
argument. If the xml:lang attribute was not specified
for the current node, the lang() function looks up the tree
for ancestors that have specified the xml:lang attribute and uses that
value. If no attribute was specified, the lookup fails and the lang() function returns false.
Extension
Functions
While the expression language contains features
found in a programming language, it's not intended to be one. Instead, XSLT
provides an extension mechanism that lets you access languages such as
JavaScript, VBScript, and Java. The specification doesn't require an XSLT
processor to support extensions for any particular language, so you'll want to
check the documentation for your specific processor for this support.
Putting
XSLT to Work
XSLT provides a number of additional features that
make it easier to process elements. For example, XSLT provides a for-each element that instructs the
processor to perform iterative processing. This is particularly useful when you
need to process a large number of elements that have the same structure. A
typical example is when you have a collection of elements that represent
records in a database. Consider the XML document in Listing Three, which
represents some of the tools in an XML tools database.
Listing
Three:
<?xml
version="1.0"?>
<productDB>
<product>
<name>XML Toolbox</name>
<company>AfterHTML</company>
<version>1.0</version>
<price>99.95</price>
<sys-requirements>Any Java
Platform</sys-requirements>
<description>
XML Toolbox is a collection of tools for creating
and processing XML documents
</description>
</product>
<product>
<name>xml4j</name>
<company>IBM</company>
<version>1.1.1.4</version>
<price>Freely available</price>
<sys-requirements>Any Java
platform</sys-requirements>
<description>
xml4j is an XML processor that is compliant
with the XML 1.0
working draft specification
</description>
</product>
<product>
<name>MSXML</name>
<company>Microsoft/DataChannel</company>
<version>N/A</version>
<price>Freely available</price>
<sys-requirements>Any Java
platform</sys-requirements>
<description>
MSXML is an XML processor that is compliant
with the XML 1.0 working draft specification
</description>
</product>
</productDB>
The document represents a database table called productDB. Each record is referenced
as product. The rest of the elements
represent field names. For simplicity, Listing Three presents just three
records.
The goal of this example is to publish some of the
fields from each record as a summary within an HTML table. The summary could be
a hit list resulting from searching the database. In any case, we would like to
transform some (but not all) of the XML record elements in Listing Three into
HTML and publish each summary in a row of the table. The columns represent the
product's name, version, and price, respectively. Let's further stipulate that
we'd like to sort each entry in ascending order based on the product's name.
Listing Four presents the XSL style sheet to execute
our transformation. The style sheet contains a single template rule that uses a
step pattern to select the document element, productDB. Next, the template
generates some preliminary HTML elements including the page TITLE, appropriate labels, and
the start of the HTML table. The first row of the table contains the headings
for each column.
Listing Four:
<?xml
version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/productDB">
<HTML>
<HEAD>
<TITLE>
XML Tools Database Search
</TITLE>
</HEAD>
<BODY>
<H1>XML Tools Database Search Results</H1>
<TABLE Border="1" width="100%">
<TR>
<TD>Product Name</TD>
<TD>Product Version</TD>
<TD>Product Price</TD>
</TR>
<xsl:for-each
select="product">
<xsl:sort select="name"/>
<TR style="color:green">
<xsl:for-each select="name">
<TD>
<xsl:apply-templates/>
</TD>
<xsl:for-each select="../version">
<TD><xsl:apply-templates/></TD>
</xsl:for-each>
<xsl:for-each select="../price">
<TD>
<!-- This test only works with a processor that
implements XSLT
<xsl:if test='number()'>
$
</xsl:if>
-->
<xsl:apply-templates/></TD>
</xsl:for-each>
</xsl:for-each>
</TR>
</xsl:for-each>
</TABLE>
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
Next, the style sheet uses a for-each element to process each product record. Without such a
construct, we'd have to write a separate transformation for every record in the
database. Not only would this be tedious, but we have no way of knowing in
advance how many records the search will return. The XSL processor, with the
help of the for-each element, will figure that
out for us.
Prior to doing anything else, Listing Four
immediately calls <xsl:sort> to sort the product nodes
in ascending order. The select attribute identifies the
sort key. The sort element takes some additional attributes including order (specifies the sort order),
lang (identifies the language of
the sort keys), and data-type (determines the data type
of the element nodes). By default, the sort order is ascending, and the data
type is text, so are leaving these out.
There are three additional <xsl:for-each> nested elements within the
first -- one for each field we want to process. After selecting the appropriate
field element, we create a column in the table and call <xsl:apply-templates> to process the element.
We have thrown in a curve on the last element. Some
products in the database are priced at a specific dollar value, but others are
available for free. We could simply enter $0 for freeware items, but we decided
to write out "freely available" in the record. The style sheet deals
with this using <xsl:if> to test whether this a
numeric value. If so, it adds a dollar sign in front of the value. Otherwise,
it is text and the template leaves it alone.
Summarizing
the Use of Expressions in XSLT
There is, of course, a great deal more to XSLT,
including location paths and extensions to external languages. However, the
other part of the equation, XSL, has yet to be addressed. While we have a
bright and shiny new XSL working-draft specification, there's not a processor
on the planet that currently supports it.