Reading XML with XmlReader

Summary

Overview
Reading Elements
Reading Attributes
Reading Node by Node
Reading Inner and Outer Content
Skipping Content
Namespaces and Prefixes
Obsolete Classes

Overview

The System.Xml namespace contains the following namespaces and core classes:

Namespace	Core Classes
System.Xml.*	XmlReader and XmlWriter: Forward only, non-cached reader and writer. XmlDocument: Represents an XML document in a W3C-style DOM.
System.Xml.Path	Infrastructure and API for XPath
System.Xml.XmlSchema	Infrastructure and API for XSD schemas.
System.Xml.Xsl	Infrastructure and API for XSLT transformations.
System.Xml.Serialization	Serializing objects to XML and deserializing XML to objects.
System.Xml.Linq	LINQ-centric version of XmlDocument

XmlReader is an abstract base class used to provide non-cached, forward-only, read-only access. XmlReader has methods and properties to:

[Method] Read XML content when content is available in a text file.
[Method] Find depth of the XML element stack.
[Method] Read and navigate attributes.
[Method] Determine if an element has content or is empty.
[Method] Skip over elements and their content
[Property] Get name of the current node.
[Property] Get content of the current node.

When reading XML, XmlReader checks that the XML is well-formed and throws XmlException if an error is encountered. Because XmlReader allows you to read from potentially slow resources (streams and URIs), it offers asynchronous versions of most of its methods so that you can easily write non-blocking code.

The typical usage for XmlReader is to Instantiate an XmlReader using XmlReader.Create passing in:

A stream, a TextReader or a URI string.
An XmlReaderSettings instance to control parsing and validation options. The following three properties are useful for skipping over superfluous content:
- IgnoreComments
- IgnoreProcessingInstructions
- IgnoreWhitespace

Another useful property on XmlReaderSettings is ConformanceLevel. Its default value of Document instructs the reader to assume a valid XML document with a single root node. If you wanted to read just the inner portion of the file, containing multiple nodes, set the ConformanceLevel to Fragment:

<Book Name="ABC">
        <Author>Yazan Diranieh</Author>
        <ISBN>12345</ISBN>
        <Publisher>Publisher 1</Publisher>
</Book>
<Book Name="XYZ">
        <Author>Yazan Diranieh</Author>
        <ISBN>6789</ISBN>
        <Publisher>Publisher 2</Publisher>
</Book>

Another useful property on XmlReaderSettings is CloseInput, which indicates whether to close the underlying stream when the reader is closed. The default value is false.

The units of an XML stream are XML nodes. The following are all possible nodes that you may encounter with XmlReader:

public enum XmlNodeType
{
    None = 0,                   // Read method has not been
    Element = 1,
    Attribute = 2,
    Text = 3,                   // The text content of a node.
    CDATA = 4,                  // For example, <![CDATA[my escaped text]]>
    EntityReference = 5,        // A reference to an entity 
    Entity = 6,                 // For example, <!ENTITY...>
    ProcessingInstruction = 7,   // For example, <?pi test?>
    Comment = 8,
    Document = 9,               // A document object that, as the root of the document tree,
                                // provides access to the entire XML document.        
    DocumentType = 10,          // For example, <!DOCTYPE...>
    DocumentFragment = 11,
    Notation = 12,              // A For example, <!NOTATION...>        
    Whitespace = 13,            // White space between markup.        
    SignificantWhitespace = 14, // White space between markup in a mixed content model 
                                // or white space within the xml:space="preserve" scope.
    EndElement = 15,            // An end element tag (for example, </item> ).
    EndEntity = 16,             // Returned when XmlReader gets to the end of the entity 
                                // replacement as a result of a call to XmlReader.ResolveEntity()
    XmlDeclaration = 17,        // for example, <?xml version='1.0'?>
}

The reader traverses the stream in textual (depth-first) order. The Depth property of the reader returns the current depth of the reader. The most primitive way to read from an XmlReader is using Read() which advances to the next node in the stream (just like MoveNext in IEnumerator). The first call to Read positions the cursor at the first node. Successive calls to Read advance the cursor to the next node (where an XML node canbe any of those found in XmlNodeType enum), until the cursor advances past the last node, at which point Read returns false and the reader should be closed and abandoned.

The following shows how to read each node and output its type (note that attributes are not included in read-based traversal):

static public void BasicUsage3()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile1.xml", settings))
    {
        // Loop over all nodes
        while (reader.Read())
        {
            Trace.Write(new string(' ', reader.Depth * 2));
            Trace.WriteLine(reader.NodeType);
        }
    }
}

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
<Books>
    <Book Name="ABC">
        <Author>Yazan Diranieh</Author>
        <ISBN>12345</ISBN>
        <Publisher>Publisher 1</Publisher>
    </Book>
    <Book Name="XYZ">
        <Author>Yazan Diranieh</Author>
        <ISBN>6789</ISBN>
        <Publisher>Publisher 2</Publisher>
    </Book>
</Books>

XmlDeclaration
Comment
Element
  Element
    Element
      Text
    EndElement
    Element
      Text
    EndElement
    Element
      Text
    EndElement
  EndElement
  Element
    Element
      Text
    EndElement
    Element
      Text
    EndElement
    Element
      Text
    EndElement
  EndElement
EndElement

When working and debugging XmlReader code, two important string properties on XmlReader are Name and Value. Depending on the the node type, either Name or Value or both are populated.

Quick overview of CDATA, ENTITY, and DOCTYPE:

CDATA: All text in an XML document is parsed except for text inside a CDATA section. A CDATA section starts with <![CDATA[ and ends with ]]>. For example, characters like "<" and "&" are illegal in XML elements because "<" is interpreted as the start of an element while "&" is interpreted as the start of a character entity. For example, Javascript uses "<" and "&" and many others as operators; a CDATA sections can be used to avoid XML errors when the XML document contains Javascript code:
```
<script>
  <![CDATA[
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
]]>
</script>
```

DOCTYPE: A doctype declaration can also be used to define special characters and character strings. An entity is like a macro. For example, the following DOCTYPE defines three entities (macors) named nbsp, writer and copyright. Note the syntax used to refer to an entity in the <footer> element:

<?xml version="1.0" encoding="UTF-8"?>
 
<!DOCTYPE note [
  <!ENTITY nbsp "&#xA0;">
  <!ENTITY writer "Writer: Donald Duck.">
  <!ENTITY copyright "Copyright: W3Schools.">
]>
 
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
  <footer>&writer;&nbsp;&copyright;</footer>
</note>

When this XML document is parsed, the output will be (note the contents of <footer>):

<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
  <footer>Writer: Donald Duck. Copyright: W3Schools.</footer>
</note>

Back to XmlReader, the following XML is used which includes a document type, entity, CDATA, and a comment:

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
 
<!DOCTYPE Book [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Yazan Diranieh.">
<!ENTITY web "www.diranieh.com">
]>
 
<Book Name="ABC">
    <Author>Yazan Diranieh</Author>
    <ISBN>12345</ISBN>
    <Quote><![CDATA[ C# operators include <, >, & and others]]></Quote>
    <Publisher>Publisher 1</Publisher>
    <Footer>&writer;&nbsp;(&web;)</Footer>
</Book>

static public void NameAndValue()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile2.xml", settings))
    {
        while (reader.Read())
        {
            Trace.Write(reader.NodeType.ToString().PadRight(20, '-'));
            Trace.Write("> ".PadRight(reader.Depth*5));
            Trace.WriteLine(string.Format("Name = {0}, Value = {1}", reader.Name, reader.Value));
        }
    }
}

XmlDeclaration------> Name = xml, Value = version="1.0" encoding="utf-8"
Comment-------------> Name = , Value =  This is a comment 
DocumentType--------> Name = Book, Value = 
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY web "www.diranieh.com">
 
Element-------------> Name = Book, Value = 
Element------------->    Name = Author, Value = 
Text---------------->         Name = , Value = Yazan Diranieh
EndElement---------->    Name = Author, Value = 
Element------------->    Name = ISBN, Value = 
Text---------------->         Name = , Value = 12345
EndElement---------->    Name = ISBN, Value = 
Element------------->    Name = Quote, Value = 
CDATA--------------->         Name = , Value =  C# operators include <, >, & and others
EndElement---------->    Name = Quote, Value = 
Element------------->    Name = Publisher, Value = 
Text---------------->         Name = , Value = Publisher 1
EndElement---------->    Name = Publisher, Value = 
Element------------->    Name = Footer, Value = 
Text---------------->         Name = , Value = Writer: Yazan Diranieh. (www.diranieh.com)
EndElement---------->    Name = Footer, Value = 
EndElement----------> Name = Book, Value =

Note how XmlReader automatically resolved entities.

Reading Elements

When the structure of the XML is know, XmlReader helps by providing a range of methods that read while presuming a specific structure.

ReadStartElement verifies that the current NodeType is Element and then calls Read (which advances the cursor to the next element). ReadEndElement verifies that the current NodeType is EndElement and then calls Read (which advances the cursor to the next element). For example, you can read this

<ISBN>12345</ISBN>

with

reader.ReadStartElement("ISBN");
string isbn = reader.Value;
reader.ReadEndElement();

The ReadElementContentAsString does all the above: it reads a start element, a text node, and an end element, returning the content as string:

string isbn = reader.ReadElementContentAsString("ISBN", "");

The following shows how to read this file

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
 
<!DOCTYPE Book [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Yazan Diranieh.">
<!ENTITY web "www.diranieh.com">
]>
 
<Book Name="ABC">
    <Author>Yazan Diranieh</Author>
    <ISBN>12345</ISBN>
    <Quote><![CDATA[ C# operators include <, >, & and others]]></Quote>
    <Publisher>Publisher 1</Publisher>
    <Footer>&writer;&nbsp;(&web;)</Footer>
</Book>

static public void ReadTestFile2()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;
    settings.IgnoreComments = true;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile2.xml", settings))
    {
        // MoveToContent is very useful. It allows you to skip XML Declarations, whitespace,
        // comments and processing instructions
        reader.MoveToContent();
        reader.Read();      // Read <Book>
        var author = reader.ReadElementContentAsString("Author", "");
        var isbn = reader.ReadElementContentAsString("ISBN", "");
        var quote = reader.ReadElementContentAsString("Quote", "");
        var publisher = reader.ReadElementContentAsString("Publisher", "");

        // Assume that <Foote> is optional. This is how to code around than
        var footer = (reader.Name == "Footer") ? reader.ReadElementContentAsString("Footer", "") : null;
 
        reader.Read();  // Read </Book>
    }
}

Reading Attributes

XmlReader must be positioned on a start element in order to read attributes. Recall that ReadStartElement verifies that the current NodeType is Element and then calls Read (which advances the cursor to the next element). This means that attributes for that element are gone forever! To read attributes, XmlReader provides an indexer that gives direct (and random access) to any attribute on the current element, either by name or position. Note that using an indexer is the same as calling GetAttribute:

static public void ReadAttributes()
{
    string filename = @"Xml\TestFile1.xml";
    XmlReaderSettings settings = new XmlReaderSettings {IgnoreWhitespace = true};
    using (var reader = XmlReader.Create(filename, settings))
    {
        // Skip xml declarations and comments and go to the root element
        reader.MoveToContent();
 
        // Read attributes on all elements
        while (reader.Read())
        {
            if (reader.NodeType != XmlNodeType.Element) continue;
 
            // Read attributes on current element
            Trace.WriteLine("reading attributes on element " + reader.Name);
            for (int i = 0; i < reader.AttributeCount; i++)
                Trace.WriteLine(string.Format("attribute {0} = {1}", i, reader[i]));
        }
    }
}

reading attributes on element Book
attribute 0 = ABC
reading attributes on element Author
reading attributes on element ISBN
reading attributes on element Publisher
reading attributes on element Book
attribute 0 = XYZ
reading attributes on element Author
reading attributes on element ISBN
reading attributes on element Publisher

Attribute Nodes

An attribute is an XML node as defined by the XmlNodeType enum. To explicitly traverse attribute nodes (just as you would traverse element nodes) usually to pare attribute values into other types, you must make a special diversion from calling Read: The diversion must begin from a start element and the forward-only rule is relaxed during attribute traversal. You can jump to any attribute (forward or backward) using MoveToAttribute. Note that MoveToElement returns you to the start element from anyplace within the attribute node diversion. The following example illustrates:

static public void ReadAttributeNodes()
{
    string filename = @"Xml\TestFile3.xml";
    XmlReaderSettings settings = new XmlReaderSettings {IgnoreWhitespace = true};
    using (var reader = XmlReader.Create(filename, settings))
    {
        // Skip xml declarations and comments and go to the root element
        reader.MoveToContent();
 
        // Skip over all nodes until we reach an element with attributes
        while (reader.Read())
        {
            if (!reader.HasAttributes) continue;
 
            // Approach 1 to reading attribute nodes
            // Note that attribute "Author" comes after attribute "Name". Order is not important
            string author = null;
            string title = null;
            if (reader.MoveToAttribute("Author"))
                author = reader.ReadContentAsString();
 
            if (reader.MoveToAttribute("Title"))
                title = reader.ReadContentAsString();
            Trace.WriteLine(string.Format("Author = {0}. Book Title = {1}", author, title));
 
            // Approach 2 to reading attribute nodes. Here you can traaverse each attribute
            // in sequence
            if (reader.MoveToFirstAttribute())
                do
                {
                    string attributeName = reader.Name;
                    string attributeValue = reader.Value;
                    Trace.WriteLine(string.Format("Attribute name: {0}, Attribute value: {1}",
                                        attributeName, attributeValue));
 
                } while (reader.MoveToNextAttribute());
        }
    }            
}

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
<Books>
  <Book Title="ABC" Author="Yazan Diranieh" ISBN="12345" />
  <Book Title="XYZ" Author="Yazan Diranieh" ISBN="67890" />
</Books>

Author = Yazan Diranieh. Book Title = ABC
Attribute name: Title, Attribute value: ABC
Attribute name: Author, Attribute value: Yazan Diranieh
Attribute name: ISBN, Attribute value: 12345
Author = Yazan Diranieh. Book Title = XYZ
Attribute name: Title, Attribute value: XYZ
Attribute name: Author, Attribute value: Yazan Diranieh
Attribute name: ISBN, Attribute value: 67890

Reading Node by Node

Recall that XmlReader provides a forward-only (and read-only, non-cached) access to an XML stream of file. The usual scenario for reading XML with XmlReader classes is:

Instantiate an XmlReader as discussed above
Load the reader with an XML stream or file.
Use the XmlReader.Read method within a while loop to loop over all nodes in the document.
Within the while loop, access various methods and properties of the current node. The current node is the node on which the reader is currently positioned. All methods and properties are with respect to the current node. Note that XmlReader properties and methods may not be available on every node type.

The following example illustrates how to read and process the Xml document above using XmlReader

static public void BasicUsage()
{
    // XmlTextReader is disposable
    using (XmlTextReader reader = new XmlTextReader(@"Xml\TestFile1.xml"))
    {
        // Loop over all nodes
        while (reader.Read())
        {
            DisplayNodeInfo(reader);
        }                
    }            
}

private static void DisplayNodeInfo(XmlTextReader reader)
{
    Trace.WriteLine(string.Format("Node Name: {0}, Node Type: {1}, Node Value: {2}",
            reader.Name, reader.NodeType,
            string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : 
					        reader.Value));
	 
    // Are there attributes on the current node?
    if (reader.HasAttributes)
    {
        // Current node has attributes
        Trace.WriteLine( string.Format("Current node has {0} attributes", reader.AttributeCount));
 
        // Loop over all attributes for current node
        for (int i = 0; i < reader.AttributeCount; i++)
        {
            // When positioned on an element, use MoveToAttribute to move to a specific attribute
            reader.MoveToAttribute(i);
            Trace.WriteLine(string.Format("Node Name: {0}, Node Type: {1}, Node Value: {2}",
                reader.Name, reader.NodeType, 
                string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : 
					            reader.Value));
        }
 
        // Move the reader back to the element node that owned the last attribute
        reader.MoveToElement();
        Trace.WriteLine(string.Format("Moved reader back to Node Name: {0}. Node Type: {1}, Node Value: {2}",
                reader.Name, reader.NodeType, 
                string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) :  
						   reader.Value));
    }
}

private static string GetEmptyStringDisplay(string data)
{
    if (data == string.Empty)
        return "(Empty)";
 
    string display = string.Empty;
    foreach (char c in data)
    {
        switch (c)
        {
            case '\r':
                display += "\\r";
                break;
            case '\n':
                display += "\\n";
                break;
            case ' ':
            default:
                display += "_";
                break;
        }
    }
    return display;
}

Output (indented for clarity)

Node Name: xml, Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Current node has 2 attributes
Node Name: version, Node Type: Attribute, Node Value: 1.0
Node Name: encoding, Node Type: Attribute, Node Value: utf-8
Moved reader back to Node Name: xml. Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Node Name: , Node Type: Whitespace, Node Value: \n
Node Name: , Node Type: Comment, Node Value:  This is a comment 
Node Name: , Node Type: Whitespace, Node Value: \n
Node Name: Books, Node Type: Element, Node Value: (Empty)
Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: ABC
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 12345
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 1
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: XYZ
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 6789
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 2
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n
Node Name: Books, Node Type: EndElement, Node Value: (Empty)

Using XmlReaderSettings we can ignore Whitespace nodes to give the following output:

static public void BasicUsage2()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile1.xml", settings))
    {
        // Loop over all nodes
        while (reader.Read())
        {
            DisplayNodeInfo(reader);
        }
    }
}

Node Name: xml, Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Current node has 2 attributes
Node Name: version, Node Type: Attribute, Node Value: 1.0
Node Name: encoding, Node Type: Attribute, Node Value: utf-8
Moved reader back to Node Name: xml. Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Node Name: , Node Type: Comment, Node Value:  This is a comment 
Node Name: Books, Node Type: Element, Node Value: (Empty)
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: ABC
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 12345
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 1
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: XYZ
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 6789
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 2
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
Node Name: Books, Node Type: EndElement, Node Value: (Empty)

Note the usage of MoveToAttribute to go through the attribute list of the current element. After this method has been called, certain properties such Name, NamespaceURI, and Prefix, reflect the properties of the attribute and not its containing element.

Also note that an attribute does not always need to be specified on an element. In many cases, a DTD or a schema defines default values for attributes on elements. When using methods that move among attributes, attributes that receive values from a DTD or a schema are acted upon just like attributes that are given values in the XML stream. You can programmatically determine if an attribute has received its value from the DTD or schema using the IsDefault property which returns true if the current node is an attribute that was generated from the default value defined in the DTD or schema.

Note: IsDefault does not apply to all XmlReader-derived classes. For example, XmlTextReader.IsDefault always returns false because XmlTextReader has no support for DTDs and schemas. On the other hand XmlValidatingReader.IsDefault returns true if the current node is an attribute whose value was generated from the default value defined in the DTD or schema. (and returns false if the value was explicitly specified in the XML stream.)

Reading Inner and Outer Content

Another way to read XML with XmlReader-derived classes is to use ReadInnerXml and/or ReadOuterXml methods. These methods work as follows:

ReadInnerXml: When positioned on a node, this method pulls all content from the current node including markup, starting from the element tag up to the element end tag. After this call, the reader is positioned after the current node's end element tag.
ReadOuterXml: Similar to ReadInnerXml except that the return value includes start and end tags of the current node.

The following tables illustrate:

Node Type	Content	Return Value	Position
Element	<Book> <Author>Yazan</Author> </Book>	ReadInnerXml: <Author>Yazan</Author> ReadOuterXml: <Book> <Author>Yazan</Author> </Book>	ReadInnerXml and ReadOuterXml: After the ending element </Book>
Attribute	<Books Publisher="Somebody"/>	ReadInnerXml: Somebody ReadOuterXml: Publisher="Somebody"	ReadInnerXml and ReadOuterXml: Remains on the attribute node Publisher

Skipping Content

Content can be skipped in two ways:

Move directly to content using MoveToContent.
Skip over child nodes of the current node using Skip.

Moving directly to content

A content node is defined to be any of the following:

Text
CDATA
Element
EndElement
EntityReference
EndEntity

You can move to any of these nodes using MoveToContent - this method checks if the current node is a content node, and if it is not one of the preceding node types, it skips over it until it finds the next content node or the end of file. If the current node is a content node, then MoveToContent returns the current node and does not move to a new content node. This implies that MoveToContent skips over the following nodes:

XmlDeclaration
ProcessingInstruction
DocumentType
Comment
Attribute
Whitespace
SignificanWhitespace

This type of navigation is more efficient if the application only needs to deal with content rather than calling Read that moves the reader to the next node, force the application to determine the type of the node, and finally read its content if that node type supports content.

If the reader is already positioned on an attribute, then calling MoveToContent moves the current node position to the owning element for that attribute. MoveToContent returns XmlNodeType which specifies the type of the current node (this information is typically used to skip over random XML). The following example illustrates how to use MoveToContent using the same Xml file above:

static public void ContentUsage()
{
    XmlNodeType nodeType;
    using (XmlTextReader reader = new XmlTextReader(@"Xml\TestFile1.xml"))
    {                
        // Calling MoveToContent multiple times does not move the reader to the next content
        nodeType = reader.MoveToContent();
        DisplayNodeInfo(reader);
        nodeType = reader.MoveToContent();
        DisplayNodeInfo(reader);
 
       while (reader.Read())
       {
          Trace.WriteLine(string.Format("Before MoveToContent: Node Name: {0}, Node Type: {1}, Node Value: {2}",
               reader.Name, reader.NodeType,
               string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value));
 
          reader.MoveToContent();
 
          Trace.WriteLine(string.Format("After MoveToContent Node Name: {0}, Node Type: {1}, Node Value: {2}",
              reader.Name, reader.NodeType,
              string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value));
 
          Trace.WriteLine("");
      }
    }
}

Node Name: Books, Node Type: Element, Node Value: (Empty)
Node Name: Books, Node Type: Element, Node Value: (Empty)

Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n____
After MoveToContent   Node Name: Book,      Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: Author,    Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: Yazan Diranieh
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: Yazan Diranieh
 
Before MoveToContent: Node Name: Author,    Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: Author,    Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: ISBN,      Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: 12345
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: 12345
 
Before MoveToContent: Node Name: ISBN,      Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: ISBN,      Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: Publisher, Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: Publisher 1
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: Publisher 1
 
Before MoveToContent: Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n____
After MoveToContent   Node Name: Book,     Node Type: EndElement,  Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n____
After MoveToContent   Node Name: Book,     Node Type: Element,     Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n________
After MoveToContent   Node Name: Author,   Node Type: Element,     Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Text,        Node Value: Yazan Diranieh
After MoveToContent   Node Name: ,         Node Type: Text,        Node Value: Yazan Diranieh
 
Before MoveToContent: Node Name: Author,   Node Type: EndElement,  Node Value: (Empty)
After MoveToContent   Node Name: Author,   Node Type: EndElement,  Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n________
After MoveToContent   Node Name: ISBN,     Node Type: Element,     Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Text,        Node Value: 6789
After MoveToContent   Node Name: ,         Node Type: Text,        Node Value: 6789
 
Before MoveToContent: Node Name: ISBN,     Node Type: EndElement,  Node Value: (Empty)
After MoveToContent   Node Name: ISBN,     Node Type: EndElement,  Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: Publisher, Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: Publisher 2
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: Publisher 2
 
Before MoveToContent: Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n____
After MoveToContent   Node Name: Book,      Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n
After MoveToContent   Node Name: Books,     Node Type: EndElement, Node Value: (Empty)

Skipping over content

The Skip method is used to move over the current element. In other words, if the node type is XmlNodeType.Element, then calling Skip moves over all of the content of the current element and its end tag. For example, if you have the following XML:

<a name="MyName" ID="1">
    <x>123</x>
    <y>456</y> 
</a>
<b>
...
</b>

and you are positioned on the <a> node or any of its attributes, then calling Skip positions you on the next element which is <b>. And if you are positioned on the <x> node, then calling Skip positions you on the next element which is <y>.

Namespaces and Prefixes

XmlReader provides two parallel systems for referring to element and attribute names:

Name
NamespaceURI and LocalName

You're using the first system whenever you read an element's Name property or call a method that accepts a single name argument. This works well if no namespaces or prefixes are present. If namespaces and prefixes are present, namespaces are ignored and prefixed become part of the name:

XML Fragment	Name	Code
<book ...>	book	reader.ReadStartElement("book")
<book xmlns='blah' .../>	book	reader.ReadStartElement("book")
<x:book ... />	x:book	reader.ReadStartElement("x:book")

The second system works through two properties aware of namespaces: LocalName and NamespaceURI. These properties take into account prefixes and default namespaces defined by parent elements. Prefixes are automatically expanded meaning that NamespaceURI always reflects the correct namespace for the current element, and LocalName is always free of prefixes.

When you pass two name arguments into a method such as ReadStartElement, you're using this name system. For example, consider this XML

<customer xmlns="MyNamespace" xmlns:other="MyOtherNamespace">
  <address>
    <other:city>SomeCity</other:city>    
    ...
  </address>    
  ...
</customer>

We could read this as follows. Note that prefixes are abstracted away:

reader.ReadStartElement("customer", "MyNamespace");
reader.ReadStartElement("address", "MyNamespace");
reader.ReadStartElement("city", "MyOtherNamespace");

If required, you can see what prefix was used using the Prefix property and converting it into a namespace by calling LookupNamespace.

Obsolete Classes

The .NET Framework extends XmlReader in the following classes:

XmlTextReader
XmlNodeReader
XmlValidatingReader

The following table describes the implementation of each of these classes. Recall that they all derive from XmlReader and hence they all provide forward-only, read-only, non-cached access to XML:

XmlReader-derived class	Description
XmlTextReader	Reads text (also known as character streams). Has no support for DTD or schema.
XmlNodeReader	Reads DOM nodes. In other words, it takes an XmlNode object and reads whatever nodes it finds in it. Has no support for DTD or schema, but can resolve entities defined in DTD.
XmlValidatingReader	A reader that provides DTD, XSD, and XDR schema validation. Takes an XmlTextReader and add validation services to it.

The table below describes which reader to use for each scenario:

Scenario	Reader to use	ValidationType property	XmlResolver	Normalization property
Do not need any DTD or schema support but require good performance.	XmlTextReader	Not available	Set to null reference	false
Require XML to be well-formed, including external entities	XmlTextReader	Not available	Set to non-null reference	true
Require XML to be well-formed and valid according to a DTD or schema	XmlValidatingReader	Auto or DTD	Set to non-null reference	Set to true on XmlTextReader before passing XmlTextReader to XmlValidatingReader
Require XML to be well-formed when streaming XML data from an XmlNode	XmlNodeReader	Not available	Set to non-null reference	Not available

Note that when the Normalization property is true, it removes all white space in attribute values and does character-range checking. As a result, setting Normalization = true results in a lower performance. On the other hand. XmlResolver is used to resolve external DTD and schema location references. XmlResolver is also used to handle any import or include elements found in the XSD. Setting the XmlResolver property to null means that the application will not resolve any external references. If this property is set to null but an external DTD or entity is encountered, the external DTD or entity will be ignored.

XmlTextReader

The previous section showed how to use various features found in all XmlReader-derived classes. This section focuses on features specific to the XmlTextReader class. XmlTextReader is an XmlReader-derived class and provides a fast XML parser. It enforces the rules that the XML must be well-formed but it does not perform any validations and has no support for DTD or XSD Schemas. XmlTextReader provides the following functionalities:

Checks that the XML is well-formed.
Checks that the DTD is well-formed, but does not use DTD for validation, expanding entity references, or adding default attributes.
Checks the well-formedness of any DOCTYPE nodes.
Checks the well-formedness of entities.
Can read data from different inputs such as streams, TextReaders, and files.
Does not validate against DTDs or Schemas (hence the XmlTextReader has good performance.)

Full Content Reads

XmlTextReader methods ReadChars, ReadBinHex, and ReadBase64 are used to read large streams of embedded content. The difference is that ReadChars reads content as text, ReadBase64 reads content as Base64-encoded text, and ReadBinHex reads content as binary or hex (which can be useful if the XML contains international text, images or even video). These methods perform as follows:

They are only available on elements. Calling them on any other node type does nothing.
The actual character content of the element is returned. In other words, it returns everything between the start and end tags. Does not resolve entities, CDATA or any other markup within the start and end tags.
There is no normalization, even if the Normalization property was set.
ReadChar returns "0" when it reaches the end of its character stream.

All three methods, methods ReadChars, ReadBinHex, and ReadBase64 have the following signature (except for ReadChars where the first parameter is char[] rather than byte[]):

ReadX( byte[] Buffer, int BufferOffset, int DataLength );

The first parameter refers to the buffer where content will be written, the second parameter refers to the location within the buffer where content should be written, and the last parameter specifies the number characters/bytes to write to the buffer. Note that being able to specify a buffer and how much to read means that these three methods can be very efficient for processing very large streams of text/data embedded within the XML. Otherwise, you would have to allocate and deal with very large arrays.

The overall pattern for using these methods is simple. Position the reader on an element using any of the XmlTextReader methods then call one of these methods successively to read content a chunk at at time. In the following an XML document containing an image is loaded and displayed:

<Image name="Sunset">
ABC123DDcDAEB563 ...
</Image>

private void btnReadBinHex_Click(object sender, System.EventArgs e)
{
    try
    {
        // Load XML document and jump to the <Image> document element
        XmlTextReader reader = new XmlTextReader( @"..\..\ReadHexBinData.xml" );
        XmlNodeType nodeType = reader.MoveToContent();

        // Read the image into a memory stream. Note how ReadBase64 is called
        // successively to read the entire image data
        System.IO.MemoryStream memstream = new System.IO.MemoryStream();
        byte[] abData = new byte[100];
        int nByteCount = reader.ReadBase64( abData, 0, 100 );
        while (nByteCount != 0)
        {
            memstream.Write( abData, 0, nByteCount );
            nByteCount = reader.ReadBase64( abData, 0, 100 );
        }

        // Now display image
        pictureBox1.Image = Bitmap.FromStream( memstream );
    }
    catch (Exception ex)
    {
        Trace.WriteLine( ex.Message );
    }
}

Document Type Information

DOCTYPE Review

To better understand the DOCTYPE declaration, a brief overview of Document Type Definitions (DTD) is required. An XML processor uses DTDs ( a collection of DOCTYPE declarations) at runtime to validate a given XML file against a predefined XML schema. DTD syntax can be a bit overwhelming, but it basically uses different syntactical elements (like !, (, ), * and others) to define what elements are required in the XML document, what elements are optional, how many occurrences of a given element are allowed, and so on. For example, consider the following DTD document saved in food.dtd:

<!ELEMENT hamburgers    (hamburger*)>
<!ELEMENT hamburger     (name, description, price)>
<!ATTLIST hamburger     lowfat CDATA #IMPLIED>
<!ELEMENT name          (#PCDATA)>
<!ELEMENT description   (#PCDATA)>
<!ELEMENT price         (#PCDATA)>

This document says the following:

A hamburgers element can contains many hamburger elements.
Each hamburger element must contain three sub-elements - name, description, and price.
Each hamburger element must contain an attribute named lowfat.
The three sub-elements of hamburger - name, description, and price must all be Parsed Character Data (PCDATA).

An XML document that wants to conform to this DTD would then have to add the following line:


<!DOCTYPE hamburgers SYSTEM "food.dtd" >
...

Based on the above, a DOCTYPE declaration is used to allow a document to identify its root element and the associated Document Type Definition (DTD) by reference to an external file. A DOCTYPE declaration, is usually required if the document is to processed by a validating environment.

A DOCTYPE declaration usually contains the following:

The name of the root element.
SYSTEM and PUBLIC identifiers for the DTD that can be used to validate the document structure.
An internal set of DTD declarations that appear inside square brackets ([ ]).

Note the following DOCTYPE declarations:

<!DOCTYPE MyRootElement>
The simplest DOCTYPE declaration that identifiers the root element
<!DOCTYPE MyRootElement SYSTEM "www.xyz.com\DTD\books.dtd" >
Identifies the root element and a URI reference for a file containing declarations that make up the DTD file.
<!DOCTYPE MyRootElement PUBLIC "c:\DTD\books.dtd "www.xyz.com\DTD\books.dtd">
Same as above except that the new URL provides a separate identifier that some parsers can use to reference the DTD instead of the given URL. This usually useful if the parser is used on a system when the network might not always be available.
A DOCTYPE declaration can also include declarations directly in an internal subset without having to refer to an external file. In this case, the following syntax would be used:
<!DOCTYPE MyRootElement
[
declarations
]>

XmlTextReader and DOCTYPE

The XmlTextReader ensures that the DOCTYPE declaration is well formed but does not do any DTD validation. When you call Read on a DOCTYPE, the returned NodeType is DocumentType. The PublicLiteral and SystemLiteral are considered attributes, whose names are PUBLIC and SYSTEM, respectively. To retrieve the content of these two attributes use GetAttribute or any other attribute accessing method. For example, if you have the following DOCTYPE declaration:

<!DOCTYPE Books SYSTEM \\SomeMachineName\DTD\books.dtd [<!ENTITY e 'ent'>] >

reader.GetAttribute( "SYSTEM" ); // "\\SomeMachineName\DTD\books.dtd"
reader.Value; // <!ENTITY e 'ent'>

Handling White Space

White space is any of the following characters:

Space - 0x20
Carriage Return - 0x0D
Line Feed - 0x0A
Horizontal Tab - 0x09

White space is categorized as significant and insignificant as follows:

Significant white space is any white space inside a mixed content mode, or inside the content of a xml:space="preserve". In general, significant white space is any white space you want to have preserved from the original document to the final document.
Insignificant white space is white space that you do not need to preserve from the reading of the original document to the final document.

XmlTextReader.WhitespaceHandling property determines whether white space should be considered significant or insignificant. This property can be changed anytime and it takes effect on the next Read operation. The following example illustrates where green represents insignificant white space and red represents significant white space:

<test>_
____<item>_
________<subitem xml:space="preserve">_
_________</subitem>_
____</item>_
</test>_

Attribute Value Normalization

XmlTextReader.Normalization property determines whether white space normalization and attribute normalization should be done or not. The following describes what is meant by normalization:

For a character reference, append the referenced character to the attribute value.
For an entity reference, recursively process the replacement text of the entity.
For white space characters 0x20, 0x0D, 0x0A, 0x09) append 0x020 to the normalized value.
Process other characters by appending them to the normalized value.
If the declared value is not CDATA, then discard any leading and trailing white space (0x020), and replace sequences of space (0x02) with a single space (0x02).

For example, given the follwing XML fragment:

Normalization affect how attr1 attribute is read as follows:

// reader.Normalization = false;
Attribute value: test A B C
1 2 3

//reader.Normalization = true;
Attribute value: test A B C 1 2 3

XmlNodeReader

XmlNodeReader provides a reader over a given DOM node sub-tree. Because XmlNodeReader derives from XmlReader, XmlNodeReader provides a fast, non-cached, forward-only access to XML data in an XmlNode. XmlNodeReader provides the following functionality:

Ensures that XML is well-formed.
Expands default attributes and entities if DTD information is present in the document.
Does not support DTD or XML Schema validation.

For example:

XmlNode node = GetNodeFromDocument( doc );
XmlNodeReader reader = XmlNodeReader( node );

while (reader.Read())
// Process node

XmlValidatingReader

XmlValidatingReader represents a reader that provides DTD and XML Schema validation. It is similar to XmlTextReader, except that XmlValidatringReader performs validation as well. The overall pattern for using XmlValidatingReader is:

Instantiate an XmlValidatingReader using one of the available constructors. Most often you will use the construtor that takes an XmlTextReader as argument
Set XmlValidatingReader.ValidationType to one of ValidationTypes enum values (None, Auto, DTD, Schema, XDR).
If the XmlValidatingReader was set to validate (XmlValidatingReader.ValidationType is not None), then set a handler for the ValidationEventHandler.
Now process the XML just as if you were using an XmlReader. While reading XML, any validation errors will trigger the ValidationEventHandler event.

Validating XML will be covered in more depth in Validation of XML with Schemas chapter.