Reading XML with XmlReader

Summary

Overview

The System.Xml namespace contains the following namespaces and core classes:

Namespace Core Classes

System.Xml.*

XmlReader and XmlWriter: Forward only, non-cached reader and writer.
XmlDocument: Represents an XML document in a W3C-style DOM.

System.Xml.Path

Infrastructure and API for XPath

System.Xml.XmlSchema

Infrastructure and API for XSD schemas.

System.Xml.Xsl

Infrastructure and API for XSLT transformations.
System.Xml.Serialization Serializing objects to XML and deserializing XML to objects.
System.Xml.Linq LINQ-centric version of XmlDocument

XmlReader is an abstract base class used to provide non-cached, forward-only, read-only access. XmlReader has methods and properties to:

When reading XML, XmlReader checks that the XML is well-formed and throws XmlException if an error is encountered. Because XmlReader allows you to read from potentially slow resources (streams and URIs), it offers asynchronous versions of most of its methods so that you can easily write non-blocking code.

The typical usage for XmlReader is to Instantiate an XmlReader using XmlReader.Create passing in:

Another useful property on XmlReaderSettings is ConformanceLevel. Its default value of Document instructs the reader to assume a valid XML document with a single root node. If you wanted to read just the inner portion of the file, containing multiple nodes, set the ConformanceLevel to Fragment:

<Book Name="ABC">
        <Author>Yazan Diranieh</Author>
        <ISBN>12345</ISBN>
        <Publisher>Publisher 1</Publisher>
</Book>
<Book Name="XYZ">
        <Author>Yazan Diranieh</Author>
        <ISBN>6789</ISBN>
        <Publisher>Publisher 2</Publisher>
</Book>

Another useful property on XmlReaderSettings is CloseInput,  which indicates whether to close the underlying stream when the reader is closed. The default value is false.

The units of an XML stream are XML nodes. The following are all possible nodes that you may encounter with XmlReader:

public enum XmlNodeType
{
    None = 0,                   // Read method has not been
    Element = 1,
    Attribute = 2,
    Text = 3,                   // The text content of a node.
    CDATA = 4,                  // For example, <![CDATA[my escaped text]]>
    EntityReference = 5,        // A reference to an entity 
    Entity = 6,                 // For example, <!ENTITY...>
    ProcessingInstruction = 7,   // For example, <?pi test?>
    Comment = 8,
    Document = 9,               // A document object that, as the root of the document tree,
                                // provides access to the entire XML document.        
    DocumentType = 10,          // For example, <!DOCTYPE...>
    DocumentFragment = 11,
    Notation = 12,              // A For example, <!NOTATION...>        
    Whitespace = 13,            // White space between markup.        
    SignificantWhitespace = 14, // White space between markup in a mixed content model 
                                // or white space within the xml:space="preserve" scope.
    EndElement = 15,            // An end element tag (for example, </item> ).
    EndEntity = 16,             // Returned when XmlReader gets to the end of the entity 
                                // replacement as a result of a call to XmlReader.ResolveEntity()
    XmlDeclaration = 17,        // for example, <?xml version='1.0'?>
}

The reader traverses the stream in textual (depth-first) order. The Depth property of the reader returns the current depth of the reader. The most primitive way to read from an XmlReader is using Read() which advances to the next node in the stream (just like MoveNext in IEnumerator). The first call to Read positions the cursor at the first node. Successive calls to Read advance the cursor to the next node (where an XML node canbe any of those found in XmlNodeType enum), until the cursor advances past the last node, at which point Read returns false and the reader should be closed and abandoned.

The following shows how to read each node and output its type (note that attributes are not included in read-based traversal):

static public void BasicUsage3()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile1.xml", settings))
    {
        // Loop over all nodes
        while (reader.Read())
        {
            Trace.Write(new string(' ', reader.Depth * 2));
            Trace.WriteLine(reader.NodeType);
        }
    }
}
<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
<Books>
    <Book Name="ABC">
        <Author>Yazan Diranieh</Author>
        <ISBN>12345</ISBN>
        <Publisher>Publisher 1</Publisher>
    </Book>
    <Book Name="XYZ">
        <Author>Yazan Diranieh</Author>
        <ISBN>6789</ISBN>
        <Publisher>Publisher 2</Publisher>
    </Book>
</Books>
XmlDeclaration
Comment
Element
  Element
    Element
      Text
    EndElement
    Element
      Text
    EndElement
    Element
      Text
    EndElement
  EndElement
  Element
    Element
      Text
    EndElement
    Element
      Text
    EndElement
    Element
      Text
    EndElement
  EndElement
EndElement

When working and debugging XmlReader code, two important string properties on XmlReader are Name and Value. Depending on the the node type, either Name or Value or both are populated.

Quick overview of CDATA, ENTITY, and DOCTYPE:

When this XML document is parsed, the output will be (note the contents of <footer>):

<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
  <footer>Writer: Donald Duck. Copyright: W3Schools.</footer>
</note>

Back to XmlReader, the following XML is used which includes a document type, entity, CDATA, and a comment:

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
 
<!DOCTYPE Book [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Yazan Diranieh.">
<!ENTITY web "www.diranieh.com">
]>
 
<Book Name="ABC">
    <Author>Yazan Diranieh</Author>
    <ISBN>12345</ISBN>
    <Quote><![CDATA[ C# operators include <, >, & and others]]></Quote>
    <Publisher>Publisher 1</Publisher>
    <Footer>&writer;&nbsp;(&web;)</Footer>
</Book>
static public void NameAndValue()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile2.xml", settings))
    {
        while (reader.Read())
        {
            Trace.Write(reader.NodeType.ToString().PadRight(20, '-'));
            Trace.Write("> ".PadRight(reader.Depth*5));
            Trace.WriteLine(string.Format("Name = {0}, Value = {1}", reader.Name, reader.Value));
        }
    }
}
XmlDeclaration------> Name = xml, Value = version="1.0" encoding="utf-8"
Comment-------------> Name = , Value =  This is a comment 
DocumentType--------> Name = Book, Value = 
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY web "www.diranieh.com">
 
Element-------------> Name = Book, Value = 
Element------------->    Name = Author, Value = 
Text---------------->         Name = , Value = Yazan Diranieh
EndElement---------->    Name = Author, Value = 
Element------------->    Name = ISBN, Value = 
Text---------------->         Name = , Value = 12345
EndElement---------->    Name = ISBN, Value = 
Element------------->    Name = Quote, Value = 
CDATA--------------->         Name = , Value =  C# operators include <, >, & and others
EndElement---------->    Name = Quote, Value = 
Element------------->    Name = Publisher, Value = 
Text---------------->         Name = , Value = Publisher 1
EndElement---------->    Name = Publisher, Value = 
Element------------->    Name = Footer, Value = 
Text---------------->         Name = , Value = Writer: Yazan Diranieh. (www.diranieh.com)
EndElement---------->    Name = Footer, Value = 
EndElement----------> Name = Book, Value = 

Note how XmlReader automatically resolved entities.

Reading Elements

When the structure of the XML is know, XmlReader helps by providing a range of methods that read while presuming a specific structure.

ReadStartElement verifies that the current NodeType is Element and then calls Read (which advances the cursor to the next element). ReadEndElement verifies that the current NodeType is EndElement and then calls Read (which advances the cursor to the next element). For example, you can read this

<ISBN>12345</ISBN>

with

reader.ReadStartElement("ISBN");
string isbn = reader.Value;
reader.ReadEndElement();

The ReadElementContentAsString does all the above: it reads a start element, a text node, and an end element, returning the content as string:

string isbn = reader.ReadElementContentAsString("ISBN", "");

The following shows how to read this file

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
 
<!DOCTYPE Book [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Yazan Diranieh.">
<!ENTITY web "www.diranieh.com">
]>
 
<Book Name="ABC">
    <Author>Yazan Diranieh</Author>
    <ISBN>12345</ISBN>
    <Quote><![CDATA[ C# operators include <, >, & and others]]></Quote>
    <Publisher>Publisher 1</Publisher>
    <Footer>&writer;&nbsp;(&web;)</Footer>
</Book>
static public void ReadTestFile2()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;
    settings.IgnoreComments = true;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile2.xml", settings))
    {
        // MoveToContent is very useful. It allows you to skip XML Declarations, whitespace,
        // comments and processing instructions
        reader.MoveToContent();
        reader.Read();      // Read <Book>
        var author = reader.ReadElementContentAsString("Author""");
        var isbn = reader.ReadElementContentAsString("ISBN""");
        var quote = reader.ReadElementContentAsString("Quote""");
        var publisher = reader.ReadElementContentAsString("Publisher""");

        // Assume that <Foote> is optional. This is how to code around than
        var footer = (reader.Name == "Footer") ? reader.ReadElementContentAsString("Footer""") : null;
 
        reader.Read();  // Read </Book>
    }
}

Reading Attributes

XmlReader must be positioned on a start element in order to read attributes. Recall that ReadStartElement verifies that the current NodeType is Element and then calls Read (which advances the cursor to the next element). This means that attributes for that element are gone forever! To read attributes, XmlReader provides an indexer that gives direct (and random access) to any attribute on the current element, either by name or position. Note that using an indexer is the same as calling GetAttribute:

static public void ReadAttributes()
{
    string filename = @"Xml\TestFile1.xml";
    XmlReaderSettings settings = new XmlReaderSettings {IgnoreWhitespace = true};
    using (var reader = XmlReader.Create(filename, settings))
    {
        // Skip xml declarations and comments and go to the root element
        reader.MoveToContent();
 
        // Read attributes on all elements
        while (reader.Read())
        {
            if (reader.NodeType != XmlNodeType.Element) continue;
 
            // Read attributes on current element
            Trace.WriteLine("reading attributes on element " + reader.Name);
            for (int i = 0; i < reader.AttributeCount; i++)
                Trace.WriteLine(string.Format("attribute {0} = {1}", i, reader[i]));
        }
    }
}
reading attributes on element Book
attribute 0 = ABC
reading attributes on element Author
reading attributes on element ISBN
reading attributes on element Publisher
reading attributes on element Book
attribute 0 = XYZ
reading attributes on element Author
reading attributes on element ISBN
reading attributes on element Publisher

Attribute Nodes

An attribute is an XML node as defined by the XmlNodeType enum. To explicitly traverse attribute nodes (just as you would traverse element nodes) usually to pare attribute values into other types, you must make a special diversion from calling Read: The diversion must begin from a start element and the forward-only rule is relaxed during attribute traversal. You can jump to any attribute (forward or backward) using MoveToAttribute. Note that MoveToElement returns you to the start element from anyplace within the attribute node diversion. The following example illustrates:

static public void ReadAttributeNodes()
{
    string filename = @"Xml\TestFile3.xml";
    XmlReaderSettings settings = new XmlReaderSettings {IgnoreWhitespace = true};
    using (var reader = XmlReader.Create(filename, settings))
    {
        // Skip xml declarations and comments and go to the root element
        reader.MoveToContent();
 
        // Skip over all nodes until we reach an element with attributes
        while (reader.Read())
        {
            if (!reader.HasAttributes) continue;
 
            // Approach 1 to reading attribute nodes
            // Note that attribute "Author" comes after attribute "Name". Order is not important
            string author = null;
            string title = null;
            if (reader.MoveToAttribute("Author"))
                author = reader.ReadContentAsString();
 
            if (reader.MoveToAttribute("Title"))
                title = reader.ReadContentAsString();
            Trace.WriteLine(string.Format("Author = {0}. Book Title = {1}", author, title));
 
            // Approach 2 to reading attribute nodes. Here you can traaverse each attribute
            // in sequence
            if (reader.MoveToFirstAttribute())
                do
                {
                    string attributeName = reader.Name;
                    string attributeValue = reader.Value;
                    Trace.WriteLine(string.Format("Attribute name: {0}, Attribute value: {1}",
                                        attributeName, attributeValue));
 
                } while (reader.MoveToNextAttribute());
        }
    }            
}
<?xml version="1.0" encoding="utf-8" ?>
<!-- This is a comment -->
<Books>
  <Book Title="ABC" Author="Yazan Diranieh" ISBN="12345" />
  <Book Title="XYZ" Author="Yazan Diranieh" ISBN="67890" />
</Books>
Author = Yazan Diranieh. Book Title = ABC
Attribute name: Title, Attribute value: ABC
Attribute name: Author, Attribute value: Yazan Diranieh
Attribute name: ISBN, Attribute value: 12345
Author = Yazan Diranieh. Book Title = XYZ
Attribute name: Title, Attribute value: XYZ
Attribute name: Author, Attribute value: Yazan Diranieh
Attribute name: ISBN, Attribute value: 67890

Reading Node by Node

Recall that XmlReader provides a forward-only (and read-only, non-cached) access to an XML stream of file. The usual scenario for reading XML with XmlReader classes is:

  1. Instantiate an XmlReader as discussed above
  2. Load the reader with an XML stream or file.
  3. Use the XmlReader.Read method within a while loop to loop over all nodes in the document.
  4. Within the while loop, access various methods and properties of the current node. The current node is the node on which the reader is currently positioned. All methods and properties are with respect to the current node. Note that XmlReader properties and methods may not be available on every node type.

The following example illustrates how to read and process the Xml document above using XmlReader

static public void BasicUsage()
{
    // XmlTextReader is disposable
    using (XmlTextReader reader = new XmlTextReader(@"Xml\TestFile1.xml"))
    {
        // Loop over all nodes
        while (reader.Read())
        {
            DisplayNodeInfo(reader);
        }                
    }            
}
private static void DisplayNodeInfo(XmlTextReader reader)
{
    Trace.WriteLine(string.Format("Node Name: {0}, Node Type: {1}, Node Value: {2}",
            reader.Name, reader.NodeType,
            string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : 
					        reader.Value));
	 
    // Are there attributes on the current node?
    if (reader.HasAttributes)
    {
        // Current node has attributes
        Trace.WriteLine( string.Format("Current node has {0} attributes", reader.AttributeCount));
 
        // Loop over all attributes for current node
        for (int i = 0; i < reader.AttributeCount; i++)
        {
            // When positioned on an element, use MoveToAttribute to move to a specific attribute
            reader.MoveToAttribute(i);
            Trace.WriteLine(string.Format("Node Name: {0}, Node Type: {1}, Node Value: {2}",
                reader.Name, reader.NodeType, 
                string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : 
					            reader.Value));
        }
 
        // Move the reader back to the element node that owned the last attribute
        reader.MoveToElement();
        Trace.WriteLine(string.Format("Moved reader back to Node Name: {0}. Node Type: {1}, Node Value: {2}",
                reader.Name, reader.NodeType, 
                string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) :  
						   reader.Value));
    }
}
private static string GetEmptyStringDisplay(string data)
{
    if (data == string.Empty)
        return "(Empty)";
 
    string display = string.Empty;
    foreach (char c in data)
    {
        switch (c)
        {
            case '\r':
                display += "\\r";
                break;
            case '\n':
                display += "\\n";
                break;
            case ' ':
            default:
                display += "_";
                break;
        }
    }
    return display;
}

Output (indented for clarity)

Node Name: xml, Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Current node has 2 attributes
Node Name: version, Node Type: Attribute, Node Value: 1.0
Node Name: encoding, Node Type: Attribute, Node Value: utf-8
Moved reader back to Node Name: xml. Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Node Name: , Node Type: Whitespace, Node Value: \n
Node Name: , Node Type: Comment, Node Value:  This is a comment 
Node Name: , Node Type: Whitespace, Node Value: \n
Node Name: Books, Node Type: Element, Node Value: (Empty)
Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: ABC
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 12345
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 1
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: XYZ
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 6789
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n________
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 2
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
		Node Name: , Node Type: Whitespace, Node Value: \n____
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
	Node Name: , Node Type: Whitespace, Node Value: \n
Node Name: Books, Node Type: EndElement, Node Value: (Empty)

Using XmlReaderSettings we can ignore Whitespace nodes to give the following output:

static public void BasicUsage2()
{
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreWhitespace = true;
    using (XmlReader reader = XmlReader.Create(@"Xml\TestFile1.xml", settings))
    {
        // Loop over all nodes
        while (reader.Read())
        {
            DisplayNodeInfo(reader);
        }
    }
}
Node Name: xml, Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Current node has 2 attributes
Node Name: version, Node Type: Attribute, Node Value: 1.0
Node Name: encoding, Node Type: Attribute, Node Value: utf-8
Moved reader back to Node Name: xml. Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8"
Node Name: , Node Type: Comment, Node Value:  This is a comment 
Node Name: Books, Node Type: Element, Node Value: (Empty)
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: ABC
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 12345
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 1
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
	Node Name: Book, Node Type: Element, Node Value: (Empty)
	Current node has 1 attributes
	Node Name: Name, Node Type: Attribute, Node Value: XYZ
	Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty)
		Node Name: Author, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Yazan Diranieh
		Node Name: Author, Node Type: EndElement, Node Value: (Empty)
		Node Name: ISBN, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: 6789
		Node Name: ISBN, Node Type: EndElement, Node Value: (Empty)
		Node Name: Publisher, Node Type: Element, Node Value: (Empty)
		Node Name: , Node Type: Text, Node Value: Publisher 2
		Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
	Node Name: Book, Node Type: EndElement, Node Value: (Empty)
Node Name: Books, Node Type: EndElement, Node Value: (Empty)

Note the usage of MoveToAttribute to go through the attribute list of the current element. After this method has been called, certain properties such Name, NamespaceURI, and Prefix, reflect the properties of the attribute and not its containing element.

Also note that an attribute does not always need to be specified on an element. In many cases, a DTD or a schema defines default values for attributes on elements. When using methods that move among attributes, attributes that receive values from a DTD or a schema are acted upon just like attributes that are given values in the XML stream. You can programmatically determine if an attribute has received its value from the DTD or schema using the IsDefault property which returns true if the current node is an attribute that was generated from the default value defined in the DTD or schema.

Note: IsDefault does not apply to all XmlReader-derived classes. For example, XmlTextReader.IsDefault always returns false because XmlTextReader has no support for DTDs and schemas. On the other hand XmlValidatingReader.IsDefault returns true if the current node is an attribute whose value was generated from the default value defined in the DTD or schema. (and returns false if the value was explicitly specified in the XML stream.)

Reading Inner and Outer Content

Another way to read XML with XmlReader-derived classes is to use ReadInnerXml and/or ReadOuterXml  methods. These methods work as follows:

The following tables illustrate:

Node Type Content Return Value Position
Element <Book>
   <Author>Yazan</Author>
</Book>
ReadInnerXml:
<Author>Yazan</Author>

ReadOuterXml:
<Book>
   <Author>Yazan</Author>
</Book>

ReadInnerXml and ReadOuterXml:
After the ending element </Book>

Attribute <Books Publisher="Somebody"/> ReadInnerXml:
Somebody

ReadOuterXml:
Publisher="Somebody"

ReadInnerXml and ReadOuterXml:
Remains on the attribute node Publisher

Skipping Content

Content can be skipped in two ways:

  1. Move directly to content using MoveToContent.
  2. Skip over child nodes of the current node using Skip.
Moving directly to content

A content node is defined to be any of the following:

You can move to any of these nodes using MoveToContent - this method checks if the current node is a content node, and if it is not one of the preceding node types, it skips over it until it finds the next content node or the end of file. If the current node is a content node, then MoveToContent returns the current node and does not move to a new content node. This implies that MoveToContent skips over the following nodes:

This type of navigation is more efficient if the application only needs to deal with content rather than calling Read that moves the reader to the next node, force the application to determine the type of the node, and finally read its content if that node type supports content. 

If the reader is already positioned on an attribute, then calling MoveToContent moves the current node position to the owning element for that attribute. MoveToContent returns XmlNodeType which specifies the type of the current node (this information is typically used to skip over random XML). The following example illustrates how to use MoveToContent using the same Xml file above:

static public void ContentUsage()
{
    XmlNodeType nodeType;
    using (XmlTextReader reader = new XmlTextReader(@"Xml\TestFile1.xml"))
    {                
        // Calling MoveToContent multiple times does not move the reader to the next content
        nodeType = reader.MoveToContent();
        DisplayNodeInfo(reader);
        nodeType = reader.MoveToContent();
        DisplayNodeInfo(reader);
 
       while (reader.Read())
       {
          Trace.WriteLine(string.Format("Before MoveToContent: Node Name: {0}, Node Type: {1}, Node Value: {2}",
               reader.Name, reader.NodeType,
               string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value));
 
          reader.MoveToContent();
 
          Trace.WriteLine(string.Format("After MoveToContent Node Name: {0}, Node Type: {1}, Node Value: {2}",
              reader.Name, reader.NodeType,
              string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value));
 
          Trace.WriteLine("");
      }
    }
}
Node Name: Books, Node Type: Element, Node Value: (Empty)
Node Name: Books, Node Type: Element, Node Value: (Empty)

Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n____
After MoveToContent   Node Name: Book,      Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: Author,    Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: Yazan Diranieh
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: Yazan Diranieh
 
Before MoveToContent: Node Name: Author,    Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: Author,    Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: ISBN,      Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: 12345
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: 12345
 
Before MoveToContent: Node Name: ISBN,      Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: ISBN,      Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: Publisher, Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: Publisher 1
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: Publisher 1
 
Before MoveToContent: Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n____
After MoveToContent   Node Name: Book,     Node Type: EndElement,  Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n____
After MoveToContent   Node Name: Book,     Node Type: Element,     Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n________
After MoveToContent   Node Name: Author,   Node Type: Element,     Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Text,        Node Value: Yazan Diranieh
After MoveToContent   Node Name: ,         Node Type: Text,        Node Value: Yazan Diranieh
 
Before MoveToContent: Node Name: Author,   Node Type: EndElement,  Node Value: (Empty)
After MoveToContent   Node Name: Author,   Node Type: EndElement,  Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Whitespace,  Node Value: \n________
After MoveToContent   Node Name: ISBN,     Node Type: Element,     Node Value: (Empty)
 
Before MoveToContent: Node Name: ,         Node Type: Text,        Node Value: 6789
After MoveToContent   Node Name: ,         Node Type: Text,        Node Value: 6789
 
Before MoveToContent: Node Name: ISBN,     Node Type: EndElement,  Node Value: (Empty)
After MoveToContent   Node Name: ISBN,     Node Type: EndElement,  Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n________
After MoveToContent   Node Name: Publisher, Node Type: Element,    Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Text,       Node Value: Publisher 2
After MoveToContent   Node Name: ,          Node Type: Text,       Node Value: Publisher 2
 
Before MoveToContent: Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
After MoveToContent   Node Name: Publisher, Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n____
After MoveToContent   Node Name: Book,      Node Type: EndElement, Node Value: (Empty)
 
Before MoveToContent: Node Name: ,          Node Type: Whitespace, Node Value: \n
After MoveToContent   Node Name: Books,     Node Type: EndElement, Node Value: (Empty)
Skipping over content

The Skip method is used to move over the current element. In other words, if the node type is XmlNodeType.Element, then calling Skip moves over all of the content of the current element and its end tag. For example, if you have the following XML:

<a name="MyName" ID="1">
    <x>123</x>
    <y>456</y
</a>
<b>
...
</b>

and you are positioned on the <a> node or any of its attributes, then calling Skip positions you on the next element which is <b>. And if you are positioned on the <x> node, then calling Skip positions you on the next element which is <y>.

Namespaces and Prefixes

XmlReader provides two parallel systems for referring to element and attribute names:

You're using the first system whenever you read an element's Name property or call a method that accepts a single name argument. This works well if no namespaces or prefixes are present. If namespaces and prefixes are present, namespaces are ignored and prefixed become part of the name:

XML Fragment Name Code
<book ...> book reader.ReadStartElement("book")
<book xmlns='blah' .../> book reader.ReadStartElement("book")
<x:book ... /> x:book reader.ReadStartElement("x:book")

The second system works through two properties aware of namespaces: LocalName and NamespaceURI. These properties take into account prefixes and default namespaces defined by parent elements. Prefixes are automatically expanded meaning that NamespaceURI always reflects the correct namespace for the current element, and LocalName is always free of prefixes.

When you pass two name arguments into a method such as ReadStartElement, you're using this name system. For example, consider this XML

<customer xmlns="MyNamespace" xmlns:other="MyOtherNamespace">
  <address>
    <other:city>SomeCity</other:city>    
    ...
  </address>    
  ...
</customer>

We could read this as follows. Note that prefixes are abstracted away:

reader.ReadStartElement("customer""MyNamespace");
reader.ReadStartElement("address""MyNamespace");
reader.ReadStartElement("city""MyOtherNamespace");

If required, you can see what prefix was used using the Prefix property and converting it into a namespace by calling LookupNamespace.

Obsolete Classes

The .NET Framework extends XmlReader in the following classes:

The following table describes the implementation of each of these classes. Recall that they all derive from XmlReader and hence they all provide forward-only, read-only, non-cached access to XML:

XmlReader-derived class Description
XmlTextReader Reads text (also known as character streams). Has no support for DTD or schema.
XmlNodeReader Reads DOM nodes. In other words, it takes an XmlNode object and reads whatever nodes it finds in it. Has no support for DTD or schema, but can resolve entities defined in DTD.
XmlValidatingReader A reader that provides DTD, XSD, and XDR schema validation. Takes an XmlTextReader and add validation services to it.

The table below describes which reader to use for each scenario:

Scenario Reader to use ValidationType property XmlResolver Normalization property
Do not need any DTD or schema support but require good performance. XmlTextReader Not available Set to null reference false
Require XML to be well-formed, including external entities XmlTextReader Not available Set to non-null reference true
Require XML to be well-formed and valid according to a DTD or schema XmlValidatingReader Auto or DTD Set to non-null reference Set to true on XmlTextReader before passing XmlTextReader to XmlValidatingReader
Require XML to be well-formed when streaming XML data from an XmlNode XmlNodeReader Not available Set to non-null reference Not available

Note that when the Normalization property is true, it removes all white space in attribute values and does character-range checking. As a result, setting Normalization = true results in a lower performance. On the other hand. XmlResolver is used to resolve external DTD and schema location references. XmlResolver is also used to handle any import or include elements found in the XSD. Setting the XmlResolver property to null means that the application will not resolve any external references. If this property is set to null but an external DTD or entity is encountered, the external DTD or entity will be ignored.

XmlTextReader

The previous section showed how to use various features found in all XmlReader-derived classes. This section focuses on features specific to the XmlTextReader class. XmlTextReader is an XmlReader-derived class and provides a fast XML parser. It enforces the rules that the XML must be well-formed but it does not perform any validations and has no support for DTD or XSD Schemas. XmlTextReader provides the following functionalities:

Full Content Reads

XmlTextReader methods ReadChars, ReadBinHex, and ReadBase64 are used to read large streams of embedded content. The difference is that ReadChars reads content as text, ReadBase64 reads content as Base64-encoded text, and ReadBinHex reads content as binary or hex (which can be useful if the XML contains international text, images or even video). These methods perform as follows:

All three methods, methods ReadChars, ReadBinHex, and ReadBase64 have the following signature (except for ReadChars where the first parameter is char[] rather than byte[]):

ReadX( byte[] Buffer, int BufferOffset, int DataLength );

The first parameter refers to the buffer where content will be written, the second parameter refers to the location within the buffer where content should be written, and the last parameter specifies the number characters/bytes to write to the buffer. Note that being able to specify a buffer and how much to read means that these three methods can be very efficient for processing very large streams of text/data embedded within the XML. Otherwise, you would have to allocate and deal with very large arrays.

The overall pattern for using these methods is simple. Position the reader on an element using any of the XmlTextReader methods then call one of these methods successively to read content a chunk at at time. In the following an XML document  containing an image is loaded and displayed:

<Image name="Sunset">
ABC123DDcDAEB563 ...
</Image>

private void btnReadBinHex_Click(object sender, System.EventArgs e)
{
    try    
    {
        // Load XML document and jump to the <Image> document element
        XmlTextReader reader = new XmlTextReader( @"..\..\ReadHexBinData.xml" );
        XmlNodeType nodeType = reader.MoveToContent();

        // Read the image into a memory stream. Note how ReadBase64 is called
        // successively to read the entire image data

        System.IO.MemoryStream memstream = new System.IO.MemoryStream();
        byte[] abData = new byte[100];
        int nByteCount = reader.ReadBase64( abData, 0, 100 );
        while (nByteCount != 0)
        {
            memstream.Write( abData, 0, nByteCount );
            nByteCount = reader.ReadBase64( abData, 0, 100 );
        }

        // Now display image
        pictureBox1.Image = Bitmap.FromStream( memstream );
    }
    catch (Exception ex)
    {
        Trace.WriteLine( ex.Message );
    }
}

Document Type Information
DOCTYPE Review

To better understand the DOCTYPE declaration, a brief overview of Document Type Definitions (DTD) is required. An XML processor uses DTDs ( a collection of DOCTYPE declarations) at runtime to validate a given XML file against a predefined XML schema. DTD syntax can be a bit overwhelming, but it basically uses different syntactical elements (like !, (, ), * and others) to define what elements are required in the XML document, what elements are optional, how many occurrences of a given element are allowed, and so on. For example, consider the following DTD document saved in food.dtd:

<!ELEMENT hamburgers    (hamburger*)>
<!ELEMENT hamburger     (name, description, price)>
<!ATTLIST hamburger     lowfat CDATA #IMPLIED>
<!ELEMENT name          (#PCDATA)>
<!ELEMENT description   (#PCDATA)>
<!ELEMENT price         (#PCDATA)>

This document says the following:

An XML document that wants to conform to this DTD would then have to add the following line:

<!--  This statement tells the parser to validate the content of the containing
XML document against the schema described in the DTD -->

<!DOCTYPE hamburgers SYSTEM "food.dtd" >
...

Based on the above, a DOCTYPE declaration is used to allow a document to identify its root element and the associated Document Type Definition (DTD) by reference to an external file. A DOCTYPE declaration, is usually required if the document is to processed by a validating environment.

A DOCTYPE declaration usually contains the following:

 Note the following DOCTYPE declarations:

XmlTextReader and DOCTYPE

The XmlTextReader ensures that the DOCTYPE declaration is well formed but does not do any DTD validation. When you call Read on a DOCTYPE, the returned NodeType is DocumentType. The PublicLiteral and SystemLiteral are considered attributes, whose names are PUBLIC and SYSTEM, respectively. To retrieve the content of these two attributes use GetAttribute or any other attribute accessing method. For example, if you have the following DOCTYPE declaration:

<!DOCTYPE Books SYSTEM \\SomeMachineName\DTD\books.dtd [<!ENTITY e 'ent'>] >

reader.GetAttribute( "SYSTEM" );        // "\\SomeMachineName\DTD\books.dtd"
reader.Value;                           // <!ENTITY e 'ent'> 

Handling White Space

White space is any of the following characters:

White space is categorized as significant and insignificant as follows:

XmlTextReader.WhitespaceHandling property determines whether white space should be considered significant or insignificant. This property can be changed anytime and it takes effect on the next Read operation. The following example illustrates where green represents insignificant white space and red represents significant white space:

<test>_
____<item>_
________<subitem xml:space="preserve">_
_________</subitem>_
____</item>_
</test>_

Attribute Value Normalization

XmlTextReader.Normalization property determines whether white space normalization and attribute normalization should be done or not.  The following describes what is meant by normalization:

For example, given the follwing XML fragment:

<item attr1=' test A B C
           1 2 3'/>
<item attr2='&#01;'/>

Normalization affect how attr1 attribute is read as follows:

// reader.Normalization = false;
Attribute value: test A B C
           1 2 3

//reader.Normalization = true;
Attribute value: test A B C             1 2 3

XmlNodeReader

XmlNodeReader provides a reader over a given DOM node sub-tree. Because XmlNodeReader derives from XmlReader, XmlNodeReader provides a fast, non-cached, forward-only access to XML data in an XmlNode. XmlNodeReader provides the following functionality:

For example:

XmlNode node = GetNodeFromDocument( doc );
XmlNodeReader reader = XmlNodeReader( node );

while (reader.Read())
    // Process node

XmlValidatingReader

XmlValidatingReader represents a reader that provides DTD and XML Schema validation. It is similar to XmlTextReader, except that XmlValidatringReader performs validation as well. The overall pattern for using XmlValidatingReader is:

Validating XML will be covered in more depth in Validation of XML with Schemas chapter.