The System.Xml namespace contains the following namespaces and core classes:
Namespace | Core Classes |
System.Xml.* |
XmlReader and
XmlWriter: Forward only, non-cached reader
and writer. |
System.Xml.Path |
Infrastructure and API for XPath |
System.Xml.XmlSchema |
Infrastructure and API for XSD schemas. |
System.Xml.Xsl |
Infrastructure and API for XSLT transformations. |
System.Xml.Serialization | Serializing objects to XML and deserializing XML to objects. |
System.Xml.Linq | LINQ-centric version of XmlDocument |
XmlReader is an abstract base class used to provide non-cached, forward-only, read-only access. XmlReader has methods and properties to:
When reading XML, XmlReader checks that the XML is well-formed and throws XmlException if an error is encountered. Because XmlReader allows you to read from potentially slow resources (streams and URIs), it offers asynchronous versions of most of its methods so that you can easily write non-blocking code.
The typical usage for XmlReader is to Instantiate an XmlReader using XmlReader.Create passing in:
A stream, a TextReader or a URI string.
IgnoreWhitespace
Another useful property on XmlReaderSettings is ConformanceLevel. Its default value of Document instructs the reader to assume a valid XML document with a single root node. If you wanted to read just the inner portion of the file, containing multiple nodes, set the ConformanceLevel to Fragment:
<Book Name="ABC"> <Author>Yazan Diranieh</Author> <ISBN>12345</ISBN> <Publisher>Publisher 1</Publisher> </Book> <Book Name="XYZ"> <Author>Yazan Diranieh</Author> <ISBN>6789</ISBN> <Publisher>Publisher 2</Publisher> </Book>
Another useful property on XmlReaderSettings is CloseInput, which indicates whether to close the underlying stream when the reader is closed. The default value is false.
The units of an XML stream are XML nodes. The following are all possible nodes that you may encounter with XmlReader:
public enum XmlNodeType { None = 0, // Read method has not been Element = 1, Attribute = 2, Text = 3, // The text content of a node. CDATA = 4, // For example, <![CDATA[my escaped text]]> EntityReference = 5, // A reference to an entity Entity = 6, // For example, <!ENTITY...> ProcessingInstruction = 7, // For example, <?pi test?> Comment = 8, Document = 9, // A document object that, as the root of the document tree, // provides access to the entire XML document. DocumentType = 10, // For example, <!DOCTYPE...> DocumentFragment = 11, Notation = 12, // A For example, <!NOTATION...> Whitespace = 13, // White space between markup. SignificantWhitespace = 14, // White space between markup in a mixed content model // or white space within the xml:space="preserve" scope. EndElement = 15, // An end element tag (for example, </item> ). EndEntity = 16, // Returned when XmlReader gets to the end of the entity // replacement as a result of a call to XmlReader.ResolveEntity() XmlDeclaration = 17, // for example, <?xml version='1.0'?> }
The reader traverses the stream in textual (depth-first) order. The Depth property of the reader returns the current depth of the reader. The most primitive way to read from an XmlReader is using Read() which advances to the next node in the stream (just like MoveNext in IEnumerator). The first call to Read positions the cursor at the first node. Successive calls to Read advance the cursor to the next node (where an XML node canbe any of those found in XmlNodeType enum), until the cursor advances past the last node, at which point Read returns false and the reader should be closed and abandoned.
The following shows how to read each node and output its type (note that attributes are not included in read-based traversal):
static public void BasicUsage3() { XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; using (XmlReader reader = XmlReader.Create(@"Xml\TestFile1.xml", settings)) { // Loop over all nodes while (reader.Read()) { Trace.Write(new string(' ', reader.Depth * 2)); Trace.WriteLine(reader.NodeType); } } }
<?xml version="1.0" encoding="utf-8" ?> <!-- This is a comment --> <Books> <Book Name="ABC"> <Author>Yazan Diranieh</Author> <ISBN>12345</ISBN> <Publisher>Publisher 1</Publisher> </Book> <Book Name="XYZ"> <Author>Yazan Diranieh</Author> <ISBN>6789</ISBN> <Publisher>Publisher 2</Publisher> </Book> </Books>
XmlDeclaration Comment Element Element Element Text EndElement Element Text EndElement Element Text EndElement EndElement Element Element Text EndElement Element Text EndElement Element Text EndElement EndElement EndElement
When working and debugging XmlReader code, two important string properties on XmlReader are Name and Value. Depending on the the node type, either Name or Value or both are populated.
Quick overview of CDATA, ENTITY, and DOCTYPE:
CDATA: All text in an XML document is parsed except for text inside a CDATA section. A CDATA section starts with <![CDATA[ and ends with ]]>. For example, characters like "<" and "&" are illegal in XML elements because "<" is interpreted as the start of an element while "&" is interpreted as the start of a character entity. For example, Javascript uses "<" and "&" and many others as operators; a CDATA sections can be used to avoid XML errors when the XML document contains Javascript code:
<script> <![CDATA[ function matchwo(a,b) { if (a < b && a < 0) then { return 1; } else { return 0; } } ]]> </script>
DOCTYPE: A doctype declaration can also be used to define special characters and character strings. An entity is like a macro. For example, the following DOCTYPE defines three entities (macors) named nbsp, writer and copyright. Note the syntax used to refer to an entity in the <footer> element:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note [ <!ENTITY nbsp " "> <!ENTITY writer "Writer: Donald Duck."> <!ENTITY copyright "Copyright: W3Schools."> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> <footer>&writer; ©right;</footer> </note>
When this XML document is parsed, the output will be (note the contents of <footer>):
<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> <footer>Writer: Donald Duck. Copyright: W3Schools.</footer> </note>
Back to XmlReader, the following XML is used which includes a document type, entity, CDATA, and a comment:
<?xml version="1.0" encoding="utf-8" ?> <!-- This is a comment --> <!DOCTYPE Book [ <!ENTITY nbsp " "> <!ENTITY writer "Writer: Yazan Diranieh."> <!ENTITY web "www.diranieh.com"> ]> <Book Name="ABC"> <Author>Yazan Diranieh</Author> <ISBN>12345</ISBN> <Quote><![CDATA[ C# operators include <, >, & and others]]></Quote> <Publisher>Publisher 1</Publisher> <Footer>&writer; (&web;)</Footer> </Book>
static public void NameAndValue() { XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; settings.DtdProcessing = DtdProcessing.Parse; using (XmlReader reader = XmlReader.Create(@"Xml\TestFile2.xml", settings)) { while (reader.Read()) { Trace.Write(reader.NodeType.ToString().PadRight(20, '-')); Trace.Write("> ".PadRight(reader.Depth*5)); Trace.WriteLine(string.Format("Name = {0}, Value = {1}", reader.Name, reader.Value)); } } }
XmlDeclaration------> Name = xml, Value = version="1.0" encoding="utf-8" Comment-------------> Name = , Value = This is a comment DocumentType--------> Name = Book, Value = <!ENTITY nbsp " "> <!ENTITY writer "Writer: Donald Duck."> <!ENTITY web "www.diranieh.com"> Element-------------> Name = Book, Value = Element-------------> Name = Author, Value = Text----------------> Name = , Value = Yazan Diranieh EndElement----------> Name = Author, Value = Element-------------> Name = ISBN, Value = Text----------------> Name = , Value = 12345 EndElement----------> Name = ISBN, Value = Element-------------> Name = Quote, Value = CDATA---------------> Name = , Value = C# operators include <, >, & and others EndElement----------> Name = Quote, Value = Element-------------> Name = Publisher, Value = Text----------------> Name = , Value = Publisher 1 EndElement----------> Name = Publisher, Value = Element-------------> Name = Footer, Value = Text----------------> Name = , Value = Writer: Yazan Diranieh. (www.diranieh.com) EndElement----------> Name = Footer, Value = EndElement----------> Name = Book, Value =
Note how XmlReader automatically resolved entities.
When the structure of the XML is know, XmlReader helps by providing a range of methods that read while presuming a specific structure.
ReadStartElement verifies that the current NodeType is Element and then calls Read (which advances the cursor to the next element). ReadEndElement verifies that the current NodeType is EndElement and then calls Read (which advances the cursor to the next element). For example, you can read this
<ISBN>12345</ISBN>
with
reader.ReadStartElement("ISBN"); string isbn = reader.Value; reader.ReadEndElement();
The ReadElementContentAsString does all the above: it reads a start element, a text node, and an end element, returning the content as string:
string isbn = reader.ReadElementContentAsString("ISBN", "");
The following shows how to read this file
<?xml version="1.0" encoding="utf-8" ?> <!-- This is a comment --> <!DOCTYPE Book [ <!ENTITY nbsp " "> <!ENTITY writer "Writer: Yazan Diranieh."> <!ENTITY web "www.diranieh.com"> ]> <Book Name="ABC"> <Author>Yazan Diranieh</Author> <ISBN>12345</ISBN> <Quote><![CDATA[ C# operators include <, >, & and others]]></Quote> <Publisher>Publisher 1</Publisher> <Footer>&writer; (&web;)</Footer> </Book>
static public void ReadTestFile2() { XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; settings.DtdProcessing = DtdProcessing.Parse; settings.IgnoreComments = true; using (XmlReader reader = XmlReader.Create(@"Xml\TestFile2.xml", settings)) { // MoveToContent is very useful. It allows you to skip XML Declarations, whitespace, // comments and processing instructions reader.MoveToContent(); reader.Read(); // Read <Book> var author = reader.ReadElementContentAsString("Author", ""); var isbn = reader.ReadElementContentAsString("ISBN", ""); var quote = reader.ReadElementContentAsString("Quote", ""); var publisher = reader.ReadElementContentAsString("Publisher", ""); // Assume that <Foote> is optional. This is how to code around than var footer = (reader.Name == "Footer") ? reader.ReadElementContentAsString("Footer", "") : null; reader.Read(); // Read </Book> } }
XmlReader must be positioned on a start element in order to read attributes. Recall that ReadStartElement verifies that the current NodeType is Element and then calls Read (which advances the cursor to the next element). This means that attributes for that element are gone forever! To read attributes, XmlReader provides an indexer that gives direct (and random access) to any attribute on the current element, either by name or position. Note that using an indexer is the same as calling GetAttribute:
static public void ReadAttributes() { string filename = @"Xml\TestFile1.xml"; XmlReaderSettings settings = new XmlReaderSettings {IgnoreWhitespace = true}; using (var reader = XmlReader.Create(filename, settings)) { // Skip xml declarations and comments and go to the root element reader.MoveToContent(); // Read attributes on all elements while (reader.Read()) { if (reader.NodeType != XmlNodeType.Element) continue; // Read attributes on current element Trace.WriteLine("reading attributes on element " + reader.Name); for (int i = 0; i < reader.AttributeCount; i++) Trace.WriteLine(string.Format("attribute {0} = {1}", i, reader[i])); } } }
reading attributes on element Book attribute 0 = ABC reading attributes on element Author reading attributes on element ISBN reading attributes on element Publisher reading attributes on element Book attribute 0 = XYZ reading attributes on element Author reading attributes on element ISBN reading attributes on element Publisher
An attribute is an XML node as defined by the XmlNodeType enum. To explicitly traverse attribute nodes (just as you would traverse element nodes) usually to pare attribute values into other types, you must make a special diversion from calling Read: The diversion must begin from a start element and the forward-only rule is relaxed during attribute traversal. You can jump to any attribute (forward or backward) using MoveToAttribute. Note that MoveToElement returns you to the start element from anyplace within the attribute node diversion. The following example illustrates:
static public void ReadAttributeNodes() { string filename = @"Xml\TestFile3.xml"; XmlReaderSettings settings = new XmlReaderSettings {IgnoreWhitespace = true}; using (var reader = XmlReader.Create(filename, settings)) { // Skip xml declarations and comments and go to the root element reader.MoveToContent(); // Skip over all nodes until we reach an element with attributes while (reader.Read()) { if (!reader.HasAttributes) continue; // Approach 1 to reading attribute nodes // Note that attribute "Author" comes after attribute "Name". Order is not important string author = null; string title = null; if (reader.MoveToAttribute("Author")) author = reader.ReadContentAsString(); if (reader.MoveToAttribute("Title")) title = reader.ReadContentAsString(); Trace.WriteLine(string.Format("Author = {0}. Book Title = {1}", author, title)); // Approach 2 to reading attribute nodes. Here you can traaverse each attribute // in sequence if (reader.MoveToFirstAttribute()) do { string attributeName = reader.Name; string attributeValue = reader.Value; Trace.WriteLine(string.Format("Attribute name: {0}, Attribute value: {1}", attributeName, attributeValue)); } while (reader.MoveToNextAttribute()); } } }
<?xml version="1.0" encoding="utf-8" ?> <!-- This is a comment --> <Books> <Book Title="ABC" Author="Yazan Diranieh" ISBN="12345" /> <Book Title="XYZ" Author="Yazan Diranieh" ISBN="67890" /> </Books>
Author = Yazan Diranieh. Book Title = ABC Attribute name: Title, Attribute value: ABC Attribute name: Author, Attribute value: Yazan Diranieh Attribute name: ISBN, Attribute value: 12345 Author = Yazan Diranieh. Book Title = XYZ Attribute name: Title, Attribute value: XYZ Attribute name: Author, Attribute value: Yazan Diranieh Attribute name: ISBN, Attribute value: 67890
Recall that XmlReader provides a forward-only (and read-only, non-cached) access to an XML stream of file. The usual scenario for reading XML with XmlReader classes is:
Within the while loop, access various methods and properties of the current node. The current node is the node on which the reader is currently positioned. All methods and properties are with respect to the current node. Note that XmlReader properties and methods may not be available on every node type.
The following example illustrates how to read and process the Xml document above using XmlReader
static public void BasicUsage() { // XmlTextReader is disposable using (XmlTextReader reader = new XmlTextReader(@"Xml\TestFile1.xml")) { // Loop over all nodes while (reader.Read()) { DisplayNodeInfo(reader); } } }
private static void DisplayNodeInfo(XmlTextReader reader) { Trace.WriteLine(string.Format("Node Name: {0}, Node Type: {1}, Node Value: {2}", reader.Name, reader.NodeType, string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value)); // Are there attributes on the current node? if (reader.HasAttributes) { // Current node has attributes Trace.WriteLine( string.Format("Current node has {0} attributes", reader.AttributeCount)); // Loop over all attributes for current node for (int i = 0; i < reader.AttributeCount; i++) { // When positioned on an element, use MoveToAttribute to move to a specific attribute reader.MoveToAttribute(i); Trace.WriteLine(string.Format("Node Name: {0}, Node Type: {1}, Node Value: {2}", reader.Name, reader.NodeType, string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value)); } // Move the reader back to the element node that owned the last attribute reader.MoveToElement(); Trace.WriteLine(string.Format("Moved reader back to Node Name: {0}. Node Type: {1}, Node Value: {2}", reader.Name, reader.NodeType, string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value)); } }
private static string GetEmptyStringDisplay(string data) { if (data == string.Empty) return "(Empty)"; string display = string.Empty; foreach (char c in data) { switch (c) { case '\r': display += "\\r"; break; case '\n': display += "\\n"; break; case ' ': default: display += "_"; break; } } return display; }
Output (indented for clarity)
Node Name: xml, Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8" Current node has 2 attributes Node Name: version, Node Type: Attribute, Node Value: 1.0 Node Name: encoding, Node Type: Attribute, Node Value: utf-8 Moved reader back to Node Name: xml. Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8" Node Name: , Node Type: Whitespace, Node Value: \n Node Name: , Node Type: Comment, Node Value: This is a comment Node Name: , Node Type: Whitespace, Node Value: \n Node Name: Books, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n____ Node Name: Book, Node Type: Element, Node Value: (Empty) Current node has 1 attributes Node Name: Name, Node Type: Attribute, Node Value: ABC Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n________ Node Name: Author, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Yazan Diranieh Node Name: Author, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n________ Node Name: ISBN, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: 12345 Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n________ Node Name: Publisher, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Publisher 1 Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n____ Node Name: Book, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n____ Node Name: Book, Node Type: Element, Node Value: (Empty) Current node has 1 attributes Node Name: Name, Node Type: Attribute, Node Value: XYZ Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n________ Node Name: Author, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Yazan Diranieh Node Name: Author, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n________ Node Name: ISBN, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: 6789 Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n________ Node Name: Publisher, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Publisher 2 Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n____ Node Name: Book, Node Type: EndElement, Node Value: (Empty) Node Name: , Node Type: Whitespace, Node Value: \n Node Name: Books, Node Type: EndElement, Node Value: (Empty)
Using XmlReaderSettings we can ignore Whitespace nodes to give the following output:
static public void BasicUsage2() { XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; using (XmlReader reader = XmlReader.Create(@"Xml\TestFile1.xml", settings)) { // Loop over all nodes while (reader.Read()) { DisplayNodeInfo(reader); } } }
Node Name: xml, Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8" Current node has 2 attributes Node Name: version, Node Type: Attribute, Node Value: 1.0 Node Name: encoding, Node Type: Attribute, Node Value: utf-8 Moved reader back to Node Name: xml. Node Type: XmlDeclaration, Node Value: version="1.0" encoding="utf-8" Node Name: , Node Type: Comment, Node Value: This is a comment Node Name: Books, Node Type: Element, Node Value: (Empty) Node Name: Book, Node Type: Element, Node Value: (Empty) Current node has 1 attributes Node Name: Name, Node Type: Attribute, Node Value: ABC Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty) Node Name: Author, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Yazan Diranieh Node Name: Author, Node Type: EndElement, Node Value: (Empty) Node Name: ISBN, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: 12345 Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) Node Name: Publisher, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Publisher 1 Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) Node Name: Book, Node Type: EndElement, Node Value: (Empty) Node Name: Book, Node Type: Element, Node Value: (Empty) Current node has 1 attributes Node Name: Name, Node Type: Attribute, Node Value: XYZ Moved reader back to Node Name: Book. Node Type: Element, Node Value: (Empty) Node Name: Author, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Yazan Diranieh Node Name: Author, Node Type: EndElement, Node Value: (Empty) Node Name: ISBN, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: 6789 Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) Node Name: Publisher, Node Type: Element, Node Value: (Empty) Node Name: , Node Type: Text, Node Value: Publisher 2 Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) Node Name: Book, Node Type: EndElement, Node Value: (Empty) Node Name: Books, Node Type: EndElement, Node Value: (Empty)
Note the usage of MoveToAttribute to go through the attribute list of the current element. After this method has been called, certain properties such Name, NamespaceURI, and Prefix, reflect the properties of the attribute and not its containing element.
Also note that an attribute does not always need to be specified on an element. In many cases, a DTD or a schema defines default values for attributes on elements. When using methods that move among attributes, attributes that receive values from a DTD or a schema are acted upon just like attributes that are given values in the XML stream. You can programmatically determine if an attribute has received its value from the DTD or schema using the IsDefault property which returns true if the current node is an attribute that was generated from the default value defined in the DTD or schema.
Note: IsDefault does not apply to all XmlReader-derived classes. For example, XmlTextReader.IsDefault always returns false because XmlTextReader has no support for DTDs and schemas. On the other hand XmlValidatingReader.IsDefault returns true if the current node is an attribute whose value was generated from the default value defined in the DTD or schema. (and returns false if the value was explicitly specified in the XML stream.)
Another way to read XML with XmlReader-derived classes is to use ReadInnerXml and/or ReadOuterXml methods. These methods work as follows:
ReadInnerXml:
When positioned on a node, this method pulls all content from the current
node including markup, starting from the element tag up to the element end
tag. After this call, the reader is positioned after the current node's end
element tag.
The following tables illustrate:
Node Type | Content | Return Value | Position |
Element | <Book> <Author>Yazan</Author> </Book> |
ReadInnerXml: <Author>Yazan</Author>
ReadOuterXml: |
ReadInnerXml and
ReadOuterXml: |
Attribute | <Books Publisher="Somebody"/> | ReadInnerXml: Somebody ReadOuterXml: |
ReadInnerXml and
ReadOuterXml: |
Content can be skipped in two ways:
A content node is defined to be any of the following:
You can move to any of these nodes using MoveToContent - this method checks if the current node is a content node, and if it is not one of the preceding node types, it skips over it until it finds the next content node or the end of file. If the current node is a content node, then MoveToContent returns the current node and does not move to a new content node. This implies that MoveToContent skips over the following nodes:
This type of navigation is more efficient if the application only needs to deal with content rather than calling Read that moves the reader to the next node, force the application to determine the type of the node, and finally read its content if that node type supports content.
If the reader is already positioned on an attribute, then calling MoveToContent moves the current node position to the owning element for that attribute. MoveToContent returns XmlNodeType which specifies the type of the current node (this information is typically used to skip over random XML). The following example illustrates how to use MoveToContent using the same Xml file above:
static public void ContentUsage() { XmlNodeType nodeType; using (XmlTextReader reader = new XmlTextReader(@"Xml\TestFile1.xml")) { // Calling MoveToContent multiple times does not move the reader to the next content nodeType = reader.MoveToContent(); DisplayNodeInfo(reader); nodeType = reader.MoveToContent(); DisplayNodeInfo(reader); while (reader.Read()) { Trace.WriteLine(string.Format("Before MoveToContent: Node Name: {0}, Node Type: {1}, Node Value: {2}", reader.Name, reader.NodeType, string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value)); reader.MoveToContent(); Trace.WriteLine(string.Format("After MoveToContent Node Name: {0}, Node Type: {1}, Node Value: {2}", reader.Name, reader.NodeType, string.IsNullOrWhiteSpace(reader.Value) ? GetEmptyStringDisplay(reader.Value) : reader.Value)); Trace.WriteLine(""); } } }
Node Name: Books, Node Type: Element, Node Value: (Empty) Node Name: Books, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n____ After MoveToContent Node Name: Book, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n________ After MoveToContent Node Name: Author, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Text, Node Value: Yazan Diranieh After MoveToContent Node Name: , Node Type: Text, Node Value: Yazan Diranieh Before MoveToContent: Node Name: Author, Node Type: EndElement, Node Value: (Empty) After MoveToContent Node Name: Author, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n________ After MoveToContent Node Name: ISBN, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Text, Node Value: 12345 After MoveToContent Node Name: , Node Type: Text, Node Value: 12345 Before MoveToContent: Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) After MoveToContent Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n________ After MoveToContent Node Name: Publisher, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Text, Node Value: Publisher 1 After MoveToContent Node Name: , Node Type: Text, Node Value: Publisher 1 Before MoveToContent: Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) After MoveToContent Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n____ After MoveToContent Node Name: Book, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n____ After MoveToContent Node Name: Book, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n________ After MoveToContent Node Name: Author, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Text, Node Value: Yazan Diranieh After MoveToContent Node Name: , Node Type: Text, Node Value: Yazan Diranieh Before MoveToContent: Node Name: Author, Node Type: EndElement, Node Value: (Empty) After MoveToContent Node Name: Author, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n________ After MoveToContent Node Name: ISBN, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Text, Node Value: 6789 After MoveToContent Node Name: , Node Type: Text, Node Value: 6789 Before MoveToContent: Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) After MoveToContent Node Name: ISBN, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n________ After MoveToContent Node Name: Publisher, Node Type: Element, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Text, Node Value: Publisher 2 After MoveToContent Node Name: , Node Type: Text, Node Value: Publisher 2 Before MoveToContent: Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) After MoveToContent Node Name: Publisher, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n____ After MoveToContent Node Name: Book, Node Type: EndElement, Node Value: (Empty) Before MoveToContent: Node Name: , Node Type: Whitespace, Node Value: \n After MoveToContent Node Name: Books, Node Type: EndElement, Node Value: (Empty)
The Skip method is used to move over the current element. In other words, if the node type is XmlNodeType.Element, then calling Skip moves over all of the content of the current element and its end tag. For example, if you have the following XML:
<a name="MyName" ID="1"> <x>123</x> <y>456</y> </a> <b> ... </b>
and you are positioned on the <a> node or any of its attributes, then calling Skip positions you on the next element which is <b>. And if you are positioned on the <x> node, then calling Skip positions you on the next element which is <y>.
XmlReader provides two parallel systems for referring to element and attribute names:
Name
NamespaceURI and LocalName
You're using the first system whenever you read an element's Name property or call a method that accepts a single name argument. This works well if no namespaces or prefixes are present. If namespaces and prefixes are present, namespaces are ignored and prefixed become part of the name:
XML Fragment | Name | Code |
<book ...> | book | reader.ReadStartElement("book") |
<book xmlns='blah' .../> | book | reader.ReadStartElement("book") |
<x:book ... /> | x:book | reader.ReadStartElement("x:book") |
The second system works through two properties aware of namespaces: LocalName and NamespaceURI. These properties take into account prefixes and default namespaces defined by parent elements. Prefixes are automatically expanded meaning that NamespaceURI always reflects the correct namespace for the current element, and LocalName is always free of prefixes.
When you pass two name arguments into a method such as ReadStartElement, you're using this name system. For example, consider this XML
<customer xmlns="MyNamespace" xmlns:other="MyOtherNamespace"> <address> <other:city>SomeCity</other:city> ... </address> ... </customer>
We could read this as follows. Note that prefixes are abstracted away:
reader.ReadStartElement("customer", "MyNamespace"); reader.ReadStartElement("address", "MyNamespace"); reader.ReadStartElement("city", "MyOtherNamespace");
If required, you can see what prefix was used using the Prefix property and converting it into a namespace by calling LookupNamespace.
The .NET Framework extends XmlReader in the following classes:
The following table describes the implementation of each of these classes. Recall that they all derive from XmlReader and hence they all provide forward-only, read-only, non-cached access to XML:
XmlReader-derived class | Description |
XmlTextReader | Reads text (also known as character streams). Has no support for DTD or schema. |
XmlNodeReader | Reads DOM nodes. In other words, it takes an XmlNode object and reads whatever nodes it finds in it. Has no support for DTD or schema, but can resolve entities defined in DTD. |
XmlValidatingReader | A reader that provides DTD, XSD, and XDR schema validation. Takes an XmlTextReader and add validation services to it. |
The table below describes which reader to use for each scenario:
Scenario | Reader to use | ValidationType property | XmlResolver | Normalization property |
Do not need any DTD or schema support but require good performance. | XmlTextReader | Not available | Set to null reference | false |
Require XML to be well-formed, including external entities | XmlTextReader | Not available | Set to non-null reference | true |
Require XML to be well-formed and valid according to a DTD or schema | XmlValidatingReader | Auto or DTD | Set to non-null reference | Set to true on XmlTextReader before passing XmlTextReader to XmlValidatingReader |
Require XML to be well-formed when streaming XML data from an XmlNode | XmlNodeReader | Not available | Set to non-null reference | Not available |
Note that when the Normalization property is true, it removes all white space in attribute values and does character-range checking. As a result, setting Normalization = true results in a lower performance. On the other hand. XmlResolver is used to resolve external DTD and schema location references. XmlResolver is also used to handle any import or include elements found in the XSD. Setting the XmlResolver property to null means that the application will not resolve any external references. If this property is set to null but an external DTD or entity is encountered, the external DTD or entity will be ignored.
The previous section showed how to use various features found in all XmlReader-derived classes. This section focuses on features specific to the XmlTextReader class. XmlTextReader is an XmlReader-derived class and provides a fast XML parser. It enforces the rules that the XML must be well-formed but it does not perform any validations and has no support for DTD or XSD Schemas. XmlTextReader provides the following functionalities:
XmlTextReader methods ReadChars, ReadBinHex, and ReadBase64 are used to read large streams of embedded content. The difference is that ReadChars reads content as text, ReadBase64 reads content as Base64-encoded text, and ReadBinHex reads content as binary or hex (which can be useful if the XML contains international text, images or even video). These methods perform as follows:
All three methods, methods ReadChars, ReadBinHex, and ReadBase64 have the following signature (except for ReadChars where the first parameter is char[] rather than byte[]):
ReadX( byte[] Buffer, int BufferOffset, int DataLength );
The first parameter refers to the buffer where content will be written, the second parameter refers to the location within the buffer where content should be written, and the last parameter specifies the number characters/bytes to write to the buffer. Note that being able to specify a buffer and how much to read means that these three methods can be very efficient for processing very large streams of text/data embedded within the XML. Otherwise, you would have to allocate and deal with very large arrays.
The overall pattern for using these methods is simple. Position the reader on an element using any of the XmlTextReader methods then call one of these methods successively to read content a chunk at at time. In the following an XML document containing an image is loaded and displayed:
<Image name="Sunset">
ABC123DDcDAEB563 ...
</Image>
private void btnReadBinHex_Click(object sender, System.EventArgs e)
{
try
{
// Load XML
document and jump to the <Image> document element
XmlTextReader reader = new XmlTextReader( @"..\..\ReadHexBinData.xml" );
XmlNodeType nodeType =
reader.MoveToContent();
// Read the
image into a memory stream. Note how ReadBase64 is called
// successively to read the entire
image data
System.IO.MemoryStream memstream = new System.IO.MemoryStream();
byte[] abData = new byte[100];
int nByteCount = reader.ReadBase64( abData, 0, 100 );
while (nByteCount != 0)
{
memstream.Write( abData, 0, nByteCount );
nByteCount = reader.ReadBase64( abData, 0, 100 );
}
// Now display
image
pictureBox1.Image = Bitmap.FromStream(
memstream );
}
catch (Exception ex)
{
Trace.WriteLine( ex.Message );
}
}
To better understand the DOCTYPE declaration, a brief overview of Document Type Definitions (DTD) is required. An XML processor uses DTDs ( a collection of DOCTYPE declarations) at runtime to validate a given XML file against a predefined XML schema. DTD syntax can be a bit overwhelming, but it basically uses different syntactical elements (like !, (, ), * and others) to define what elements are required in the XML document, what elements are optional, how many occurrences of a given element are allowed, and so on. For example, consider the following DTD document saved in food.dtd:
<!ELEMENT hamburgers (hamburger*)>
<!ELEMENT hamburger (name, description, price)>
<!ATTLIST hamburger lowfat CDATA #IMPLIED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
This document says the following:
An XML document that wants to conform to this DTD would then have to add the following line:
<!-- This statement
tells the parser to validate the content of the containing
XML document against the schema described in the DTD -->
<!DOCTYPE hamburgers SYSTEM "food.dtd" >
...
Based on the above, a DOCTYPE declaration is used to allow a document to identify its root element and the associated Document Type Definition (DTD) by reference to an external file. A DOCTYPE declaration, is usually required if the document is to processed by a validating environment.
A DOCTYPE declaration usually contains the following:
Note the following DOCTYPE declarations:
The XmlTextReader ensures that the DOCTYPE declaration is well formed but does not do any DTD validation. When you call Read on a DOCTYPE, the returned NodeType is DocumentType. The PublicLiteral and SystemLiteral are considered attributes, whose names are PUBLIC and SYSTEM, respectively. To retrieve the content of these two attributes use GetAttribute or any other attribute accessing method. For example, if you have the following DOCTYPE declaration:
<!DOCTYPE Books SYSTEM \\SomeMachineName\DTD\books.dtd [<!ENTITY e 'ent'>] >
reader.GetAttribute( "SYSTEM"
); // "\\SomeMachineName\DTD\books.dtd"
reader.Value;
// <!ENTITY e 'ent'>
White space is any of the following characters:
White space is categorized as significant and insignificant as follows:
XmlTextReader.WhitespaceHandling property determines whether white space should be considered significant or insignificant. This property can be changed anytime and it takes effect on the next Read operation. The following example illustrates where green represents insignificant white space and red represents significant white space:
<test>_
____<item>_
________<subitem xml:space="preserve">_
_________</subitem>_
____</item>_
</test>_
XmlTextReader.Normalization property determines whether white space normalization and attribute normalization should be done or not. The following describes what is meant by normalization:
For example, given the follwing XML fragment:
<item attr1=' test A B C
1 2 3'/>
<item attr2=''/>
Normalization affect how attr1 attribute is read as follows:
// reader.Normalization = false;
Attribute value: test A B C
1 2 3
//reader.Normalization = true;
Attribute value: test A B C
1 2 3
XmlNodeReader provides a reader over a given DOM node sub-tree. Because XmlNodeReader derives from XmlReader, XmlNodeReader provides a fast, non-cached, forward-only access to XML data in an XmlNode. XmlNodeReader provides the following functionality:
For example:
XmlNode node = GetNodeFromDocument( doc );
XmlNodeReader reader = XmlNodeReader( node );
while (reader.Read())
// Process node
XmlValidatingReader represents a reader that provides DTD and XML Schema validation. It is similar to XmlTextReader, except that XmlValidatringReader performs validation as well. The overall pattern for using XmlValidatingReader is:
Validating XML will be covered in more depth in Validation of XML with Schemas chapter.