Software That Shines

Better Software Blog

Software Insights

How to read XML and JSON files with (almost) no work

TL;DR

So, let's say you get some XML files that you have to read.  Generally this task would elicit a resigned groan as you think of all the tedious programming it will take to parse this -- even though there are libraries that help with reading XML, and LINQ to XML is really nice, you still have to deal with the attributes, the node names, and so on.

In certain situations, there's a massive shortcut for this built in to Visual Studio:  Edit -> Paste Special -> Paste XML As Classes.

 Paste Special / Paste As Classes

Paste Special / Paste As Classes

Just copy some XML, create a new code file in your project, and paste as classes. 


Example project

Let's do a little walk-through on how we can use this.  We'll analyze how many times the US Federal Government closed for various reasons each year!  (I simply searched for "status xml" and found this as an example I could use: https://www.opm.gov/xml/operatingstatushistory.xml)

First let's create a new throwaway project in Visual Studio. 

With that done, let's grab the XML -- you can just go to the "Open File" menu in Visual Studio, and paste in that path as the file name:

 Get the XML file into Visual Studio

Get the XML file into Visual Studio

Now let's make a cs file for the classes.

 The classes have to go somewhere, after all...

The classes have to go somewhere, after all...

Copy all the XML from the file, and let's paste it as classes and replace the empty declaration.

 Replace the empty class with the XML classes

Replace the empty class with the XML classes

Ok, now we have classes for the data.  Now what?  We can read and deserialize it (add this function to Program.cs):

//cache it: https://support.microsoft.com/en-us/help/886385/memory-usage-is-high-when-you-create-several-xmlserializer-objects-in-asp-net
static XmlSerializer OpmStatusDeserializer { get; } = new XmlSerializer(typeof(ArrayOfCurrentStatus));
private static async Task<IList<ArrayOfCurrentStatusCurrentStatus>> GetStatusesAsync()
{
    using (HttpClient httpClient = new HttpClient())
    {
        httpClient.DefaultRequestHeaders.Accept.Clear();
        httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue(MediaTypeNames.Text.Xml));

        using (Stream s = await httpClient.GetStreamAsync(new Uri("https://www.opm.gov/xml/operatingstatushistory.xml")))
        {
            ArrayOfCurrentStatus statusSet = (ArrayOfCurrentStatus)OpmStatusDeserializer.Deserialize(s);
            return statusSet.CurrentStatus;
        }
    }
}

Let's have the app get a count of the different statuses:

static void Main(string[] args)
{
    MainAsync().Wait();
    Console.ReadKey();
}

private static async Task MainAsync()
{
    IList<ArrayOfCurrentStatusCurrentStatus> statuses = await GetStatusesAsync();

    ILookup<string, ArrayOfCurrentStatusCurrentStatus> statusesByType = statuses.ToLookup(s => s.OperatingStatus);

    IDictionary<string, int> statusesAndCounts = statusesByType.ToDictionary(g => g.Key, g => g.Count());

    foreach (var statusAndCount in statusesAndCounts.OrderByDescending(kvp => kvp.Value))
    {
        Console.WriteLine("Count: {0}{2}Status: {1}{2}", statusAndCount.Value, statusAndCount.Key.Trim(), Environment.NewLine);
    }
}

Not terribly exciting, but there you go.  You can also go in the other direction and generate XML files from the classes by populating and serializing them.

I recently used this for parsing and generating messages when interacting with SQL Service Broker from a .NET app!


Bonus info

DataSet support

Additionally, you can work with XML files via the System.Data.DataSet.  Depending on your goals, or the size or complexity of the XML, this may be a more convenient or more efficient route.  Of course, one of the downsides that returns is a lack of strong typing: you'll need to work with textual column names.

private static async Task<DataSet> GetStatusesDataSetAsync()
{
    using (HttpClient httpClient = new HttpClient())
    {
        httpClient.DefaultRequestHeaders.Accept.Clear();
        httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue(MediaTypeNames.Text.Xml));

        using (Stream s = await httpClient.GetStreamAsync(new Uri("https://www.opm.gov/xml/operatingstatushistory.xml")))
        {
            DataSet dataSet = new DataSet();
            dataSet.ReadXml(s);
            return dataSet;
        }
    }
}

XSD Schemas

By the way, another thing you can do with the XML file open is create an XSD schema:

 Generate an XML schema from the open XML file

Generate an XML schema from the open XML file

You can use that to create an empty DataSet, which you could then populate and use to generate XML.  Let's walk through that: click the Create Schema menu item.  The resulting schema will open in a new window.  Let's add a new XML Schema to the project:

 Add the schema to the project

Add the schema to the project

And then just copy the generated schema into that new file.  To make use of it, you can drag it into the project's Resources.

 Add the schema file to the assembly resources

Add the schema file to the assembly resources

Now you can use the DataSet.ReadXmlSchema method.  However, a string argument is treated as a file name: to treat it differently we can use a TextReader: the StringReader class is perfect for this:

private static DataSet CreateNewEmptyDataSet()
{
    DataSet dataSet = new DataSet();
    using (TextReader xsdReader = new StringReader(Resources.operatingstatushistory))
    {
        dataSet.ReadXmlSchema(xsdReader);
    }
    return dataSet;
}

Voilà!  An empty DataSet that you can populate.  GetXml will return the XML as a string, while WriteXml will do so as a Stream (which is likely more efficient, if it's going to be saved or transmitted).

You can also use when reading in the XML (as above), to ensure that the XML you're reading actually has the schema you want and doesn't blow up later when you try to access nonexistent columns and whatnot.

More details on DataSets and XML


Other XML resources for experimentation

Here are some other XML streams I found that you might find interesting to play around with: