How to read XML and JSON files with (almost) no work
TL;DR
So, let's say you get some XML files that you have to read. Generally this task would elicit a resigned groan as you think of all the tedious programming it will take to parse this -- even though there are libraries that help with reading XML, and LINQ to XML is really nice, you still have to deal with the attributes, the node names, and so on.
In certain situations, there's a massive shortcut for this built in to Visual Studio: Edit -> Paste Special -> Paste XML As Classes.
Just copy some XML, create a new code file in your project, and paste as classes.
Example project
Let's do a little walk-through on how we can use this. We'll analyze how many times the US Federal Government closed for various reasons each year! (I simply searched for "status xml" and found this as an example I could use: https://www.opm.gov/xml/operatingstatushistory.xml)
First let's create a new throwaway project in Visual Studio.
With that done, let's grab the XML -- you can just go to the "Open File" menu in Visual Studio, and paste in that path as the file name:
Now let's make a cs file for the classes.
Copy all the XML from the file, and let's paste it as classes and replace the empty declaration.
Ok, now we have classes for the data. Now what? We can read and deserialize it (add this function to Program.cs):
//cache it: https://support.microsoft.com/en-us/help/886385/memory-usage-is-high-when-you-create-several-xmlserializer-objects-in-asp-net static XmlSerializer OpmStatusDeserializer { get; } = new XmlSerializer(typeof(ArrayOfCurrentStatus)); private static async Task<IList<ArrayOfCurrentStatusCurrentStatus>> GetStatusesAsync() { using (HttpClient httpClient = new HttpClient()) { httpClient.DefaultRequestHeaders.Accept.Clear(); httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue(MediaTypeNames.Text.Xml)); using (Stream s = await httpClient.GetStreamAsync(new Uri("https://www.opm.gov/xml/operatingstatushistory.xml"))) { ArrayOfCurrentStatus statusSet = (ArrayOfCurrentStatus)OpmStatusDeserializer.Deserialize(s); return statusSet.CurrentStatus; } } }
Let's have the app get a count of the different statuses:
static void Main(string[] args) { MainAsync().Wait(); Console.ReadKey(); } private static async Task MainAsync() { IList<ArrayOfCurrentStatusCurrentStatus> statuses = await GetStatusesAsync(); ILookup<string, ArrayOfCurrentStatusCurrentStatus> statusesByType = statuses.ToLookup(s => s.OperatingStatus); IDictionary<string, int> statusesAndCounts = statusesByType.ToDictionary(g => g.Key, g => g.Count()); foreach (var statusAndCount in statusesAndCounts.OrderByDescending(kvp => kvp.Value)) { Console.WriteLine("Count: {0}{2}Status: {1}{2}", statusAndCount.Value, statusAndCount.Key.Trim(), Environment.NewLine); } }
Not terribly exciting, but there you go. You can also go in the other direction and generate XML files from the classes by populating and serializing them.
I recently used this for parsing and generating messages when interacting with SQL Service Broker from a .NET app!
Bonus info
DataSet support
Additionally, you can work with XML files via the System.Data.DataSet. Depending on your goals, or the size or complexity of the XML, this may be a more convenient or more efficient route. Of course, one of the downsides that returns is a lack of strong typing: you'll need to work with textual column names.
private static async Task<DataSet> GetStatusesDataSetAsync() { using (HttpClient httpClient = new HttpClient()) { httpClient.DefaultRequestHeaders.Accept.Clear(); httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue(MediaTypeNames.Text.Xml)); using (Stream s = await httpClient.GetStreamAsync(new Uri("https://www.opm.gov/xml/operatingstatushistory.xml"))) { DataSet dataSet = new DataSet(); dataSet.ReadXml(s); return dataSet; } } }
XSD Schemas
By the way, another thing you can do with the XML file open is create an XSD schema:
You can use that to create an empty DataSet, which you could then populate and use to generate XML. Let's walk through that: click the Create Schema menu item. The resulting schema will open in a new window. Let's add a new XML Schema to the project:
And then just copy the generated schema into that new file. To make use of it, you can drag it into the project's Resources.
Now you can use the DataSet.ReadXmlSchema method. However, a string argument is treated as a file name: to treat it differently we can use a TextReader: the StringReader class is perfect for this:
private static DataSet CreateNewEmptyDataSet() { DataSet dataSet = new DataSet(); using (TextReader xsdReader = new StringReader(Resources.operatingstatushistory)) { dataSet.ReadXmlSchema(xsdReader); } return dataSet; }
Voilà! An empty DataSet that you can populate. GetXml will return the XML as a string, while WriteXml will do so as a Stream (which is likely more efficient, if it's going to be saved or transmitted).
You can also use when reading in the XML (as above), to ensure that the XML you're reading actually has the schema you want and doesn't blow up later when you try to access nonexistent columns and whatnot.
Other XML resources for experimentation
Here are some other XML streams I found that you might find interesting to play around with:
- NYC MTA: http://web.mta.info/developers/developer-data-terms.html#data
Service Status (I use this in OnTime NYC)
Elevator/Escalator status
Lost and Found data
Other real-time APIs that require a key
FAA: airport status http://services.faa.gov/docs/services/airport/
NSA status: https://www.nsa.gov/status/assets/files/status-feed.xml
Washington DC Metro: https://developer.wmata.com/docs/services/
Rail status: https://www.wmata.com/rider_tools/metro_service_status/feeds/rail_Advisories.xml
JSON here too: you could use "paste JSON as classes" too