Tuesday, 20 April 2010

Navigating SSIS packages with Linq to Xml

I’m currently writing a validator tool which compares data files against the metadata from a flat file connection manager in an SSIS package and outputs a data quality report. It’s intended to provide a quick way of verifying customer’s data before doing an on-site demonstration. Hopefully this will solve the problem of wasted journeys that I’ve often had due to data being an incorrect format. The problem stems from the fact that it takes several hours for the customer to download the data from their provider and confidentiality precludes it being sent to us over the wire – in addition to which the width and length of data makes it difficult to fix quality issues on the fly.

The only limitation is that the tool cannot make use of the Dts namespaces (because their availability in the deployment environment cannot be guaranteed), which means that some part of the vehicle is going to need re-inventing. Also, some of the data files we’re dealing with have in excess of 1000 columns, so we’d like to be able to simply reflect the column metadata from the existing dtsx packages (rather than replicate it elsewhere). This has led me to the obvious conclusion of issuing XPath statements against the packages so that I can build a list of columns with data type information and such. However this got rather unwieldy, so I though it would be a good time to use LINQ to XML. I’ve avoided this thus far, as it’s very rare that I work with raw XML these days – I’m one of those lazy facists who believes that XML should never be seen outside of standards documents or the message layer (although as with all such ill-founded pronouncements this falls down somewhat with derivatives such as XHTML and XAML). To be honest, it’s swings and roundabouts between using LINQ to XML and the ‘old way’. It’s still full of string literals and anonymous types, but iteration syntax is slightly more elegant.

I won’t reproduce the entire project here – just enough to make it through the basics of reading the package. I may cover the data type matching in another post. The following constant is used throughout the code:

public const string DTS_NAMESPACE = @"{www.microsoft.com/SqlServer/Dts}";

First up is getting a list of the connection managers in the package (which is a PITA because the name is an attributed child node of the <ConnectionManager> element):

var connectionManagers = from cm in packageXml.Root.Elements(
                DTS_NAMESPACE + "ConnectionManager").Elements(DTS_NAMESPACE + "Property")
                        where cm.Attribute(DTS_NAMESPACE + "Name").Value == "ObjectName"
                        select new
                        {
                            connectionManagerName = cm.Value,
                            connectionManagerXml = cm.Parent
                        };

This gives us the name of the connection manager and the child XML nodes in separate variables. We can then iterate through this (for this example we’re grabbing flat file connections only):

foreach (var connectionManager in connectionManagers)
{
   XDocument connectionXml = XDocument.Parse(
       connectionManager.connectionManagerXml.ToString());

   List<string> connectionTypes = (from ct in connectionXml.Root.Elements(DTS_NAMESPACE + "Property")
                                   where ct.Attribute(DTS_NAMESPACE + "Name").Value == "CreationName"
                                   select ct.Value.ToString()).ToList<string>();

   if (connectionTypes.Count == 1)
   {
       string connectionType = connectionTypes[0];

       if (connectionType.ToUpper() == "FLATFILE")
       {
           // all the Xml for the connection manager is in the connectionXml variable
       }
   }
}

We can get at the connection-level properties using the following method:

private string GetConnectionManagerProperty(
   XDocument connectionManagerXml, string propertyName)
{
   List<string> properties = (from pr in connectionManagerXml.Root.Elements(DtsxPackage.DTS_NAMESPACE + "Property")
                              where pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == propertyName
                              select pr.Value.ToString()).ToList<string>();
   Debug.Assert(properties.Count <= 1);

   if (properties.Count == 1)
   {
       return properties[0];
   }

   return string.Empty;
}

like so:

string unicodeFlag = GetConnectionManagerProperty(connectionManagerXml, "Unicode");

Similarly we can iterate through the columns like so:

var columns = from cl in connectionManagerXml.Root.Descendants(DtsxPackage.DTS_NAMESPACE + "FlatFileColumn")
             select cl;

foreach (var columnProperties in columns)
{
   var properties = (from pr in columnProperties.Elements(DtsxPackage.DTS_NAMESPACE + "Property")
                     where pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == "ObjectName" ||
                     pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == "DataType" ||
                     pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == "DataPrecision" ||
                     pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == "DataScale" ||
                     pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == "ColumnDelimiter" ||
                     pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value == "MaximumWidth"
                     select new PropertyPair()
                     {
                         Name = pr.Attribute(DtsxPackage.DTS_NAMESPACE + "Name").Value,
                         Value = pr.Value
                     });
}

It would be nice to be able to deconstruct the a dtsx in a single XPath/XML statement, but the rather strange schema for dtsx appears to preclude that. Hopefully some XmlMagician will show me how much more concisely and simply this could be achieved.

Friday, 2 April 2010

DataGridViewComboBoxColumn and access modifiers

Despite my best efforts I’m still working on a Windows Forms project from a few months ago.  Apart from the fact that it’s helping me to appreciate WPF in a big way (particularly the way that modifying complex layouts is so much easier in XAML) it’s also raising issues that I should really be aware of by now.

One of these is that the data source for a DataGridViewComboBoxColumn’s list contents must have a public access modifier (assuming you’re binding to an IList).  This is so trivial I’m only mentioning it because it caused me a hour of grief last night trying to work out why my DataGridView wasn’t playing ball.  I wrongly assumed that I could use the internal access modifier and all would be well as long as the class was in the same project as the DataGridView.  Not so – as long as the fields being bound to the DisplayMember and ValueMember properties are public, then all is well (even if the class itself is internal).

For example, the following class:

internal class MyListSourceClass
{
public string Description {get;set;}
public string Value {get;set;}
}

Will bind quite happily in the following code:

List<MyListSourceClass> listSource = GetListDataFromSomewhere();

// where MyComboBoxColumn is the DataGridViewComboBoxColumn to be populated
MyComboBoxColumn.DataSource = listSource;
MyComboBoxColumn.DisplayMember = "Description";
MyComboBoxColumn.ValueMember = "Value";

but if we change the access modifiers on the Description and Value fields of MyListSourceClass then all sorts of nonsense happens.  I’m so looking forward to getting back to WPF.