Thursday, 11 March 2010

Dynamically creating a ServiceHost for all ServiceContracts in an assembly

I’m currently shoehorning about 100 WCF services into an existing application that uses .NET Remoting. In order to be consistent with the Remoting code they’ll be exposed using wsHttpBinding. In a simple world this would mean that I could keep them completely separate from the existing code base by deploying them to IIS (the target environment is Windows 2003) as .svc files. Unfortunately IIS isn’t on the target environment required list and as this isn’t our product we can’t mandate that it should be. The upshot of this is that we have to self-host the services which is going to require 100 instances of ServiceHost (1 per service contract).

As I don’t fancy hard coding all the service contracts into the hosting code I need a quick way of setting up the hosting. On the plus side I’m leaving well alone as much of the existing code as possible, which means that I’m putting all the service contracts in a single separate assembly/class library. So what I’ve come up with is some helper code that can be pointed at an assembly to create a service host for each service contract in that assembly. The principle is pretty simple:

  • Reflect all classes in the assembly that implement interfaces with ServiceContractAttribute
  • Create a ServiceHost for each of these class and add to an array

I should highlight that although the methodology does facilitate a generic approach to life-cycle management, this is simply an elegant hack. No application should sensibly self-host this many services. As I understand it (and I’d be grateful for any corrections if I’m talking nonsense here) IIS uses a single listener and fires up the services as required, then keeps them alive for a finite time. Each instance of ServiceHost will have it’s own listener and keep the service loaded for the lifetime of the host even if no client ever calls it. So apart from the lack of scalability inherent in this, there is also a start up delay as all the ServiceHost instances are initialised.I can't vouch for the scalability of this - according to the 2nd edition of Programming WCF Services incoming calls are dispatched by monitoring threads to the I/O completion pool which has 1000 threads by default. It's not entirely clear whether the monitoring threads are pooled so I don't know how much more/less efficient this is than hosting in IIS.

This HostHelper class exposes 2 methods:

  • GetServiceTypes()

    Returns all the classes that can be exposed as WCF services (i.e. either implement an interface that has ServiceContractAttribute or have the attribute themselves)

  • GetServiceContractType()

    Returns the contract (interface) for the service class

I’ve designed it to sit in the same assembly as the service contracts (hence the use of GetExecutingAssembly() to self-reference), but one could just as easily pass in the assembly to be reflect as a parameter:

public static class HostHelper
{
   public static Type GetServiceContractType(Type type)
   {
       if (!type.IsClass)
       {
           throw new InvalidOperationException();
       }

       Type[] contractInterfaces = type.GetInterfaces();

       foreach (Type contractInterface in contractInterfaces)
       {
           if (contractInterface.GetCustomAttributes(
               typeof(ServiceContractAttribute), true).Length > 0)
           {
               return contractInterface;
           }
       }

       if (type.GetCustomAttributes(
           typeof(ServiceContractAttribute), true).Length > 0)
       {
           return type;
       }

       throw new InvalidOperationException();
   }

   public static List<Type> GetServiceTypes()
   {
       Assembly assembly = Assembly.GetExecutingAssembly();
       Debug.Assert(assembly != null);

       List<Type> serviceTypes = new List<Type>();
       Type[] types = assembly.GetTypes();

       foreach (Type type in types)
       {
           bool isWCFType = IsWCFType(type);

           if (isWCFType)
           {
               serviceTypes.Add(type);
           }
       }

       return serviceTypes;
   }

   private static bool IsWCFType(Type type)
   {
       if (type.IsClass)
       {
           if (type.GetCustomAttributes(typeof(ServiceContractAttribute), true).Length > 0)
           {
               return true;
           }
           else
           {
               Type[] classInterfaces = type.GetInterfaces();

               foreach (Type classInterface in classInterfaces)
               {
                   if (classInterface.GetCustomAttributes(
                       typeof(ServiceContractAttribute), true).Length > 0)
                   {
                       return true;
                   }
               }
           }
       }

       return false;
   }
}

The code that runs up the multiple service hosts can be put into a console application:

class Program
{
   private const string BASE_ADDRESS = "http://localhost:8889/";

   private static List<ServiceHost> m_Hosts;

   static void Main(string[] args)
   {
       List<Type> types = HostHelper.GetServiceTypes();

       if (types.Count > 0 && m_Hosts == null)
       {
           m_Hosts = new List<ServiceHost>();
       }

       foreach (Type type in types)
       {
           Type contract = HostHelper.GetServiceContractType(type);
           BindingElement bindingElement = new HttpTransportBindingElement();
           Binding binding = new CustomBinding(bindingElement);
           string fullBaseAddress = string.Concat(BASE_ADDRESS,type.Name);

           ServiceHost host = new ServiceHost(type, new Uri(fullBaseAddress));

           host.AddServiceEndpoint(contract, binding, "");

           ServiceMetadataBehavior metaDataBehavior = new ServiceMetadataBehavior();
           metaDataBehavior.HttpGetEnabled = false;
           host.Description.Behaviors.Add(metaDataBehavior);

           host.AddServiceEndpoint(typeof(IMetadataExchange), binding, "MEX");

           Console.WriteLine("{0}: Opening host for {1}", type.Name, DateTime.Now.ToString());
           host.Open();
           Console.WriteLine("{0}: Host for {1} is open", type.Name, DateTime.Now.ToString());

           m_Hosts.Add(host);
       }
      
       Console.Read();

       if (m_Hosts != null && m_Hosts.Count > 0)
       {
           foreach (ServiceHost host in m_Hosts)
           {
               if (host.State == CommunicationState.Opened)
               {
                   host.Close();
               }
           }
       }
   }
}

I’ve added metadata exchange endpoints so proxies could be generated using SvcUtil against http://localhost:8889/ActivityMeasureDataAccess, but basically all the service hosts are stored in the m_Hosts member variable.

This example could easily be ported to a Windows Service to operate like a poor man’s WAS, although it would need handling for the Faulted event to shore up the reliability:

static void host_Faulted(object sender, EventArgs e)
{
   ServiceHost host = sender as ServiceHost;
   Debug.Assert(host != null);

   Debug.Assert(host.State == CommunicationState.Faulted);

   m_Hosts.Remove(host);

   Type serviceType = host.Description.ServiceType;
   Debug.Assert(host.BaseAddresses.Count == 1);
   Uri baseAddress = host.BaseAddresses[0];
   host = new ServiceHost(serviceType, baseAddress);

   ⁄⁄ Other host setup code here

   m_Hosts.Add(host);
}

Which basically shuts down the faulted host, removes it from the global list, and replaces it with one that has the same settings.

Wednesday, 10 March 2010

Using generics with WCF client proxy interfaces

In the spirit of trying to learn something new at least once in a while, I’ve discovered something about interfaces in .NET – i.e. that a class can inherit from multiple interfaces with the same method signatures and only have to implement the interfaces once. This may sound like a useless feature but it actually solves a problem that I have with WCF service contract factoring and generics (more on that later). To illustrate, let us consider the following interface:

interface IDataAccess<T>
where T : class
{
List<T> GetData();
}

A typical use for such an interface would be generic data access. GetData() will always return a list of T (whatever T might be). For the purposes of this example T will be a the Car class:

public class Car
{
public string Make { get; set; }
public string Registration { get; set; }
public int? Mileage { get; set; }

public override string ToString()
{
   return string.Format("{0}, registration {1}, with {2} miles on the clock",
       Make, Registration, Mileage.Value.ToString());
}
}

We can then provide a concrete implementation of IDataAccess<T>

public class CarDataAccess : IDataAccess<Car>
{
public List<Car> GetData()
{
   Car myCar = new Car()
   {
       Make = "Vauxhall Zafira",
       Mileage = 50000,
       Registration = "DY07 XXX"
   };

   List<Car> myList = new List<Car>();
   myList.Add(myCar);
   return myList;
}
}

Now we can also ask the class to implement an interface that, rather than using generics, references the concrete (Car) class:

interface ICarDataAccess
{
List<Car> GetData();
}

and change the first line of the data access class to reflect this:

public class CarDataAccess : IDataAccess<Car>, ICarDataAccess

The great thing is that not only does this compile, but we can use either interface to access the GetData() method of the concrete class:

ICarDataAccess myDataObject = new CarDataAccess();
List<Car> results = myDataObject.GetData();

Console.WriteLine(results[0].ToString());

IDataAccess<Car> myOtherDataObject = myDataObject as IDataAccess<Car>;
Debug.Assert(myOtherDataObject != null);

results = myOtherDataObject.GetData();

Console.WriteLine(results[0].ToString());

So why I am impressed by this - what possible practical use does it have? The answer is that it helps us to overcome on of the major limitations of WCF client proxies – specifically the lack of support for using generic interfaces in service contracts. For example, if you were to decorate the IDataAccess<T> interface with the [ServiceContract] attribute:

[ServiceContract]
interface IDataAccess<T>
where T : class
{
List<T> GetData();
}

and deploy the CarDataAccess class as a WCF service, then it would compile but would throw an InvalidOperationException when the service host started with an error message along the lines of:

The contract name ‘xxx’ could not be found in the list of contracts implemented by the service ‘yyy’.

So we have to have to use the non-generic interface on the service side. However we can use generics on the client side by manually adding the generic interface to the list of interfaces implemented by the client proxy:

public class CarDataAccessServiceClient : ClientBase<ICarDataAccess>, ICarDataAccess, IDataAccess<Car>
{
// generated proxy code here
}

This doesn’t break the proxy, but it does allow us to write generic client code:

public void DoSomething<T, D>()
where T : ICommunicationObject, IDataAccess<D>, IDisposable, new(),
where D : class
{
    T myProxy = new T();

    try
    {
       myProxy.Open();
       List<D> results = myProxy.GetData();
       myProxy.Close();
    }
    catch
    {
       myProxy.Abort();
    }
}

An easy trick, but one that is about to save me hours of coding.

Does maintainability really matter?

As a software developer I would answer ‘yes’, primarily because it makes my life easier – I suspect that all professional developers would agree. The traditional argument in favour of not being a code cowboy is presented very well in this article. Yet I’ve struggled over the last few days as I transit from the code monkey to the man with the business hat on, to make a compelling case for this being true in all industries.

I’m going to use the word ‘maintainability’ in a very general sense, partly because otherwise I’d have to scour the web looking for the definition most aligned with my own position, but also because I believe the definition drives the justification. Experience and pragmatism has made me aware that all software is ultimately maintainable in line with the infinite monkey theorem – therein is the root of my problem.

A few days ago I was asked by one of my fellow directors if there was a solid case for using SSIS as an ETL tool instead of a combination of .NET data access code and T-SQL. We’ve recently partnered with another of his companies to produce some BI extensions to their software, although as we’re 1 day away from shipping it was more of a theoretical question. His business case for not using SSIS was his other company didn’t currently use it and so the lack of in-house skills in tandem with the fact that it counts as an additional deployment platform meant that providing support would incur greater overhead and risk. Of course I trotted out all the arguments about it being more maintainable, scalable, reliable, etc. His response to this being “…that’s all well and good for the developers, but what are the benefits to the customer?”.

In theory one can cite agility as a customer benefit, as (in theory) it means that new functionality can be rolled out far quicker than for badly engineered software. But then I considered a lot of the work I’ve done in the past few years and it seems that the companies where poorly engineered software was prevalent were the ones that could afford to pay for large numbers of monkeys – even if they can’t stretch to infinity – without going into the red, in order to deliver new features quickly. In fact, many of these code monkeys have cited similar arguments as a justification for not even trying to follow best practice (although hubris does play its part).

For example, a few years ago I worked on the website of a large airline. The website was implemented in type-unsafe VBScript on Classic ASP with a SQL Server 2000 back-end (and no middleware). The absence of intelligent architecture and prevalence of spaghetti code was enough to make any half-decent developer weep. Yet they wanted to add more functionality to the website despite the in-house developers hitting entropy. So they just hired in a bunch of developers from a consultancy at around £1000 a day each and set them to work hacking new functionality – which despite much swearing at code, they did very successfully. £150k+ a month may seem a high price tag to pay for not writing decent code in the first place – but these developers were working either alone or in pairs to produce functionality on 30 day sprints that was raising anywhere between £50000 and £1m extra revenue per month in perpetuity per project.

More recently I worked on a major data warehousing project for a public sector body to replace a system which was taking around 3 days to process less than 12 million rows of data from a text file (although in fairness the width of data was around 1500 fields). The vendor of the existing system was adding new functionality all the time, but was unable to rectify core performance or reliability issues (processing would regularly fail). Despite its failings the organisation in question worked with the system for 3 years, during which time I estimate it cost over £1m in labour inefficiency – but they were able to absorb the cost while continuing core operations (albeit somewhat unreliably). However the replacement system reduced processing time to between 2 and 4 hours which enabled them to spend less time crunching data and more time analysing it (the raison d'ĂȘtre of the organisation). The smaller processing window also added value because it meant that time-sensitive data could be analysed – something which fundamentally altered the strategic capability of the business.

In the case of the airline, customer experience was unaffected – performance of the website was “good enough”. So the only rationale for maintainability would be to save around £1.5m a year (out of a £2.5bn turnover). Placed against the risk of replacing the entire codebase I can see why the status quo has held. So I conclude that:

  • Larger organisations have the resources to compensate for inefficiencies because the revenues are so high and the risk is lower than replacement
  • Small businesses with more fragile cash flow would undoubtedly benefit from better software engineering and use of existing frameworks

However there is a middle ground of businesses and government agencies, that could vastly improve the quality of their operations if their systems were more maintainable and developed according to best practice – in the meantime they’ll just soldier on with the proverbial sticking plaster.

As for me, I’ll stick with SSIS for ETL because it’s cheaper for the business and there’s less risk in supporting it than trying to support a mass of spaghetti SQL. Performance would be adequate with either solution – data volumes aren’t likely to increase to the point where any performance difference is noticeable (although if they do then I’ll be glad I chose SSIS). The customer doesn’t care one way or the other as long as things keep working – so the technical argument wins over.