Friday, June 08, 2012

GC pressure series: Introduction and enumerators

As you may already know, I’m a Microsoft developer who works on Visual Studio.  Improved performance and responsiveness is a major goal for the 2012 release.  As much of the new code in Visual Studio 2012 was written in C#, I learned a lot about how much GC pressure can degrade performance. 

Q: Wait.  What is GC pressure?

A: GC pressure happens when your .NET application produces lots of “garbage” by allocating and releasing memory.  .NET has to “collect” all this garbage periodically by walking your application’s memory graph to determine which objects are still used and which have become garbage, then the memory for the garbage is freed. 

Q: Right.  That’s what garbage collection is all about.  So what’s the problem?

A: Normally the CLR’s heuristics for finding good times to pause your application to perform GC gives your application a lean memory footprint and high responsiveness.  The problem comes when you’re in a performance critical section of code in your application, perhaps in a loop of some kind, and within that section you’re producing and discarding lots of managed objects. This causes “GC pressure” which leads .NET to interrupt your perf critical code to run the GC in order to keep your application from running up its memory use, which besides potentially slowing down the machine overall, can eventually lead to an OutOfMemoryException.  The heuristics of .NET’s GC are complex and beyond the scope of my blog.

Q: How do I know if GC pressure is responsible for my performance problems?

A: Like most performance investigations, you find out that GC pressure is your problem by measuring using some profiler tool and noticing that native callstacks are executing while you’re allocating memory that include GC code. 

For instance, some large Visual Studio solutions I measured were spending around 30% of the time simply running the GC.  Imagine if I could wave a wand and make VS 30% faster.  That would be significant, right?  The good news is that GC pressure is very often a solvable problem.  In several upcoming blog posts, I’ll review several discrete sources of GC pressure that are easily solvable, and which allowed us to greatly improve solution load performance (and other scenarios) significantly.

Enumerator structures vs. IEnumerable<T>

It’s often good design to use types that are only as specific as needed.  For instance, your public methods should typically take IList<T> instead of List<T>.  This allows others to call you with a custom list implementation if they need to.  But this can actually cause GC pressure if you use a C# foreach to enumerate the list.  Consider the following:

static void Main() {
   var numbers = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 8 });
   foreach (var number in numbers) {
   }
}

In this case, the C# compiler knows that foreach is enumerating the List<int> type.  The public List<T>.GetEnumerator() method returns a List<T>.Enumerator, which is a struct rather than a class.  This means that C# can generate IL that enumerates the collection without allocating any new objects on the heap: no garbage.  But now consider a only slightly different case:

static void Main() {
   IList<int> numbers = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 8 });
   foreach (var number in numbers) {
   }
}

The only different is that I’ve explicitly typed the List<int> into an IList<int> local variable.  Now when C# enumerates the list with foreach, the only GetEnumerator() method available to call is the one on IList<T>, which returns an IEnumerator<T>.  This is typed as an interface, not a struct.  Although the underlying List<T>.GetEnumerator() will still be invoked, the struct it returns will now have to be boxed so that it is addressible via its IEnumerator<T> interface. This boxing allocates an object on the heap, which will be discarded after the enumeration is over.  If this foreach was itself in a perf critical loop then this could be a significant source of GC pressure.

To wrap up with a final comprehensive example, here are two methods you might define:

static void GarbageProducer(IEnumerable<int> numbers) {
   // numbers.GetEnumerator() can only returned a (boxed)
   // instance of IEnumerator<int>, forcing this method to
   // allocate one object each time this method is invoked.
   foreach (var number in numbers) {
   }
}

static void NoGarbageProducer(List<int> numbers) {
   // numbers.GetEnumerator() will return the stack allocated
   // List<int>.Enumerator struct, so no allocations are required.
   foreach (var number in numbers) {
   }
}

In the above snippet, GarbageProducer allocates memory, but its parameter is typed for maximum portability, whereas NoGarbageProducer never allocates memory but requires a very specific collection type to be passed in.  Which is better?  That depends, of course.  If the method is private, and you’re passing in a List<int> anyway, by all means keep the parameter typed as List<int> and avoid the allocation (in perf critical code anyway).  If the method is public you should probably use IEnumerable<T>.  Yes, it almost guarantees an allocation will be required, but on the other hand it allows virtually any type of collection to be passed in.  If you take List<T> and the caller only has a T[] array, the caller would first have to convert the array to a List<T> instance and then pass it in, and that’s much more costly to memory.

I did say “almost guarantees” an allocation.  Can it be avoided?  Yes, actually.  With some extra work within the method:

static void NoGarbageProducer(IEnumerable<int> numbers) {
   List<int> optimizedCase = numbers as List<int>;
   if (optimizedCase != null) {
      foreach (var number in optimizedCase) {
         DoWork(number);
      }
   } else {
      foreach (var number in numbers) {
         DoWork(number);
      }
   }
}

private static void DoWork(int number) {
}

Obviously this is more code to write and maintain, so it should only be done where measuring performance proves it is worthwhile.  Note that it only makes the enumerating alloc-free when the collection passed in is actually a List<T>, otherwise it falls back to the one-allocation code path.  This is an example of how you might discover you have a very popular method in a perf critical path, and perhaps you observe that you tend to be called with List<T> so you optimize for that case.  You could even optimize for multiple specific collection types, so long as each of them implement the alloc-free struct GetEnumerator pattern.

3 comments:

  1. Well, if I were to write a very tight loop where performance is paramount I wouldn't even consider foreach in the first place. For or while would be better I guess

    ReplyDelete
  2. Thanks for your response, Andrei. In .NET 1.1, foreach was much less efficient when enumerating arrays than the for loop. Since then, the C# compiler has improved significantly, and in fact foreach compiles to a while loop when enumerating arrays. Non-ordered collections don't have indexers, leaving foreach as your only option. The point is write whatever is most readable/maintainable, and then when you know that something is a perf problem, make your optimization (perhaps a while loop instead of a foreach) and then measure to make sure it makes the improvement you expect. I've seen several popular collection implementations (including in .NET itself!) that are *more* efficient when you use the enumerator (aka foreach) than when you use indexers because of the state that the enumerator can store between each step.

    ReplyDelete
  3. Interesting! Good to know. :)

    ReplyDelete