Sunday, June 24, 2012

GC pressure series: hidden boxing

Last time we talked about GC pressure from enumerating collections using their interfaces rather than their concrete types.  It turns out that the garbage produced by using the interfaces is directly caused by boxing, which we’ll explore more in depth and in a broader context here.

Before diving into boxing and when it can implicitly occur, let’s first review some terminology.  If you’re already fluent in value types vs. reference types, feel free to skip to the boxing section.

Value types

In C#, value types are structs.  Primitives you’re very familiar with like integer, float, and Guid are structs, as well as the ones you may define yourself.  If you pass a struct as a parameter to another method, that method gets a copy of that struct – not the original struct.  Consider this example:

static void Main() {
   int seeds = 5;
   Debug.Assert(seeds == 5);

static void AddOneSeed(int seeds) {

The AddOneSeed method received a copy of the integer, which is a struct.  Therefore any changes it made do not propagate back to the caller.  Note that if a “ref” keyword is added to the parameter in the method’s signature that the method accesses the caller’s original struct instead of a copy, allowing the method to modify its caller’s value.

Value types store their data directly in their container.  In the above case, the value types appear as local variables within methods, so the data of these value types is stored directly on the stack of the thread executing these methods.  This also means that value types can vary in size, and the container must know exactly how large the value type is so that enough memory can be allocated for it.

Reference types

In C#, reference types are classes.  Reference types store their data on the heap, and anyone holding a reference type is actually just holding a reference: a memory pointer to that object on the heap.  Classes may contain structs as fields.  This means that value types may actually appear on the heap.  Consider this simple example:

class Fruit {
   int seeds;

Fruit is a reference type that contains a field that is a value type.  This value type always appears on the heap because it’s within an object on the heap.  The memory allocated on the heap by the Fruit object actually includes the value type directly.  In the case of Fruit above, the object on the heap is 12 bytes in size: 8 for the CLR header that exists for every object, and 4 more bytes for the seeds field.

Boxing: When a value type becomes a reference type

Sometimes a value type must be represented as a reference type.  The CLR allows anything to be represented as a System.Object.  But System.Object is a reference type.  Anyone with a reference to Object expects it to be a pointer to an object on the heap, and the size of Object to the holder of the reference is always 4 bytes regardless of the size of that object on the heap (or 8 bytes in a 64-bit process).  How can a value type, which may vary in size, be assigned to a System.Object, which is always exactly 4 bytes?  Boxing.

Boxing happens when the compiler has to assign a value-type to a reference type.  When boxing, the CLR allocates a new object on the heap just large enough to store the particular value type being boxed, and copies the value type into that object.  Now the reference to that object can be passed around simply as System.Object without anyone knowing how large the object really is. To actually look at the value type again, it must be unboxed, where the value is copied back onto the stack, and possibly into a field typed as the matching value type.

Let’s consider the simplest example of boxing.  We have an integer value type on the stack as a local variable in this method.  Then we assign this value type to a reference type.  IComparable is an interface.  Value types can implement interfaces.  But as soon as you refer to the value type by an interface it implements it gets boxed.  Check this out:

static void Main() {
   int number = 5;
   IComparable comparable = number;

The C# compiler compiles the above code to the below IL:

.locals init (
        [0] int32 number,
        [1] class [mscorlib]System.IComparable comparable)
box int32

It’s not important that you understand all the IL, but notice the box instruction.  This tells the CLR to allocate an object and copy the value at the top of the stack into it, and then push a reference to that object onto the stack in its place.  The last instruction then assigns that object reference into the local variable typed as an IComparable. 

But wait… if a struct implements an interface already, why does it have to be boxed when it is assigned to a variable typed as that interface?  The answer lies in its size.  Anything can implement an interface (struct or class), and sizes can vary.  But folks holding a reference to an interface have to know in advance how much space to allocate for that reference.  Since value types vary in size, value types don’t qualify.  Instead, when you’re dealing with interfaces you’re always dealing with reference types, meaning that value types get boxed inside an object on the heap as a reference type.

The important thing to realize here is that although the C# code did not have a new keyword anywhere, an object was allocated anyway.  This is under-the-covers magic by C# and the CLR so that things just work, and normally it’s a good thing.  But this is a post on GC pressure, so of course we’re going to study how boxing can work against you, and how you can avoid it.

How to avoid boxing

The easiest way to avoid boxing is to always refer to a value type by its concrete type rather than by System.Object or an interface the value type implements.  You can still call methods on that struct that implement interfaces just by calling those methods directly.  For example, consider these two calls to CompareTo:

static void Main() {
   int number = 5;

   // CompareTo is found on the IComparable interface
   IComparable comparable = number; // box number
   int result = comparable.CompareTo(8); // box 8

   // CompareTo is also a public method on int directly.
   // In fact two overloads of CompareTo are available
   // on int, one that takes System.Object and another
   // that takes an int, so the C# compiler will choose
   // to call the one that takes int, avoiding both 
   // boxes that the above example causes.
   int result2 = number.CompareTo(8);

The example demonstrates two things.  First is the discrete second boxing that can happen.  Once we’ve boxed the int to an IComparable, there is only one CompareTo method for the compiler to choose from: CompareTo(object).  That means that in addition to having already boxed the 5, the 8 must also be boxed in order to call CompareTo(object).  On the other hand, leaving 5 as an integer avoids the first box, and allows the CompareTo(8) method to bind to int.CompareTo(int), which means the 8 can also remain as a value type.  To better understand what is happening, consider how the Int32 and IComparable types are defined:

public struct Int32 : IComparable, IFormattable, IConvertible, IComparable, IEquatable {
   public int CompareTo(int value);
   public int CompareTo(object value);

public interface IComparable {
   int CompareTo(object obj);

As you can see, when you have an IComparable reference, all the compiler can do is call CompareTo(object). But when you have an Int32, you also can call CompareTo(int) and avoid both boxes.

Keep in mind as well that returning a value type from a method whose signature returns an interface also boxes:

static IComparable GetNumber() {
   return 5; // boxed!

Some pathological cases

An occasional box here and there isn’t going to hurt the performance of your app.  But there are some things you can do that can lead to enough boxing to really hurt in frequently called code.

Boxing in a loop

Suppose you have no choice but to box a value type to call a method that takes an interface.  If you were calling that method in a loop with the same value type each time, you can take care to avoid boxing for each iteration of the loop.

static void BoxEachIteration() {
   int number = 5;
   for (int i = 0; i < 500; i++) {
      Work(number, i);

static void BoxJustOnce() {
   int number = 5;
   IComparable boxedNumber = number;
   for (int i = 0; i < 500; i++) {
      Work(boxedNumber, i);

static void Work(IComparable number, int iteration) {
Boxing in non-mutating methods

When you implement public methods that don’t mutate state, you should generally be very careful not to box at all.  For example, folks expect that retrieving a value from a dictionary should have no side effects including not allocating memory.  This is important because your cheap, non-mutating methods are often the ones that end up being called in loops which may end up being perf critical to the application.

Earlier this week I investigated a performance problem that I found was due to GC pressure caused by a hashtable-like collection that boxed in its Get method.  Ouch.  Here was the Get method:

public T Get(string key)
    return this.Get(new KeyedObject(key));

There's a new keyword there, but KeyedObject was a struct, so supposedly no heap allocations there. But it gets more subtle. Take a look at how the struct is defined, and how the Get overload that is being called is defined:

private struct KeyedObject : IKeyed
    private string name;
    internal KeyedObject(string name);
    string IKeyed.Key { get; }

public T Get(IKeyed item)
  // ...

Do you see the problem? Although the Get(string) method creates a struct, which doesn’t cause GC pressure, it then passes that struct to a method that accepts an interface.  That struct implements that interface, so while the C# compiler allows it, it first has to box the struct, which means all the work the authoring developer went to by using a struct to avoid allocations didn’t actually avoid the allocation after all.  Since this Get(string) method was being called thousands of times in this perf critical scenario, we were generating garbage like crazy and the GC had to keep interrupting the app to clean up.

One possible fix to this is to add a Get(KeyedObject) overload that does the work directly with the value type.  Then the C# compiler would call that one instead of the one taking the interface, and the boxing would disappear.

Non-generic collections

Generic collections aren’t just sugar.  They can help performance tremendously.  Consider a table of value type pairs.  If you use the non-generic Hashtable collection, where keys and values are typed as System.Object, each key and value must get boxed before storage.  That at least is a somewhat fixed cost – it’s not directly garbage because the box is persistent for as long as that entry is in the collection.  But now think about every lookup operation.  Let’s say you have a hashtable where the keys and values are both integers.  You have a key and you want to look up the value.  You are compelled to box the key first, making it impossible to query for values without allocating memory.

var table = new Hashtable();
table[5] = 8; // boxes 5 and 8
int value = (int)table[5]; // box 5 (again)

If on the other hand you use Dictionary<int, int>, all the internal storage mechanisms and methods are strongly typed to take and return integers rather than System.Object, so no boxing occurs. This can have a dramatic impact on performance due to GC pressure as well as memory availability since each integer takes only 4 bytes in memory instead of the minimum 12 bytes required by every GC heap allocated object.

Should I define my type as struct or class?

That’s a broad question and there is more to consider than GC pressure when making this decision.  In my personal experience, almost every time I’ve nobly started with a struct I’ve ended up converting it to a class.  So my personal default is to use class unless struct is required.  There are several reasons for this, but most pertinent to this post are two things to consider:

  1. Structs allow you to allocate and pass around the values without allocating memory on the heap.  Classes must always be allocated on the heap.
  2. Structs can be boxed many times, but classes will only allocate memory once

Every time you assign a struct such that boxing is required, a new object is allocated.  So the same struct can actually end up acquiring lots of memory and producing lots of garbage.  Whereas if you’d just chosen to use a class, the memory is allocated once and shared across all who reference it.  So be careful when you define structs and ask yourself how folks with that struct will use it.  If the likelihood is high that most instances of the struct will get boxed sometime in their lifetime, consider making it a class instead to help keep the GC pressure down.

Friday, June 08, 2012

GC pressure series: Introduction and enumerators

As you may already know, I’m a Microsoft developer who works on Visual Studio.  Improved performance and responsiveness is a major goal for the 2012 release.  As much of the new code in Visual Studio 2012 was written in C#, I learned a lot about how much GC pressure can degrade performance. 

Q: Wait.  What is GC pressure?

A: GC pressure happens when your .NET application produces lots of “garbage” by allocating and releasing memory.  .NET has to “collect” all this garbage periodically by walking your application’s memory graph to determine which objects are still used and which have become garbage, then the memory for the garbage is freed. 

Q: Right.  That’s what garbage collection is all about.  So what’s the problem?

A: Normally the CLR’s heuristics for finding good times to pause your application to perform GC gives your application a lean memory footprint and high responsiveness.  The problem comes when you’re in a performance critical section of code in your application, perhaps in a loop of some kind, and within that section you’re producing and discarding lots of managed objects. This causes “GC pressure” which leads .NET to interrupt your perf critical code to run the GC in order to keep your application from running up its memory use, which besides potentially slowing down the machine overall, can eventually lead to an OutOfMemoryException.  The heuristics of .NET’s GC are complex and beyond the scope of my blog.

Q: How do I know if GC pressure is responsible for my performance problems?

A: Like most performance investigations, you find out that GC pressure is your problem by measuring using some profiler tool and noticing that native callstacks are executing while you’re allocating memory that include GC code. 

For instance, some large Visual Studio solutions I measured were spending around 30% of the time simply running the GC.  Imagine if I could wave a wand and make VS 30% faster.  That would be significant, right?  The good news is that GC pressure is very often a solvable problem.  In several upcoming blog posts, I’ll review several discrete sources of GC pressure that are easily solvable, and which allowed us to greatly improve solution load performance (and other scenarios) significantly.

Enumerator structures vs. IEnumerable<T>

It’s often good design to use types that are only as specific as needed.  For instance, your public methods should typically take IList<T> instead of List<T>.  This allows others to call you with a custom list implementation if they need to.  But this can actually cause GC pressure if you use a C# foreach to enumerate the list.  Consider the following:

static void Main() {
   var numbers = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 8 });
   foreach (var number in numbers) {

In this case, the C# compiler knows that foreach is enumerating the List<int> type.  The public List<T>.GetEnumerator() method returns a List<T>.Enumerator, which is a struct rather than a class.  This means that C# can generate IL that enumerates the collection without allocating any new objects on the heap: no garbage.  But now consider a only slightly different case:

static void Main() {
   IList<int> numbers = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 8 });
   foreach (var number in numbers) {

The only different is that I’ve explicitly typed the List<int> into an IList<int> local variable.  Now when C# enumerates the list with foreach, the only GetEnumerator() method available to call is the one on IList<T>, which returns an IEnumerator<T>.  This is typed as an interface, not a struct.  Although the underlying List<T>.GetEnumerator() will still be invoked, the struct it returns will now have to be boxed so that it is addressible via its IEnumerator<T> interface. This boxing allocates an object on the heap, which will be discarded after the enumeration is over.  If this foreach was itself in a perf critical loop then this could be a significant source of GC pressure.

To wrap up with a final comprehensive example, here are two methods you might define:

static void GarbageProducer(IEnumerable<int> numbers) {
   // numbers.GetEnumerator() can only returned a (boxed)
   // instance of IEnumerator<int>, forcing this method to
   // allocate one object each time this method is invoked.
   foreach (var number in numbers) {

static void NoGarbageProducer(List<int> numbers) {
   // numbers.GetEnumerator() will return the stack allocated
   // List<int>.Enumerator struct, so no allocations are required.
   foreach (var number in numbers) {

In the above snippet, GarbageProducer allocates memory, but its parameter is typed for maximum portability, whereas NoGarbageProducer never allocates memory but requires a very specific collection type to be passed in.  Which is better?  That depends, of course.  If the method is private, and you’re passing in a List<int> anyway, by all means keep the parameter typed as List<int> and avoid the allocation (in perf critical code anyway).  If the method is public you should probably use IEnumerable<T>.  Yes, it almost guarantees an allocation will be required, but on the other hand it allows virtually any type of collection to be passed in.  If you take List<T> and the caller only has a T[] array, the caller would first have to convert the array to a List<T> instance and then pass it in, and that’s much more costly to memory.

I did say “almost guarantees” an allocation.  Can it be avoided?  Yes, actually.  With some extra work within the method:

static void NoGarbageProducer(IEnumerable<int> numbers) {
   List<int> optimizedCase = numbers as List<int>;
   if (optimizedCase != null) {
      foreach (var number in optimizedCase) {
   } else {
      foreach (var number in numbers) {

private static void DoWork(int number) {

Obviously this is more code to write and maintain, so it should only be done where measuring performance proves it is worthwhile.  Note that it only makes the enumerating alloc-free when the collection passed in is actually a List<T>, otherwise it falls back to the one-allocation code path.  This is an example of how you might discover you have a very popular method in a perf critical path, and perhaps you observe that you tend to be called with List<T> so you optimize for that case.  You could even optimize for multiple specific collection types, so long as each of them implement the alloc-free struct GetEnumerator pattern.