Written some cool source code? Upload it to Programmer's Heaven.

CLR Profiler, your memory spy

Introduction

Memory leaks, haven't we all been haunted by them. Your team has just completed another mayor release of your web based product. Important new features have been added and customers really like your new product. Customers rapidly start to adopt your product, deploy it on their servers and suddenly the call volume in your call center spikes. Customers report severe memory leaks under heavy usage and need to restart your application frequently to keep it up and running. Your support and engineering team gets flooded with support incidents and your customer facing team starts screaming to get this fixed ASAP. Some of you folks might have lived through such a scenario and know how much pressure this can cause. We all know that it is not easy to find the root cause for memory leaks. Your team will spend 99% of their time identifying the root cause and 1% to fix it. There are many possible root causes for memory leaks. Memory leaking implies already what happens, memory is allocated but never freed so it can never be reused. This can be caused by a memory block being allocated and forgotten to be freed, because you instantiated an object but forgot to free it up, because of circular references, etc.

Microsoft addressed this problem in .NET, the new world of managed code. The .NET framework takes over the memory management for you. It knows any memory block allocated in your application, tracks its usage, knows when it is no longer used and then frees the memory up. This is done by the Garbage Collector and the first part of the article will explain how the GC works. But a look under the hood also shows that you still have to worry about memory allocations. You need to understand the memory allocation profile of your application so you know how effective your memory usage is and ultimately how much pressure your application puts on the GC. If your application has a mixture of many short and long lived objects then this will cause many garbage collections which ultimately can affect performance negatively. The second part of this article covers how you can use the CLR Profiler to understand the memory allocation profile of your application and how you can improve it.

A look at value types vs. reference types and stack vs. heap

The .NET framework distinguishes between value types and reference types. Value types are primitive types like Integer, Boolean, Byte, Character, etc. A variable of a value type always contains the value of that type. Passing it along as method parameter or assigning it to another variable always creates a copy of the value type. Value types are always stored on the stack. When declaring a value type .NET always assures it assigns a default value, which is always zero.

Reference types are class types, interface types, delegate types and array types. Reference types are allocated on the heap and have a pointer on the stack which points to the type on the heap. A variable of a reference type contains the pointer to the heap. Passing it along as a method parameter or assigning it to another variable copies the pointer but each pointer points to the same location on the heap.

The Stack and Heap are two different memory spaces. The stack is a FILO (First-In-Last-Out). So the stack is a sequential memory block and memory can be freed up only after all the used memory above it has been freed. There is a stack-pointer which points to the top of the stack, so as new value types are pushed to the stack it gets incremented. So let's look at the following code sample (not the most effective code sample but it demonstrates well our stack usage):

public void Main()
{
int Result = Calculate(20,10);
}
private int Calculate(int Val1,int Val2)
{
int Result;
// calculate the value
Result = Val1 + Val2;
// return the value
return Result;
}


There are two methods - Main() and Calculate(). When Main() is called it creates a stack frame on the stack. Then it pushes the return address to the stack followed by all the arguments. As the method is executed and new variables are declared it creates these on the stack itself. If a reference type is created it creates the reference type on the heap and creates a pointer to it on the stack. In our sample it creates Result (type integer) on the stack. Then it calls Calculate() which creates a new stack frame. It stores the return address and the two arguments Val1 and Val2 on the stack. It then creates Result (the local variable) on the stack and executes the method. It then returns the value of Result to the caller, returns to the caller (it knows the return address as it has been pushed to the stack) and destroys the stack frame, meaning it moves the stack pointer back and frees up all the memory allocated on the stack by this stack frame (method). When Main() is done it also destroys its stack frame. Walking the stack, looking for all the stack frames, gives you the call stack. The call stack is the call hierarchy and shows which methods have been called in which order.

The heap is used to allocate memory for the reference types. Instantiating an object of a type/class tells the .NET framework to allocate memory for this type on the heap and assign the pointer to it to the variable (which itself is put on the stack). There is heap pointer which points to the top of the heap. Allocating new memory is as simple as moving that heap pointer by the amount of memory required. The heap is a collection of allocated reference types and the Garbage Collector of the .NET framework manages the heap.

A dive into the .NET Garbage Collector

In the unmanaged world you controlled the memory yourself. You did that by freeing up any memory you allocated with malloc and by reference counting. Each COM object has a reference counter. When a COM object is created or the reference to it is passed around the reference counter gets incremented. When the COM object is no longer needed you make sure that the reference counter gets decremented again. When the reference counter of a COM object is zero then the COM library knows that it can free up the memory used by the COM object. The VB runtime takes care of the reference counting for you. In VB you make sure that you set the variable referencing the COM object to "Nothing" when you no longer need the COM object. This puts the burden on the developer and is a frequent cause for memory leaks.

In .NET you no longer allocate memory manually and you also don't do anymore any referencing counting. You instantiate types/classes as needed and as soon as they are out of scope, meaning they are no longer used, .NET performs the cleanup. The cleanup is performed by the Garbage Collector of .NET. How does the GC know when an object is no longer needed? It finds that information from different types of Roots, storage locations which contain references to objects on the heap.

  • Stack - The stack which includes all local variables or method arguments of any executing method.
  • Thread stack - The thread local storage of any thread executing in your application.
  • CPU registers - The CPU registers of any thread executing in your application.
  • Global and static references - All global and static types/classes.
The GC considers all objects on the heap as garbage except it has been referenced from one of the roots or referenced by objects which are referenced by the roots. So when the GC finds an object referenced by the root it then looks at the object itself and finds all the objects referenced by it and so forth. So it finds all objects which are referenced, meaning used. This process is called graphing, as the GC walks all roots and builds a graph of reachable objects. What follows is the compacting phase. The GC always strives toward a continuous block of used memory followed by a continuous block of free memory so that memory allocations are nothing else then incrementing a heap pointer. This assures that memory allocations are very fast and cheap.

In the compacting phase all used objects are moved together into one continuous memory block. Take the example that you have four objects - object A, B, C and D. During a garbage collection the GC determines in the graphing phase that object B is no longer used. Object C and D are then moved so you have a continuous memory space occupied by object A, C and D followed by the free memory. If an object is moved it is also required that the GC updates all pointers referring to it. At the end the GC updates the heap pointer so it points again to the next available memory. There are two versions of the GC available. The Server GC is used by ASP.NET and is optimized for throughput. The Server GC halts all threads during the graphing and compacting phase. The workstation GC is used by Windows Forms applications and is optimized for performance. The Workstation GC halts all threads of your application only during the compacting phase. So your application still executes and is responsive to the user during the graphing phase.

There is a common theme amongst most applications. Applications have long lived objects and short lived objects. Many objects are created, used and then fall out of scope, meaning there is no reference to it anymore. Most objects fall into this category. Therefore the GC distinguishes between three generations, G0, G1 and G2. Every object created is considered to be generation zero. When the GC kicks in it first looks only at G0 objects. If it can free up enough memory, which is the case most of the times, it will stop. This is considered a partial collection. All objects which survive a G0 are promoted to G1. If the GC couldn't free up enough memory it will perform a G1 garbage collection. Any object surviving it will be promoted to G2. If that didn't satisfy the memory requirement it will perform a full collection which will also look at G2 objects. The full collection is the most expensive. It is recommended to create long-lived objects at the beginning of your application to ease pressure on the GC.

Another optimization is that the .NET framework keeps two heaps, one for large-sized-objects and another one for small to medium sized objects. Objects which are larger then 80KB are kept in the large-sized heap. The GC does not perform any compacting for the large-sized heap because it would be costly to move large amounts of memory around. For multiprocessor machines the Server GC splits the heap into sections, one section per CPU. The GC then uses one thread for each CPU/section for the collection process. Each garbage collection means a performance hit and of course a full collection is the most expensive one. Therefore it is important to understand the memory allocation profile of your application, as it can have a real impact on the overall performance of your application.

Monitoring the Garbage Collection

The easiest way to understand how much pressure your application puts on the GC is to monitor your application using the Performance Monitor from Windows. Go to the menu "Administrative Tools | Performance" and then click on the "Plus" icon in the tool bar. It allows you to add performance counters. In the drop down list called "Performance object" select ".NET CLR memory". Select the following counters in the list.

  • # Gen 0 Collections
  • # Gen 1 Collections
  • # Gen 2 Collections
  • Gen 0 heap size
  • Gen 1 heap size
  • Gen 2 heap size
  • Gen 0 Promoted Bytes/sec
  • Gen 1 Promoted Bytes/sec
  • % Time in GC
  • # Bytes in all Heaps
Select from the instance list your application (start the application before hand) and then click add. Click close, switch to your application, use it as your users would and watch the counters in the performance monitor. It will show you very well how many collections are happening for each generation, how many bytes are used in each generation, how much memory is promoted to the next generation and how much time is spent in the GC. This provides a pretty good overview what is going on. Here are a few general guidelines:

  • Number of collections - The number of G0 collections should be the highest and the number of G2 collections the lowest which means you have lots of short-lived objects and only some long-lived objects. If you have a high number of G1 and G2 collections then your application keeps hanging on to objects and you should look how to improve that.
  • Size of promoted memory - The G1 and G2 heap size should be low compared to "# Bytes in all Heaps", meaning only a small part of your allocated memory is promoted to G1 or G2. If you have high numbers then you are promoting lots of objects to G1 or G2 and you should look how to improve that.
  • Time spend in GC - The percentage of time spend in the GC should be low. Any number higher then a few percentages should warrant a through investigation as your application puts lots of pressure on the GC. This can also be seen by high numbers of G0, G1 and G2 collections in a short period of time.
This provides you a good overview what is happening on the heap and how much pressure your application puts on the GC. But you need to have a much more granular view to make smart decisions which part of your application really puts the most pressure on the GC.

Introducing the CLR Profiler

That is where the CLR Profiler, a free tool from Microsoft, comes into play. The CLR Profiler provides a granular view of what is happening on the heap of your application. The CLR Profiler (version 2.0 at the time of writing this article) can be downloaded from here. When downloaded unzip it into a folder and launch the "CLRProfiler.exe" from the Binaries folder.

The CLR Profiler only allows to profile .NET applications as it relies on the CLR profiling API. Please note that the application will be between 10 and a 100 times slower then normal because the profiler records through the CLR profiling API every single memory allocation and garbage collection which is naturally very resource intensive. This article will not describe the profiling API itself. Detailed documentation is provided by Visual Studio .NET 2003 itself (go to the following folder "C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Tool Developers Guide\docs" and open the document named "Profiling.doc"). The CLR Profiler allows to profile applications, windows services and ASP.NET applications.

  • .NET Applications - Go to the menu "File | Profile Application" and select the executable. This will launch the application (be patient it might take a while). Use the application as your normal users would. You can go back any time to the CLR Profiler and click "Kill Application" which will stop the profiling but keeps the application still open. If you want to resume the profiling click on "Start Application". It will ask you if you want to save the previous profiling information, which is good practice so you can go back and compare them. This will bring up a new instance of the application and you can repeat the procedure. If you want to look at the heap while the profiling continues, click on "Show Heap now". This allows you to view the current state of the heap while the profiling continuous. This provides not all views (more to that later).
  • Windows Services - Go to the menu "File | Profile Service" and enter the name of the Windows Service to profile (you can open the windows service manager and look up the name of the service). Make sure you put the name under double quotes if it contains spaces, e.g. "My Service". You see that the dialog creates the command to start and stop the service for you. Add any needed command line arguments and then click Ok. It restarts the windows service so the CLR Profiler can hook itself into the profiling API of that service. You have the same three options in the CLR Profiler. Click "Kill Service" which restarts the service without the CLR Profiler being hooked into it. Click "Start Service" to resume the profiling which will restart the service again with the CLR Profiler hooking itself into it. And click "Show Heap now" to see the current snapshot.
  • ASP.NET Applications - Go to the menu "File | Profile ASP.NET" which will restart IIS, because the CLR Profiler needs to hook itself into the profiling API of this .NET process. It then shows a dialog which says it is waiting for the ASP.NET process to start which you do by opening up a browser and use your web application as your normal users would. You have again the same three options in the CLR Profiler. Click "Kill ASP.NET" which restarts the IIS without the CLR Profiler being hooked into it. Click "Start ASP.NET" to resume the profiling which will restart IIS again with the CLR Profiler hooking itself into it. And click "Show Heap now" to see the current snapshot.
Use the menu item "File | Save Profile As" to save the profiling information to a log file and "File | Open log File" to open the log file again. Through the View menu you can look at different views of the profiling information.

The "Histogram Allocated Types" view
This view shows you the allocated/instantiated types over size. The horizontal axis is the object size and the vertical axis shows all the types of that size. Each type is shown with a different color. The legend on the right side shows the color of each type, how many instances of each type have been instantiated and what percentage of the total heap size it took. It also shows the total heap size - 2.4 MB with 43,193 objects.

http://www.programmersheaven.com/articles/klaus/CLR-Profiler/HistogramAllocatedTypes_small.jpg

This view shows very well which types are instantiated, how many instances of a type are around and which types take up the most of the total heap. In our sample (see graph above) we see that there are hundreds to thousands of String instances around, varying in size. This warrants some further investigation. Clicking on a type in the legend highlights all occurrences in the graph while clicking on a type in the graph highlights the type in the legend. This makes it very easy to identify a type on the graph and also to find all occurrences of a type on the graph. You can also change the scale of the horizontal and vertical axis with the two groups of radio buttons on top of the view. You can also right click on a type in the graph or legend and select "Show Who Allocated" from the popup menu. This brings up the "Allocation Graph" (see more later on).

The "Histogram Relocated Types" view
This graph looks exactly the same as the previous one, except that it does not show the types currently allocated on the heap. It shows the Relocated types, these are the types which survived a GC and have been moved. It ultimately shows how much memory (which types, how many instances of each type and total memory size of each type) had to be moved by the GC. This means the types shown have survived a GC and have been promoted to the next generation. In our sample hundreds to thousands of String instances have survived a GC. This warrants a through investigation as it indicates your application is holding on to these String objects. The number of types and the total size of memory moved should be small, the smaller the better which means you have many short-lived objects and only few are long-lived. This view provides the same features as the previous one.

The "Objects by Address" view
This view shows a snapshot of the heap, which is by default at the end of the profiling period. You can also call this graph from the "Time Line" view which then shows the heap at the selected time. It shows which objects live in G0, G1 and G2. You also see two separate heaps. The one on the right side is the large-sized heap. The colors indicate which type occupies the location on the heap.

http://www.programmersheaven.com/articles/klaus/CLR-Profiler/ObjectsByAddress_small.jpg

You can change again the scale of the vertical and horizontal axis with the two groups of radio buttons on top of the view. Selecting a type in the legend highlights all its occurrences on the graph. Selecting a type on the graph highlights only that instance so you see its size. Selecting a region on the graph (click and drag the mouse) shows in the legend information about the selected region (which types are living in that region, how many instances have been instantiated for each type, etc.). Right click on the selected region and from the popup menu jump to the "Histogram Allocated Types" view or the "Allocation graph" view. It shows you for the selected objects the object size as well which types allocated them. You can select and then right click on a type in the legend and select the same options from the popup menu. This then limits the called "Histogram Allocated Types" view and "Allocation graph" view to the selected type.

The "Histogram by Age" view
This view shows how long instantiated types survive in your application. The vertical axis shows the time. The more objects you have on the right side, the more objects are hanging around in your application. This warrants some thorough investigation. This graph provides the same features as the "Histogram Allocated Types" view.

The "Allocation Graph" view
This view shows you how much memory has been allocated on the heap by a method which includes also all the allocations of child methods called by this method. To the left you see all the methods which called this method (callers) and to the right you see all the methods it calls itself (callees) plus all the objects it allocated. You read the view from left to right. At the top left you see a box (root) representing the CLR runtime which in our example is 2.4 MB (see screenshot below). It then shows the top level method calls and how much memory each allocated (including all methods called by it). In our example EasyMetaDataGeneratorForm::Main() allocated 2.3 MB, 94.88% of the total heap size.

http://www.programmersheaven.com/articles/klaus/CLR-Profiler/AllocationGraph_small.jpg

You can follow the execution path to the right, Main() called Run() which called way down the line ReadDataIntoDataSet() to read the required data into a data set. ReadDataIntoDataSet() allocated in total about 775kB, 31.50% of the total heap size. And so forth. The height of each box is in relation to the total heap size allocation.

You can click on the menu "Edit | Find routine" to find a method. It will bring that method into the center of the view. You can also click on a method/object in the view, which will highlight it and its connections to the other methods/objects. So it highlights which method called this method (to the left) and which other methods it itself called or objects it allocated (to the right). You can also right click on a method/object and select "Select callers & callees" in the popup menu. This will select all the methods in the caller hierarchy all the way to the top calling method (left side) and all the methods called or objects allocated by it downstream (right side). Right clicking on a method and selecting "Prune to callers & callees" from the popup menu filters out all the methods/objects upstream and downstream in the calling hierarchy which removes unrelated information from the view and makes it easier to read. Perform the same operation on the top right box (root) to show again all methods/objects. Spend time on that view and understand which method is responsible for what heap allocations.

You can also select a method, then right click on it and select "Copy as text to clipboard" from the popup menu. Paste it in an editor and it shows you in detail all the types allocated by this method and which other methods have been called by it. For each it shows how much memory got allocated and what the contributing percentage to the total memory allocation is (can be slightly off due to rounding errors).

The "Assembly Graph", "Function Graph", "Module Graph" & "Class Graph" view
These views work the same way as the "Allocation Graph" only difference being that it shows which assemblies, functions/methods, modules and classes/types got loaded by a method.

The "Call Graph" view
Works the same as the "Allocation Graph" but it only shows the caller hierarchy, so which methods get called by a method. It also shows how often a method gets called and how many other methods it itself (all children downstream) calls.

The "Time Line" view
This view shows you very well the memory consumption of your application over time. It visualizes how the memory consumption grows till the GC kicks in and clears up the memory. It shows both the large-sized heap (top graph) plus the regular heap. It marks on the horizontal axis each garbage collection (red for G0, green for G1 and blue for G2). In our sample you can see that soon after the application started a G0 happened but it could not free up much memory so soon thereafter a G2 happens which frees up memory. Soon thereafter another G0 happens which this time frees up some memory followed by a huge jump in memory consumption and another G2 collection which frees up most of that memory. This pattern repeats itself for another twelve collections. One obvious fact is that most of the collections are G2 collections which are the most expensive. This indicates most of the objects are long-lived and warrants a thorough investigation. You can improve the performance of your application by making as many of the objects short-lived as possible. This would result in more G0 collections which are much faster.

http://www.programmersheaven.com/articles/klaus/CLR-Profiler/TimeLine_small.jpg

Click and drag the mouse to select a region. The legend shows then information about all the types in that time frame. You can select a type on the legend to highlight it on the graph. Right click on the selected region and you can jump via the popup menu to the "Allocation Graph", "Histogram Allocated Types" and "Histogram Relocated Types" view (which will only cover the selected region). You can do the same by highlighting a type in the legend and right click on it.

Next Page



 

Other Views

corner
Popular resources and forums for programmers on Programmersheaven.com
Assembly, Basic, C, C#, C++, Delphi, Java, JavaScript, Pascal, Perl, PHP, Python, Ruby, Visual Basic
© Copyright 2009 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Publisher: Lars Hagelin. Read the latest words from the publisher here.
Be the first to sign up for Lars Hagelin’s In-depth Outsourcing Newsletter here.
bootstrapLabs Logo A bootstrapLabs project.