Wednesday, 25 January 2012

Writing code C# Performace

Writing code C# Performace

Writing code that runs quickly is sometimes at odds with writing code quickly. C.A.R. Hoare, computer science luminary and discoverer of the QuickSort algorithm, famously proclaimed, "Premature optimization is the root of all evil." The extreme programming design principle of "You Aren't Gonna Need It" (YAGNI) argues against implementing any features, including performance optimizations, until they're needed.
Writing unnecessary code is undoubtedly bad for work efficiency. However, it's important to realize that different situations have different needs. Code for vehicular real-time control systems has inherent up-front responsibilities for stability and performance that aren't present in, say, a small one-off departmental application. Therefore, it's more important in such code to optimize early and often.
Performance tuning for real-world applications often involves activities geared towards finding bottlenecks: in code, in the network transport layer, and at transaction boundaries. However, these techniques alone cannot solve the dreaded problem of uniformly slow code, which surfaces when large bottlenecks have been resolved but the code still exhibits inadequate performance. This is code that has been written without attention to correct usage, often by junior programmers, in the same style across whole modules or applications. Unfortunately, the best solution for this problem is to make sure that all programmers on a project follow correct coding practice when writing the code the first time; coding guidelines and good shared libraries help enormously.
This article presents helpful tips for writing in-process .NET managed code that performs well. It's assumed that basic programming skills such as factoring control structures, pulling work outside of loops whenever possible, caching variables for reuse, use of the switch statement, and the like are known to the average reader.
All code examples referred to in this article can be downloaded from the .NET Developer's Journal Web site, at www.sys-con.com/dotnet/sourcec.cfm. The code comes with a Windows Forms application that can be used to easily view the code and run all the tests on your own machine. You'll need the .NET Runtime 1.1 to run the code.
Tools
While testing tools such as NUnit and the upcoming VS.NET 2005 Team System can help you find bottlenecks, when tuning small sections of code, there's still no substitute for the micro-benchmark. This is because most generic testing frameworks depend on things like delegates, attributes, and/or interface method calls to do testing, and the code usually is not written with benchmarking primarily in mind. This can be very significant if you're interested in measuring the execution time of a batch of code down to the microsecond or even nanosecond level.
A micro-benchmark consists of a tight loop isolating the code that's being tested for performance, with a time reading before and after. When the test has finished, the start time is subtracted from the end time, and this duration is divided by the number of iterations to get the per-iteration time cost. The following code shows a simple micro-benchmark construct:


int loopCount = 1000000000;
long startTime, endTime;
double nanoseconds;

startTime = DateTime.Now.Ticks * 100;
for(int x = 0; x < loopCount; x++) {
  // put the code to be tested here
}
endTime = DateTime.Now.Ticks * 100;
nanoseconds = ((double)(endTime - startTime)) / ((double)loopCount);
Console.WriteLine(nanoseconds.ToString("F") + " ns per operation");
When performing a simple micro-benchmark, it's important to remember a couple of things. First, small fluctuations (noise) are normal, so to obtain the most accurate results, each test should be run several times. In particular, the first set of tests executed after program initialization may be skewed due to the lazy acquisition of resources by the .NET runtime. Also, if your results are very inconsistent, you may not have penetrated the "noise floor" of the measurement. The best solution for this is to increase the number of loops and/or tests. Another thing to remember is that looping itself introduces overhead, and for the most accurate readings, you should subtract this from the result. On a P4-M 2-GHz laptop, the per-loop overhead for a for loop with an int counter in release mode is around 1 nanosecond.
I'd never advocate running each code fragment from a long program through micro-benchmarks, but benchmarking is a good way to become familiar with the relative costs of different types of expressions. True knowledge of the performance of your code is built on actual observations. As time goes on, you'll find yourself needing such tests less and less, and you'll keep track in the back of your head of the relative performance of the statements you're writing.
Another important tool is ildasm.exe, the IL disassembler. With it, you can inspect the IL of your release builds to see if your assumptions are correct about what's going on under the covers. IL is not hard to read for a person familiar with the .NET framework; if you're interested in learning more, I suggest starting with Serge Lidin's book on the subject.
A great free tool for decompiling IL to C# or VB source, Reflector, is found at www.aisto.com/roeder/dotnet/; it's incredibly useful for viewing code that ships with the .NET Framework, for those of you less familiar with IL.
The CLR Profiler, available as a free download from Microsoft's Web site, allows you to track memory allocation and garbage collection activity, among other useful features. Also, the MSDN Web site has excellent coverage of performance metrics tools such as WMI and performance counters.
Working with Objects and Value Types
Objects: A Double Whammy
Objects are expensive to use, partly because of the overhead involved in allocating memory from the heap (which is actually well-optimized in .NET) and partly because every created object must eventually be destroyed. The destruction of an object may take longer than its creation and initialization, especially if the class contains a custom finalization routine. Also, the garbage collector runs in an indeterministic way; there's no guarantee that an object's memory will be immediately reclaimed when it goes out of scope, and until it's collected, this wasted memory can adversely affect performance.
The Garbage Collector in a Nutshell
It's necessary to understand garbage collection to appreciate the full impact of using objects. The single most important fact to know about the garbage collector is that it divides objects into three "generations": 0, 1, and 2. Every object starts out in generation 0; if it survives (if at least one reference is maintained) long enough, it goes to 1; much later, it transitions to 2. The cost of collecting an object increases with each generation. For this reason, it's important to avoid creating unnecessary objects, and to destroy each reference as quickly as possible. The objects that are left will often be long-lived and won't be destroyed until application shutdown.
Lazy Instantiation/Initialization
The Singleton design pattern is often used to provide a single global instance of a class. Sometimes it's the case that a particular singleton won't be needed during an application run. It's generally good practice to delay the creation of any object until it's needed, unless there's a specific need to the contrary - for instance, to pre-cache slow-initializing objects such as database connections. The "double-checked locking" pattern is useful in these situations, as a way to avoid synchronization and still ensure that a needed action is only performed once. Lazy initialization is a technique that can enhance the performance of an entire application through object reduction.
Avoiding Use of Class Destructors
Class destructors (implemented as the Finalize() method in VB.NET) cause extra overhead for the garbage collector, because it must track which objects have been finalized before their memory can be reclaimed. I've never had a need for finalizers in a purely managed application.
Casting and Boxing/Unboxing Overhead
Casting is the dynamic conversion of a type at runtime to another, and boxing is the creation of a reference wrapper for a value type (unboxing being the conversion back to the wrapped value type). The overhead of both is most heavily felt in collections classes, as they all - with the exception of certain specialized ones like StringDictionary - store each value as an Object. For instance, when you store an Int32 in an ArrayList, it is first boxed (wrapped in an object) when it is inserted; each time the value is read, it is unboxed before it is returned to the calling code.
This will be fixed in the next version of .NET with the introduction of generics, but for now you can avoid it by creating strongly typed collections and by typing variables and parameters as strongly as possible. If you're unsure about whether or not boxing/unboxing is taking place, you can check the IL of your code for appearances of the box and unbox keywords.
Trusting the Garbage Collector
Programmers new to .NET sometimes worry about memory allocation to the point that they explicitly invoke System.GC.Collect(). Garbage collection is a fairly expensive process, and it usually works best when left to its own devices. The .NET garbage collection scheme can intentionally delay reclamation of objects until memory is available, and in particular longer-lived objects (those that make it to generation 1 or 2) may not be reclaimed for an extended period. Even a simple "Hello, world!" console application may allocate 15 MB or more of memory for its "working set." My advice: don't call GC.Collect() unless you really know what you're doing.
Properties, Methods, and Delegates
Avoiding Overuse of Property Getters and Setters
Most people don't realize that property getters and setters are similar to methods when it comes to overhead; it's mainly syntax that differentiates them. A non-virtual property getter or setter that contains no instructions other than the field access will be inlined by the compiler, but in many other cases, this isn't possible. You should carefully consider your use of properties; from inside a class, access fields directly (if possible), and never blindly call properties repeatedly without storing the value in a variable. All that said, this doesn't mean that you should use public fields! Example 1 demonstrates the performance of properties and field access in several common situations.
About Delegates
Delegates are slower to execute than interface methods. Delegates are often used to introduce a level of indirection in code, but in almost all cases interfaces allow a cleaner design. Of course, it's impossible to completely shun delegates; the entire event-handling paradigm in .NET is based on them. Example 2 compares the performance of delegates and direct method calls.
Minimizing Method Calls
The .NET compiler is capable of performing many optimizations for release builds. One of them is called "method inlining." If method A calls method B and certain other conditions are met, such as the code in method B being small enough, the code from B will be copied into A during compilation. However, .NET won't or can't inline certain types of methods, such as virtual methods or methods over a certain size. Each method invocation/property access entails significant overhead, such as the allocation of a stack frame, etc. Of course, you should never repeatedly call a method for the same result on purpose, but you should also be mindful of the impact of method calls in general.

No comments:

Post a Comment