Writing code C# Performace
Writing
code that runs quickly is sometimes at odds with writing code quickly.
C.A.R. Hoare, computer science luminary and discoverer of the QuickSort
algorithm, famously proclaimed, "Premature optimization is the root of
all evil." The extreme programming design principle of "You Aren't Gonna
Need It" (YAGNI) argues against implementing any features, including
performance optimizations, until they're needed.
Writing
unnecessary code is undoubtedly bad for work efficiency. However, it's
important to realize that different situations have different needs.
Code for vehicular real-time control systems has inherent up-front
responsibilities for stability and performance that aren't present in,
say, a small one-off departmental application. Therefore, it's more
important in such code to optimize early and often.
Performance
tuning for real-world applications often involves activities geared
towards finding bottlenecks: in code, in the network transport layer,
and at transaction boundaries. However, these techniques alone cannot
solve the dreaded problem of uniformly slow code, which surfaces when
large bottlenecks have been resolved but the code still exhibits
inadequate performance. This is code that has been written without
attention to correct usage, often by junior programmers, in the same
style across whole modules or applications. Unfortunately, the best
solution for this problem is to make sure that all programmers on a
project follow correct coding practice when writing the code the first
time; coding guidelines and good shared libraries help enormously.
This
article presents helpful tips for writing in-process .NET managed code
that performs well. It's assumed that basic programming skills such as
factoring control structures, pulling work outside of loops whenever
possible, caching variables for reuse, use of the switch statement, and
the like are known to the average reader.
All code examples referred to in this article can be downloaded from the .NET Developer's Journal Web site, at www.sys-con.com/dotnet/sourcec.cfm.
The code comes with a Windows Forms application that can be used to
easily view the code and run all the tests on your own machine. You'll
need the .NET Runtime 1.1 to run the code.
Tools
While
testing tools such as NUnit and the upcoming VS.NET 2005 Team System
can help you find bottlenecks, when tuning small sections of code,
there's still no substitute for the micro-benchmark. This is because
most generic testing frameworks depend on things like delegates,
attributes, and/or interface method calls to do testing, and the code
usually is not written with benchmarking primarily in mind. This can be
very significant if you're interested in measuring the execution time of
a batch of code down to the microsecond or even nanosecond level.
A
micro-benchmark consists of a tight loop isolating the code that's
being tested for performance, with a time reading before and after. When
the test has finished, the start time is subtracted from the end time,
and this duration is divided by the number of iterations to get the
per-iteration time cost. The following code shows a simple
micro-benchmark construct:
int loopCount = 1000000000;
long startTime, endTime;
double nanoseconds;
startTime = DateTime.Now.Ticks * 100;
for(int x = 0; x < loopCount; x++) {
// put the code to be tested here
}
endTime = DateTime.Now.Ticks * 100;
nanoseconds = ((double)(endTime - startTime)) / ((double)loopCount);
Console.WriteLine(nanoseconds.ToString("F") + " ns per operation");
When
performing a simple micro-benchmark, it's important to remember a
couple of things. First, small fluctuations (noise) are normal, so to
obtain the most accurate results, each test should be run several times.
In particular, the first set of tests executed after program
initialization may be skewed due to the lazy acquisition of resources by
the .NET runtime. Also, if your results are very inconsistent, you may
not have penetrated the "noise floor" of the measurement. The best
solution for this is to increase the number of loops and/or tests.
Another thing to remember is that looping itself introduces overhead,
and for the most accurate readings, you should subtract this from the
result. On a P4-M 2-GHz laptop, the per-loop overhead for a for loop
with an int counter in release mode is around 1 nanosecond.
I'd
never advocate running each code fragment from a long program through
micro-benchmarks, but benchmarking is a good way to become familiar with
the relative costs of different types of expressions. True knowledge of
the performance of your code is built on actual observations. As time
goes on, you'll find yourself needing such tests less and less, and
you'll keep track in the back of your head of the relative performance
of the statements you're writing.
Another
important tool is ildasm.exe, the IL disassembler. With it, you can
inspect the IL of your release builds to see if your assumptions are
correct about what's going on under the covers. IL is not hard to read
for a person familiar with the .NET framework; if you're interested in
learning more, I suggest starting with Serge Lidin's book on the
subject.
A great free tool for decompiling IL to C# or VB source, Reflector, is found at www.aisto.com/roeder/dotnet/; it's incredibly useful for viewing code that ships with the .NET Framework, for those of you less familiar with IL.
The
CLR Profiler, available as a free download from Microsoft's Web site,
allows you to track memory allocation and garbage collection activity,
among other useful features. Also, the MSDN Web site has excellent
coverage of performance metrics tools such as WMI and performance
counters.
Working with Objects and Value Types
Objects: A Double Whammy
Objects
are expensive to use, partly because of the overhead involved in
allocating memory from the heap (which is actually well-optimized in
.NET) and partly because every created object must eventually be
destroyed. The destruction of an object may take longer than its
creation and initialization, especially if the class contains a custom
finalization routine. Also, the garbage collector runs in an
indeterministic way; there's no guarantee that an object's memory will
be immediately reclaimed when it goes out of scope, and until it's
collected, this wasted memory can adversely affect performance.
The Garbage Collector in a Nutshell
It's
necessary to understand garbage collection to appreciate the full
impact of using objects. The single most important fact to know about
the garbage collector is that it divides objects into three
"generations": 0, 1, and 2. Every object starts out in generation 0; if
it survives (if at least one reference is maintained) long enough, it
goes to 1; much later, it transitions to 2. The cost of collecting an
object increases with each generation. For this reason, it's important
to avoid creating unnecessary objects, and to destroy each reference as
quickly as possible. The objects that are left will often be long-lived
and won't be destroyed until application shutdown.
Lazy Instantiation/Initialization
The
Singleton design pattern is often used to provide a single global
instance of a class. Sometimes it's the case that a particular singleton
won't be needed during an application run. It's generally good practice
to delay the creation of any object until it's needed, unless there's a
specific need to the contrary - for instance, to pre-cache
slow-initializing objects such as database connections. The
"double-checked locking" pattern is useful in these situations, as a way
to avoid synchronization and still ensure that a needed action is only
performed once. Lazy initialization is a technique that can enhance the
performance of an entire application through object reduction.
Avoiding Use of Class Destructors
Class
destructors (implemented as the Finalize() method in VB.NET) cause
extra overhead for the garbage collector, because it must track which
objects have been finalized before their memory can be reclaimed. I've
never had a need for finalizers in a purely managed application.
Casting and Boxing/Unboxing Overhead
Casting
is the dynamic conversion of a type at runtime to another, and boxing
is the creation of a reference wrapper for a value type (unboxing being
the conversion back to the wrapped value type). The overhead of both is
most heavily felt in collections classes, as they all - with the
exception of certain specialized ones like StringDictionary - store each
value as an Object. For instance, when you store an Int32 in an
ArrayList, it is first boxed (wrapped in an object) when it is inserted;
each time the value is read, it is unboxed before it is returned to the
calling code.
This
will be fixed in the next version of .NET with the introduction of
generics, but for now you can avoid it by creating strongly typed
collections and by typing variables and parameters as strongly as
possible. If you're unsure about whether or not boxing/unboxing is
taking place, you can check the IL of your code for appearances of the
box and unbox keywords.
Trusting the Garbage Collector
Programmers
new to .NET sometimes worry about memory allocation to the point that
they explicitly invoke System.GC.Collect(). Garbage collection is a
fairly expensive process, and it usually works best when left to its own
devices. The .NET garbage collection scheme can intentionally delay
reclamation of objects until memory is available, and in particular
longer-lived objects (those that make it to generation 1 or 2) may not
be reclaimed for an extended period. Even a simple "Hello, world!"
console application may allocate 15 MB or more of memory for its
"working set." My advice: don't call GC.Collect() unless you really know
what you're doing.
Properties, Methods, and Delegates
Avoiding Overuse of Property Getters and Setters
Most
people don't realize that property getters and setters are similar to
methods when it comes to overhead; it's mainly syntax that
differentiates them. A non-virtual property getter or setter that
contains no instructions other than the field access will be inlined by
the compiler, but in many other cases, this isn't possible. You should
carefully consider your use of properties; from inside a class, access
fields directly (if possible), and never blindly call properties
repeatedly without storing the value in a variable. All that said, this
doesn't mean that you should use public fields! Example 1 demonstrates
the performance of properties and field access in several common
situations.
About Delegates
Delegates
are slower to execute than interface methods. Delegates are often used
to introduce a level of indirection in code, but in almost all cases
interfaces allow a cleaner design. Of course, it's impossible to
completely shun delegates; the entire event-handling paradigm in .NET is
based on them. Example 2 compares the performance of delegates and
direct method calls.
Minimizing Method Calls
The
.NET compiler is capable of performing many optimizations for release
builds. One of them is called "method inlining." If method A calls
method B and certain other conditions are met, such as the code in
method B being small enough, the code from B will be copied into A
during compilation. However, .NET won't or can't inline certain types of
methods, such as virtual methods or methods over a certain size. Each
method invocation/property access entails significant overhead, such as
the allocation of a stack frame, etc. Of course, you should never
repeatedly call a method for the same result on purpose, but you should
also be mindful of the impact of method calls in general.
No comments:
Post a Comment