Comparing implementations with BenchmarkDotnet

 
 
  • Gérald Barré

Sometimes you want to improve the performance of a function, so you need to compare different implementations to find the fastest in terms of time, memory, or both. You could create a console application and use a Stopwatch to measure each variant. But how can you easily compare behavior across x64 and x86, or across different runtimes? Are the executions properly isolated? And why is one implementation faster than another?

To help you in this task, you can use BenchmarkDotNet, a powerful .NET library for benchmarking.

Let's test BenchmarkDotNet with a simple function that converts a byte array to a hexadecimal string. We'll use 4 implementations from StackOverflow:

  1. The basic implementation, often found on StackOverflow

    C#
    public string ToHexWithStringBuilder(byte[] bytes)
    {
        var hex = new StringBuilder(bytes.Length * 2);
        foreach (byte b in bytes)
            hex.Append(b.ToString("X2"));
        return hex.ToString();
    }
  2. Another implementation using BitConverter, slightly shorter

    C#
    public string ToHexWithBitConverter(byte[] bytes)
    {
        var hex = BitConverter.ToString(bytes);
        return hex.Replace("-", "");
    }
  3. Another implementation with bit operations

    C#
    public string ToHexWithLookupAndShift(byte[] bytes)
    {
        const string hexAlphabet = "0123456789ABCDEF";
        var result = new StringBuilder(bytes.Length * 2);
        foreach (byte b in bytes)
        {
            result.Append(hexAlphabet[b >> 4]);
            result.Append(hexAlphabet[b & 0xF]);
        }
        return result.ToString();
    }
  4. The last one is trickier, but it works.

    C#
    public string ToHexWithByteManipulation(byte[] bytes)
    {
        var c = new char[bytes.Length * 2];
        int b;
        for (int i = 0; i < bytes.Length; i++)
        {
            b = bytes[i] >> 4;
            c[i * 2] = (char)(55 + b + (((b - 10) >> 31) & -7));
            b = bytes[i] & 0xF;
            c[i * 2 + 1] = (char)(55 + b + (((b - 10) >> 31) & -7));
        }
        return new string(c);
    }

#Using BenchmarkDotNet to compare the 4 implementations

First, create a console application. Add the following NuGet packages:

  • BenchmarkDotNet
  • BenchmarkDotNet.Diagnostics.Windows: provides additional data about runs
Shell
dotnet add package BenchmarkDotNet
dotnet add package BenchmarkDotNet.Diagnostics.Windows

Then, create a class containing the code to benchmark, with one method per implementation. Each method must be decorated with the [Benchmark] attribute. To test across different array sizes, BenchmarkDotNet provides the [Params] attribute. Here is how it looks:

C#
[OrderProvider(SummaryOrderPolicy.FastestToSlowest)] // Order the result
[RyuJitX64Job, LegacyJitX86Job] // Run with x64 and x86 runtimes
[MemoryDiagnoser] // Analyse the memory usage
public class ByteArrayToHexaBenchmark
{
    // Initialize the byte array for each run
    private byte[] _array;

    [Params(10, 1000, 10000)]
    public int Size { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _array = Enumerable.Range(0, Size).Select(i => (byte)i).ToArray();
    }

    // Code to benchmark
    [Benchmark(Baseline = true)]
    public string ToHexWithStringBuilder() => ToHexWithStringBuilder(_array);

    [Benchmark]
    public string ToHexWithBitConverter() => ToHexWithBitConverter(_array);

    [Benchmark]
    public string ToHexWithLookupAndShift() => ToHexWithLookupAndShift(_array);

    [Benchmark]
    public string ToHexWithByteManipulation() => ToHexWithByteManipulation(_array);

    // Actual implementations
    // code omitted for brevity... copy from above
}

Then, you run the benchmark:

C#
public class Program
{
    public static void Main()
    {
        BenchmarkRunner.Run<ByteArrayToHexaBenchmark>();
    }
}

Now, you can run the application in release configuration to get the result:

BenchmarkDotNet resultsBenchmarkDotNet results

It's very easy to find the best implementation.

If you want to understand why a method behaves differently, you can use diagnosers. In the previous example, we used the [MemoryDiagnoser] attribute to measure memory usage per run. You can also use [InliningDiagnoser] to determine whether methods are inlined by the JIT. For more advanced data, [HardwareCounters] can surface metrics such as the number of branch mispredictions, giving you deep insight into your code's runtime behavior.

#Comparing multiple runtimes

First, add all desired frameworks to the csproj file:

csproj (MSBuild project file)
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net6.0;net5.0;net4.8</TargetFrameworks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="BenchmarkDotNet" Version="0.13.1" />
    <PackageReference Include="BenchmarkDotNet.Diagnostics.Windows" Version="0.13.1" />
  </ItemGroup>
</Project>

Then, add a job per framework to compare:

C#
[Config(typeof(CustomConfiguration))]
public class MyBenchmark
{
    private class CustomConfiguration : ManualConfig
    {
        public CustomConfiguration()
        {
            AddJob(Job.Default.WithRuntime(ClrRuntime.Net48));
            AddJob(Job.Default.WithRuntime(CoreRuntime.Core50));
            AddJob(Job.Default.WithRuntime(CoreRuntime.Core60));
        }
    }

    [Benchmark]
    public void Foo()
    {
        // Benchmark body
    }
}

#Comparing multiple runtime knobs

C#
[Config(typeof(CustomConfiguration))]
public class MyBenchmark
{
    private class CustomConfiguration : ManualConfig
    {
        public CustomConfiguration()
        {
            AddJob(Job.Default.WithId("Inlining enabled"));

            AddJob(Job.Default.WithId("Inlining disabled")
                .WithEnvironmentVariables(
                    new EnvironmentVariable("COMPlus_JitNoInline", "1")));

            AddJob(Job.Default.WithId("Dynamic PGO")
                .WithEnvironmentVariables(
                    new EnvironmentVariable("DOTNET_TieredPGO", "1"),
                    new EnvironmentVariable("DOTNET_TC_QuickJitForLoops", "1"),
                    new EnvironmentVariable("DOTNET_ReadyToRun", "0")));
        }
    }

    [Benchmark]
    public void Foo()
    {
        // Benchmark body
    }
}

#Conclusion

BenchmarkDotNet is very easy to set up, and gives you very accurate results in a few seconds. Thanks to the diagnosers, you can clearly understand how a function behaves at runtime, and take some actions to improve it. BenchmarkDotNet must be part of your toolbox.

Do you have a question or a suggestion about this post? Contact me!

Follow me:
Enjoy this blog?