It's very common to split a string into lines. You can write something like that:
C#
var str = "Nickname: meziantou\r\nName: Gérald Barré";
var separators = new [] { '\r', '\n' };
var lines = str.Split(separators, StringSplitOptions.RemoveEmptyEntries);
foreach (var line in lines)
{
// - Allocate 1 string per line + 1 array
// - The Split method may allocate
}
This code creates one string per line and allocates one array, resulting in many allocations. It also splits the entire string upfront, even if you only need a few lines. You can also use a StringReader:
C#
var reader = new StringReader(str);
string line;
while ((line = reader.ReadLine()) != null)
{
// - Allocate 1 string per line
// - The ReadLine method may allocate
}
But this still allocates one StringReader and one string per line.
In modern .NET, you can take advantage of ReadOnlySpan<char> to avoid allocations when working with strings. The goal is to create a SplitLines method that allocates nothing and can be used this way:
C#
foreach (ReadOnlySpan<char> line in str.SplitLines())
{
// whatever, but without any allocation 😊
}
Before showing the code, here are some important C# / .NET concepts to understand.
##Making code compatible with the foreach operator
The foreach statement is not limited to IEnumerable<T> and can be applied to an instance of any type that satisfies the following conditions:
- has the public parameterless
GetEnumerator method whose return type is either class, struct, or interface type, - the return type of the
GetEnumerator method has the public Current property and the public parameterless MoveNext method whose return type is Boolean.
C#
var enumerable = new List<int> { 1, 2, 3 };
foreach(var item in enumerable)
{
Console.WriteLine(item);
}
// The foreach operator is rewritten by the compiler as the following code
var enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
{
var item = enumerator.Current;
Console.WriteLine(item);
}
This means the GetEnumerator method can return a struct that has a bool MoveNext() method and a Current property. This avoids one allocation, since the result does not need to be cast to IEnumerator<T>.
##Span<T> and ReadOnlySpan<T>
These types provide a type-safe and memory-safe representation of a contiguous region of arbitrary memory. They can represent any array-like type (array, string, ArraySegment<T>) and benefit from the same optimizations as arrays, making them fast. Spans are useful for slicing array-like objects. For strings, slicing with a Span is similar to Substring, but without any allocation since Span<T> is a value type (struct).
C#
ReadOnlySpan<int> array = new int[]{1,2,3,4,5,6};
ReadOnlySpan<int> slice = array[1..^1]; // [2,3,4,5]
ReadOnlySpan<char> str = "meziantou";
ReadOnlySpan<char> substring = str[0..3]; // mez
In our case, ReadOnlySpan<char> lets us slice the string line by line using the Slice method. Unlike Substring, it does not create a new string on the heap, so there is no allocation.
##ref struct
Span<T> and ReadOnlySpan<T> are ref struct types. Instances of a ref struct are allocated on the stack and cannot escape to the managed heap. The compiler enforces this by reporting an error when you attempt an operation that would allocate the struct on the heap. For instance:
A ref struct can't be a declared type of a field of a class or a non-ref struct
👉🏼 This means the enumerator that we'll implement needs to be a ref struct to contains the ReadOnlySpan<char>
A ref struct can't implement interfaces
👉🏼 Hopefully, you don't need to implement an interface to be compatible with the foreach operator
A ref struct can't be boxed to System.ValueType or System.Object
👉🏼 The foreach operator doesn't cast the value of the iterator, so this is ok
Note that there are other limitations explained in the documentation but they don't apply to the code of this post.
#SplitLines implementation
Now that you understand the foreach operator and ref structs, let's see the code:
C#
using System;
public static class StringExtensions
{
public static LineSplitEnumerator SplitLines(this string str)
{
// LineSplitEnumerator is a struct so there is no allocation here
return new LineSplitEnumerator(str.AsSpan());
}
// Must be a ref struct as it contains a ReadOnlySpan<char>
public ref struct LineSplitEnumerator
{
private ReadOnlySpan<char> _str;
public LineSplitEnumerator(ReadOnlySpan<char> str)
{
_str = str;
Current = default;
}
// Needed to be compatible with the foreach operator
public LineSplitEnumerator GetEnumerator() => this;
public bool MoveNext()
{
var span = _str;
if (span.Length == 0) // Reach the end of the string
return false;
var index = span.IndexOfAny('\r', '\n');
if (index == -1) // The string is composed of only one line
{
_str = ReadOnlySpan<char>.Empty; // The remaining string is an empty string
Current = new LineSplitEntry(span, ReadOnlySpan<char>.Empty);
return true;
}
if (index < span.Length - 1 && span[index] == '\r')
{
// Try to consume the '\n' associated to the '\r'
var next = span[index + 1];
if (next == '\n')
{
Current = new LineSplitEntry(span.Slice(0, index), span.Slice(index, 2));
_str = span.Slice(index + 2);
return true;
}
}
Current = new LineSplitEntry(span.Slice(0, index), span.Slice(index, 1));
_str = span.Slice(index + 1);
return true;
}
public LineSplitEntry Current { get; private set; }
}
public readonly ref struct LineSplitEntry
{
public LineSplitEntry(ReadOnlySpan<char> line, ReadOnlySpan<char> separator)
{
Line = line;
Separator = separator;
}
public ReadOnlySpan<char> Line { get; }
public ReadOnlySpan<char> Separator { get; }
// This method allow to deconstruct the type, so you can write any of the following code
// foreach (var entry in str.SplitLines()) { _ = entry.Line; }
// foreach (var (line, endOfLine) in str.SplitLines()) { _ = line; }
// https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/functional/deconstruct?WT.mc_id=DT-MVP-5003978#deconstructing-user-defined-types
public void Deconstruct(out ReadOnlySpan<char> line, out ReadOnlySpan<char> separator)
{
line = Line;
separator = Separator;
}
// This method allow to implicitly cast the type into a ReadOnlySpan<char>, so you can write the following code
// foreach (ReadOnlySpan<char> entry in str.SplitLines())
public static implicit operator ReadOnlySpan<char>(LineSplitEntry entry) => entry.Line;
}
}
Here are some usage examples of the SplitLines() method:
C#
var str = "Nickname: meziantou\r\nName: Gérald Barré";
foreach (ReadOnlySpan<char> line in str.SplitLines())
{
Console.WriteLine(line.ToString());
}
foreach (var (line, endOfLine) in str.SplitLines())
{
Console.Write(line.ToString());
Console.Write(endOfLine.ToString());
}
foreach (var lineEntry in str.SplitLines())
{
Console.Write(lineEntry.Line.ToString());
Console.Write(lineEntry.Sparator.ToString());
}
#Benchmark
Let's create a quick benchmark with BenchmarkDotNet:
C#
[MemoryDiagnoser]
public class SplitLinesBenchmark
{
private const string Data = "Nickname: meziantou\r\nFirstName: Gérald\nLastName: Barré";
private readonly char[] Separators = new[] { '\r', '\n' };
[Benchmark]
public void StringReader()
{
var reader = new StringReader(Data);
string line;
while ((line = reader.ReadLine()) != null)
{
}
}
[Benchmark]
public void Split()
{
foreach (var line in Data.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
{
}
}
[Benchmark]
public void Split_Preallocated()
{
foreach (var line in Data.Split(Separators, StringSplitOptions.RemoveEmptyEntries))
{
}
}
[Benchmark]
public void Span()
{
foreach (ReadOnlySpan<char> item in Data.SplitLines())
{
}
}
}
public class Program
{
public static void Main(string[] args) => BenchmarkRunner.Run<SplitLinesBenchmark>();
}
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22621
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.5.22307.18
[Host] : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
DefaultJob : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|---|
| StringReader | 37.51 ns | 0.639 ns | 0.598 ns | 0.0124 | 208 B |
| Split | 73.75 ns | 0.854 ns | 0.799 ns | 0.0186 | 312 B |
| Split_Preallocated | 69.16 ns | 1.231 ns | 1.152 ns | 0.0167 | 280 B |
| Span | 35.87 ns | 0.734 ns | 0.785 ns | - | - |
The code is faster and doesn't allocate at all 😃
#Additional references
Do you have a question or a suggestion about this post? Contact me!