Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra record is read when file has line ending in the end #21

Open
joelverhagen opened this issue Jan 12, 2024 · 1 comment
Open

Extra record is read when file has line ending in the end #21

joelverhagen opened this issue Jan 12, 2024 · 1 comment
Assignees

Comments

@joelverhagen
Copy link

It is common to have a new line (\n or \r\n) at the end of a text file following the last line (e.g. in Posix as shared on Stack Overflow). Generally this is not seen as the separator for a new CSV record. For the CSV records in my benchmark (https://github.com/joelverhagen/NCsvPerf), all of the parsers I've tested so far have this property of not yielding an empty record at the end.

Repro of what I am talking about:

using System.Text;
using Addax.Formats.Tabular;

var lines = new[] { "a,b,c", "1,2,3", "x,y,z" };
var file = string.Join("\r\n", lines) + "\r\n"; // line ending at the end
var dialect = new TabularDialect("\r\n", ',', '\"');
var stream = new MemoryStream(Encoding.UTF8.GetBytes(file));
using (var reader = new TabularReader(stream, dialect))
{
    while (reader.TryPickRecord())
    {
        Console.WriteLine("Record:");
        while (reader.TryReadField())
        {
            Console.Write("  Field: ");
            if (reader.TryGetString(out var value))
            {
                Console.WriteLine(value);
            }
            else
            {
                Console.WriteLine("(no value)");
            }
        }
    }
}

Actual output:

Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z
Record:
  Field:

Expected output:

Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z

I think this can be easily worked around by detecting a single empty string field on a line when more fields are expected, which is what I will do for my benchmark which will include Addax.

Nice work on the library! Thanks!

joelverhagen added a commit to joelverhagen/NCsvPerf that referenced this issue Jan 12, 2024
@alexanderkozlenko alexanderkozlenko self-assigned this Jan 12, 2024
@alexanderkozlenko
Copy link
Owner

That's a valid case about the line ending in the end. The library provides two types of readers, and the intention behind this API design is to give developers flexibility. TabularReader is a low-level API that exposes the file structure exactly as it is, including all line endings and comments, which may be critical in some use cases. TabularReader<T> is a high-level API that focuses on consuming records in a structured and user-friendly way, ignoring empty lines and the line ending in the end of a file. If we adjust the example to use the latter, we observe the desired behavior:

using (var reader = new TabularReader<MyRecord>(stream, dialect))
{
    while (reader.TryReadRecord())
    {
        Console.WriteLine("Record:");
        Console.WriteLine("  Field: {0}", reader.CurrentRecord.Field0);
        Console.WriteLine("  Field: {0}", reader.CurrentRecord.Field1);
        Console.WriteLine("  Field: {0}", reader.CurrentRecord.Field2);
    }
}

[TabularRecord]
internal class MyRecord
{
    [TabularFieldOrder(0)]
    public string? Field0 { get; set; }
    [TabularFieldOrder(1)]
    public string? Field1 { get; set; }
    [TabularFieldOrder(2)]
    public string? Field2 { get; set; }
}
Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z

In some scenarios, such as the benchmark project, it may require additional handling of the trailing line ending. However, unless this behavior proves to be a significant blocker for adoption, I would like to keep the current API shape to aligns with the initial library's goals.

Thank you for including the library in the benchmark, I appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants