You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is common to have a new line (\n or \r\n) at the end of a text file following the last line (e.g. in Posix as shared on Stack Overflow). Generally this is not seen as the separator for a new CSV record. For the CSV records in my benchmark (https://github.com/joelverhagen/NCsvPerf), all of the parsers I've tested so far have this property of not yielding an empty record at the end.
Repro of what I am talking about:
usingSystem.Text;usingAddax.Formats.Tabular;varlines=new[]{"a,b,c","1,2,3","x,y,z"};varfile=string.Join("\r\n",lines)+"\r\n";// line ending at the endvardialect=newTabularDialect("\r\n",',','\"');varstream=newMemoryStream(Encoding.UTF8.GetBytes(file));using(varreader=newTabularReader(stream,dialect)){while(reader.TryPickRecord()){Console.WriteLine("Record:");while(reader.TryReadField()){Console.Write(" Field: ");if(reader.TryGetString(outvarvalue)){Console.WriteLine(value);}else{Console.WriteLine("(no value)");}}}}
Actual output:
Record:
Field: a
Field: b
Field: c
Record:
Field: 1
Field: 2
Field: 3
Record:
Field: x
Field: y
Field: z
Record:
Field:
Expected output:
Record:
Field: a
Field: b
Field: c
Record:
Field: 1
Field: 2
Field: 3
Record:
Field: x
Field: y
Field: z
I think this can be easily worked around by detecting a single empty string field on a line when more fields are expected, which is what I will do for my benchmark which will include Addax.
Nice work on the library! Thanks!
The text was updated successfully, but these errors were encountered:
That's a valid case about the line ending in the end. The library provides two types of readers, and the intention behind this API design is to give developers flexibility. TabularReader is a low-level API that exposes the file structure exactly as it is, including all line endings and comments, which may be critical in some use cases. TabularReader<T> is a high-level API that focuses on consuming records in a structured and user-friendly way, ignoring empty lines and the line ending in the end of a file. If we adjust the example to use the latter, we observe the desired behavior:
Record:
Field: a
Field: b
Field: c
Record:
Field: 1
Field: 2
Field: 3
Record:
Field: x
Field: y
Field: z
In some scenarios, such as the benchmark project, it may require additional handling of the trailing line ending. However, unless this behavior proves to be a significant blocker for adoption, I would like to keep the current API shape to aligns with the initial library's goals.
Thank you for including the library in the benchmark, I appreciate it!
It is common to have a new line (
\n
or\r\n
) at the end of a text file following the last line (e.g. in Posix as shared on Stack Overflow). Generally this is not seen as the separator for a new CSV record. For the CSV records in my benchmark (https://github.com/joelverhagen/NCsvPerf), all of the parsers I've tested so far have this property of not yielding an empty record at the end.Repro of what I am talking about:
Actual output:
Expected output:
I think this can be easily worked around by detecting a single empty string field on a line when more fields are expected, which is what I will do for my benchmark which will include Addax.
Nice work on the library! Thanks!
The text was updated successfully, but these errors were encountered: