Skip to content

Commit

Permalink
docs: Describe using --date-format and --datetime-format to disable t…
Browse files Browse the repository at this point in the history
…ype inference for dates and datetimes, closes #917
  • Loading branch information
jpmckinney committed Oct 17, 2023
1 parent 8628306 commit cbd68e3
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/common_arguments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ These arguments can be used to override csvkit's default "smart" parsing of CSV

For example, to disable CSV sniffing, set :code:`--snifflimit 0` and then, if necessary, set the :code:`--delimiter` and :code:`--quotechar` options yourself. Or, set :code:`--snifflimit -1` to use the entire file as the sample, instead of the first 1024 bytes.

To disable type inference, add the :code:`--no-inference` flag.
To disable type inference, add the :code:`--no-inference` flag. To prevent text values from being converted to dates or datetimes, set the :code:`--date-format` and/or :code:`--datetime-format` options to a non-occurring value, like ``-``.

The output of csvkit's tools is always formatted with "default" formatting options. This means that when executing multiple csvkit commands (either with a pipe or through intermediary files) it is only ever necessary to specify these arguments the first time (and doing so for subsequent commands will likely cause them to fail).

Expand Down
7 changes: 5 additions & 2 deletions docs/tricks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,13 +114,16 @@ Although these issues are annoying, in most cases, CSV sniffing Just Works™. D
CSV data interpretation
-----------------------

* Are the numbers ``1`` and ``0`` being interepted as ``True`` and ``False``?
* Are the numbers ``1`` and ``0`` being interpreted as ``True`` and ``False``?
* Are phone numbers changing to integers and losing their leading ``+`` or ``0``?
* Are text values incorrectly being converted to dates or datetimes?
* Is the Italian comune of "None" being treated as a null value?

These may be symptoms of csvkit's type inference being too aggressive for your data. CSV is a text format, but it may contain text representing numbers, dates, booleans or other types. csvkit attempts to reverse engineer that text into proper data types—a process called "type inference".

For some data, type inference can be error prone. If necessary you can disable it with the :code:`--no-inference` switch. This will force all columns to be treated as regular text.
For some data, type inference can be error prone. If necessary you can disable it with the :code:`--no-inference` option. This will force all columns to be treated as regular text.

To prevent values from being converted to dates or datetimes, set the :code:`--date-format` and/or :code:`--datetime-format` options to a non-occurring value, like ``-``.

Slow performance
----------------
Expand Down

0 comments on commit cbd68e3

Please sign in to comment.