Releases: eBay/tsv-utils
tsv-summarize bug fixes
tsv-summarize
- Newest operators were not hooked up correctly to command line args.
- Correction to bash-completion
tsv-summarize: Missing field support
Two enhancements:
- tsv-summarize: Missing value support. New command line options to either exclude or replace empty fields. This is a common pattern in some data sets. Also, some new operators related to missing values.
- Bash-completion: Definitions enabling command option completion in
bash
shells. Needs to be manually installed. Details in the Tips & Tricks document.
Improved numeric printing in tsv-summarize
Fixed some edge cases where printing was being done using exponential notation where not intended.
tsv-summarize: numeric formating; Doc updates
- tsv-summarize: Improved formatting of numeric values, especially when using the
--p|float-precision
option. - Reorganization of the documentation.
tsv-sample: Weighted reservoir sampling
New tool, tsv-sample, does sampling and randomization of data file lines. Both uniform and weighted random sampling is supported. Weighted sampling gets weights from a field in the input data. Implemented using reservoir sampling.
Copyright notice updates for 2017
v1.0.12 Copyright notice date updates. (#18)
tsv-filter and tsv-summarize updates
- tsv-filter: New tests
--is-numeric
,--is-finite
,--is-nan
,is-infinity
. These are useful to ensure a numeric test like--gt
(greater than) are run only on field values with validly formatted numbers. - tsv-summarize: Take advantage the faster
topN
in DMD/Phobos version 2.073.
First release of tsv-append
tsv-append
concatenates multiple TSV files, similar to the Unix cat
utility. It is header aware, writing the header from only the first file. It also supports source tracking, adding a column indicating the original file to each row.
Concatenation with header support is useful when preparing data for traditional Unix utilities like sort and sed or applications that read a single file.
Source tracking is useful when creating long/narrow form tabular data. This format is used by many statistics and data mining packages.
Better command options: --help/help-verbose; --H|header
Minor improvements to command line arguments:
- Use --help/help-verbose rather than --help/help-brief. The short version (-h | --help) is generally more useful than the long form.
- Add
-H
as a short form of--header
Fix DUB build error
v1.0.8 Fix DUB build error. Add common directory to tsv-filter build. Fix DM…