Command line CSV processing tool based on csvkit. But faster and less memory intensive. And written in Go.
To install on Apple OS X, open a Terminal window and run
/bin/bash <(curl -s https://raw.githubusercontent.com/DataFoxCo/gocsv/latest/scripts/install-latest-darwin-amd64.sh)
The tool is built for pipelining, so most commands accept a CSV from standard output and output to standard output.
Subcommands:
- describe - Get basic information about a CSV.
- dimensions (alias:
dims
) - Get the dimensions of a CSV. - clean - Clean a CSV of common formatting issues.
- delimiter (alias:
delim
) - Change the delimiter being used for a CSV. - tsv - Transform a CSV into a TSV.
- head - Extract the first N rows from a CSV.
- tail - Extract the last N rows from a CSV.
- headers - View the headers from a CSV.
- view - Display a CSV in a pretty tabular format.
- stats - Get some basic statistics on a CSV.
- rename - Rename the headers of a CSV.
- behead - Remove header row(s) from a CSV.
- autoincrement (alias:
autoinc
) - Add a column of incrementing integers to a CSV. - stack - Stack multiple CSVs into one CSV.
- split - Split a CSV into multiple files.
- sort - Sort a CSV based on one or more columns.
- filter - Extract rows whose column match some criterion.
- replace - Replace values in cells by regular expression.
- select - Extract specified columns.
- sample - Sample rows.
- unique (alias:
uniq
) - Extract unique rows based upon certain columns. - join - Join two CSVs based on equality of elements in a column.
- xlsx - Convert sheets of a XLSX file to CSV.
Get basic information about a CSV. This will output the number of rows and columns in the CSV, the column headers in the CSV, and the inferred type of each column.
Usage
gocsv describe FILE
Alias: dims
Get the dimensions of a CSV.
Usage
gocsv dimensions FILE
Clean a CSV of common formatting issues. Currently this consists of making sure all rows are the same length (padding short rows and trimming long ones) and removing empty rows at the end.
Usage:
gocsv clean [--verbose] [--no-trim] [--excel] [--numbers] FILE
Arguments:
--verbose
(optional) Print out to stderr when cleaning the CSV.--no-trim
(optional) Do not remove trailing rows that are empty.--excel
(optional) Clean the CSV for issues that will cause problems with Excel. See Excel specifications and limitations.- Truncate any cells that exceed the maximum character limit of 32767.
--numbers
(optional) Clean the CSV for issues that will cause problems with Numbers.- Truncate the number of rows in the CSV at 65535, the maximum amount of rows that Numbers displays.
Alias: delim
Change the delimiter being used for a CSV.
Usage:
gocsv delim [--input INPUT_DELIMITER] [--output OUTPUT_DELIMITER] FILE
Arguments:
--input
(shorthand-i
, optional) The delimiter used in the input. Defaults to,
.--output
(shorthand-o
, optional) The delimiter used in the output. Defaults to,
.
Transform a CSV into a TSV. It is shortand for gocsv delim -o "\t" FILE
. This can very useful if you want to pipe the result to pbcopy
(OS X) in order to paste it into a spreadsheet tool.
Usage:
gocsv tsv FILE
Extract the first N rows from a CSV.
Usage:
gocsv head [-n N] FILE
Arguments:
-n
(optional) The number of rows to extract. IfN
is an integer, it will extract the first N rows. IfN
is prepended with+
, it will extract all except the last N rows.
Extract the last N rows from a CSV.
Usage:
gocsv tail [-n N] FILE
Arguments:
-n
(optional) The number of rows to extract. IfN
is an integer, it will extract the last N rows. IfN
is prepended with+
, it will extract all except the first N rows.
View the headers of a CSV along with the index of each header.
Usage:
gocsv headers FILE
Display a CSV in a pretty tabular format.
Usage:
gocsv view [-n N] [--max-width N] FILE
Arguments:
-n
(optional) Display only the first N rows of the CSV.--max-width
(optional, default 20, shorthand-w
) The maximum width of each cell for display. If a cell exceeds the maximum width, it will be truncated in the display.
If the length of a cell exceeds --max-width
it will be truncated with an ellipsis. If a cell contains a new-line character, only the first line will be displayed.
Get some basic statistics on a CSV.
Usage:
gocsv stats FILE
Rename the headers of a CSV.
Usage:
gocsv rename --columns COLUMNS --names NAMES FILE
Arguments:
--columns
(shorthand-c
) A comma-separated list of the columns to rename. See Specifying Columns for more details.--names
A comma-separated list of names to change each column to. This must be the same length as and match the order of thecolumns
argument.
Remove the header from a CSV
Usage:
gocsv behead [-n N] FILE
Arguments:
-n
(optional) Number of header rows to remove. Defaults to 1.
Alias: autoinc
Append (or prepend) a column of incrementing integers to each row. This can be helpful to be able to map back to the original row after a number of transformations.
Usage:
gocsv autoincrement [--prepend] [--name NAME] [--seed SEED] FILE
Arguments:
--prepend
(optional) Prepend the new column rather than the default append.--name
(optional) Specify a name for the autoincrementing column. Defaults toID
.--seed
(optional) Specify the integer to begin incrementing from. Defaults to1
.
Stack multiple CSVs to create a larger CSV. Optionally include an indication of which file a row came from in the final CSV.
Usage:
gocsv stack [--filenames] [--groups GROUPS] [--group-name GROUP_NAME] FILE [FILES]
Arguments:
--filenames
(optional) Use the names of each file as the group variable. By default the column will be named "File".--groups
(optional) Comma-separated list to use as the names of the groups for each row. There must be as many groups as there are files. By default the column will be named "Group".--group-name
(optional) Name of the grouping column in the final CSV.
Note that --groups
and --filenames
are mutually exclusive.
Also note that the stack
subcommand does not support piping from standard input.
Split a CSV into multiple files.
Usage:
gocsv split --max-rows N [--filename-base FILENAME] FILE
Arguments:
--max-rows
Maximum number of rows per final CSV.--filename-base
(optional) Prefix of the resulting files. The file outputs will be appended with"-1.csv"
,"-2.csv"
, etc. If not specified, the base filename will be the same as the base of the input filename, unless the input is specified by standard input. If so, then the base filename will beout
.
Sort a CSV by multiple columns, with or without type inference. The currently supported types are float, int, date, and string.
Usage:
gocsv sort --columns COLUMNS [--reverse] [--no-inference] FILE
Arguments:
--columns
(shorthand-c
) A comma-separated list (in order) of the columns to sort against. See Specifying Columns for more details.--reverse
(optional) Reverse the order of sorting. By default the sort order is ascending.--no-inference
(optional) Skip type inference when sorting.
Filter a CSV by rows whose columns match some criterion.
Usage:
gocsv filter [--columns COLUMNS] [--regex REGEX] [--gt N] [--gte N] [--lt N] [--lte N] [--exclude] FILE
Arguments:
--columns
(optional, shorthand-c
) A comma-separated list of the columns to filter against. If no columns are specified, then filter checks every column on a row. If a row matches on any of the columns, the row is considered a match. See Specifying Columns for more details.--regex
(optional) Regular expression to use to match against.--case-insensitive
(optional, shorthand-i
) When using the--regex
flag, use this flag to specify a case insensitive match rather than the default case sensitive match.--gt
,--gte
,--lt
,--lte
(optional) Compare against a number.--exclude
(optional) Exclude rows that match. Default is to include.
Note that one of --regex
, --gt
, --gte
, --lt
, or --lte
must be specified.
Replace values in cells by regular expression.
Usage:
gocsv replace [--columns COLUMNS] --regex REGEX --repl REPLACEMENT FILE
Arguments:
--columns
(optional, shorthand-c
) A comma-separated list of the columns to run replacements on. If no columns are specified, then replace runs the replacement operation on cells in every column. See Specifying Columns for more details.--regex
Regular expression to use to match against for replacement.--case-insensitive
(optional, shorthand-i
) Use this flag to specify a case insensitive match for replacement rather than the default case sensitive match.--repl
String to use for replacement.
Note that if you have a capture group in the --regex
argument, you can use expand the replacement using, for example "\$1"
.
Select (or exclude) columns from a CSV
Usage:
gocsv select --columns COLUMNS [--exclude] FILE
Arguments:
--columns
(shorthand-c
) A comma-separated list (in order) of the columns to select. If you want to select a column multiple times, you can! See Specifying Columns for more details.--exclude
(optional) Exclude the specified columns (default is to include).
Alias: uniq
Extract unique rows based upon certain columns.
Usage:
gocsv unique [--columns COLUMNS] [--sorted] [--count] FILE
Arguments
--columns
(optional, shorthand-c
) A comma-separated list (in order) of the columns to use to define uniqueness. If no columns are specified, it will perform uniqueness across the entire row. See Specifying Columns for more details.--sorted
(optional) Specify whether the input is sorted. If the input is sorted, the unique subcommand will run more efficiently.--count
(optional) Append a column with the header "Count" to keep track of how many times that unique row occurred in the input.
Sample rows from a CSV
Usage
gocsv sample -n NUM_ROWS [--replace] [--seed SEED] FILE
Arguments:
-n
The number of rows to sample.--replace
(optional) Whether to sample with replacement. Defaults tofalse
.--seed
(optional) Integer seed to use for generating pseudorandom numbers for sampling.
Join two CSVs using an inner (default), left, right, or outer join.
Usage:
gocsv join --columns COLUMNS [--left] [--right] [--outer] LEFT_FILE RIGHT_FILE
Arguments:
--columns
(shorthand-c
) A comma-separated list (in order) of the columns to use for joining. You must specify either 1 or 2 columns. When 1 is specified, it will join the CSVs using that column in both the left and right CSV. When 2 are specified, it will join using the first column on the left CSV and the second column on the right CSV. See Specifying Columns for more details.--left
(optional) Perform a left join.--right
(optional) Perform a right join.--outer
(optional) Perform an outer join.
Note that by default it will perform an inner join. It will exit if you specify multiple types of join.
Convert sheets of a XLSX file to CSV.
Usage:
gocsv xlsx [--list-sheets] [--dirname DIRNAME] [--sheet SHEET] FILE
Arguments:
--list-sheets
(optional) List the sheets in the XLSX file.--sheet
(optional) Specify the sheet (by index or name) of the sheet to convert.--dirname
(optional) Name of directory to output CSV conversions of sheets fromFILE
. If this is not specified, the command will output the CSV files to a directory with the same name asFILE
(without the.xlsx
extension).
By default the xlsx
subcommand will convert all the sheets in FILE
to CSVs to a directory with the same name as FILE
.
When specifying a column on the command line, you can specify either the index or the name of the column. The tool will always try to interpret the column first by index and then by name. The tool uses 1-based indexing (as in the output of the headers subcommand). When specifying the name, it will use the first column that is an exact case-sensitive match.
Because all of the subcommands (other than stack) support receiving a CSV from standard input, you can easily pipeline:
cat test-files/left-table.csv \
| gocsv join --left --columns LID,RID test-files/right-table.csv \
| gocsv filter --columns XYZ --regex "[ev]e-\d$" \
| gocsv select --columns LID,XYZ \
| gocsv sort --columns LID,XYZ
Subcommand | Input | Output |
---|---|---|
describe | ✔ | N/A |
dimensions | ✔ | N/A |
clean | ✔ | ✔ |
tsv | ✔ | ✔ |
delimiter | ✔ | ✔ |
head | ✔ | ✔ |
tail | ✔ | ✔ |
headers | ✔ | N/A |
view | ✔ | N/A |
stats | ✔ | N/A |
rename | ✔ | ✔ |
behead | ✔ | ✔ |
autoincrement | ✔ | ✔ |
stack | Coming Soon | ✔ |
split | ✔ | N/A |
sort | ✔ | ✔ |
filter | ✔ | ✔ |
replace | ✔ | ✔ |
select | ✔ | ✔ |
sample | ✔ | ✔ |
unique | ✔ | ✔ |
join | ✔ | ✔ |
xlsx | N/A | * |
* xlsx
sends output to standard out when using the --sheet
flag.
gocsv tsv test-files/left-table.csv | pbcopy
gocsv select --columns 2,1 test-files/left-table.csv
gocsv select --columns 1,1,2,2 test-files/left-table.csv
gocsv join --left --columns LID,RID test-files/left-table.csv test-files/right-table.csv
gocsv select --columns LID test-files/left-table.csv | gocsv behead | sort | uniq
gocsv select --columns LID test-files/left-table.csv | gocsv behead | sort | uniq -c | sort -nr
gocsv filter --columns ABC --regex "-1$" test-files/left-table.csv
gocsv replace --columns ABC --regex "^(.*)-(\d)$" -i --repl "\$2-\$1" test-files/left-table.csv
gocsv sort --columns LID,ABC --reverse test-files/left-table.csv
gocsv stack --groups "Primer Archivo,Segundo Archivo,Tercer Archivo" --group-name "Orden de Archivo" test-files/stack-1.csv test-files/stack-2.csv test-files/stack-3.csv
For the latest pre-built binaries, see the Latest Release page.
Open a Terminal window and paste the following command:
/bin/bash <(curl -s https://raw.githubusercontent.com/DataFoxCo/gocsv/latest/scripts/install-latest-darwin-amd64.sh)
This will install gocsv
at /usr/local/bin/gocsv
.
To install the pre-built binary for Apple OS X, download the gocsv-darwin-amd64.zip
file. It should download into your ~/Downloads
directory. To install it, open a Terminal window and do the following:
cd ~/Downloads
unzip gocsv-darwin-amd64.zip
mv gocsv-darwin-amd64/gocsv /usr/local/bin
rmdir gocsv-darwin-amd64
To verify that it has installed, open a new Terminal window and run
gocsv help
You should see the gocsv
help message.
Installing the pre-built binary for Linux is very similar to installing the binary for Apple OS X. First, download gocsv-linux-amd64.zip
. Assuming this downloads to your ~/Downloads
directory, open a Terminal window and run the following commands:
cd ~/Downloads
unzip gocsv-linux-amd64.zip
mv gocsv-linux-amd64/gocsv /usr/local/bin
rmdir gocsv-linux-amd64
To verify that it has installed, open a new Terminal window and run
gocsv help
You should see the gocsv
help message.
Download gocsv-windows-amd64.zip
. Then good luck.