Skip to content

Commit

Permalink
csvstat: Add a "Non-null values" statistic and --non-nulls option, #774
Browse files Browse the repository at this point in the history
  • Loading branch information
jpmckinney committed Oct 17, 2023
1 parent 54afbfc commit 85de9ea
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 7 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Unreleased
----------

* :doc:`/scripts/csvstat` reports a "Non-null values" statistic (or a :code:`nonnulls` column when :code:`--csv` is set).
* :doc:`/scripts/csvstat` adds a :code:`--non-nulls` option to only output counts of non-null values.
* feat: Add a :code:`--null-value` option to commands with the :code:`--blanks` option, to convert additional values to NULL.
* Add Python 3.12 support.

Expand Down Expand Up @@ -166,7 +168,7 @@ This is a minor release which fixes several bugs reported in the :code:`1.0.0` r

* :doc:`/scripts/csvstat` no longer crashes when a :code:`Number` column has :code:`None` as a frequent value. (#738)
* :doc:`/scripts/csvlook` documents that output tables are Markdown-compatible. (#734)
* :doc:`/scripts/csvstat` accepts a :code:`--csv` flag for tabular output. (#584)
* :doc:`/scripts/csvstat` adds a :code:`--csv` flag for tabular output. (#584)
* :doc:`/scripts/csvstat` output is easier to read. (#714)
* :doc:`/scripts/csvpy` has a better description when using the :code:`--agate` flag. (#729)
* Fix a Python 2.6 bug preventing :doc:`/scripts/csvjson` from parsing utf-8 files. (#732)
Expand Down
8 changes: 8 additions & 0 deletions csvkit/utilities/csvstat.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
'aggregation': agate.HasNulls,
'label': 'Contains null values: ',
}),
('nonnulls', {
'aggregation': agate.Count,
'label': 'Non-null values: ',
}),
('unique', {
'aggregation': None,
'label': 'Unique values: ',
Expand Down Expand Up @@ -79,6 +83,9 @@ def add_arguments(self):
self.argparser.add_argument(
'--nulls', dest='nulls_only', action='store_true',
help='Only output whether columns contains nulls.')
self.argparser.add_argument(
'--non-nulls', dest='nonnulls_only', action='store_true',
help='Only output counts of non-null values.')
self.argparser.add_argument(
'--unique', dest='unique_only', action='store_true',
help='Only output counts of unique values.')
Expand Down Expand Up @@ -351,6 +358,7 @@ def format_decimal(d, f='%.3f', no_grouping_separator=False):
return locale.format_string(f, d, grouping=not no_grouping_separator).rstrip('0').rstrip('.')


# These are accessed via: globals().get(f'get_{op_name}')
def get_type(table, column_id, **kwargs):
return f'{table.columns[column_id].data_type.__class__.__name__}'

Expand Down
12 changes: 6 additions & 6 deletions tests/test_utilities/test_csvstat.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,14 @@ def test_csv(self):
header = next(reader)

self.assertEqual(header[1], 'column_name')
self.assertEqual(header[4], 'unique')
self.assertEqual(header[5], 'unique')

row = next(reader)

self.assertEqual(row[1], 'state')
self.assertEqual(row[2], 'Text')
self.assertEqual(row[5], '')
self.assertEqual(row[11], '2')
self.assertEqual(row[6], '')
self.assertEqual(row[12], '2')

def test_csv_columns(self):
output = self.get_output_as_io(['--csv', '-c', '4', 'examples/realdata/ks_1033_data.csv'])
Expand All @@ -93,14 +93,14 @@ def test_csv_columns(self):
header = next(reader)

self.assertEqual(header[1], 'column_name')
self.assertEqual(header[4], 'unique')
self.assertEqual(header[5], 'unique')

row = next(reader)

self.assertEqual(row[1], 'nsn')
self.assertEqual(row[2], 'Text')
self.assertEqual(row[5], '')
self.assertEqual(row[11], '16')
self.assertEqual(row[6], '')
self.assertEqual(row[12], '16')

def test_decimal_format(self):
output = self.get_output(['-c', 'TOTAL', '--mean', 'examples/realdata/FY09_EDU_Recipients_by_State.csv'])
Expand Down

0 comments on commit 85de9ea

Please sign in to comment.