Skip to content

Commit 4c45839

Browse files
committed
📝 Add csv.Sniffer methods
1 parent 6af00ef commit 4c45839

File tree

2 files changed

+63
-11
lines changed

2 files changed

+63
-11
lines changed

CHANGELOG.rst

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ emergencies when we need to start branches for older versions.
2323
Added
2424
~~~~~
2525

26+
* 📝 Add csv.Sniffer methods
2627
* 📝 Add the removal of git lfs
2728

2829
`24.3.0 <https://github.com/cusyio/Python4DataScience/compare/24.2.0...24.3.0>`_: 2024-11-03

docs/data-processing/serialisation-formats/csv/example.ipynb

+62-11
Original file line numberDiff line numberDiff line change
@@ -1478,7 +1478,7 @@
14781478
{
14791479
"data": {
14801480
"text/plain": [
1481-
"<pandas.io.parsers.readers.TextFileReader at 0x137d11220>"
1481+
"<pandas.io.parsers.readers.TextFileReader at 0x116412300>"
14821482
]
14831483
},
14841484
"execution_count": 16,
@@ -1746,16 +1746,67 @@
17461746
" print(line)"
17471747
]
17481748
},
1749+
{
1750+
"cell_type": "markdown",
1751+
"id": "0ed726c4-5e09-4676-bcf0-f78e9f7a10e0",
1752+
"metadata": {},
1753+
"source": [
1754+
"[Sniffer.has_header](https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header) analyses your csv file and returns ``True`` if the first row appears to be a series of column headers.\n",
1755+
"\n",
1756+
"<div class=\"alert alert-block alert-info\">\n",
1757+
"\n",
1758+
"**Note:**\n",
1759+
"\n",
1760+
"This method is only a rough heuristic and can produce both false-positive and false-negative results.\n",
1761+
"</div>"
1762+
]
1763+
},
1764+
{
1765+
"cell_type": "markdown",
1766+
"id": "a19c05c1-e947-471b-8089-8e36e65b4268",
1767+
"metadata": {},
1768+
"source": [
1769+
"[Sniffer.sniff](https://docs.python.org/3/library/csv.html#csv.Sniffer.sniff) also analyses your csv file, but returns one of the following dialect subclasses."
1770+
]
1771+
},
1772+
{
1773+
"cell_type": "code",
1774+
"execution_count": 26,
1775+
"id": "263a8cb4-4ae1-46f0-963f-9d2df2de45ed",
1776+
"metadata": {},
1777+
"outputs": [
1778+
{
1779+
"name": "stdout",
1780+
"output_type": "stream",
1781+
"text": [
1782+
"['', 'Title', 'Language', 'Authors', 'License', 'Publication date', 'doi']\n",
1783+
"['0', 'Python basics', 'en', 'Veit Schiele', 'BSD-3-Clause', '2021-10-28', '']\n",
1784+
"['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2019-06-27', '']\n",
1785+
"['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', 'BSD-3-Clause', '2020-10-26', '']\n",
1786+
"['3', 'PyViz Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2020-04-13', '']\n"
1787+
]
1788+
}
1789+
],
1790+
"source": [
1791+
"with open('out.csv') as f:\n",
1792+
" dialect = csv.Sniffer().sniff(f.read(1024))\n",
1793+
" f.seek(0)\n",
1794+
" reader = csv.reader(f, dialect)\n",
1795+
"\n",
1796+
" for line in reader:\n",
1797+
" print(line)"
1798+
]
1799+
},
17491800
{
17501801
"cell_type": "markdown",
17511802
"id": "3bc7ee20",
17521803
"metadata": {},
17531804
"source": [
1754-
"### Dialekte\n",
1805+
"### Dialects\n",
17551806
"\n",
1756-
"csv-Dateien gibt es in vielen verschiedenen Varianten. Das Python csv-Modul kommt bereits mit drei verschiedenen Dialekten:\n",
1807+
"csv files are available in many different variants. The Python csv module already comes with three different dialects:\n",
17571808
"\n",
1758-
"Parameter | excel | excel-tab | unix\n",
1809+
"Parameters | [excel](https://docs.python.org/3/library/csv.html#csv.excel) | [excel-tab](https://docs.python.org/3/library/csv.html#csv.excel_tab) | [unix](https://docs.python.org/3/library/csv.html#csv.unix_dialect)\n",
17591810
":--- | :--- | :--- | :---\n",
17601811
"`delimiter` | `','` | `'\\\\t'` | `','`\n",
17611812
"`quotechar` | `'\\\"'` | `'\\\"'` | ` '\\\"'`\n",
@@ -1780,7 +1831,7 @@
17801831
},
17811832
{
17821833
"cell_type": "code",
1783-
"execution_count": 26,
1834+
"execution_count": 27,
17841835
"id": "c6d73a1e",
17851836
"metadata": {},
17861837
"outputs": [],
@@ -1804,7 +1855,7 @@
18041855
},
18051856
{
18061857
"cell_type": "code",
1807-
"execution_count": 27,
1858+
"execution_count": 28,
18081859
"id": "85ac6d66",
18091860
"metadata": {},
18101861
"outputs": [
@@ -1837,7 +1888,7 @@
18371888
},
18381889
{
18391890
"cell_type": "code",
1840-
"execution_count": 28,
1891+
"execution_count": 29,
18411892
"id": "341af079",
18421893
"metadata": {},
18431894
"outputs": [
@@ -1856,7 +1907,7 @@
18561907
" 'doi': ('', '', '', '')}"
18571908
]
18581909
},
1859-
"execution_count": 28,
1910+
"execution_count": 29,
18601911
"metadata": {},
18611912
"output_type": "execute_result"
18621913
}
@@ -1881,7 +1932,7 @@
18811932
},
18821933
{
18831934
"cell_type": "code",
1884-
"execution_count": 29,
1935+
"execution_count": 30,
18851936
"id": "69f3c21a",
18861937
"metadata": {},
18871938
"outputs": [],
@@ -1895,7 +1946,7 @@
18951946
},
18961947
{
18971948
"cell_type": "code",
1898-
"execution_count": 30,
1949+
"execution_count": 31,
18991950
"id": "ff5b4f67",
19001951
"metadata": {},
19011952
"outputs": [
@@ -1907,7 +1958,7 @@
19071958
" '2,Jupyter Tutorial,en,Veit Schiele\\n']"
19081959
]
19091960
},
1910-
"execution_count": 30,
1961+
"execution_count": 31,
19111962
"metadata": {},
19121963
"output_type": "execute_result"
19131964
}

0 commit comments

Comments
 (0)