📝 Add csv.Sniffer methods

veit · veit · commit 4c458390f227 · 2025-03-02T12:02:41.000+01:00
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -23,6 +23,7 @@ emergencies when we need to start branches for older versions.
 Added
 ~~~~~
 
+* 📝 Add csv.Sniffer methods
 * 📝 Add the removal of git lfs
 
 `24.3.0 <https://github.com/cusyio/Python4DataScience/compare/24.2.0...24.3.0>`_: 2024-11-03
diff --git a/docs/data-processing/serialisation-formats/csv/example.ipynb b/docs/data-processing/serialisation-formats/csv/example.ipynb
@@ -1478,7 +1478,7 @@
     {
      "data": {
       "text/plain": [
-       "<pandas.io.parsers.readers.TextFileReader at 0x137d11220>"
+       "<pandas.io.parsers.readers.TextFileReader at 0x116412300>"
       ]
      },
      "execution_count": 16,
@@ -1746,16 +1746,67 @@
     "    print(line)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "0ed726c4-5e09-4676-bcf0-f78e9f7a10e0",
+   "metadata": {},
+   "source": [
+    "[Sniffer.has_header](https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header)  analyses your csv file and returns ``True`` if the first row appears to be a series of column headers.\n",
+    "\n",
+    "<div class=\"alert alert-block alert-info\">\n",
+    "\n",
+    "**Note:**\n",
+    "\n",
+    "This method is only a rough heuristic and can produce both false-positive and false-negative results.\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a19c05c1-e947-471b-8089-8e36e65b4268",
+   "metadata": {},
+   "source": [
+    "[Sniffer.sniff](https://docs.python.org/3/library/csv.html#csv.Sniffer.sniff) also analyses your csv file, but returns one of the following dialect subclasses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "263a8cb4-4ae1-46f0-963f-9d2df2de45ed",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['', 'Title', 'Language', 'Authors', 'License', 'Publication date', 'doi']\n",
+      "['0', 'Python basics', 'en', 'Veit Schiele', 'BSD-3-Clause', '2021-10-28', '']\n",
+      "['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2019-06-27', '']\n",
+      "['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', 'BSD-3-Clause', '2020-10-26', '']\n",
+      "['3', 'PyViz Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2020-04-13', '']\n"
+     ]
+    }
+   ],
+   "source": [
+    "with open('out.csv') as f:\n",
+    "    dialect = csv.Sniffer().sniff(f.read(1024))\n",
+    "    f.seek(0)\n",
+    "    reader = csv.reader(f, dialect)\n",
+    "\n",
+    "    for line in reader:\n",
+    "        print(line)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "3bc7ee20",
    "metadata": {},
    "source": [
-    "### Dialekte\n",
+    "### Dialects\n",
     "\n",
-    "csv-Dateien gibt es in vielen verschiedenen Varianten. Das Python csv-Modul kommt bereits mit drei verschiedenen Dialekten:\n",
+    "csv files are available in many different variants. The Python csv module already comes with three different dialects:\n",
     "\n",
-    "Parameter | excel | excel-tab | unix\n",
+    "Parameters | [excel](https://docs.python.org/3/library/csv.html#csv.excel) | [excel-tab](https://docs.python.org/3/library/csv.html#csv.excel_tab) | [unix](https://docs.python.org/3/library/csv.html#csv.unix_dialect)\n",
     ":--- | :--- | :--- | :---\n",
     "`delimiter` | `','` | `'\\\\t'` | `','`\n",
     "`quotechar` | `'\\\"'` | `'\\\"'` | ` '\\\"'`\n",
@@ -1780,7 +1831,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 27,
    "id": "c6d73a1e",
    "metadata": {},
    "outputs": [],
@@ -1804,7 +1855,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 28,
    "id": "85ac6d66",
    "metadata": {},
    "outputs": [
@@ -1837,7 +1888,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 29,
    "id": "341af079",
    "metadata": {},
    "outputs": [
@@ -1856,7 +1907,7 @@
        " 'doi': ('', '', '', '')}"
       ]
      },
-     "execution_count": 28,
+     "execution_count": 29,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1881,7 +1932,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 30,
    "id": "69f3c21a",
    "metadata": {},
    "outputs": [],
@@ -1895,7 +1946,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 31,
    "id": "ff5b4f67",
    "metadata": {},
    "outputs": [
@@ -1907,7 +1958,7 @@
        " '2,Jupyter Tutorial,en,Veit Schiele\\n']"
       ]
      },
-     "execution_count": 30,
+     "execution_count": 31,
      "metadata": {},
      "output_type": "execute_result"
     }