"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.table(results, end=3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d6d6313-88a1-452f-9323-e977013d0d5a",
+ "metadata": {},
+ "source": [
+ "The results are shown inside the sentences that they occur in.\n",
+ "`p`s are too big to fit into sentences, so the `p`s are left out and only the images show up.\n",
+ "\n",
+ "We can make the display richer: instead of a plain table, we can unfold the sentences in the results:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "4d5f1536-4194-4c9d-86b1-a3dbd44580db",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "result 1"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1001:4
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1001:4
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "result 2"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1002:3
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1002:3
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "result 3"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1002:3
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1002:3
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.show(results, end=3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "194d0133-3fba-4926-b56b-a033295941b8",
+ "metadata": {},
+ "source": [
+ "The results are collected and shown in their surrounding sentence. \n",
+ "\n",
+ "Not that we see only the sentences that contain an image.\n",
+ "\n",
+ "But we can see more if we tell text-fabric to condense the result not in sentences, but in `p`s:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "9f95f839-91c4-41a3-9fc8-f8d62854f1dc",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "result 1"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1001:4
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "result 2"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1002:3
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "result 3"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1002:3
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.show(results, end=3, condenseType=\"p\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f4def1a-f87a-4f82-a537-0e3fadbcd6a1",
+ "metadata": {},
+ "source": [
+ "# Formulas\n",
+ "\n",
+ "Now let's look for formulas that have a square root in them.\n",
+ "\n",
+ "Note that in TeX a square root is written as `\\sqrt`.\n",
+ "\n",
+ "The TeX source of a formula is contained in the `tex` feature of a formula node, provided\n",
+ "the formula is written in TeX. Not all formulas are written in TeX."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "b9a91170-0cd7-4ecc-b6cb-5b4fb08f7f05",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = \"\"\"\n",
+ "formula tex~sqrt\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "0c638c11-2064-4c91-b359-81ac6562b251",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0.00s 54 results\n"
+ ]
+ }
+ ],
+ "source": [
+ "results = A.search(query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "331153b5-2783-4b65-a161-28f157a0382b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "sentence 1"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2152:4
sentence 10
formula TeX
tex={\\rm m} - {{\\displaystyle\\strut {\\rm n}\\over \\displaystyle\\strut {\\rm z}}}{\\rm x} + \\sqrt{{\\rm mm} + {\\rm ox} - {{\\displaystyle\\strut {\\rm p}\\over \\displaystyle\\strut {\\rm m}}}{\\rm xx}}
${\\rm m} - {{\\displaystyle\\strut {\\rm n}\\over \\displaystyle\\strut {\\rm z}}}{\\rm x} + \\sqrt{{\\rm mm} + {\\rm ox} - {{\\displaystyle\\strut {\\rm p}\\over \\displaystyle\\strut {\\rm m}}}{\\rm xx}}$
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "sentence 2"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "3 3174:14
sentence 2
formula TeX
tex=vv = - {{1\\over 3}}nv + 2ev + {{2\\over 3}}ne - 2ee
$vv = - {{1\\over 3}}nv + 2ev + {{2\\over 3}}ne - 2ee$
formula TeX
tex=x - {{1\\over 6}}n \\pm \\sqrt{{{1\\over 36}}nn + {{1\\over 3}}nx - xx}
$x - {{1\\over 6}}n \\pm \\sqrt{{{1\\over 36}}nn + {{1\\over 3}}nx - xx}$
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "sentence 3"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "3 3174:15
sentence 1
formula TeX
tex=y = {{1\\over 6}}n \\pm \\sqrt{{{1\\over 36}}nn + {{1\\over 3}}nx - xx}
$y = {{1\\over 6}}n \\pm \\sqrt{{{1\\over 36}}nn + {{1\\over 3}}nx - xx}$
formula TeX
tex=x^{4}.. - {{1\\over 9}}n^{3}x + {{1\\over 54}}n^{4} = 0
$x^{4}.. - {{1\\over 9}}n^{3}x + {{1\\over 54}}n^{4} = 0$
formula TeX
tex=z^{6}. - {{2\\over 27}}n^{4}zz - {{1\\over 81}}n^{6} = 0
$z^{6}. - {{2\\over 27}}n^{4}zz - {{1\\over 81}}n^{6} = 0$
formula TeX
tex={{1\\over 3}}nn
formula TeX
tex=z = n\\sqrt{{{1\\over 3}}}
$z = n\\sqrt{{{1\\over 3}}}$
formula TeX
tex=x^{4} - {{1\\over 9}}n^{3}x + {{1\\over 54}}n^{4} = 0
$x^{4} - {{1\\over 9}}n^{3}x + {{1\\over 54}}n^{4} = 0$
formula TeX
tex=xx - nx\\sqrt{{{1\\over 3}} + {{1\\over 6}}nn} - {\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{ 3}}} = 0
$xx - nx\\sqrt{{{1\\over 3}} + {{1\\over 6}}nn} - {\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{ 3}}} = 0$
formula TeX
tex=xx + nx\\sqrt{{{1\\over 3}} + {{1\\over 6}}nn} + {\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{ 3}}} = 0
$xx + nx\\sqrt{{{1\\over 3}} + {{1\\over 6}}nn} + {\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{ 3}}} = 0$
formula TeX
tex=x = n\\sqrt{{{1\\over 12}}} \\pm \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}
$x = n\\sqrt{{{1\\over 12}}} \\pm \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}$
formula TeX
tex=n\\sqrt{{{1\\over 12}}} + \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}
$n\\sqrt{{{1\\over 12}}} + \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}$
formula TeX
tex=n\\sqrt{{{1\\over 12}}} - \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}
$n\\sqrt{{{1\\over 12}}} - \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}$
formula TeX
tex=\\sqrt{{\\displaystyle\\strut {2nn}\\over \\displaystyle\\strut {3\\sqrt{3}}} - {{1\\over 3}}nn}
$\\sqrt{{\\displaystyle\\strut {2nn}\\over \\displaystyle\\strut {3\\sqrt{3}}} - {{1\\over 3}}nn}$
formula TeX
tex=\\sqrt{{\\displaystyle\\strut {4nn}\\over \\displaystyle\\strut {3\\sqrt{3}}} - {{2\\over 3}}nn}
$\\sqrt{{\\displaystyle\\strut {4nn}\\over \\displaystyle\\strut {3\\sqrt{3}}} - {{2\\over 3}}nn}$
formula TeX
tex=\\sqrt{36\\sqrt{3} - 54}
$\\sqrt{36\\sqrt{3} - 54}$
formula TeX
tex=\\sqrt{4\\sqrt{3} - 6}
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.show(results, end=3, condensed=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a70cab4-9316-46cd-beb5-66edd63f4a58",
+ "metadata": {},
+ "source": [
+ "We can get rid of the TeX codes.\n",
+ "\n",
+ "We see them because our query mentioned the feature `tex`, but we can turn that off (showing the 3rd result only)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "028d28df-ff81-4e47-b904-974749c69ee2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "sentence 3"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "3 3174:15
sentence 1
formula TeX
$y = {{1\\over 6}}n \\pm \\sqrt{{{1\\over 36}}nn + {{1\\over 3}}nx - xx}$
formula TeX
$x^{4}.. - {{1\\over 9}}n^{3}x + {{1\\over 54}}n^{4} = 0$
formula TeX
$z^{6}. - {{2\\over 27}}n^{4}zz - {{1\\over 81}}n^{6} = 0$
formula TeX
$z = n\\sqrt{{{1\\over 3}}}$
formula TeX
$x^{4} - {{1\\over 9}}n^{3}x + {{1\\over 54}}n^{4} = 0$
formula TeX
$xx - nx\\sqrt{{{1\\over 3}} + {{1\\over 6}}nn} - {\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{ 3}}} = 0$
formula TeX
$xx + nx\\sqrt{{{1\\over 3}} + {{1\\over 6}}nn} + {\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{ 3}}} = 0$
formula TeX
$x = n\\sqrt{{{1\\over 12}}} \\pm \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}$
formula TeX
$n\\sqrt{{{1\\over 12}}} + \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}$
formula TeX
$n\\sqrt{{{1\\over 12}}} - \\sqrt{{\\displaystyle\\strut {nn}\\over \\displaystyle\\strut {6\\sqrt{3}}} - {{1\\over 12}}nn}$
formula TeX
$\\sqrt{{\\displaystyle\\strut {2nn}\\over \\displaystyle\\strut {3\\sqrt{3}}} - {{1\\over 3}}nn}$
formula TeX
$\\sqrt{{\\displaystyle\\strut {4nn}\\over \\displaystyle\\strut {3\\sqrt{3}}} - {{2\\over 3}}nn}$
formula TeX
$\\sqrt{36\\sqrt{3} - 54}$
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.show(results, start=3, end=3, condensed=True, queryFeatures=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5d89b91b-dc66-4cc8-aec7-43e4a2765164",
+ "metadata": {},
+ "source": [
+ "## Formulas without TeX\n",
+ "\n",
+ "We gather the formulas not written in TeX:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "398e1c5f-01f2-41d8-9317-ab73dd1f4ce1",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0.01s 5981 results\n"
+ ]
+ }
+ ],
+ "source": [
+ "query = \"\"\"\n",
+ "formula notation#TeX\n",
+ "\"\"\"\n",
+ "\n",
+ "results = A.search(query)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6df9780c-8aef-40ff-b474-7939e7bc6425",
+ "metadata": {},
+ "source": [
+ "The majority is not written in TeX, let's sample a few:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "90250c53-9231-4a76-87ab-94c2df23284a",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "n | p | formula |
\n",
+ "1 | 7 7538:6 | IN |
\n",
+ "2 | 2 2122:12 | G |
\n",
+ "3 | 1 1020:7 | ZZ |
\n",
+ "4 | 2 2164:25 | GR |
\n",
+ "5 | 2 2159:5 | g |
\n",
+ "6 | 2 2156:9 | C in A + C in E − Aq − A in E bis − Eq |
\n",
+ "7 | 2 2126:16 | AO |
\n",
+ "8 | 1 1f1b:3 | DE |
\n",
+ "9 | 7 7547:12 | S |
\n",
+ "10 | 4 4303:13 | BG |
\n",
+ "11 | 1 1063:9 | B |
\n",
+ "12 | 6 6408:5 | x |
\n",
+ "13 | 3 3198:15 | FL |
\n",
+ "14 | 1 1020:12 | Q |
\n",
+ "15 | 1 1020:10 | LD |
\n",
+ "16 | 1 1066:7 | B |
\n",
+ "17 | 2 2156:6 | DN |
\n",
+ "18 | 2 2156:15 | EO |
\n",
+ "19 | 4 4289:9 | AC |
\n",
+ "20 | 6 6467:5 | g |
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from random import seed, sample\n",
+ "\n",
+ "seed(42)\n",
+ "\n",
+ "selected = sample(results, 20)\n",
+ "\n",
+ "A.table(selected)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aab833a7-aaaa-49ee-bb23-06712544bc17",
+ "metadata": {},
+ "source": [
+ "These formulas are all so simple that TeX was not needed to display them.\n",
+ "\n",
+ "Let's see the first 2 of them in context:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "e3c73134-887d-4640-b138-46e08b7936db",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "sentence 1"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2122:12
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "sentence 2"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "7 7538:6
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.show(selected[0:2], condensed=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "09577595-dbb0-4322-a61f-1615613325d3",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "---\n",
+ "\n",
+ "# Contents\n",
+ "\n",
+ "* **[start](start.ipynb)** intro and highlights\n",
+ "* **search** turbo charge your hand-coding with search templates\n",
+ "* **[compute](compute.ipynb)** sink down a level and compute it yourself\n",
+ "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n",
+ "\n",
+ "Advanced\n",
+ "\n",
+ "* **[similar sentences](similar.ipynb)** find similar sentences\n",
+ "\n",
+ "CC-BY Dirk Roorda"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorial/similar.ipynb b/tutorial/similar.ipynb
new file mode 100644
index 0000000..a88232c
--- /dev/null
+++ b/tutorial/similar.ipynb
@@ -0,0 +1,1916 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "4a486875-147d-492a-9003-f05c48d841fc",
+ "metadata": {},
+ "source": [
+ "
\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "---\n",
+ "\n",
+ "To get started: consult [start](start.ipynb)\n",
+ "\n",
+ "---\n",
+ "\n",
+ "# Similar sentences\n",
+ "\n",
+ "We explore the similar sentences in the letters of Descartes.\n",
+ "\n",
+ "They have already been diagnosed and put in an *edge* feature by running\n",
+ "the notebook [parallels](../programs/parallels.ipynb)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3f0597b0-6f7d-4610-91bb-6aa93a5c3f7a",
+ "metadata": {},
+ "source": [
+ "# Incantation\n",
+ "\n",
+ "The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are\n",
+ "explained in the [start tutorial](start.ipynb)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "b8d43d3f-d00a-4ec3-b690-d0fa6fc9dcbe",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%load_ext autoreload\n",
+ "%autoreload 2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "156b5da3-563a-4081-967b-afd74cc314a3",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2018-05-24T10:06:39.818664Z",
+ "start_time": "2018-05-24T10:06:39.796588Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from tf.app import use"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "d77aff2b-9f7d-45fb-a1a2-7d31c16c2bca",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "TF-app: ~/github/CLARIAH/descartes-tf/app"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "data: ~/github/CLARIAH/descartes-tf/tf/1.0"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "data: ~/github/CLARIAH/descartes-tf/parallels/tf/1.0"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "This is Text-Fabric 11.0.7\n",
+ "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n",
+ "\n",
+ "28 features found and 0 ignored\n",
+ " 0.09s Dataset without structure sections in otext:no structure functions in the T-API\n",
+ " 0.34s All features loaded/computed - for details use TF.isLoaded()\n",
+ " 0.01s All additional features loaded - for details use TF.isLoaded()\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " Text-Fabric: Text-Fabric API 11.0.7, CLARIAH/descartes-tf/app v3, Search Reference
\n",
+ " Data: DESCARTES-TF, Character table, Feature docs
\n",
+ " Node types
\n",
+ "\n",
+ " \n",
+ " Name | \n",
+ " # of nodes | \n",
+ " # slots/node | \n",
+ " % coverage | \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " volume | \n",
+ " 8 | \n",
+ " 85241.88 | \n",
+ " 100 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " letter | \n",
+ " 725 | \n",
+ " 940.60 | \n",
+ " 100 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " page | \n",
+ " 2884 | \n",
+ " 236.45 | \n",
+ " 100 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " postscriptum | \n",
+ " 56 | \n",
+ " 46.79 | \n",
+ " 0 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " opener | \n",
+ " 545 | \n",
+ " 1.97 | \n",
+ " 0 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " closer | \n",
+ " 541 | \n",
+ " 13.10 | \n",
+ " 1 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " address | \n",
+ " 86 | \n",
+ " 15.22 | \n",
+ " 0 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " head | \n",
+ " 725 | \n",
+ " 23.37 | \n",
+ " 2 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " p | \n",
+ " 8438 | \n",
+ " 80.82 | \n",
+ " 100 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " sentence | \n",
+ " 14332 | \n",
+ " 45.74 | \n",
+ " 96 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " hi | \n",
+ " 5972 | \n",
+ " 4.63 | \n",
+ " 4 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " formula | \n",
+ " 6200 | \n",
+ " 1.21 | \n",
+ " 1 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " figure | \n",
+ " 319 | \n",
+ " 1.00 | \n",
+ " 0 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " word | \n",
+ " 681935 | \n",
+ " 1.00 | \n",
+ " 100 | \n",
+ "
\n",
+ "
\n",
+ " Sets: no custom sets
\n",
+ " Features:
\n",
+ "Similar Sentences
\n",
+ " \n",
+ "\n",
+ "
\n",
+ "
\n",
+ "sim\n",
+ "
\n",
+ "
int
\n",
+ "\n",
+ "
similarity between sentences based on the Levenshtein ratio\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "\n",
+ "Descartes = Descartes, all letters
\n",
+ " \n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
alternative date of a letter\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
alternative ids of a letter, comma separated\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
certainty of something\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
date of a letter\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
id of a letter\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
person involved in the transmission of the letter from sender to receiver\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
whether the word is in italic\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
whether the word is in the margin\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
whether the word is in subscript\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
whether the word is in supscript\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
language of a letter\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
level of a paragraph when it acts like a heading\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
int
\n",
+ "\n",
+ "
number of whatever element\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
notation method of a formula\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
nonword chars after a word \n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
recipient of a letter\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
location from where a letter was received\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
person responsible for something\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
sender of a letter\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
location from where a letter was sent\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
unformatted TeX code of a formula, without the `$`\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
transcription of a word \n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
kind of a node; \"empty\"; \"formula\", \"head\", \"symbol\", \"illustration\"\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
str
\n",
+ "\n",
+ "
url of a graphic node\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "
none
\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "data: ~/github/CLARIAH/descartes-tf/source/illustrations"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "Found 5 symbols
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "Found 310 illustrations
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A = use(\n",
+ " \"CLARIAH/descartes-tf:clone\",\n",
+ " checkout=\"clone\",\n",
+ " hoist=globals(),\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f100a88e-ae13-4921-abb8-6bf5ee732af2",
+ "metadata": {},
+ "source": [
+ "# Use the similar sentences module\n",
+ "\n",
+ "You see an extra module **Similar Sentences** listed with one feature: `sim`.\n",
+ "It is in *italics*, which indicates it is an edge feature.\n",
+ "\n",
+ "We count how many similar pairs their are, how many 100% similar pairs there are,\n",
+ "and how many more than 90% but not 100%."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "055e8536-98e4-4c30-a7a2-aba9d581d7f7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0.01s 1199 results\n"
+ ]
+ }
+ ],
+ "source": [
+ "query = \"\"\"\n",
+ "sentence\n",
+ "-sim> sentence\n",
+ "\"\"\"\n",
+ "results = A.search(query)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ba934873-7783-49ab-a10a-3d62bb3fe776",
+ "metadata": {},
+ "source": [
+ "We collect the 100% results, in bidirectional form."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "16e963dd-5be2-4cbb-9b8f-a366b6fb506b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0.02s 1234 results\n"
+ ]
+ }
+ ],
+ "source": [
+ "query100 = \"\"\"\n",
+ "sentence\n",
+ " sentence\n",
+ "\"\"\"\n",
+ "results100 = A.search(query100)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4958ee6b-2d67-428f-a8e3-5d9b39284f0e",
+ "metadata": {},
+ "source": [
+ "Let's show the 90+% pairs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "e97d4c90-e166-4e35-ac33-18fdb8ef0d3a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0.01s 724 results\n"
+ ]
+ }
+ ],
+ "source": [
+ "query90 = \"\"\"\n",
+ "sentence\n",
+ "-sim>90> sentence\n",
+ "\"\"\"\n",
+ "results90 = A.search(query90)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9adf7387-f2c9-4b4d-8e0b-8431dcb083f4",
+ "metadata": {},
+ "source": [
+ "Let's weed out the 100% pairs:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "55e7a921-a323-44a7-9fdd-7b66fe94ea5a",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "107"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "results100set = set(results100)\n",
+ "\n",
+ "results = tuple(r for r in results90 if r not in results100set)\n",
+ "len(results)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d06355a8-e1f9-4d94-9c45-1b62f06016d3",
+ "metadata": {},
+ "source": [
+ "We show the top 50 of these highly similar sentence pairs:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "6ad33c70-5837-4b43-958f-be8f34995313",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "n | p | sentence | sentence |
\n",
+ "1 | 1 1018:4 | 708669 Il faut observer que la ligne NM , qui est le milieu de la lame PNOM , doit être exactement parallèle à l'axe AB de la première machine, et que la ligne perpendiculaire qui tomberait de l'axe AB sur les planches GH et IK , tombe justement sur cette ligne NM .\n",
+ "De plus, aux dernières figures, il faut que la même ligne NM , prolongée, passe justement par le centre de la roue Q ,\n",
+ "et se rencontre faire une ligne droite avec l'axe RS , sur lequel tourne le verre. | 708698 Enfin vous dites qu'il faut aussi observer que la ligne NM , qui fait le milieu de la lame PNOM , doit être exactement parallèle à l'axe AB de la première machine, et que la ligne perpendiculaire qui tomberait de l'axe AB sur les planches GH et IK , tombe E justement sur cette ligne NM . De plus, aux dernières figures, il faut que la même ligne NM prolongée passe justement par le centre de la roue Q et se rencontre faire une ligne droite avec l'axe RS , sur lequel tourne le verre. |
\n",
+ "2 | 2 2122:10 | 710640 Faisons après cela qu' A , l'un des bouts de cette corde, étant attaché ferme à quelque clou, l'autre C soit derechef soutenu par un homme; et il est évident que cet homme, en C , n'aura besoin, non plus que devant, pour soutenir le poids E , que de la force qu'il faut pour soutenir cent livres: à cause que le clou qui est vers A y fait le même office que l'homme que nous y supposions auparavant. | 712335 Puis, si on suppose que A , l'un des bouts de cette corde, soit attaché ferme à quelque clou, et que l'autre C soit derechef soutenu par un homme, il est évident que cet homme en C n'aura besoin non plus que devant, pour soutenir ce poids E , que de la force qu'il faut pour soutenir 100 livres, à cause que le clou qui sera vers A y fera le même office que l'homme que nous y supposions auparavant. |
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "A.table(results, withNodes=True, end=2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51419d90-d1b8-4421-b81b-52a1eba0726d",
+ "metadata": {},
+ "source": [
+ "Unfortunately, the generic mechanism of text-fabric does not show the passage of the second sentence of each similar pair.\n",
+ "\n",
+ "We can make a display by hand, and also show the similarity:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "d82e9b4e-e565-407b-80c5-5d67a9afe2f5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/markdown": [
+ "### 1 similarity 96\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1018:4 Il faut observer que la ligne NM , qui est le milieu de la lame PNOM , doit être exactement parallèle à l'axe AB de la première machine, et que la ligne perpendiculaire qui tomberait de l'axe AB sur les planches GH et IK , tombe justement sur cette ligne NM .\n",
+ "De plus, aux dernières figures, il faut que la même ligne NM , prolongée, passe justement par le centre de la roue Q ,\n",
+ "et se rencontre faire une ligne droite avec l'axe RS , sur lequel tourne le verre.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "1 1019:11 Enfin vous dites qu'il faut aussi observer que la ligne NM , qui fait le milieu de la lame PNOM , doit être exactement parallèle à l'axe AB de la première machine, et que la ligne perpendiculaire qui tomberait de l'axe AB sur les planches GH et IK , tombe
E justement sur cette ligne NM . De plus, aux dernières figures, il faut que la même ligne NM prolongée passe justement par le centre de la roue Q et se rencontre faire une ligne droite avec l'axe RS , sur lequel tourne le verre. "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 2 similarity 92\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2122:10 Faisons après cela qu' A , l'un des bouts de cette corde, étant attaché ferme à quelque clou, l'autre C soit derechef soutenu par un homme; et il est évident que cet homme, en C , n'aura besoin, non plus que devant, pour soutenir le poids E , que de la force qu'il faut pour soutenir cent livres: à
cause que le clou qui est vers A y fait le même office que l'homme que nous y supposions auparavant. "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2164:15 Puis, si on suppose que A , l'un des bouts de cette corde, soit attaché ferme à quelque clou, et que l'autre C soit derechef soutenu par un homme, il est évident que cet homme en C n'aura besoin non plus que devant, pour soutenir ce poids E , que de la force qu'il faut pour soutenir 100 livres, à cause que le clou qui sera vers A y fera le même office que l'homme que nous y supposions auparavant.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 3 similarity 97\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2122:10 Enfin, posons que cet homme qui est vers C tire la corde pour faire hausser le poids E ; et il est évident que, s'il y emploie la force qu'il faut pour lever 100 livres à la hauteur de deux pieds, il fera hausser ce poids E , qui en pèse 200, de la hauteur d'un pied: car la corde ABC étant doublée comme elle est, on la doit tirer de deux pieds par le bout C , pour faire autant hausser le poids E que si deux hommes la tiraient, l'un par le bout A et l'autre par le bout C , chacun de la longueur d'un pied seulement.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2164:15 Enfin, supposant que cet homme, qui est vers C , tire la corde pour faire hausser le poids E , il est évident que, s'il y emploie la force qu'il faut pour lever 100 livres à la hauteur de deux pieds, il fera hausser ce poids E , qui en pèse deux cents, de la hauteur d'un pied; car la corde ABC étant doublée comme elle est, on la doit tirer de deux pieds, par le bout C , pour faire autant hausser ce poids E que si deux hommes la tiraient, l'un par le bout A et l'autre par le bout C , chacun de la longueur d'un pied seulement.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 4 similarity 95\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2122:29 Et pour mesurer exactement quelle doit être cette force en chaque point de la ligne courbe ABCDE , il faut savoir qu'elle y agit tout de même que si elle traînait le poids sur un plan circulairement incliné, et que l'inclination de chacun des points de ce plan circulaire se doit mesurer par celle de la ligne droite qui touche le cercle en ce point.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2164:25 Or, pour mesurer exactement quelle doit être cette force en chaque point de la ligne courbe ABCDE , il faut penser qu'elle y agit tout de même que si elle traînait le poids sur un plan circulairement incliné, et l'inclination de chacun des points de ce plan circulaire, ou sphérique, se doit mesurer par celle de la ligne droite qui touche le cercle en ce point-là.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 5 similarity 96\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:8 Ergo sumendo quodlibet punctum in recta BE , et ab eo ducendo ordinatam OI , a puncto autem B ordinatam BC , major erit proportio CD ad DI ,\n",
+ "quam quadrati BC ad quadratum OI , quia punctum < O > est extra parabolen.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:10 Ergo sumendo quodlibet punctum in recta BE , et ab eo ducendo ordinatam OI , a puncto autem B ordinatam BC , major erit proportio CD ad DI ,\n",
+ "quam quadrati BC ad quadratum OI , quia punctum O est extra ellipsim.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 6 similarity 93\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:8 Ergo sumendo quodlibet punctum in recta BE , et ab eo ducendo ordinatam OI , a puncto autem B ordinatam BC , major erit proportio CD ad DI ,\n",
+ "quam quadrati BC ad quadratum OI , quia punctum < O > est extra parabolen.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:12 Ergo sumendo quodlibet punctum in recta BE , et ab eo ducendo ordinatam OI , a puncto autem B ordinatam BC , major erit proportio CD ad DI ,\n",
+ "quam BC ad OI , quia punctum O est extra hyperbolen.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 7 similarity 97\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:9 Sed propter similitudinem triangulorum, ut
BC quadratum ad OI quadratum, ita CE quadratum ad IE quadratum; major <igitur> erit proportio CD ad DI , quam quadrati CE ad quadratum IE .\n",
+ "<c> cum diametro concurrens. "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:11 Sed propter similitudinem triangulorum, ut BC quadratum ad OI quadratum, ita CE quadratum ad IE quadratum; major igitur erit proportio CD ad DI , quam quadrati CE ad quadratum IE .\n",
+ " \n",
+ "<d>
cto E cum diametro concurrens. "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 8 similarity 92\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:10 Ergo sumendo quodlibet punctum in recta BE , et ab eo ducendo ordinatam OI , a puncto autem B ordinatam BC , major erit proportio CD ad DI ,\n",
+ "quam quadrati BC ad quadratum OI , quia punctum O est extra ellipsim.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:12 Ergo sumendo quodlibet punctum in recta BE , et ab eo ducendo ordinatam OI , a puncto autem B ordinatam BC , major erit proportio CD ad DI ,\n",
+ "quam BC ad OI , quia punctum O est extra hyperbolen.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 9 similarity 98\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:14 Cum autem punctum B detur, <datur applicata BC ; ergo punctum C >. Datur etiam CD . Sit igitur CD aequalis B datae.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:17 Cum autem punctum B detur, datur applicata BC ; ergo punctum C . Datur etiam CD . Sit igitur CD aequalis D datae.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "### 10 similarity 98\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:14 Cum autem punctum B detur, <datur applicata BC ; ergo punctum C >. Datur etiam CD . Sit igitur CD aequalis B datae.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "2 2144:20 Cum autem punctum B detur, datur applicata BC ; ergo punctum C . Datur etiam CD . Sit igitur CD aequalis D datae.
"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "for (i, (s1, s2)) in enumerate(results[0:10]):\n",
+ " sim = dict(E.sim.f(s1))[s2]\n",
+ " A.dm(f\"### {i+1} similarity {sim}\\n\")\n",
+ " A.plain(s1)\n",
+ " A.plain(s2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "83ab330a-f78a-46a0-b62f-dfd88db4b247",
+ "metadata": {},
+ "source": [
+ "## Edges: low-level\n",
+ "\n",
+ "We can list all edges going out from a reference node.\n",
+ "What we see is tuple of pairs: the target node and the similarity between the reference node and that target node."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "246c6cc2-92c7-4e99-8c6b-776280a7f337",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "refNode1=722606\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "((722626, 94),)"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "refNode1 = results[-1][0]\n",
+ "print(f\"{refNode1=}\")\n",
+ "\n",
+ "E.sim.f(refNode1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e48070ee-eb27-4adc-adaf-68e0af1c8c1e",
+ "metadata": {},
+ "source": [
+ "Likewise, we can observe the nodes that target the reference node:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "f1f3cc0b-df96-4e29-9d9b-779e1660b50a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "refNode2=722626\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "((722606, 94),)"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "refNode2 = results[-1][1]\n",
+ "print(f\"{refNode2=}\")\n",
+ "\n",
+ "E.sim.t(refNode2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "333412e0-75a8-4901-b062-07a1673293a8",
+ "metadata": {},
+ "source": [
+ "Both sets of nodes are similar to the reference node and it is inconvenient to use both `.f()` and `.t()` to get the similar lines.\n",
+ "\n",
+ "But there is another way:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "a3a37416-96d9-49f8-92d5-b74cb7af6f9d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((719355, 80), (722626, 94))"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "E.sim.b(refNode1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "1d6ecc9b-b990-41ee-9b35-babec04d03f0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((722606, 94),)"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "E.sim.b(refNode2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "09577595-dbb0-4322-a61f-1615613325d3",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "---\n",
+ "\n",
+ "# Contents\n",
+ "\n",
+ "* **[start](start.ipynb)** intro and highlights\n",
+ "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n",
+ "* **[compute](compute.ipynb)** sink down a level and compute it yourself\n",
+ "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n",
+ "\n",
+ "Advanced\n",
+ "\n",
+ "* **similar sentences** find similar sentences\n",
+ "\n",
+ "CC-BY Dirk Roorda"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorial/start.ipynb b/tutorial/start.ipynb
index 48c41dc..98a3eed 100644
--- a/tutorial/start.ipynb
+++ b/tutorial/start.ipynb
@@ -4,9 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "
\n",
- "
\n",
- "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
"\n",
"# Start\n",
"\n",
@@ -179,7 +179,19 @@
{
"data": {
"text/html": [
- "data: ~/github/CLARIAH/descartes-tf/tf/0.9"
+ "data: ~/github/CLARIAH/descartes-tf/tf/1.0"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "data: ~/github/CLARIAH/descartes-tf/parallels/tf/1.0"
],
"text/plain": [
""
@@ -195,9 +207,9 @@
"This is Text-Fabric 11.0.7\n",
"Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n",
"\n",
- "26 features found and 0 ignored\n",
- " 0.08s Dataset without structure sections in otext:no structure functions in the T-API\n",
- " 0.31s All features loaded/computed - for details use TF.isLoaded()\n",
+ "28 features found and 0 ignored\n",
+ " 0.09s Dataset without structure sections in otext:no structure functions in the T-API\n",
+ " 0.35s All features loaded/computed - for details use TF.isLoaded()\n",
" 0.01s All additional features loaded - for details use TF.isLoaded()\n"
]
},
@@ -219,28 +231,21 @@
"\n",
" volume | \n",
" 8 | \n",
- " 85287.50 | \n",
+ " 85241.88 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" letter | \n",
" 725 | \n",
- " 941.10 | \n",
+ " 940.60 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" page | \n",
" 2884 | \n",
- " 236.58 | \n",
- " 100 | \n",
- "
\n",
- "\n",
- "\n",
- " p | \n",
- " 8438 | \n",
- " 80.86 | \n",
+ " 236.45 | \n",
" 100 | \n",
"
\n",
"\n",
@@ -252,10 +257,17 @@
"\n",
"\n",
"\n",
- " head | \n",
- " 725 | \n",
- " 23.37 | \n",
- " 2 | \n",
+ " opener | \n",
+ " 545 | \n",
+ " 1.97 | \n",
+ " 0 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " closer | \n",
+ " 541 | \n",
+ " 13.10 | \n",
+ " 1 | \n",
"
\n",
"\n",
"\n",
@@ -266,10 +278,24 @@
"
\n",
"\n",
"\n",
- " closer | \n",
- " 541 | \n",
- " 13.10 | \n",
- " 1 | \n",
+ " head | \n",
+ " 725 | \n",
+ " 23.37 | \n",
+ " 2 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " p | \n",
+ " 8438 | \n",
+ " 80.82 | \n",
+ " 100 | \n",
+ "
\n",
+ "\n",
+ "\n",
+ " sentence | \n",
+ " 14332 | \n",
+ " 45.74 | \n",
+ " 96 | \n",
"
\n",
"\n",
"\n",
@@ -280,16 +306,9 @@
"
\n",
"\n",
"\n",
- " opener | \n",
- " 545 | \n",
- " 1.97 | \n",
- " 0 | \n",
- "
\n",
- "\n",
- "\n",
" formula | \n",
" 6200 | \n",
- " 1.27 | \n",
+ " 1.21 | \n",
" 1 | \n",
"
\n",
"\n",
@@ -302,19 +321,85 @@
"\n",
"\n",
" word | \n",
- " 682300 | \n",
+ " 681935 | \n",
" 1.00 | \n",
" 100 | \n",
"
\n",
"\n",
" Sets: no custom sets
\n",
" Features:
\n",
+ "Similar Sentences
\n",
+ " \n",
+ "\n",
+ "
\n",
+ "
\n",
+ "sim\n",
+ "
\n",
+ "
int
\n",
+ "\n",
+ "
\n",
+ " similarity between sentences based on the Levenshtein ratio
\n",
+ " \n",
+ " \n",
+ "\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "\n",
"Descartes = Descartes, all letters
\n",
" \n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
@@ -339,7 +424,7 @@
"\n",
"
\n",
"\n",
"