Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dolsysmith committed Jan 29, 2024
1 parent df96596 commit ac3528f
Show file tree
Hide file tree
Showing 11 changed files with 91 additions and 58 deletions.
3 changes: 0 additions & 3 deletions Untitled

This file was deleted.

Binary file added _images/bigram-table-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/bigram-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/play-button.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/rabelais-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/rabelais-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/save-to-drive.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/variable.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
87 changes: 58 additions & 29 deletions _sources/reading_writing_machines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,21 @@
"\n",
"To use this document in Colab, follow the steps below (or watch this short video):\n",
"\n",
"1. Click the rocket icon in the upper-right corner of this page.\n",
"1. Click the rocket icon {fas}`fa-rocket` in the upper-right corner of this page.\n",
"2. Select the `Google Colab` option.\n",
"3. A new tab should open in your browser, displaying the content of this page but in the Colab interface. \n",
"4. Click the option `Save a copy of Drive` to create your own copy of the document (so that you can preserve any changes you make as well as the output from your interactions).\n",
"5. The Colab version of the document is backed by a running instance of the Python kernel, meaning that you can run the code and see the results in the Colab interface. To run any of the code sections, click the play icon to the left of that section, as in the image below.\n",
"6. The logic of a programming language like Python is strictly linear. Running code sections out of order, or skipping a code section, will often result in errors, which will appear below the code section that triggered the error, in lieu of the expected output. If you start to encounter unexpected errors, go to the Colab menu at the top of the browser window and select `Kernel - Restart Kernel`. Then run each of the code sections in order, starting from the top of the document. \n",
"4. Within that tab, click the button `Copy to Drive` from the menu at the top to create your own copy of the document (so that you can preserve any changes you make as well as the output from your interactions). \n",
"```{image} ./save-to-drive.png\n",
":alt: Button with the text \"Copy to Drive\" and the Google Drive symbol\n",
":width: 150px\n",
"```\n",
"5. The Colab version of the document is backed by a running instance of the Python kernel (the program that runs Python progams), meaning that you can run the code and see the results in the Colab interface. To run any of the code sections, click the play icon to the left of that section, as in the image below.\n",
"```{image} ./play-button.png\n",
":alt: Python code with play button icon to the left; red arrow pointing to the play button.\n",
":width: 250px\n",
"```\n",
"\n",
"6. The logic of a programming language like Python is strictly linear. Running code sections out of order, or skipping a code section, will often result in errors, which will appear below the code section that triggered the error, in lieu of the expected output. If you start to encounter unexpected errors, go to the Colab menu at the top of the browser window and select `Runtime - Restart Session`. Then run each of the code sections in order, starting from the top of the document. \n",
"\n",
"Note that the `Reading the code` sections, as well as footnotes and references, have been ommitted from the Colab version of this document in order to make the latter easier to use. So you may want to toggle back and forth between the two versions. (I apologize for that inconvenience -- it's a technical limitation I have not had time to address.)\n"
]
Expand Down Expand Up @@ -101,23 +110,26 @@
"id": "c43f26d2-9ed3-49fb-a2c6-9591eb1738da",
"metadata": {},
"source": [
"In the encoding specified by the Python language, the equals sign (`=`) is an instruction that loosely translates to: \"Store this value (on the right side) somewhere in memory, and give that location in memory the provided label (on the left side).\" The following image presents one way of imagining what happens in response to this code (with the caveat that, ultimately, the letters and numbers are represented by their binary encoding). "
"In the encoding specified by the Python language, the equals sign (`=`) is an instruction that loosely translates to: \"Store this value (on the right side) somewhere in memory, and give that location in memory the provided name (on the left side).\" The following image presents one way of imagining what happens in response to this code (with the caveat that, ultimately, the letters and numbers are represented by their binary encoding). "
]
},
{
"cell_type": "markdown",
"id": "063a8ee0-c7cb-4ce2-b74d-d447fb9b0865",
"metadata": {},
"source": [
"[image here]"
"```{image} variable.png\n",
":alt: Shows the words \"answer to everything\" in one box, with an arrow pointing to a box in the middle, from which another arrow points to the number 42 in the third box. The first box is labeled \"name,\" the second \"variable,\" and the third, \"value.\" \n",
":width: 500px\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "fb21619b-0b45-4159-b520-63c6f4f08952",
"metadata": {},
"source": [
"By running the previous line of code, we have created a _variable_ called `answer_to_everything`. We can use the variable to retrieve its value (for use in other parts of our program). Run the code below to see some output."
"By running the previous line of code, we have created a _variable_, which maps the name `answer_to_everything` to the value `42`. We can use the variable to retrieve its value (for use in other parts of our program). Run the code below to see some output."
]
},
{
Expand Down Expand Up @@ -164,7 +176,7 @@
"id": "76593f24-7ab4-48dd-a6a6-19d2b2016e13",
"metadata": {},
"source": [
"A misspelled variable causes Python to abort its computation. Imagine if conversation ground to a halt whenever one of the parties mispronounced a word or used a malapropism!\n",
"A misspelled variable name causes Python to abort its computation. Imagine if conversation ground to a halt whenever one of the parties mispronounced a word or used a malapropism!\n",
"\n",
"I tend to say that Python is extremely literal. But of course, this is merely an analogy, and a loose one. There is no room for metaphor in programming languages, at least, not as far as the computation itself is concerned. The operation of a language like Python is determined by the algorithms used to implement it. Given the same input and the same conditions of operation, a given Python program should produce the same output every time. (If it does not, that's usually considered a bug.)"
]
Expand Down Expand Up @@ -637,23 +649,33 @@
{
"cell_type": "markdown",
"id": "8b2c4e45-e87f-4c24-8f71-739f4b007180",
"metadata": {},
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"#### More tedious counting\n",
"\n",
"One way to construct such an analysis is as follows: represent your sample of text as a continuous string of characters. (As we've seen, that's easy to do in Python.) Then \"glue\" it to another string, representing the same text, but with every character shifted to the left by one position. For example, the first several characters of the first sentence from _Gargantua and Pantagruel_ would look like this:\n",
"\n",
"[image]\n",
"```{image} ./rabelais-1.png\n",
":alt: The text \"Most noble and illust\" is shown twice, one two consecutive lines, with each letter surrounded by a box. The second line is shifted to the left one character, so that the \"M\" of the first line appears above the \"o\" of the second line, etc.\n",
":width: 500px\n",
"\n",
"With the exception of the dangling left-most and right-most characters, you now have a pair of strings that yield, for each position, a pair of characters.\n",
"```\n",
"With the exception of the dangling left-most and right-most characters, you now have a pair of strings that yield, for each position, a pair of characters. In the image below, the first few successive pairs are shown, along with the position of each pair of characters with respect to the \"glued\" strings.\n",
"\n",
"[image with highlighting]\n",
"```{image} ./rabelais-2.png\n",
":alt: A table with the letters \"h,\" a space, \"o,\" \"e,\" and \"i\" along the top (column headers), and \"t,\" space, \"c,\" \"w,\" \"s,\" and \"g\" along the left-hand side (row labels), and numbers in the cells of the table. \n",
":width: 500px\n",
"\n",
"```\n",
"These pairs are called bigrams. But in order to construct a Markov chain, we're not just counting bigrams. Rather, we want to create what's called a _transition table_: a table where we can look up a given character -- the letter `e`, say -- and then for any other character that can follow `e`, find the frequency with which it occurs in that position (i.e., following an `e`). If a given character never follows another character, its bigram doesn't exist in the table. \n",
"\n",
"Below are shown the most common bigrams in such a transition table created on the basis of _Gargantua and Pantagruel_.\n",
"\n",
"[image]"
"Below are shown some of the most common bigrams in such a transition table created on the basis of _Gargantua and Pantagruel_.\n",
"```{image} ./bigram-table.png\n",
":alt: The text \"Most noble and illust\" is shown as above, with the addition of alternating yellow and blue highlighting to identify pairs of letters, and numbers along the bottom, starting at 0. \n",
":width: 300px\n",
"```"
]
},
{
Expand Down Expand Up @@ -975,7 +997,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": null,
"id": "f257d727-09ad-48c6-8939-de57e8e566a6",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -1051,11 +1073,17 @@
"\n",
"So instead of a table like this:\n",
"\n",
"[image]\n",
"```{image} ./bigram-table.png\n",
":alt: A table with the letters \"h,\" a space, \"o,\" \"e,\" and \"i\" along the top (column headers), and \"t,\" space, \"c,\" \"w,\" \"s,\" and \"g\" along the left-hand side (row labels), and numbers in the cells of the table. \n",
":width: 300px\n",
"```\n",
"\n",
"we have this:\n",
"we have this (where the `h`, `b`, and `w` in the row labels are all preceded by the space character):\n",
"\n",
"[image] \n",
"```{image} ./bigram-table-2.png\n",
":alt: A table with the letters \"e,\" \"a,\" space, \"i,\" \"o\" along the top (column headers), and \"th,\" space \"h\", space \"b\",\" \"er,\" and space \"w\" along the left-hand side (row labels), and numbers in the cells of the table. \n",
":width: 300px\n",
"```\n",
"\n",
"Note, however, that throughout these experiments, the level of approximation to any particular understanding of \"the English lexicon\" depends on the nature of the data from which we derive our frequencies. Urquhart's translation of Rabelais, dating from the 16th Century, has a rather distinctive vocabulary, as you might expect, even with the modernized spelling and grammar of the Project Gutenberg edition. \n",
"\n",
Expand All @@ -1064,7 +1092,7 @@
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": null,
"id": "410bc842-9823-4d44-8627-95c56fb40b08",
"metadata": {},
"outputs": [],
Expand All @@ -1079,20 +1107,20 @@
" max=max_value,\n",
" description='Set value of n:')\n",
" \n",
"def create_update_function(ttable, text, transition_function, slider):\n",
"def create_update_function(text, transition_function, slider):\n",
" '''\n",
" returns a callback function for use in updating the provided transition table with ngrams from text, given slider.value, as well as an output widget\n",
" for displaying the output of the callback\n",
" '''\n",
" output = widgets.Output()\n",
" def on_update(change):\n",
" with output:\n",
" nonlocal ttable\n",
" global ttable\n",
" ttable = transition_function(create_ngrams(text, slider.value))\n",
" print(f'Updated! Value of n is now {slider.value}.')\n",
" return on_update, output\n",
"\n",
"def create_generate_function(ttable, sample_function, slider):\n",
"def create_generate_function(sample_function, slider):\n",
" '''\n",
" returns a callback function for use in generating new random samples from the provided trasition table.\n",
" '''\n",
Expand All @@ -1118,10 +1146,11 @@
"metadata": {},
"outputs": [],
"source": [
"ttable = g_ttable\n",
"ngram_slider = create_slider()\n",
"update_callback, update_output = create_update_function(g_ttable, g_text_norm, create_transition_table, ngram_slider)\n",
"update_callback, update_output = create_update_function(g_text_norm, create_transition_table, ngram_slider)\n",
"update_button = create_button(\"Update table\", update_callback)\n",
"generate_callback, generate_output = create_generate_function(g_ttable, create_sample, ngram_slider)\n",
"generate_callback, generate_output = create_generate_function(create_sample, ngram_slider)\n",
"generate_button = create_button(\"New sample\", generate_callback)\n",
"display(ngram_slider, update_button, update_output, generate_button, generate_output)\n"
]
Expand Down Expand Up @@ -1194,7 +1223,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": null,
"id": "827fa854-204d-47e2-ae88-06f0b107c359",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -1238,11 +1267,11 @@
"metadata": {},
"outputs": [],
"source": [
"g_ttable_w = create_ttable_words(create_ngrams(g_text_words))\n",
"ttable = create_ttable_words(create_ngrams(g_text_words))\n",
"ngram_slider_w = create_slider()\n",
"update_callback_w, update_output_w = create_update_function(g_ttable_w, g_text_words, create_ttable_words, ngrams_slider_w)\n",
"update_callback_w, update_output_w = create_update_function(g_text_words, create_ttable_words, ngram_slider_w)\n",
"update_button_w = create_button(\"Update table\", update_callback_w)\n",
"generate_callback_w, generate_output_w = create_generate_function(g_ttable_w, create_sample_words)\n",
"generate_callback_w, generate_output_w = create_generate_function(create_sample_words, ngram_slider_w)\n",
"generate_button_w = create_button(\"New sample\", generate_callback_w)\n",
"display(ngram_slider_w, update_button_w, update_output_w, generate_button_w, generate_output_w)\n"
]
Expand Down
Loading

0 comments on commit ac3528f

Please sign in to comment.