diff --git a/grain_size_tools/example_notebooks/start.ipynb b/grain_size_tools/example_notebooks/start.ipynb index 98c578a..e31da33 100644 --- a/grain_size_tools/example_notebooks/start.ipynb +++ b/grain_size_tools/example_notebooks/start.ipynb @@ -8,16 +8,40 @@ "\n", "> IMPORTANT NOTE: This Jupyter notebook example only applies to GrainSizeTools v3.0+ Please check your script version before using this notebook. You will be able to reproduce all the results shown in this tutorial using the dataset provided with the script, the ```file data_set.txt```\n", "\n", - "## Running the script\n", + "## Running the script in Jupyter lab/notebooks\n", "\n", "The first step is to execute the code to get all the functionalities. Jupyter lab (or Jupyter notebooks) allows you to run any code using the following code snippet: ``%run + the Python file to run``. In this case you must set the full filepath that indicates where the file ``GrainSizeTools_script.py`` is located in your system. If the script was executed correctly you will see that all GrainSizeTools (GST) modules have been loaded correctly and a welcome message as follows:" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 1, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "module plot imported\n", + "module averages imported\n", + "module stereology imported\n", + "module piezometers imported\n", + "module template imported\n", + "\n", + "======================================================================================\n", + "Welcome to GrainSizeTools script\n", + "======================================================================================\n", + "A free open-source cross-platform script to visualize and characterize grain size\n", + "population and estimate differential stress via paleopizometers.\n", + "\n", + "Version: v3.0RC0 (2020-04-23)\n", + "Documentation: https://marcoalopez.github.io/GrainSizeTools/\n", + "\n", + "Type get.functions_list() to get a list of the main methods\n", + "\n" + ] + } + ], "source": [ "%run C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/GrainSizeTools_script.py" ] @@ -26,12 +50,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As indicated in the welcome message, we can get a list of the main methods at any time by typing in the console:" + "---\n", + "\n", + "## Get information on the GrainSizeTools methods\n", + "\n", + "First, to get a list of the main methods type" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -69,6 +97,78 @@ "get.functions_list()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The script is implemented around several modules. To access a method within a module you will have to type the name of the module and then, separated by a dot, the name of the method.For example to access the method ``qq_plot`` of the plot module you should write:\n", + "\n", + "```python\n", + "plot.qq_plot()\n", + "```\n", + "and the provide the required parameters within the parenthesis.\n", + "\n", + "To access the methods within a module, type the module name plus the dot and hit the tab key and a complete list of methods will pop up.\n", + "\n", + "### Get detailed information on methods\n", + "\n", + "You can get detailed information about any method or function of the script in different ways. The first is through the console using the character ? before the method" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\u001b[1;31mSignature:\u001b[0m \u001b[0mconf_interval\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mconfidence\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m0.95\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;31mDocstring:\u001b[0m\n", + "Estimate the confidence interval using the t-distribution with n-1\n", + "degrees of freedom t(n-1). This is the way to go when sample size is\n", + "small (n < 30) and the standard deviation cannot be estimated accurately.\n", + "For large datasets, the t-distribution approaches the normal distribution.\n", + "\n", + "Parameters\n", + "----------\n", + "data : array-like\n", + " the dataset\n", + "\n", + "confidence : float between 0 and 1, optional\n", + " the confidence interval, default = 0.95\n", + "\n", + "Assumptions\n", + "-----------\n", + "the data follows a normal or symmetric distrubution (when sample size\n", + "is large)\n", + "\n", + "call_function(s)\n", + "----------------\n", + "Scipy's t.interval\n", + "\n", + "Returns\n", + "-------\n", + "the arithmetic mean, the error, and the limits of the confidence interval\n", + "\u001b[1;31mFile:\u001b[0m c:\\users\\marco\\documents\\github\\grainsizetools\\grain_size_tools\\grainsizetools_script.py\n", + "\u001b[1;31mType:\u001b[0m function\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "?conf_interval" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another option in Jupyter's lab is to get the information interactively without having to call the help from the console. To do this, right-click on the mouse and select \"Show Context Help\" from the menu. Now, every time you write a method in the interactive console, all the information will automatically appear in the \"Contextual help\" window. In this case, you may prefer to rearrange the windows using drag and drop so that you can see the notebook and the contextual help in parallel." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -95,7 +195,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -322,7 +422,7 @@ "[2661 rows x 11 columns]" ] }, - "execution_count": 10, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -352,7 +452,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -361,7 +461,7 @@ "pandas.core.frame.DataFrame" ] }, - "execution_count": 11, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -398,7 +498,7 @@ "dataset = pd.read_csv(get_filepath(), sep='\\t')\n", "```\n", "\n", - "Lastly, Pandas also allows to directly import tabular data from the clipboard (i.e. data copied using copy-paste commands). For example, after copying the table from a text/excel file or a website using: \n", + "Lastly, Pandas also allows to directly import tabular data from the clipboard (i.e. data copied using copy-paste commands). For example, after copying the table from a text file, excel spreadsheet or a website using: \n", "\n", "```python\n", "dataset = pd.read_clipboard()\n", @@ -412,7 +512,491 @@ "metadata": {}, "source": [ "## Basic tabular data (Pandas) manipulation\n", - "\n" + "\n", + "Let's first see how the data set looks like. For this you can call the variable (as in the example before) or use the ``head()`` and ``tail()`` methods so that it only shows us the first (or last) rows of the data set" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AreaCirc.FeretFeretXFeretYFeretAngleMinFeretARRoundSolidity
01157.250.68018.0621535.00.5131.63413.5001.1010.9080.937
122059.750.77162.097753.516.5165.06946.6971.3140.7610.972
231961.500.84257.871727.065.071.87846.9231.1390.8780.972
345428.500.709114.6571494.583.519.62063.4491.8960.5280.947
45374.000.69929.2622328.034.033.14716.0001.5150.6600.970
\n", + "
" + ], + "text/plain": [ + " Area Circ. Feret FeretX FeretY FeretAngle MinFeret AR \\\n", + "0 1 157.25 0.680 18.062 1535.0 0.5 131.634 13.500 1.101 \n", + "1 2 2059.75 0.771 62.097 753.5 16.5 165.069 46.697 1.314 \n", + "2 3 1961.50 0.842 57.871 727.0 65.0 71.878 46.923 1.139 \n", + "3 4 5428.50 0.709 114.657 1494.5 83.5 19.620 63.449 1.896 \n", + "4 5 374.00 0.699 29.262 2328.0 34.0 33.147 16.000 1.515 \n", + "\n", + " Round Solidity \n", + "0 0.908 0.937 \n", + "1 0.761 0.972 \n", + "2 0.878 0.972 \n", + "3 0.528 0.947 \n", + "4 0.660 0.970 " + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dataset.head() # returns 5 rows by default, you can define any number within the parenthesis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our dataset (aka the Dataframe) has 10 different columns. To interact with one of the columns we must call its name in square brackets with the name in quotes as follows" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 314.5\n", + "1 4119.5\n", + "2 3923.0\n", + "3 10857.0\n", + "4 748.0\n", + " ... \n", + "2656 905.0\n", + "2657 2162.5\n", + "2658 1027.0\n", + "2659 555.5\n", + "2660 1450.0\n", + "Name: Area, Length: 2661, dtype: float64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get the column 'Area' and multiplied by two\n", + "dataset['Area'] * 2" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AreaCirc.FeretFeretXFeretYFeretAngleMinFeretARRoundSolidity
0157.250.68018.0621535.00.5131.63413.5001.1010.9080.937
12059.750.77162.097753.516.5165.06946.6971.3140.7610.972
21961.500.84257.871727.065.071.87846.9231.1390.8780.972
35428.500.709114.6571494.583.519.62063.4491.8960.5280.947
4374.000.69929.2622328.034.033.14716.0001.5150.6600.970
.................................
2656452.500.78928.5041368.01565.5127.87522.5001.2350.8100.960
26571081.250.75647.9091349.51569.5108.24631.3631.4460.6920.960
2658513.500.72032.9621373.01586.0112.28620.4961.4930.6700.953
2659277.750.62729.4361316.01601.5159.10217.0021.7270.5790.920
2660725.000.74839.4371335.51615.5129.34128.0251.3510.7400.960
\n", + "

2661 rows × 10 columns

\n", + "
" + ], + "text/plain": [ + " Area Circ. Feret FeretX FeretY FeretAngle MinFeret AR \\\n", + "0 157.25 0.680 18.062 1535.0 0.5 131.634 13.500 1.101 \n", + "1 2059.75 0.771 62.097 753.5 16.5 165.069 46.697 1.314 \n", + "2 1961.50 0.842 57.871 727.0 65.0 71.878 46.923 1.139 \n", + "3 5428.50 0.709 114.657 1494.5 83.5 19.620 63.449 1.896 \n", + "4 374.00 0.699 29.262 2328.0 34.0 33.147 16.000 1.515 \n", + "... ... ... ... ... ... ... ... ... \n", + "2656 452.50 0.789 28.504 1368.0 1565.5 127.875 22.500 1.235 \n", + "2657 1081.25 0.756 47.909 1349.5 1569.5 108.246 31.363 1.446 \n", + "2658 513.50 0.720 32.962 1373.0 1586.0 112.286 20.496 1.493 \n", + "2659 277.75 0.627 29.436 1316.0 1601.5 159.102 17.002 1.727 \n", + "2660 725.00 0.748 39.437 1335.5 1615.5 129.341 28.025 1.351 \n", + "\n", + " Round Solidity \n", + "0 0.908 0.937 \n", + "1 0.761 0.972 \n", + "2 0.878 0.972 \n", + "3 0.528 0.947 \n", + "4 0.660 0.970 \n", + "... ... ... \n", + "2656 0.810 0.960 \n", + "2657 0.692 0.960 \n", + "2658 0.670 0.953 \n", + "2659 0.579 0.920 \n", + "2660 0.740 0.960 \n", + "\n", + "[2661 rows x 10 columns]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Remove the column without a name from the DataFrame\n", + "dataset.drop(' ', axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some things to try (just copy-paste in interactive cells below and explore):\n", + "\n", + "```python\n", + "dataset.mean() # estimate the mean for all columns\n", + "dataset['Area'].mean() # estimate the mean only for the column Area\n", + "dataset.std() # estimate the (Bessel corrected) standard deviation\n", + "dataset.dropna() # remove missing values from the data\n", + "dataset.describe() # generate descriptive statistics\n", + "dataset.info() # display info of the DataFrame\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + " 1331.000000\n", + "Area 1213.823750\n", + "Circ. 0.730233\n", + "Feret 44.808749\n", + "FeretX 1533.019542\n", + "FeretY 764.684517\n", + "FeretAngle 90.622313\n", + "MinFeret 31.016330\n", + "AR 1.451459\n", + "Round 0.719253\n", + "Solidity 0.943795\n", + "dtype: float64" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# estimate the mean of all columns\n", + "np.mean(dataset) # alternatively you can use dataset.mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "dataset[['Circ.', 'Round', 'Solidity']].boxplot()" ] }, {