diff --git a/DOCS/_Paleopizometry.md b/DOCS/_Paleopizometry.md index b745297..ab1caaa 100644 --- a/DOCS/_Paleopizometry.md +++ b/DOCS/_Paleopizometry.md @@ -14,16 +14,18 @@ For the first requirement, the GrainSizeTools script includes common mineral pha For the second requirement, the function will automatically convert the equivalent circular diameter to linear intercepts where applicable using de Hoff and Rhines (1968) correction. This is, **you don't have to worry about whether the piezometer was originally calibrated using linear intercepts**, always use the equivalent circular diameters in microns. -The third requirement is key for a correct estimation of the differential stress since each paleopiezometer was calibrated for a specific average grain size (e.g. the arithmetic mean, median, or RMS mean) and, hence, **only provides valid results if the same type of average is used**. Also, **you should not use any type of stereological correction for the estimation of the average grain size**, if the author(s) of the piezometer used any type of stereological correction, the average grain size will be automatically corrected by the function. +The third requirement is key for a correct estimation of the differential stress since each paleopiezometer was calibrated for a specific average grain size (e.g. the arithmetic mean, median or RMS mean) and, hence, **only provides valid results if the same type of average is used**. Also, **you should not use any type of stereological correction for the estimation of the average grain size**, if the author(s) of the piezometer used any type of stereological correction, the average grain size will be automatically corrected by the function. The fourth requirement means that the user has to decide whether to correct or not the estimate of the differential stress for plane stress using the correction factor proposed by Paterson and Olgaard (2000). The rationale for this is that the experiments designed to calibrate piezometers are mainly performed in uniaxial compression while natural shear zones approximately behave as plane stress volumes. -Below are examples of how to obtain information about the different piezometers and define these parameters. +In the next subsection, we will show examples of how to obtain information about the different piezometers and define these parameters. ## Get information on piezometric relations -Table 1 provides a list of the all piezometric relations currently available in the GrainSizeTools script with features (the type of average to use and the DRX mechanism) and references. The experimentally-derived parameters are provided in Tables 2 to 5. Besides, you can get information from the console on the different available piezometric relations just by typing ``piezometers.*()``, where * is the mineral phase, either ``quartz``, ``calcite``, ``olivine``, or ``feldspar``. For example: +Table 1 provides a list of the all piezometric relations currently available in the GrainSizeTools script with features (the type of average to use and the DRX mechanism) and references. The experimentally-derived parameters are provided in Tables 2 to 5. + +Besides, you can get information interactively on the different available piezometric relations from the console just by typing ``piezometers.*()``, where * is the mineral phase, either ``quartz``, ``calcite``, ``olivine``, or ``feldspar``. For example: ```python piezometers.quartz() @@ -41,7 +43,7 @@ Available piezometers: 'Twiss' ``` -Also, if you want to obtain the complete information of a specific piezometer you can do it in the following way: +If you want to get the details of a particular piezometric relationship you can do so as follows. Remember that the relationship between recrystallized grain size and differential stress is ***σ~d~ = Bg^-m^*** where σ~d~ and g are the differential stress and the average grain size respectively. ```python piezometers.quartz('Twiss') @@ -50,12 +52,19 @@ piezometers.quartz('Twiss') ``` (550, 0.68, - 'Ensure that you entered the apparent grain size as the arithmeic mean grain size', + 'Ensure that you entered the apparent grain size as the arithmetic mean grain size', True, 1.5) ``` -where...TODO +Note the five different outputs separated by commas which correspond with: +- the constant *B* of the piezometric relation +- the exponent *m* of the piezometric relation +- A warning indicating the average to use with this piezometric relation +- An indication of whether the piezometric relation was calibrated using linear intercepts (if ``False`` the piezometric relation was calibrated using equivalent circular diameters. +- The stereological correction factor used (if applicable). If ``False``, no stereological correction applies. + + **Table 1.** Relation of piezometers (in alphabetical order) and the apparent grain size required to obtain meaningful differential stress estimates @@ -134,12 +143,79 @@ where...TODO -## Using the ``calc_diffstress()`` function +## Estimate differential stress using the ``calc_diffstress()`` function + +Let us first look at the documentation of the: + +```python +?calc_diffstress + +Signature: calc_diffstress(grain_size, phase, piezometer, correction=False) +Docstring: +Apply different piezometric relations to estimate the differential +stress from average apparent grain sizes. The piezometric relation has +the following general form: + +df = B * grain_size**-m + +where df is the differential stress in [MPa], B is an experimentally +derived parameter in [MPa micron**m], grain_size is the aparent grain +size in [microns], and m is an experimentally derived exponent. + +Parameters +---------- +grain_size : positive scalar or array-like + the apparent grain size in microns + +phase : string {'quartz', 'olivine', 'calcite', or 'feldspar'} + the mineral phase + +piezometer : string + the piezometric relation + +correction : bool, default False + correct the stress values for plane stress (Paterson and Olgaard, 2000) + + References +----------- +Paterson and Olgaard (2000) https://doi.org/10.1016/S0191-8141(00)00042-0 +de Hoff and Rhines (1968) Quantitative Microscopy. Mcgraw-Hill. New York. + +Call functions +-------------- +piezometers.quartz +piezometers.olivine +piezometers.calcite +piezometers.albite + +Assumptions +----------- +- Independence of temperature (excepting Shimizu piezometer), total strain, +flow stress, and water content. +- Recrystallized grains are equidimensional or close to equidimensional when +using a single section. +- The piezometer relations requires entering the grain size as "average" +apparent grain size values calculated using equivalent circular diameters +(ECD) with no stereological correction. See documentation for more details. +- When required, the grain size value will be converted from ECD to linear +intercept (LI) using a correction factor based on de Hoff and Rhines (1968): +LI = (correction factor / sqrt(4/pi)) * ECD +- Stress estimates can be corrected from uniaxial compression (experiments) +to plane strain (nature) multiplying the paleopiezometer by 2/sqrt(3) +(Paterson and Olgaard, 2000) + +Returns +------- +The differential stress in MPa (a float) +File: c:\users\marco\documents\github\grainsizetools\grain_size_tools\grainsizetools_script.py +Type: function + +``` -The ``calc_diffstress`` requires three (obligatory) inputs: (1) the average grain size **in microns**, (2) the mineral phase, and (3) the piezometric relation to use. We provide few examples below: +As indicated in the documentation, the ``calc_diffstress()`` requires three (obligatory) inputs: (1) the average grain size in microns, (2) the mineral phase, and (3) the piezometric relation to use. We provide a few examples below: ```python -calc_diffstress(12.0, phase='quartz', piezometer='Twiss') +calc_diffstress(12, phase='quartz', piezometer='Twiss') ``` ``` @@ -147,18 +223,16 @@ calc_diffstress(12.0, phase='quartz', piezometer='Twiss') differential stress = 83.65 MPa INFO: -Ensure that you entered the apparent grain size as the arithmeic mean grain size +Ensure that you entered the apparent grain size as the arithmetic mean grain size ECD was converted to linear intercepts using de Hoff and Rhines (1968) correction -============================================================================ +=========================================================================== ``` -TODO - - +The function returns the differential stress (in MPa) plus some relevant information about the corrections made and the type of average expected. Most piezometric calibrations were calibrated using uniaxial compression deformation experiments while in nature most shear zones approximately behaves as plane stress. Due to this, it may be necessary to correct the differential stress value. The ``calc_diffstress()`` allows you to apply the correction proposed by Paterson and Olgaard (2000) for this as follows (note the slightly different value of differential stress): ```python # Apply the same piezometric relation but correct the estimate for plane stress -calc_diffstress(12.0, phase='quartz', piezometer='Twiss', correction=True) +calc_diffstress(12, phase='quartz', piezometer='Twiss', correction=True) ``` ``` @@ -166,19 +240,57 @@ calc_diffstress(12.0, phase='quartz', piezometer='Twiss', correction=True) differential stress = 96.59 MPa INFO: -Ensure that you entered the apparent grain size as the arithmeic mean grain size +Ensure that you entered the apparent grain size as the arithmetic mean grain size ECD was converted to linear intercepts using de Hoff and Rhines (1968) correction ============================================================================ ``` Note that the stress estimate is a bit different compare to the value without the correction. +Some paleopiezometers require uncommon averages such as the root mean square or RMS, for example: + +```python +piezometers.quartz('Stipp_Tullis') + +(669.0, + 0.79, + 'Ensure that you entered the apparent grain size as the root mean square (RMS)', + False, + False) +``` + +In this case you should estimate the RMS as +$RMS = \sqrt{\dfrac{1}{n} (x_{1}^2 + x_{2}^2 + ... + x_{n}^2)}$ + +```python +# Import the example dataset +filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA/data_set.txt' +dataset = pd.read_csv(filepath, sep='\t') +dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi) # estimate ECD + +# estimate the root mean squared +rms = np.sqrt(np.mean(dataset['diameters']**2)) # note that in Python the exponent operator is ** (as in Fortran) not ^ (as in Matlab) + +calc_diffstress(rms, phase='quartz', piezometer='Stipp_Tullis') +``` + +``` +============================================================================ +differential stress = 36.79 MPa +INFO: +Ensure that you entered the apparent grain size as the root mean square (RMS) +============================================================================ +``` -You can pass as input an array of grain size values instead of a scalar value, in this case the function will returns an array of values + + +## Estimation of the differential stress using arrays of values + +Alternatively, you can use (NumPy) arrays as input to estimate several differential stresses at once. In this case, the ``calc_diffstress()`` function will return a NumPy array, so it is generally more useful to store it in a variable as in the example below. ```python -ameans = np.array([12.23, 13.71, 12.76, 11.73, 12.69, 10.67]) +ameans = np.array([12.23, 13.71, 12.76, 11.73, 12.69, 10.67]) # a set of average grain size values estimates = calc_diffstress(ameans, phase='olivine', piezometer='VanderWal_wet') estimates ``` @@ -193,12 +305,50 @@ Differential stresses in MPa array([167.41, 153.66, 162.16, 172.73, 162.83, 185.45]) ``` -For example, the piezometer relation of Stipp and Tullis (2003) requires entering the grain size as *the root mean square (RMS) using equivalent circular diameters with no stereological correction*, and so on. Table 1 show all the implemented piezometers in GrainSizeTools v3.0+ and the apparent grain size required for each one. Despite some piezometers were originally calibrated using linear intercepts (LI), the script will always require entering a specific grain size average measured as equivalent circular diameters (ECD). The script will automatically approximate the ECD value to linear intercepts using the De Hoff and Rhines (1968) empirical relation. Also, the script takes into account if the authors originally used a specific correction factor for the grain size. For more details on the piezometers and the assumption made using the command ```help()``` in the console as follows: +If the set of estimated values belongs to the same structural element (e.g. different areas of the same mylonite or different rocks within the same shear zone), you may want to estimate the average differential stress from all the data. The GrainSizeTools script provides a method named ``conf_interval()`` for this. ```python -help(calc_diffstress) +?conf_interval + +Signature: conf_interval(data, confidence=0.95) +Docstring: +Estimate the confidence interval using the t-distribution with n-1 +degrees of freedom t(n-1). This is the way to go when sample size is +small (n < 30) and the standard deviation cannot be estimated accurately. +For large datasets, the t-distribution approaches the normal distribution. + +Parameters +---------- +data : array-like + the dataset + +confidence : float between 0 and 1, optional + the confidence interval, default = 0.95 + +Assumptions +----------- +the data follows a normal or symmetric distrubution (when sample size +is large) + +call_function(s) +---------------- +Scipy's t.interval + +Returns +------- +the arithmetic mean, the error, and the limits of the confidence interval +File: c:\users\marco\documents\github\grainsizetools\grain_size_tools\grainsizetools_script.py +Type: function +``` -# alternatively in Jupyterlab: -?calc_diffstress +```python +conf_interval(estimates) +``` + +``` +Mean = 167.37 ± 11.41 +Confidence set at 95.0 % +Max / min = 178.79 / 155.96 +Coefficient of variation = ±6.8 % ``` diff --git a/DOCS/_Plot_module.md b/DOCS/_Plot_module.md index 3aea4c0..ce7f1ef 100644 --- a/DOCS/_Plot_module.md +++ b/DOCS/_Plot_module.md @@ -1,10 +1,10 @@ # The plot module: visualizing grain size distributions -The plot module allows several visualizations of the grain size distribution. All methods of the *plot* module can be invoked by writing ```plot.*```, where * refers to the plot to be used. +The plot module includes a series of plots to visualize and characterize grain size populations. All methods of the *plot* module can be invoked by writing ```plot.*```, where * refers to the plot to be used. -> 👉 If you write ``plot.`` and then press the tab key a menu will pop up with all the methods implemented in the plot module +> 👉 If you write ``plot.`` and then hit the tab key a menu will pop up with all the methods implemented in the module -The main method is named ```plot.distribution()``` and it allows to visualize the grain size population through the histogram and/or the kernel density estimate (KDE), as well as the location of the different averages in the distribution (Fig. 7). To use it we call this function and pass as an argument the population of grain sizes as follows: +The main method is ```plot.distribution()```. The method allows to visualize the grain size population using the histogram and/or the kernel density estimate (KDE) and provides the location of the different averages (Fig 1). The simplest example of use would be to pass the column with the diameters as follows: ```python plot.distribution(dataset['diameters']) @@ -20,7 +20,7 @@ plot.distribution(dataset['diameters']) ![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/new_distribution.png?raw=true) -*Figure 7. The ```plot.distribution()``` return with default options. This shows the histogram, the kernel density estimate (KDE) of the distribution, and the location of the different averages* +*Figure 1. The ```plot.distribution()``` plot with default options. This shows the histogram and the kernel density estimate (KDE) of the distribution, and the location of the averages estimated by the function ``summarize`` by default* The method returns a plot, the number of classes and bin size of the histogram, and the bandwidth (or kernel) of the KDE. The ```plot.distribution()``` method contains different options that we will commented on in turn: @@ -111,9 +111,36 @@ Note, however, that the bandwidth affects the location of the KDE-based mode. Fo -### The area-weighted distribution +## Testing lognormality -The plot module also allows plotting the area-weighted distribution of grain sizes using the function ``area_weighted()``. This function also returns some basic statistics such as the area-weighted mean and the histogram features. For example: +Sometimes can be helpful to test whether the data follows or deviates from a lognormal distribution. For example, to find out if the dataset is suitable for applying the two-step stereological method or which confidence interval method is best. The script uses two methods to test whether the distribution of grain size follows a lognormal distribution. One is a visual method named [quantile-quantile (q-q) plots]([https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot](https://en.wikipedia.org/wiki/Q–Q_plot)) and the other is a quantitative test named the [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro–Wilk_test). For this we use the GrainSizeTools function ```test_lognorm``` as follows : + +```python +plot.qq_plot(dataset['diameters'], figsize=(6, 5)) +``` + +``` +======================================= +Shapiro-Wilk test (lognormal): +0.99, 0.01 (test statistic, p-value) +It doesnt look like a lognormal distribution (p-value < 0.05) +(╯°□°)╯︵ ┻━┻ +======================================= +``` + +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/new_qqplot.png?raw=true) + +*Figure X. The q-q plot of the test dataset. Note that the distribution of apparent grain sizes deviates from the logarithmic at the ends*. + +The Shapiro-Wilk test returns two different values, the test statistic and the p-value. This test considers the distribution to be lognormally distributed when the p-value is greater than 0.05. The q-q plot is a visual test that when the points fall right onto the reference line it means that the grain size values are lognormally distributed. The q-q plot has the advantage over the Shapiro-Wilk test that it shows where the distribution deviates from lognormality. + +> 👉 To know more about the q-q plot see https://serialmentor.com/dataviz/ + + + +## The area-weighted distribution + +The plot module also allows plotting the area-weighted distribution of grain sizes using the function ``area_weighted()``. This function also returns some basic statistics such as the area-weighted mean and the histogram features. For example: ```python plot.area_weighted(dataset['diameters'], dataset['Area']) @@ -137,50 +164,42 @@ plot.area_weighted(dataset['diameters'], dataset['Area']) > 👉 ***When to use and not to use the area-weighted approach?*** > -> You **should not use** the area-weighted mean for the calibration of paleopiezometers or for the comparison of grain size populations, as this is a poorly optimised central tendency measure ([Lopez-Sanchez, 2020](https://doi.org/10.1016/j.jsg.2020.104042)). On the other hand, the area-weighted distribution is useful to visualize...TODO +> You **should not use** the area-weighted mean for the calibration of paleopiezometers or for the comparison of grain size populations, as this is a poorly optimised central tendency measure ([Lopez-Sanchez, 2020](https://doi.org/10.1016/j.jsg.2020.104042)). On the other hand, the area-weighted distribution is useful to visualize the coarser size range, since in number-weighted distributions these sizes are diluted but can represent a significant area or volume. -### Testing lognormality -Sometimes can be helpful to test whether the data follows or deviates from a lognormal distribution. For example, to find out if the data set is suitable for applying the two-step stereological method or which confidence interval method is best for the arithmetic mean. The script use two methods to test whether the distribution of grain size follows a lognormal distribution. One is a visual method named [quantile-quantile (q-q) plots]([https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot](https://en.wikipedia.org/wiki/Q–Q_plot)) and the other is a quantitative test named the [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro–Wilk_test). For this we use the GrainSizeTools function ```test_lognorm``` as follows : + +### Normalized grain size distributions + +Normalized grain size distributions are representations of the entire grain population standardized using an average of the population, usually the arithmetic mean or the median. The advantage of normalized distribution is that it allows the comparison of grain size distribution with different average grain sizes. For example, to check whether two or more grain size distributions have similar shapes we can compare their standard deviations (SD) or their interquartile ranges (IQR). In this case, the method `plot.normalized()` display the distribution on a logarithmic scale and provides the SD or IQR of the normalized population depending on the chosen normalizing factor. ```python -plot.qq_plot(dataset['diameters']) +plot.normalized(dataset['diameters'], avg='amean') ``` ``` ======================================= -Shapiro-Wilk test (lognormal): -0.99, 0.03 (test statistic, p-value) -It doesnt look like a lognormal distribution (p-value < 0.05) -(╯°□°)╯︵ ┻━┻ +Normalized SD = 0.165 +KDE bandwidth = 0.04 ======================================= ``` -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/new_qqplot.png?raw=true) - -*Figure X. The q-q plot of the test dataset. Note that the distribution of apparent grain sizes deviates from the logarithmic at the ends*. - -The Shapiro-Wilk test will return two different values, the test statistic and the p-value. This test considers the distribution to be lognormally distributed when the p-value is greater than 0.05. The q-q plot is a visual test that when the points fall right onto the reference line it means that the grain size values are lognormally distributed. The q-q plot has the advantage over the Shapiro-Wilk test that it shows where the distribution deviates from lognormality. - -> 👉 To know more about the q-q plot see https://serialmentor.com/dataviz/ - +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/new_normalized.png?raw=true) +*Figure X. KDE of the log-transformed grain size distribution normalized to the arithmetic mean (note that amean = 1).* -### Normalized grain size distributions - -Normalized grain size distributions are representations of the entire grain population standardized using an average, usually the arithmetic mean or median. The advantage of normalized distributions is that they allow comparison of grain size distribution when the average grain size between distributions differs significantly. For example, to check whether two or more grain size distributions have similar shapes we can compare their standard deviations (SD) or their interquartile ranges (IQR). In this case, the method shows the normalized distribution on a logarithmic scale and provides the SD or IQR of the normalized population depending on the chosen normalizing factor. +Let's play by changing some of the function parameters. In this case, we are going to establish the median as an alternative normalization factor, and we are also going to smooth the kernel density estimator by increasing the value from 0.04 (estimated according to the Silverman rule) to 0.1. Also, we will set the appearance of the figure using the figsize parameter, where the values within the parentheses are the (width, height) in inches. ```python -plot.normalized(dataset['diameters'], avg='amean') +plot.normalized(dataset['diameters'], avg='median', bandwidth=0.1, figsize=(6, 5)) ``` ``` ======================================= -Normalized SD = 0.165 -KDE bandwidth = 0.04 +Normalized IQR = 0.221 +KDE bandwidth = 0.1 ======================================= ``` -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/new_normalized.png?raw=true) +![]() -*Figure X. KDE of the log-transformed grain size distribution normalized to the arithmetic mean (note that amean = 1).* \ No newline at end of file +Note that in this case, the method returns the normalized inter-quartile range (IQR) rather than the normalized standard deviation. Also, note that the kernel density estimate appears smoother resembling an almost perfect normal distribution. \ No newline at end of file diff --git a/DOCS/_Stereology_module.md b/DOCS/_Stereology_module.md index 0b0beba..f2d904b 100644 --- a/DOCS/_Stereology_module.md +++ b/DOCS/_Stereology_module.md @@ -1,9 +1,20 @@ # The stereology module -TODO +The main purpose of stereology is to extract quantitative information from microscope images. It is a set of mathematical methods relating two-dimensional measures obtained on sections to three-dimensional parameters defining the structure. Note that the aim of stereology is not to reconstruct the 3D geometry of the material (as in tomography) but to estimate a particular 3D feature. In this particular case, to approximate the actual (3D) grain size distribution from the apparent (2D) grain size distribution obtained in sections. In this particular case, to approximate the actual (3D) grain size distribution from the apparent (2D) grain size distribution obtained in sections. + +GrainSizeTools script includes two stereological methods: 1) the Saltykov method, and 2) the two-step method. Before looking at its functionalities, applications and limitations, let's import the example dataset. + +```python +# Import the example dataset +filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA/data_set.txt' +dataset = pd.read_csv(filepath, sep='\t') +dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi) # estimate ECD +``` ## The Saltykov method +TODO: explain functionalities, applications and limitations + ```python stereology.Saltykov(dataset['diameters'], numbins=11, calc_vol=50) ``` @@ -15,11 +26,13 @@ bin size = 14.24 ======================================= ``` +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/saltykov.png?raw=true) + ## The two-step method -TODO +TODO: functionalities, applications and limitations ```python stereology.calc_shape(dataset['diameters']) @@ -34,3 +47,4 @@ Geometric mean (scale) = 36.05 ± 1.27 ======================================= ``` +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/2step.png?raw=true) \ No newline at end of file diff --git a/DOCS/_data_import_handling.md b/DOCS/_data_import_handling.md deleted file mode 100644 index 0e72570..0000000 --- a/DOCS/_data_import_handling.md +++ /dev/null @@ -1,166 +0,0 @@ -# Import and handling of (tabular) data - -## Using the Spyder data importer - -If you are in Spyder, the easiest way to import data is through the Spyder data importer. To do this, select the variable browser and then click on the import data icon in the upper left (Fig. 3). A new window will pop up showing different import options (Fig. 4). - -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/new_variable_explorer.png?raw=true) - -*Figure 3. The variable explorer window in Spyder. Note that the variable explorer label is selected at the bottom (indicated with an arrow). To launch the data importer click on the top left icon (indicated by a circle).* - -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/import_data.png?raw=true) - -*Figure 4. The two-step process for importing data with the Spyder data importer. At left, the first step where the main options are the (1) the variable name, (2) the type of data to import (set to data), (3) the column separator (set to Tab), and (4) the rows to skip (set to 0 as this assumes that the first row is the column names). At right, the second step where you can preview the data and set the variable type. In this case, choose import as DataFrame, which is the best choice for tabular data.* - -Once you press "Done" (in the bottom right) the dataset will appear within the variable explorer window as shown in figure 3. Note that it provides information about the type of variable (a Dataframe), the number of rows and columns (2661 x 11), and the column names. If you want to get more details or edit something, double-click on this variable and a new window will pop up (Fig. 5). Also, you can do a right-click on this variable and several options will appear. - -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/variable_explorer02.png?raw=true) - -*Figure 5. Representation of the dataset in the Spyder variable explorer. Note that the colours change with values.* - -> 👉 More info on the Spyder variable explorer here: https://docs.spyder-ide.org/variableexplorer.html - - - ---- - - - -## Importing tabular data using the console - -An alternative option is to import the data using the console. For this, [Pandas](https://pandas.pydata.org/about/index.html) is the de facto standard Python library for data analysis and manipulation of tabular datasets (CSV, excel or text files among others). The library includes several tools for reading files and handling of missing data. Also, when running the GrainSizeTools script pandas is imported as ```pd``` for its general use. - -All Pandas methods to read data are all named ```pd.read_*``` where * is the file type. For example: - -```python -pd.read_csv() # read csv or txt files, default delimiter is ',' -pd.read_table() # read general delimited file, default delimiter is '\t' (TAB) -pd.read_excel() # read excel files -pd.read_html() # read HTML tables -... # etc. -``` - -> 👉 For other supported file types see https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html - -The only mandatory argument for the reading methods is to define the path (local or URL) with the location of the file to be imported. For example: - - -```python -# read file to create a Pandas DataFrame (i.e. a table) -# note that the file path is within quotes (either single or double) -dataset = pd.read_table('DATA/data_set.txt') -``` - -Pandas' reading methods give you a lot of control over how a file is read. To keep things simple, we list here the most commonly used options: - -```python -sep # The delimiter to use (alternatively you can also use the word delimiter) -header # Row number(s) to use as the column names. By default it takes the first row as the column names (header=0). If there is no columns names in the file you must set header=None -skiprows # Number of lines to skip at the start of the file (an integer). -na_filter # Detect missing value markers. False by default. -sheet_name # Only for excel files, the excel sheet name either a number or the full name of the sheet. - -``` - -An example using several optional arguments might be: - -```python -dataset = pd.read_csv('DATA/data_set.csv', sep=';', skiprows=5, na_filter=True) -``` - -which in plain language means that we are importing a ``csv`` file named ``data_set`` that is located in the folder ``DATA``. The data is delimited by a semicolon and we ignore the first five lines of the file (*i.e.* column names are supposed to appear in the sixth row). Last, we want all missing values to be handled during the import. - -> 👉 more details on Pandas csv read method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html - -The GrainSizeTools script includes an own method named ```get_filepath()``` to get the path of the file through a file selection dialog instead of directly writing it. This can be used in two ways: - -```python -# store the path in a variable (here named filepath for convenience) and then use it when calling the read method -filepath = get_filepath() -dataset = pd.read_csv(filepath, sep='\t') - -# use get_filepath() directly within the read method -dataset = pd.read_csv(get_filepath(), sep='\t') -``` - -Lastly, Pandas also allows to directly import tabular data from the clipboard (i.e. data copied using copy-paste commands). For this, after copying the table (from a text/excel file or a website) call the method: - -```python -dataset = pd.read_clipboard() -``` - -The copied table will appear in the variable explorer. - - - ---- - - - -## Basic tabular data manipulation (Pandas dataframes) - -In the examples above, we imported the data as a *Dataframe*, which for simplicity is just a Python “object” containing tabular data. - -```python -type(dataset) # show the variable type -``` - -``` -pandas.core.frame.DataFrame -``` - -### Visualize the DataFrame - -For visualizing the data at any time, you can use the variable explorer in Spyder (Fig. 5) or directly typing the name of the variable in the console and press enter. - -```python -# show the DataFrame in the console or the notebook -dataset -``` - -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/dataframe_output.png?raw=true) - -Alternatively, if you want to view few rows use: - -```python -# visualize the first rows, you can define the number of rows whithin the parentheses -dataset.head() -# view the last rows -dataset.tail() -``` - -### Interacting with columns (creating columns, operations, etc.) - -To select one or more columns of the DataFrame, you need to type the name of the DataFrame and then the name of the column within brackets as follows: - -```python -# select a specific column of the dataset, note that the name of the column is in quotes. -dataset['AR'] * 2 # select the values of the column named 'AR' and multiply it by two - -# select several columns and estimate the arithmetic mean -# note the double brackets when calling more than one column! -np.mean(dataset[['Area', 'Feret']]) - -# estimate the arithmetic mean of all columns -np.mean(dataset) -``` - -A real-case scenario might be to estimate the apparent diameters of the grains from the sectional areas using the equivalent circular diameter (ECD) formula, which is - -ECD = 2 * √(area / π) - -Indeed, this is the case with the imported example dataset where the sectional areas not the apparent diameters are provided. In the example below, we are generating a new column named ``diameters`` with the equivalent circular diameters of the grain - - -```python -dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi) -dataset.head() -``` - -![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/dataframe_diameters.png?raw=true) - -Now, you can see that a new column named diameters appear when displaying the dataset. - - - -> 👉 In the examples above we define the square root as ``np.sqrt``, the arithmetic mean as ``np.mean``, and pi as ``np.pi``. In this case, ``np.`` stems for Numpy or numerical Python, a basic package for scientific computing with Python, and the word after the dot with the method or the scientific value to be applied. If you write in the console ``np.`` and then press the TAB key, you will see a large list of available methods. In general, the method names are equivalent to those used in MATLAB but always by adding the ``np.`` first. \ No newline at end of file diff --git a/DOCS/_describe.md b/DOCS/_describe.md new file mode 100644 index 0000000..9a60608 --- /dev/null +++ b/DOCS/_describe.md @@ -0,0 +1,190 @@ +# Describing the population of grain sizes + +The method to describe the properties of the grain size population is named ``summarize()``. Before we get into the details of the method, let's run the GrainSizeTools script, load the example dataset, and create a toy dataset with known parameters. + +```python +# Load the example dataset +filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA/data_set.txt' +dataset = pd.read_csv(filepath, sep='\t') + +# estimate equivalent circular diameters (ECDs) +dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi) +dataset +``` + +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/dataframe_output.png?raw=true) + +```python +# Set the population properties +scale = np.log(20) # set sample geometric mean to 20 +shape = np.log(1.5) # set the lognormal shape to 1.5 + +# generate a random lognormal population of size 500 +np.random.seed(seed=1) # this is to generate always the same population for reproducibility +toy_dataset = np.random.lognormal(mean=scale, sigma=shape, size=500) +``` + +We are now ready to check what we can get from the function `summarize()`. The simplest example of use would be to pass the data containing the diameters. For simplicity's sake, let's do it with the toy dataset first. + +```python +summarize(toy_dataset) +``` + +``` +============================================================================ +CENTRAL TENDENCY ESTIMATORS +============================================================================ +Arithmetic mean = 22.13 microns +Confidence intervals at 95.0 % +mCox method: 21.35 - 22.98 (-3.5%, +3.8%), length = 1.623 +============================================================================ +Geometric mean = 20.44 microns +Confidence interval at 95.0 % +CLT method: 19.73 - 21.17 (-3.5%, +3.6%), length = 1.441 +============================================================================ +Median = 20.32 microns +Confidence interval at 95.0 % +robust method: 19.33 - 21.42 (-4.9%, +5.4%), length = 2.096 +============================================================================ +Mode (KDE-based) = 17.66 microns +Maximum precision set to 0.1 +KDE bandwidth = 2.78 (silverman rule) + +============================================================================ +DISTRIBUTION FEATURES +============================================================================ +Sample size (n) = 500 +Standard deviation = 9.07 (1-sigma) +Interquartile range (IQR) = 11.44 +Lognormal shape (Multiplicative Standard Deviation) = 1.49 +============================================================================ +Shapiro-Wilk test warnings: +Data is not normally distributed! +Normality test: 0.88, 0.00 (test statistic, p-value) +============================================================================ +``` + +By default, the `summarize()` function returns: + +- Different **central tendency estimators** ("averages") including the arithmetic and geometric means, the median, and the KDE-based mode (i.e. frequency peak). +- The **confidence intervals** for the different means and the median at 95% of certainty in absolute value and percentage relative to the average (*a.k.a* coefficient of variation). The meaning of these intervals is that, given the observed data, there is a 95% probability (one in 20) that the true value of grain size falls within this credible interval. The function provides the lower and upper bounds of the confidence interval, the error in percentage respect to the average, and the interval length. +- The methods used to estimate the confidence intervals for each average (excepting for the mode). The function `summarize()` automatically choose the optimal method depending on distribution features (see below) +- The sample size and two population dispersion measures: the (Bessel corrected) [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) and the [interquartile range](https://en.wikipedia.org/wiki/Interquartile_range). +- The shape of the lognormal distribution using the multiplicative standard deviation (MSD) +- A Shapiro-Wilk test warning indicating when the data deviates from normal and/or lognormal (when p-value < 0.05). + +Note that here the Shapiro-Wilk test warning tells us that the distribution is not normally distributed, which is to be expected since we know that this is a lognormal distribution. Note that the geometric mean and the lognormal shape are very close to the values used to generate the synthetic dataset, 20 and 1.5 respectively. + +Now, let's do the same using the dataset that comes from a real rock, for this, we have to pass the column with the diameters: + +```python +summarize(dataset['diameters']) +``` + +``` +============================================================================ +CENTRAL TENDENCY ESTIMATORS +============================================================================ +Arithmetic mean = 34.79 microns +Confidence intervals at 95.0 % +CLT (ASTM) method: 34.09 - 35.48, (±2.0%), length = 1.393 +============================================================================ +Geometric mean = 30.10 microns +Confidence interval at 95.0 % +CLT method: 29.47 - 30.75 (-2.1%, +2.2%), length = 1.283 +============================================================================ +Median = 31.53 microns +Confidence interval at 95.0 % +robust method: 30.84 - 32.81 (-2.2%, +4.1%), length = 1.970 +============================================================================ +Mode (KDE-based) = 24.31 microns +Maximum precision set to 0.1 +KDE bandwidth = 4.01 (silverman rule) + +============================================================================ +DISTRIBUTION FEATURES +============================================================================ +Sample size (n) = 2661 +Standard deviation = 18.32 (1-sigma) +Interquartile range (IQR) = 23.98 +Lognormal shape (Multiplicative Standard Deviation) = 1.75 +============================================================================ +Shapiro-Wilk test warnings: +Data is not normally distributed! +Normality test: 0.94, 0.00 (test statistic, p-value) +Data is not lognormally distributed! +Lognormality test: 0.99, 0.03 (test statistic, p-value) +============================================================================ +``` + +Leaving aside the difference in numbers, there are some subtle differences compared to the results obtained with the toy dataset. First, the confidence interval method for the arithmetic mean is no longer the modified Cox (mCox) but the one based on the central limit theorem (CLT) advised by the [ASTM](https://en.wikipedia.org/wiki/ASTM_International). As previously noted, the function ```summarize()``` automatically choose the optimal confidence interval method depending on distribution features. We show below the decision tree flowchart for choosing the optimal confidence interval estimation method, which is based on [Lopez-Sanchez (2020)](https://doi.org/10.1016/j.jsg.2020.104042). + +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/avg_map.png?raw=true) + +The reason why the CLT method applies in this case is that the grain size distribution not enough lognormal-like (note the Shapiro-Wilk test warning with a p-value < 0.05), and this might cause an inaccurate estimate of the arithmetic mean confidence interval. + +Now, let's focus on the different options of the ``summarize()`` method. + +``` +Signature: +summarize( + data, + avg=('amean', 'gmean', 'median', 'mode'), + ci_level=0.95, + bandwidth='silverman', + precision=0.1, +) +Docstring: +Estimate different grain size statistics. This includes different means, +the median, the frequency peak grain size via KDE, the confidence intervals +using different methods, and the distribution features. + +Parameters +---------- +data : array_like + the size of the grains + +avg : string, tuple or list; optional + the averages to be estimated + + | Types: + | 'amean' - arithmetic mean + | 'gmean' - geometric mean + | 'median' - median + | 'mode' - the kernel-based frequency peak of the distribution + +ci_level : scalar between 0 and 1; optional + the certainty of the confidence interval (default = 0.95) + +bandwidth : string {'silverman' or 'scott'} or positive scalar; optional + the method to estimate the bandwidth or a scalar directly defining the + bandwidth. It uses the Silverman plug-in method by default. + +precision : positive scalar or None; optional + the maximum precision expected for the "peak" kde-based estimator. + Default is 0.1. Note that this has nothing to do with the + confidence intervals + +Call functions +-------------- +- amean, gmean, median, and freq_peak (from averages) + +Examples +-------- +>>> summarize(dataset['diameters']) +>>> summarize(dataset['diameters'], ci_level=0.99) +>>> summarize(np.log(dataset['diameters']), avg=('amean', 'median', 'mode')) + +Returns +------- +None +File: c:\users\marco\documents\github\grainsizetools\grain_size_tools\grainsizetools_script.py +Type: function +``` + + + +> **TODO:** +- explain the different options of ``summarize()`` through examples +- examples using log-transformed populations + diff --git a/DOCS/_first_steps.md b/DOCS/_first_steps.md index 87d3680..d5ed27e 100644 --- a/DOCS/_first_steps.md +++ b/DOCS/_first_steps.md @@ -1,6 +1,4 @@ -*last update 2020/05/02* - -# Getting started using the GrainSizeTools script: first steps +# Getting started: first steps using the GrainSizeTools script Installing Python for data science ------------- @@ -77,7 +75,7 @@ and provide the required parameters within the parenthesis. To access the methods within a module, type the module name plus the dot and hit the tab key and a complete list of methods will pop up. -### Get detailed information on methods +#### Get detailed information on methods You can get detailed information about any method or function of the script in different ways. The first is through the console using the character ? before the method @@ -134,21 +132,31 @@ filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA # import the data dataset = pd.read_table(filepath) + +#display the data +dataset ``` -Once the data is imported the dataset will appear in the variable explorer. Some important things to note about the code snippet used above is that we used the ``pd.read_table()`` method to import the file. By default, this method assumes that the data to import is stored in a text file separated by tabs. Alternatively you can use the ``pd.read_csv()`` method (note that csv means comma-separated values) and set the delimiter to ``'\t'`` as follows: ``pd.read_csv(filepath, sep='\t')``. +![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/dataframe_output.png?raw=true) + +Some important things to note about the code snippet used above are: + +- We used the ``pd.read_table()`` method to import the file. By default, this method assumes that the data to import is stored in a text file separated by tabs. Alternatively you can use the ``pd.read_csv()`` method (note that csv means comma-separated values) and set the delimiter to ``'\t'`` as follows: ``pd.read_csv(filepath, sep='\t')``. +- When calling the variable ``dataset`` it returs a visualization of the dataset imported, which is a tabular-like dataset with 2661 entries and 11 columns with different grain properties. In Python, this type of tabular-like objects are called (Pandas) *DataFrame* and allow a flexible and easy to use data analysis. Just for checking: ```python +# show the variable type type(dataset) + pandas.core.frame.DataFrame ``` Pandas' reading methods give you a lot of control over how a file is read. To keep things simple, I list the most commonly used arguments: ```python -sep # Delimiter to use. +sep # Delimiter/separator to use. header # Row number(s) to use as the column names. By default it takes the first row as the column names (header=0). If there is no columns names in the file you must set header=None skiprows # Number of lines to skip at the start of the file (an integer). na_filter # Detect missing value markers. False by default. @@ -162,7 +170,7 @@ An example using several optional arguments might be: dataset = pd.read_csv('DATA/data_set.csv', sep=';', skiprows=5, na_filter=True) ``` -which in plain language means that we are importing a ``csv`` file named ``data_set`` that is located in the folder ``DATA``. The data is delimited by a semicolon and we ignore the first five lines of the file (*i.e.* column names are supposed to appear in the sixth row). Last, we want all missing values to be handled during the import. +which in plain language means that we are importing a (fictitious) ``csv`` file named ``data_set`` that is located in the folder ``DATA``. The data is delimited by a semicolon and we ignore the first five lines of the file (*i.e.* column names are supposed to appear in the sixth row). Last, we want all missing values to be handled during the import. > 👉 more details on Pandas csv read method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html @@ -180,41 +188,21 @@ dataset = pd.read_csv(get_filepath(), sep='\t') Lastly, Pandas also allows to directly import tabular data from the clipboard (i.e. data copied using copy-paste commands). For example, after copying the table from a text file, excel spreadsheet or a website using: ```python -pd.read_clipboard() +dataset = pd.read_clipboard() ``` ---- + ## Basic data manipulation (using Pandas) Let's first see how the data set looks like. Instead of calling the variable (as in the example before) we now use the ``head()`` and ``tail()`` methods so that it only shows us the first (or last) rows of the data set ```python -type(dataset) # show the variable type -``` - -``` -pandas.core.frame.DataFrame -``` - -For visualizing the data at any time, you can use the variable explorer in Spyder (Fig. 5) or directly typing the name of the variable in the console and press enter. - -```python -# show the DataFrame in the console -dataset +dataset.head() # returns 5 rows by default, you can define any number within the parenthesis ``` ![](https://github.com/marcoalopez/GrainSizeTools/blob/master/FIGURES/dataframe_output.png?raw=true) -Alternatively, if you want to view just few rows use: - -```python -# visualize the first rows, you can define the number of rows whithin the parentheses -dataset.head() -# view the last rows -dataset.tail() -``` - The example dataset has 11 different columns (one without a name). To interact with one of the columns we must call its name in square brackets with the name in quotes as follows: ```python @@ -245,7 +233,10 @@ dataset = dataset.drop(' ', axis=1) dataset.head(3) ``` +![]() + If you want to remove more than one column pass a list of columns instead as in the example below: + ```python dataset.drop(['FeretX', 'FeretY'], axis=1) ``` @@ -269,7 +260,7 @@ dataset.head() You can see a new column named diameters. -> 👉 In the examples above we define the square root as ``np.sqrt``, the arithmetic mean as ``np.mean``, and pi as ``np.pi``. In this case, ``np.`` stems for Numpy or numerical Python, a basic package for scientific computing with Python, and the keyword after the dot is the method or the scientific value to be applied. If you write in the console ``np.`` and then press the TAB key, you will see a large list of available methods. In general, the method names are equivalent to those used in MATLAB but always by adding the ``np.`` first. +> 👉 In the examples above we define the square root as ``np.sqrt``, the arithmetic mean as ``np.mean``, and pi as ``np.pi``. In this case, ``np.`` stems for NumPy or numerical Python, a basic package for scientific computing with Python, and the keyword after the dot is the method or the scientific value to be applied. If you write in the console ``np.`` and then press the TAB key, you will see a large list of available methods. In general, the method names are equivalent to those used in MATLAB but always by adding the ``np.`` first. ### A list of useful Pandas methods diff --git a/DOCS/running_tests.md b/DOCS/running_tests.md deleted file mode 100644 index 1acc2fa..0000000 --- a/DOCS/running_tests.md +++ /dev/null @@ -1,391 +0,0 @@ -# Running test - -This document provides a way to check the functionality of the script. For this, use the data provided with the script in the file ``data_set.txt`` , copy the commands indicated in the different sections and paste them in the console, and check if the results are the same. - - - -## Test the ``extract_column`` function - -```python -# find and open the data_set.txt file through a file selection dialog ->>> areas = extract_column() - - Area Circ. Feret ... MinFeret AR Round Solidity -0 1 157.25 0.680 18.062 ... 13.500 1.101 0.908 0.937 -1 2 2059.75 0.771 62.097 ... 46.697 1.314 0.761 0.972 -2 3 1961.50 0.842 57.871 ... 46.923 1.139 0.878 0.972 -3 4 5428.50 0.709 114.657 ... 63.449 1.896 0.528 0.947 -4 5 374.00 0.699 29.262 ... 16.000 1.515 0.660 0.970 - -[5 rows x 11 columns] - Area Circ. Feret ... MinFeret AR Round Solidity -2656 2657 452.50 0.789 28.504 ... 22.500 1.235 0.810 0.960 -2657 2658 1081.25 0.756 47.909 ... 31.363 1.446 0.692 0.960 -2658 2659 513.50 0.720 32.962 ... 20.496 1.493 0.670 0.953 -2659 2660 277.75 0.627 29.436 ... 17.002 1.727 0.579 0.920 -2660 2661 725.00 0.748 39.437 ... 28.025 1.351 0.740 0.960 - -[5 rows x 11 columns] - -column extracted: -Area = [ 157.25 2059.75 1961.5 ... 513.5 277.75 725. ] -n = 2661 - - -# Do the same specifying the file path (define your own absolute filepath) ->>> extract_column(file_path='.../GrainSizeTools/data_set.txt') - - -# Extract other column ->>> extract_column(col_name='Feret') -... -column extracted: -Feret = [18.062 62.097 57.871 ... 32.962 29.436 39.437] - - -# catching common exceptions ->>> extract_column(col_name='Foo') -... -KeyError: 'Foo' - -# filepath does not exist -FileNotFoundError: File b'.../data_set.txt' does not exist -``` - - - -## Test the ``area2diameter`` function - -```python ->>> area2diameter(areas) -array([14.14980277, 51.210889 , 49.97458721, ..., 25.56967943, - 18.80537911, 30.3825389 ]) - - ->>> area2diameter(areas, correct_diameter=1.0) -array([15.14980277, 52.210889 , 50.97458721, ..., 26.56967943, - 19.80537911, 31.3825389 ]) -``` - - - -## Test the ``calc_grain_size`` function - -```python -# check with default options ->>> diameters = area2diameter(areas) ->>> calc_grain_size(diameters) - -DESCRIPTIVE STATISTICS - -Arithmetic mean grain size = 34.79 microns -Standard deviation = 18.32 (1-sigma) -RMS mean = 39.31 microns -Geometric mean = 30.1 microns - -Median grain size = 31.53 microns -Interquartile range (IQR) = 23.98 - -Peak grain size (based on KDE) = 24.28 microns -KDE bandwidth = 4.01 (silverman rule) - -HISTOGRAM FEATURES -The modal interval is 16.83 - 20.24 -The number of classes are 45 -The bin size is 3.41 according to the auto rule - - -# check using different grain size scales (use arithmetic mean and SD to compare) ->>> calc_grain_size(diameters, plot='sqrt') -... -Arithmetic mean grain size = 5.7 microns -Standard deviation = 1.53 (1-sigma) - - ->>> calc_grain_size(diameters, plot='log') -... -Arithmetic mean grain size = 3.4 microns -Standard deviation = 0.56 (1-sigma) - - ->>> calc_grain_size(diameters, plot='log10') -... -Arithmetic mean grain size = 1.48 microns -Standard deviation = 0.24 (1-sigma) - - ->>> calc_grain_size(diameters, areas=areas, plot='area') -... -Area-weighted mean grain size = 53.88 microns -... -The number of classes are 46 -The bin size is 3.4 according to the auto rule - - ->>> calc_grain_size(diameters, areas=areas, plot='norm') -Define the normalization factor (1 to 3) -1 -> mean; 2 -> median; 3 -> max_freq: 1 -... -Arithmetic mean grain size = 1.0 microns -Standard deviation = 0.16 (1-sigma) - -Define the normalization factor (1 to 3) -1 -> mean; 2 -> median; 3 -> max_freq: 2 -... -Median grain size = 1.0 microns -Interquartile range (IQR) = 0.22 - -Define the normalization factor (1 to 3) -1 -> mean; 2 -> median; 3 -> max_freq: 3 -... -Peak grain size (based on KDE) = 1.0 microns -KDE bandwidth = 0.03 (silverman rule) - - -# catching common mistakes/exceptions -# bad name ->>> calc_grain_size(diameters, plot='foo') -ValueError: The type of plot has been misspelled, please use 'lin', 'log', 'log10', 'sqrt', 'norm', or 'area' - - # missing areas when using the area weighted approach ->>> calc_grain_size(diameters, plot='areas') -You must provide the areas of the grain sections! - -# wrong choice when using normalization approach ->>> calc_grain_size(diameters, areas=areas, plot='norm') -Define the normalization factor (1 to 3) -1 -> mean; 2 -> median; 3 -> max_freq: 4 -ValueError: Normalization factor has to be defined as 1, 2, or 3 -``` - -```python -# Test binsize functionality. Note that we just check a number (not all) of different plug-in methods since this functionality belong to the numpy package and hence it is already tested by numpy developers. - ->>> calc_grain_size(diameters, binsize='doane') -... -The bin size is 9.02 according to the doane rule - - ->>> calc_grain_size(diameters, binsize='scott') -... -The bin size is 4.51 according to the scott rule - - -# ad hoc bin size ->>> calc_grain_size(diameters, binsize=7.5) -... -HISTOGRAM FEATURES -The modal interval is 18.19 - 25.69 -The number of classes are 21 - - -# catching common mistakes -# bad name ->>> calc_grain_size(diameters, binsize='foo') -ValueError: 'foo' is not a valid estimator for `bins` -``` - -```python -# Test kde bandwidth functionality - ->>> calc_grain_size(diameters, bandwidth='scott') -... -Peak grain size (based on KDE) = 24.2 microns -KDE bandwidth = 3.78 (scott rule) -... - ->>> calc_grain_size(diameters, bandwidth=6.0) -... -Peak grain size (based on KDE) = 25.29 microns -KDE bandwidth = 6.0 -... - -# catching common mistakes/exceptions -# bad name ->>> calc_grain_size(diameters, bandwidth='foo') -ValueError: `bw_method` should be 'scott', 'silverman', a scalar or a callable. -``` - - - -## Test the ``Saltykov`` function - -```python ->>> Saltykov(diameters) # check plot -bin size = 15.66 - - ->>> Saltykov(diameters, numbins=14) # check plot -bin size = 11.19 - - ->>> Saltykov(diameters, numbins=16, calc_vol=40) -volume fraction (up to 40 microns) = 20.33 % -bin size = 9.79 - - -# Get the frequency and the right edges of the classes ->>> Saltykov(diameters, return_data=True) -(array([ 7.82979491, 23.48938472, 39.14897454, 54.80856436, - 70.46815417, 86.12774399, 101.78733381, 117.44692362, - 133.10651344, 148.76610326]), - array([2.67457256e-03, 2.30102443e-02, 2.03855325e-02, 1.15382229e-02, - 3.80778332e-03, 1.86010761e-03, 5.21259159e-04, 2.47891410e-05, - 0.00000000e+00, 3.61215495e-05])) - - -# generating text files with the output ->>> Saltykov(diameters, text_file='foo.csv') -The file foo.csv was created # check file -bin size = 15.66 - ->>> Saltykov(diameters, text_file='bar.txt') -The file bar.txt was created # check file -bin size = 15.66 - - -# test left edge (see plot) ->>> Saltykov(diameters, left_edge=5.0) ->>> Saltykov(diameters, left_edge='min') # check using min(diameters) - - -# catching common mistakes -# set a grain size higher than the greatest grain size in the population to estimate the volume (it should return 100%) ->>> Saltykov(diameters, calc_vol=10000) -volume fraction (up to 10000 microns) = 100 % - - -# not specifiying the correct type of text file ->>> Saltykov(diameters, text_file='foo') ->>> Saltykov(diameters, text_file='foo.xlsx') -ValueError: text file must be specified as .csv or .txt -``` - - - - - -## Test the ``calc_shape`` function - -```python -# default parameters ->>> calc_shape(diameters) -OPTIMAL VALUES -Number of clasess: 11 -MSD (shape) = 1.63 ± 0.06 -Geometric mean (location) = 36.05 ± 1.27 - - ->>> calc_shape(diameters, class_range=(12, 18)) -OPTIMAL VALUES -Number of clasess: 12 -MSD (shape) = 1.64 ± 0.07 -Geometric mean (location) = 36.22 ± 1.62 - - ->>> calc_shape(diameters, initial_guess=True) -Define an initial guess for the MSD parameter (the default value is 1.2; MSD > 1.0): 1.6 -Define an initial guess for the geometric mean (the default value is 35.0): 40.0 -# You should obtain the same results provided in the first example -``` - - - - - -```python ->>> my_results = [165.3, 174.2, 180.1] ->>> confidence_interval(data=my_results, confidence=0.95) -Confidence set at 99.0 % -Mean = 173.2 ± 42.69 -Max / min = 215.89 / 130.51 -Coefficient of variation = 24.6 % - - ->>> confidence_interval(data=my_results, confidence=0.99) -Confidence set at 99.0 % -Mean = 173.2 ± 42.69 -Max / min = 215.89 / 130.51 -Coefficient of variation = 24.6 % - - -# catching common mistakes -confidence_interval(data=my_results, confidence=1.2) -... -ValueError: alpha must be between 0 and 1 inclusive -``` - - - - - -```python -# check "Available piezometers ->>> quartz() -Available piezometers: -'Cross' -'Cross_hr' -'Holyoke' -'Holyoke_BLG' -'Shimizu' -'Stipp_Tullis' -'Stipp_Tullis_BLG' -'Twiss' - ->>> olivine() -Available piezometers: -'Jung_Karato' -'VanderWal_wet' - ->>> feldspar() -Available piezometers: -'Post_Tullis_BLG' - ->>> calcite() -Available piezometers: -'Barnhoorn' -'Platt_Bresser' -'Rutter_SGR' -'Rutter_GBM' -'Valcke' - -# Check estimates (TODO: Automatize this using all the piezometers!) ->>> calc_diffstress(grain_size=5.7, phase='quartz', piezometer='Stipp_Tullis') -differential stress = 169.16 MPa -Ensure that you entered the apparent grain size as the root mean square (RMS)! - - ->>> calc_diffstress(grain_size=35, phase='olivine', piezometer='VanderWal_wet') -differential stress = 282.03 MPa -Ensure that you entered the apparent grain size as the mean in linear scale! - - ->>> calc_diffstress(grain_size=35, phase='calcite', piezometer='Rutter_SGR') -differential stress = 35.58 MPa -Ensure that you entered the apparent grain size as the mean in linear scale! - - -# Catching exceptions -# wrong phase name ->>> calc_diffstress(grain_size=5.7, phase='foo', piezometer='Stipp_Tullis') -ValueError: Phase name misspelled. Please choose between valid mineral names - -# wrong piezometer name ->>> calc_diffstress(grain_size=5.7, phase='calcite', piezometer='Stipp_Tullis') -Available piezometers: -'Barnhoorn' -'Platt_Bresser' -'Rutter_SGR' -'Rutter_GBM' -'Valcke -... -ValueError: Piezometer name misspelled. Please choose between valid piezometer - -# missing required positional arguments ->>> calc_diffstress(grain_size=5.7, phase='calcite') -TypeError: calc_diffstress() missing 1 required positional argument: 'piezometer' - ->>> calc_diffstress(grain_size=5.7, piezometer='Stipp_Tullis') -TypeError: calc_diffstress() missing 1 required positional argument: 'phase' -``` - diff --git a/grain_size_tools/GrainSizeTools_script.py b/grain_size_tools/GrainSizeTools_script.py index c88b671..bbd76d7 100644 --- a/grain_size_tools/GrainSizeTools_script.py +++ b/grain_size_tools/GrainSizeTools_script.py @@ -82,7 +82,7 @@ def conf_interval(data, confidence=0.95): print('Mean = {mean:0.2f} ± {err:0.2f}' .format(mean=amean, err=err)) print('Confidence set at {} %' .format(confidence * 100)) print('Max / min = {max:0.2f} / {min:0.2f}' .format(max=high, min=low)) - print('Coefficient of variation = {:0.1f} %' .format(100 * err / amean)) + print('Coefficient of variation = ±{:0.1f} %' .format(100 * err / amean)) return amean, err, (low, high) diff --git a/grain_size_tools/example_notebooks/getting_started.ipynb b/grain_size_tools/example_notebooks/getting_started.ipynb index d99367c..839595e 100644 --- a/grain_size_tools/example_notebooks/getting_started.ipynb +++ b/grain_size_tools/example_notebooks/getting_started.ipynb @@ -970,7 +970,7 @@ "source": [ "You can see a new column named diameters.\n", "\n", - "> 👉 In the examples above we define the square root as ``np.sqrt``, the arithmetic mean as ``np.mean``, and pi as ``np.pi``. In this case, ``np.`` stems for Numpy or numerical Python, a basic package for scientific computing with Python, and the keyword after the dot is the method or the scientific value to be applied. If you write in the console ``np.`` and then press the TAB key, you will see a large list of available methods. In general, the method names are equivalent to those used in MATLAB but always by adding the ``np.`` first.\n", + "> 👉 In the examples above we define the square root as ``np.sqrt``, the arithmetic mean as ``np.mean``, and pi as ``np.pi``. In this case, ``np.`` stems for NumPy or numerical Python, a basic package for scientific computing with Python, and the keyword after the dot is the method or the scientific value to be applied. If you write in the console ``np.`` and then press the TAB key, you will see a large list of available methods. In general, the method names are equivalent to those used in MATLAB but always by adding the ``np.`` first.\n", "\n", "### A list of useful Pandas methods\n", "\n", @@ -1241,7 +1241,7 @@ { "data": { "text/plain": [ - "" + "" ] }, "execution_count": 12, diff --git a/grain_size_tools/example_notebooks/grain_size_description.ipynb b/grain_size_tools/example_notebooks/grain_size_description.ipynb index 48f9417..7d788ec 100644 --- a/grain_size_tools/example_notebooks/grain_size_description.ipynb +++ b/grain_size_tools/example_notebooks/grain_size_description.ipynb @@ -292,9 +292,9 @@ "source": [ "# Load the example dataset\n", "filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA/data_set.txt'\n", - "dataset = pd.read_csv(filepath, sep='\\t') # Import the example dataset\n", + "dataset = pd.read_csv(filepath, sep='\\t')\n", "\n", - "# estimate ECD\n", + "# estimate equivalent circular diameters (ECDs)\n", "dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi)\n", "dataset" ] @@ -385,15 +385,15 @@ "By default, the ```summarize()``` function returns:\n", "\n", "- Different **central tendency estimators** (\"averages\") including the arithmetic and geometric means, the median, and the KDE-based mode (i.e. frequency peak).\n", - "- The **confidence intervals** for the different means and the median at 95% of certainty in absolute value and in percentage relative to the average (*a.k.a* coefficient of variation). The meaning of these intervals is that, given the observed data, there is a 95% probability (one in 20) that the true value of grain size falls within this credible interval. The function provides the lower and upper bounds of the confidence interval, the error in percentage respect to the average, and the interval length. \n", + "- The **confidence intervals** for the different means and the median at 95% of certainty in absolute value and percentage relative to the average (*a.k.a* coefficient of variation). The meaning of these intervals is that, given the observed data, there is a 95% probability (one in 20) that the true value of grain size falls within this credible interval. The function provides the lower and upper bounds of the confidence interval, the error in percentage respect to the average, and the interval length. \n", "- The methods used to estimate the confidence intervals for each average (excepting for the mode). The function ```summarize()``` automatically choose the optimal method depending on distribution features (see below)\n", "- The sample size and two population dispersion measures: the (Bessel corrected) [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) and the [interquartile range](https://en.wikipedia.org/wiki/Interquartile_range).\n", "- The shape of the lognormal distribution using the multiplicative standard deviation (MSD)\n", "- A Shapiro-Wilk test warning indicating when the data deviates from normal and/or lognormal (when p-value < 0.05).\n", "\n", - "Note that here the Shapiro-Wilk test warning tell us that the distribution is not normally distributed, which is to be expected since we know that this is a lognormal distribution. Note that the geometric mean and the lognormal shape are very close to the values used to generate the syntethic dataset, 20 and 1.5 respectively.\n", + "Note that here the Shapiro-Wilk test warning tells us that the distribution is not normally distributed, which is to be expected since we know that this is a lognormal distribution. Note that the geometric mean and the lognormal shape are very close to the values used to generate the synthetic dataset, 20 and 1.5 respectively.\n", "\n", - "Now, let's do the same using the dataset that comes from a real rock, for this we have to pass the column with the diameters:" + "Now, let's do the same using the dataset that comes from a real rock, for this, we have to pass the column with the diameters:" ] }, { @@ -450,7 +450,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Leaving aside the difference in numbers, there are some subtle differences compared to the results obtained with the toy dataset. First, the confidence interval method for the arithmetic mean is no longer the modified Cox (mCox) but the one based on the central limit theorem (CLT) advised by the [ASTM](https://en.wikipedia.org/wiki/ASTM_International). As previously noted, the function ```summarize()``` automatically choose the optimal confidence interval method depending on distribution features. We show below the decision tree flowchart for choosing the optimal confidence interval estimation method, which is based on [Lopez-Sanchez (2020)](https://doi.org/10.1016/j.jsg.2020.104042)" + "Leaving aside the difference in numbers, there are some subtle differences compared to the results obtained with the toy dataset. First, the confidence interval method for the arithmetic mean is no longer the modified Cox (mCox) but the one based on the central limit theorem (CLT) advised by the [ASTM](https://en.wikipedia.org/wiki/ASTM_International). As previously noted, the function ```summarize()``` automatically choose the optimal confidence interval method depending on distribution features. We show below the decision tree flowchart for choosing the optimal confidence interval estimation method, which is based on [Lopez-Sanchez (2020)](https://doi.org/10.1016/j.jsg.2020.104042)." ] }, { @@ -464,7 +464,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The reason why the CLT method applies in this case is that the grain size distribution not enough lognormal-like (note the Shapiro-Wilk test warning with a p-value < 0.05!), and this might cause an inaccurate estimate of the arithmetic mean confidence interval.\n", + "The reason why the CLT method applies in this case is that the grain size distribution not enough lognormal-like (note the Shapiro-Wilk test warning with a p-value < 0.05), and this might cause an inaccurate estimate of the arithmetic mean confidence interval.\n", "\n", "Now, let's focus on the different options of the ``summarize()`` method." ] diff --git a/grain_size_tools/example_notebooks/paleopiezometry_examples.ipynb b/grain_size_tools/example_notebooks/paleopiezometry_examples.ipynb index 048c2d9..9f1ed7e 100644 --- a/grain_size_tools/example_notebooks/paleopiezometry_examples.ipynb +++ b/grain_size_tools/example_notebooks/paleopiezometry_examples.ipynb @@ -57,11 +57,11 @@ "\n", "For the second requirement, the function will automatically convert the equivalent circular diameter to linear intercepts where applicable using de Hoff and Rhines (1968) correction. This is, **you don't have to worry about whether the piezometer was originally calibrated using linear intercepts**, always use the equivalent circular diameters in microns.\n", "\n", - "The third requirement is key for a correct estimation of the differential stress since each paleopiezometer was calibrated for a specific average grain size (e.g. the arithmetic mean, median, or RMS mean) and, hence, **only provides valid results if the same type of average is used**. Also, **you should not use any type of stereological correction for the estimation of the average grain size**, if the author(s) of the piezometer used any type of stereological correction, the average grain size will be automatically corrected by the function. \n", + "The third requirement is key for a correct estimation of the differential stress since each paleopiezometer was calibrated for a specific average grain size (e.g. the arithmetic mean, median or RMS mean) and, hence, **only provides valid results if the same type of average is used**. Also, **you should not use any type of stereological correction for the estimation of the average grain size**, if the author(s) of the piezometer used any type of stereological correction, the average grain size will be automatically corrected by the function. \n", "\n", "The fourth requirement means that the user has to decide whether to correct or not the estimate of the differential stress for plane stress using the correction factor proposed by Paterson and Olgaard (2000). The rationale for this is that the experiments designed to calibrate piezometers are mainly performed in uniaxial compression while natural shear zones approximately behave as plane stress volumes.\n", "\n", - "In the next subsection we will show examples of how to obtain information about the different piezometers and define these parameters, but let's first load the script." + "In the next subsection, we will show examples of how to obtain information about the different piezometers and define these parameters, but let's first load the script." ] }, { @@ -143,7 +143,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that we obtain five different outputs separated by commas which correspond with:\n", + "Note the five different outputs separated by commas which correspond with:\n", "- the constant *B* of the piezometric relation\n", "- the exponent *m* of the piezometric relation\n", "- A warning indicating the average to use with this piezometric relation\n", @@ -236,7 +236,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As indicated in the documentation, the ``calc_diffstress()`` requires three (obligatory) inputs: (1) the average grain size in microns, (2) the mineral phase, and (3) the piezometric relation to use. We provide few examples below:" + "As indicated in the documentation, the ``calc_diffstress()`` requires three (obligatory) inputs: (1) the average grain size in microns, (2) the mineral phase, and (3) the piezometric relation to use. We provide a few examples below:" ] }, { @@ -266,7 +266,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The function returns the differential stress (in MPa) plus some relevant information about the corrections made and the type of average expected. Most piezometric calibrations were calibrated using uniaxial compression deformation experiments while in nature most shear zones approximately behaves as plane stress. Due to this, it may be necessary to correct the differential stress value. The ``calc_diffstress()`` allows you to apply the correction proposed by Paterson and Olgaard (2000) for this as follows (note the slighly different value of differential stress):" + "The function returns the differential stress (in MPa) plus some relevant information about the corrections made and the type of average expected. Most piezometric calibrations were calibrated using uniaxial compression deformation experiments while in nature most shear zones approximately behaves as plane stress. Due to this, it may be necessary to correct the differential stress value. The ``calc_diffstress()`` allows you to apply the correction proposed by Paterson and Olgaard (2000) for this as follows (note the slightly different value of differential stress):" ] }, { @@ -367,7 +367,7 @@ "source": [ "## Estimation of the differential stress using arrays of values\n", "\n", - "Alternatively, you can use (numpy) arrays as input to estimate several differential stresses at once. In this case, the ``calc_diffstress()`` function will return a numpy array, so it is generally more useful to store it in a variable as in the example below." + "Alternatively, you can use (NumPy) arrays as input to estimate several differential stresses at once. In this case, the ``calc_diffstress()`` function will return a NumPy array, so it is generally more useful to store it in a variable as in the example below." ] }, { @@ -407,7 +407,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If the set of estimated values belongs to the same structural element (e.g. different areas of the the same mylonite or different rocks within the same shear zone), you may want to estimate an average differential stress from all the data. The GrainSizeTools script provides a method named ``conf_interval()`` for this." + "If the set of estimated values belongs to the same structural element (e.g. different areas of the same mylonite or different rocks within the same shear zone), you may want to estimate the average differential stress from all the data. The GrainSizeTools script provides a method named ``conf_interval()`` for this." ] }, { @@ -470,7 +470,7 @@ "Mean = 167.37 ± 11.41\n", "Confidence set at 95.0 %\n", "Max / min = 178.79 / 155.96\n", - "Coefficient of variation = 6.8 %\n" + "Coefficient of variation = ±6.8 %\n" ] }, { diff --git a/grain_size_tools/example_notebooks/plot_module_examples.ipynb b/grain_size_tools/example_notebooks/plot_module_examples.ipynb index 19bd150..279dbdb 100644 --- a/grain_size_tools/example_notebooks/plot_module_examples.ipynb +++ b/grain_size_tools/example_notebooks/plot_module_examples.ipynb @@ -6,7 +6,7 @@ "source": [ "# The plot module: visualizing grain size distributions\n", "\n", - "The plot module includes a series of plots to visualize and characterize grain populations. Before we get into the details, let's run the GrainSizeTools script and load the example dataset." + "The plot module includes a series of plots to visualize and characterize grain size populations. Before we get into the details, let's run the GrainSizeTools script and load the example dataset." ] }, { @@ -301,9 +301,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As already mentioned, the plot module allows several types of visualizations. All methods of the *plot* module can be invoked by writing ```plot.*```, where * refers to the plot to be used.\n", + "All methods of the *plot* module can be invoked by writing ```plot.*```, where * refers to the plot to be used.\n", "\n", - "> 👉 If you write ``plot.`` and then press the tab key a menu will pop up with all the methods implemented in the plot module\n", + "> 👉 If you write ``plot.`` and then hit the tab key a menu will pop up with all the methods implemented in the module\n", "\n", "The main method is ```plot.distribution()```. The method allows to visualize the grain size population using the histogram and/or the kernel density estimate (KDE) and provides the location of the different averages. The simplest example of use would be to pass the column with the diameters as follows:" ] @@ -345,7 +345,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "By default, the ```plot.distribution()``` function returns a plot containing the histogram and the kernel density values of the distribution and the location of the main averages estimated by the function ``summarize`` by default." + "By default, the ```plot.distribution()``` function returns a plot containing the histogram and the kernel density values of the distribution and the location of the averages estimated by the function ``summarize`` by default." ] }, { @@ -477,7 +477,7 @@ "source": [ "## Testing lognormality\n", "\n", - "Sometimes can be helpful to test whether the data follows or deviates from a lognormal distribution. For example, to find out if the data set is suitable for applying the two-step stereological method or which confidence interval method is best for the arithmetic mean. The script use two methods to test whether the distribution of grain size follows a lognormal distribution. One is a visual method named [quantile-quantile (q-q) plots]([https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot](https://en.wikipedia.org/wiki/Q–Q_plot)) and the other is a quantitative test named the [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro–Wilk_test). For this we use the GrainSizeTools function ```test_lognorm``` as follows :" + "Sometimes can be helpful to test whether the data follows or deviates from a lognormal distribution. For example, to find out if the dataset is suitable for applying the two-step stereological method or which confidence interval method is best. The script uses two methods to test whether the distribution of grain size follows a lognormal distribution. One is a visual method named [quantile-quantile (q-q) plots]([https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot](https://en.wikipedia.org/wiki/Q–Q_plot)) and the other is a quantitative test named the [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro–Wilk_test). For this we use the GrainSizeTools function ```test_lognorm``` as follows :" ] }, { @@ -491,9 +491,9 @@ "text": [ "=======================================\n", "Shapiro-Wilk test (lognormal):\n", - "0.99, 0.12 (test statistic, p-value)\n", - "It looks like a lognormal distribution\n", - "(⌐■_■)\n", + "0.98, 0.00 (test statistic, p-value)\n", + "It doesnt look like a lognormal distribution (p-value < 0.05)\n", + "(╯°□°)╯︵ ┻━┻\n", "=======================================\n" ] }, @@ -525,6 +525,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "The Shapiro-Wilk test returns two different values, the test statistic and the p-value. This test considers the distribution to be lognormally distributed when the p-value is greater than 0.05. The q-q plot is a visual test that when the points fall right onto the reference line it means that the grain size values are lognormally distributed. The q-q plot has the advantage over the Shapiro-Wilk test that it shows where the distribution deviates from lognormality. \n", + "\n", + "> 👉 To know more about the q-q plot see https://serialmentor.com/dataviz/\n", + "\n", + "---\n", + "\n", "## The area-weighted distribution\n", "\n", "The plot module also allows plotting the area-weighted distribution of grain sizes using the function ``area_weighted()``. This function also returns some basic statistics such as the area-weighted mean and the histogram features. For example:" @@ -580,7 +586,7 @@ "source": [ "> 👉 ***When to use and not to use the area-weighted approach?***\n", ">\n", - "> You **should not use** the area-weighted mean for the calibration of paleopiezometers or for the comparison of grain size populations, as this is a poorly optimised central tendency measure ([Lopez-Sanchez, 2020](https://doi.org/10.1016/j.jsg.2020.104042)). On the other hand, the area-weighted distribution is useful to visualize...TODO\n", + "> You **should not use** the area-weighted mean for the calibration of paleopiezometers or for the comparison of grain size populations, as this is a poorly optimised central tendency measure ([Lopez-Sanchez, 2020](https://doi.org/10.1016/j.jsg.2020.104042)). On the other hand, the area-weighted distribution is useful to visualize the coarser size range, since in number-weighted distributions these sizes are diluted but can represent a significant area or volume.\n", "\n", "## Normalized grain size distributions\n", "\n", @@ -614,7 +620,7 @@ } ], "source": [ - "fig4, ax = plot.normalized(dataset['diameters'])" + "fig4, ax = plot.normalized(dataset['diameters'], avg='amean')" ] }, { @@ -630,7 +636,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's play by changing some of the function parameters. In this case, we are going to establish the median as an alternative normalization factor, and we are also going to smooth the kernel density estimator by increaing the value from 0.04 (estimated according to the Silverman rule) to 0.1. In addition, we will set the appearance of the figure using the figsize parameter, where the values within the parentheses are the (width, height) in inches." + "Let's play by changing some of the function parameters. In this case, we are going to establish the median as an alternative normalization factor, and we are also going to smooth the kernel density estimator by increasing the value from 0.04 (estimated according to the Silverman rule) to 0.1. Also, we will set the appearance of the figure using the figsize parameter, where the values within the parentheses are the (width, height) in inches." ] }, { @@ -652,7 +658,7 @@ "data": { "text/plain": [ "(
,\n", - " )" + " )" ] }, "execution_count": 12, diff --git a/grain_size_tools/example_notebooks/stereology_module_examples.ipynb b/grain_size_tools/example_notebooks/stereology_module_examples.ipynb index 4f1a2df..5819d35 100644 --- a/grain_size_tools/example_notebooks/stereology_module_examples.ipynb +++ b/grain_size_tools/example_notebooks/stereology_module_examples.ipynb @@ -6,7 +6,9 @@ "source": [ "# The stereology module\n", "\n", - "TODO" + "The main purpose of stereology is to extract quantitative information from microscope images. It is a set of mathematical methods relating two-dimensional measures obtained on sections to three-dimensional parameters defining the structure. Note that the aim of stereology is not to reconstruct the 3D geometry of the material (as in tomography) but to estimate a particular 3D feature. In this particular case, to approximate the actual (3D) grain size distribution from the apparent (2D) grain size distribution obtained in sections. In this particular case, to approximate the actual (3D) grain size distribution from the apparent (2D) grain size distribution obtained in sections. \n", + "\n", + "GrainSizeTools script includes two stereological methods: 1) the Saltykov method, and 2) the two-step method. Before looking at its functionalities, applications and limitations, let's run the script and import the example dataset." ] }, { @@ -43,26 +45,27 @@ "%run C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/GrainSizeTools_script.py" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## The Saltykov method\n", - "\n", - "TODO" - ] - }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ + "# Import the example dataset\n", "filepath = 'C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/DATA/data_set.txt'\n", - "dataset = pd.read_csv(filepath, sep='\\t') # Import the example dataset\n", + "dataset = pd.read_csv(filepath, sep='\\t')\n", "dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi) # estimate ECD" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Saltykov method\n", + "\n", + "TODO: explain functionalities, applications and limitations" + ] + }, { "cell_type": "code", "execution_count": 3, @@ -110,7 +113,7 @@ "source": [ "## The two-step method\n", "\n", - "TODO" + "TODO: explain functionalities, applications and limitations" ] }, {