-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
first version of joss paper, scipy implmentation geometric distribution
- Loading branch information
1 parent
3ae4f00
commit 8d64800
Showing
14 changed files
with
248 additions
and
123 deletions.
There are no files selected for viewing
Binary file modified
BIN
+482 Bytes
(100%)
...ntation/continuous/document_continuous_distributions/phitter_continuous_distributions.pdf
Binary file not shown.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
@article{marsaglia2004evaluating, | ||
title = {Evaluating the anderson-darling distribution}, | ||
author = {Marsaglia, George and Marsaglia, John}, | ||
journal = {Journal of statistical software}, | ||
volume = {9}, | ||
pages = {1--5}, | ||
year = {2004} | ||
} | ||
|
||
@book{walck1996hand, | ||
title = {Hand-book on statistical distributions for experimentalists}, | ||
author = {Walck, Christian and others}, | ||
year = {1996}, | ||
publisher = {Stockholms universitet} | ||
} | ||
|
||
@article{george2011estimation, | ||
title = {Estimation of parameters of Johnson's system of distributions}, | ||
author = {George, Florence and Ramachandran, KM}, | ||
journal = {Journal of Modern Applied Statistical Methods}, | ||
volume = {10}, | ||
pages = {494--504}, | ||
year = {2011} | ||
} | ||
|
||
@article{sinclair1988approximations, | ||
title = {Approximations to the distribution function of the anderson—darling test statistic}, | ||
author = {Sinclair, CD and Spurr, BD}, | ||
journal = {Journal of the American Statistical Association}, | ||
volume = {83}, | ||
number = {404}, | ||
pages = {1190--1191}, | ||
year = {1988}, | ||
publisher = {Taylor \& Francis} | ||
} | ||
|
||
@book{mclaughlin2001compendium, | ||
title = {A compendium of common probability distributions}, | ||
author = {McLaughlin, Michael P}, | ||
year = {2001}, | ||
publisher = {Michael P. McLaughlin} | ||
} | ||
|
||
@article{lewis1961distribution, | ||
title = {Distribution of the Anderson-Darling statistic}, | ||
author = {Lewis, Peter AW}, | ||
journal = {The Annals of Mathematical Statistics}, | ||
pages = {1118--1124}, | ||
year = {1961}, | ||
publisher = {JSTOR} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
title: "Phitter: A Python Library for Probability Distribution Fitting and Analysis" | ||
tags: | ||
- Python | ||
- Statistics | ||
- probability distributions | ||
- data analysis | ||
- machine learning | ||
- simulation | ||
- monte carlo | ||
authors: | ||
- name: Sebastián José Herrera Monterrosa | ||
orcid: 0009-0002-2766-642X | ||
affiliation: 1 | ||
affiliations: | ||
- name: Pontificia Universidad Javeriana | ||
index: 1 | ||
date: 26 March 2024 | ||
bibliography: paper.bib | ||
--- | ||
|
||
# Summary | ||
|
||
Phitter is a Python library designed to analyze datasets and determine the best analytical probability distributions that represent them. It provides a comprehensive suite of tools for fitting and analyzing over 80 probability distributions, both continuous and discrete. Phitter implements three goodness-of-fit tests and offers interactive visualizations to aid in the analysis process. For each selected probability distribution, Phitter provides a standard modeling guide along with detailed spreadsheets that outline the methodology for using the chosen distribution in various fields such as data science, operations research, and artificial intelligence. | ||
|
||
# Statement of Need | ||
|
||
In the fields of data science, statistics, and machine learning, understanding the underlying probability distributions of datasets is crucial for accurate modeling and prediction. However, identifying the most appropriate distribution for a given dataset can be a complex and time-consuming task. Phitter addresses this need by providing a user-friendly, efficient, and comprehensive tool for probability distribution fitting and analysis. | ||
|
||
Phitter stands out from existing tools by offering: | ||
|
||
1. A wide range of over 80 probability distributions, including both continuous and discrete options. | ||
2. Implementation of multiple goodness-of-fit tests (Chi-Square, Kolmogorov-Smirnov, and Anderson-Darling). | ||
3. Interactive visualizations for better understanding and interpretation of results. | ||
4. Accelerated fitting capabilities for large datasets (over 100K samples). | ||
5. Detailed modeling guides and spreadsheets for practical application in various fields. | ||
|
||
# Features and Functionality | ||
|
||
Phitter offers a range of features designed to streamline the process of probability distribution analysis: | ||
|
||
- **Flexible Fitting**: Users can fit both continuous and discrete distributions to their data. | ||
- **Customizable Analysis**: Options to specify the number of bins, confidence level, and distributions to fit. | ||
- **Parallel Processing**: Support for multi-threaded fitting to improve performance. | ||
- **Comprehensive Output**: Detailed summaries of fitted distributions, including parameters, test statistics, and rankings. | ||
- **Visualization Tools**: Functions to plot histograms, PDFs, ECDFs, and Q-Q plots for visual analysis. | ||
- **Distribution Utilities**: Methods to work with individual distributions, including CDF, PDF, PPF, and sampling functions. | ||
|
||
# Implementation and Usage | ||
|
||
Phitter is implemented in Python and is available via PyPI. It requires Python 3.9 or higher. The library can be easily installed using pip: | ||
|
||
``` | ||
pip install phitter | ||
``` | ||
|
||
Basic usage involves creating a `PHITTER` object with a dataset and calling the `fit()` method: | ||
|
||
```python | ||
import phitter | ||
|
||
data = [...] # Your dataset | ||
phi = phitter.PHITTER(data) | ||
phi.fit() | ||
``` | ||
|
||
More advanced usage allows for customization of fitting parameters and specific distribution analysis. | ||
|
||
# Conclusion | ||
|
||
Phitter provides researchers, data scientists, and statisticians with a powerful tool for probability distribution analysis. By offering a comprehensive set of distributions, multiple goodness-of-fit tests, and interactive visualizations, Phitter simplifies the process of identifying and working with probability distributions in various data-driven fields. | ||
|
||
# References |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
__version__ = "0.0.8" | ||
__version__ = "0.7.1" | ||
|
||
from .main import PHITTER | ||
from phitter import continuous | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" | |
|
||
[project] | ||
name = "phitter" | ||
version = "0.0.8" | ||
version = "0.7.1" | ||
description = "Find the best probability distribution for your dataset" | ||
authors = [{name = "Sebastián José Herrera Monterrosa", email = "[email protected]"}] | ||
readme = "README.md" | ||
|
@@ -36,7 +36,8 @@ dependencies = [ | |
"scipy>=1.1.0", | ||
"plotly>=5.14.0", | ||
"kaleido>=0.2.1", | ||
"matplotlib>=3.3" | ||
"matplotlib>=3.3", | ||
"pandas>=1.5.0" | ||
] | ||
|
||
[project.urls] | ||
|
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters