TARGET AUDIENCE

BUSINESS INTELLIGENCE: KPI, TRENDS, AND PREDICTIONS

SCIENTIFIC COMPUTING: INFERENTIAL & DESCRIPTIVE STATISTICS

DATA-DRIVEN WEB APPLICATIONS: PYTHON & STREAMLIT

Live Demo

YouTube

TARGET AUDIENCE

Serial No.	Targeted Audience
1	Scientific Researchers
2	Health & Medical Laboratories
3	Data Scientists
4	Educational Institutions
5	Statisticians
6	Natural Mathematics Researchers
7	Information System Analysts
8	Software Developers
9	Finance & Accounting Professionals
10	Machine Learning Engineers
11	Market Analysts
12	Economists
13	Business Intelligence Analysts
14	Operations Researchers

INTRODUCTION

I’m Sameer, a data scientist and software developer. This application is built to address data-related challenges across various fields. It provides a structured approach to data workflows, including collection, exploration, cleaning, transformation, analysis, and interpretation, helping to uncover valuable insights and support effective business decisions.

PROBLEM STATEMENT

The target audience mentioned above struggles to analyze large datasets effectively. Many do not know the right tools or technologies for data collection, analysis, and interpretation. Instead, they rely on manual calculations, Excel, or SPSS. These methods may work for small datasets but often lead to computational errors, inaccurate results, wrong findings and other data problems.

IMPORTANCE OF THIS PROJECT

This project simplifies data analysis by providing automated tools and mathematical models that use data science and statistical techniques to handle complex data efficiently. It reduces errors, saves time, and allows users to get accurate results without needing advanced skills or manual calculations. Additionally, the project demonstrates how to apply mathematical models and open-source pre-trained machine learning models to solve real-world problems in business enviroment, making data-driven informed decisions more accessible and effective.

MAIN OBJECTIVE

The main objective of this project is to develop analytical information system that automates how statistics, machine learning, and programming languages can be used for complex data analysis in a business environment. It aims to help users enhance decision-making and generate insights more effectively through automation.

SPECIFIC OBJECTIVES

Data collection using spreadsheets
Data exploration
Data cleaning and transformation
Model building and training
Model performance evaluation
Prediction and optimization
Statistical inference and managerial decision-making
Model deployment using Streamlit and Django frameworks

Step	Description
Data Collection using Spreadsheets	Gathering and organizing raw data into structured formats for analysis.
Data Exploration	Analyzing data distribution, identifying patterns, and uncovering insights using visualization and summary statistics.
Data Cleaning and Transformation	Handling missing values, removing outliers, and transforming data for better compatibility with analytical models.
Model Building and Training	Designing and training machine learning models to identify patterns or make predictions.
Model Performance Evaluation	Assessing model accuracy, precision, recall, and other metrics to ensure optimal performance.
Prediction and Optimization	Utilizing models for forecasting and applying optimization techniques for better outcomes.
Statistical Inference and Managerial Decision-Making	Deriving actionable insights from data and guiding strategic decisions through statistical analysis.
Model Deployment using Streamlit and Django Frameworks	Deploying data science solutions as interactive web applications for accessibility and scalability.

METHODOLOGY

Data Collection: Data is gathered from relevant sources depending on the field of application (e.g., health records, financial data, survey data).
Data Cleaning: The data undergoes preprocessing to handle missing values, correct inaccuracies, and transform data types for accurate analysis.
Data Analysis: Using descriptive and inferential statistical methods, key patterns and trends are identified within the dataset.
Model Development: Machine learning models, including regression and classification models, are developed to predict outcomes and identify patterns.
Visualization: Interactive visualizations such as histograms, ogives, and scatter plots help in the intuitive understanding of results.
Interpretation: Insights are derived from the results, helping users make data-driven decisions relevant to their field of interest.

PROJECT FEATURES:

S.No	Topic	Description
1	CO-VARIANCE	Measure of the joint variability of two variables.
2	ADVANCED MULTIVARIATE REGRESSION	Regression techniques involving multiple predictors and response variables.
3	TRENDS BY GEO-REFERENCING	Analyze data trends based on geographic information.
4	DESCRIPTIVE STATISTICS ANALYTICS	Summary and analysis of data with central tendency, dispersion, etc.
5	MULTIPLE REGRESSION ANALYSIS	Model the relationship between one dependent variable and multiple independent variables.
6	SALES TRENDS BY DATE RANGE	Analyze sales patterns over a specified time period.
7	BUSINESS TARGET BY PROGRESS	Evaluate business performance relative to targets.
8	INTERACTIVE VISUALIZATION GRAPHS	Dynamic and user-interactive data visualizations.
9	STATISTICS FOR GROUPED DATA	Statistical analysis where data is organized into groups or intervals.
10	STATISTICS FOR UNGROUPED DATA	Statistical analysis of raw, ungrouped data values.
11	ADVANCED PYTHON QUERY	Techniques for complex data querying using Python.
12	OUTLIER DETECTION TECHNIQUES	Methods for identifying abnormal data points in datasets.
13	HYPOTHESIS TESTING	Statistical method to test assumptions or claims about a population.
14	FREQUENCY DISTRIBUTION	Representation of data showing the number of observations within intervals.
15	NORMAL DISTRIBUTIONS	Bell-shaped distribution that is symmetrical about the mean.
16	PROBABILITY DISTRIBUTIONS	Function that shows the likelihood of different outcomes in an experiment.
17	LOGISTIC REGRESSION	Model to estimate probabilities and model binary outcomes.
18	ESTIMATION OF POPULATION	Inference of population parameters based on sample data.
19	PROBABILITY DENSITY	Function describing the likelihood of a continuous random variable's outcome.

PROJECT PAGES

PAGE 1: DESCRIPTIVE STATISTICS FOR GROUPED DATA

1. Data Loading

Data Source: Loads dataset from a CSV file for analysis.

2. Age Interval Calculation

Purpose: Creates age intervals (e.g., 0-10, 11-20) and labels for categorizing age data into discrete groups.

3. Frequency Table Creation

Purpose: Generates a table counting occurrences within each age interval, facilitating grouped data analysis.

4. Grouped Statistical Calculations

Purpose: Calculates essential statistics for grouped data, aiding in understanding data distribution:
- Mean: Computes the weighted average midpoint of age intervals.
- Mode: Identifies the most frequent age interval.
- Median: Determines the midpoint interval in cumulative frequency.
- Variance and Standard Deviation: Measures the spread of data points around the mean.
- Skewness and Kurtosis: Assesses the symmetry and peakedness of the data distribution.
- Interquartile Range (IQR): Calculates the spread between the first and third quartiles.
- Standard Error: Measures the precision of the sample mean.

5. Metric Display in Streamlit

Purpose: Displays key grouped data statistics (mean, median, mode, etc.) to the user in an interactive dashboard.

6. Skewness Visualization

Purpose: Plots a normal distribution curve to visualize data symmetry, with an annotation for skewness, allowing for a visual assessment of data distribution shape.

7. Frequency Table Display

Purpose: Presents a frequency table with cumulative frequencies, providing insights into data distribution across age intervals.

PAGE 2: DESCRIPTIVE STATISTICS & DATA VISUALIZATION

Data Loading and Selection
- Loads data from an Excel file (data.xlsx) and uses it for analytical processing.
- Allows users to filter data by Region, Location, and Construction fields for customized analysis.
Descriptive Analytics
- Computes key summary statistics such as Sum, Mode, Mean, and Median for the Investment column.
- Displays these metrics in the Streamlit interface for easy visualization.
Data Visualization
- Histograms: Visualizes the frequency distribution of variables in the dataset.
- Bar Chart: Shows investments by BusinessType, providing a breakdown of investments by type.
- Line Chart: Visualizes investments by State, showing trends across different states.
- Pie Chart: Represents Ratings by Region, showing the proportion of ratings for each region.
Target Tracking and Progress Bar
- Defines a target for investment and calculates the current percentage toward this target.
- Provides a progress bar to visually represent how close the current investment is to the target.
Quartile Analysis
- Uses a box plot to analyze the distribution of Investment by BusinessType, displaying quartiles and helping identify outliers.
User Interface with Interactive Elements
- Includes an interactive sidebar with options to navigate between different views (Home, Progress).
- Enables selection of quantitative features for exploring distributions and trends.

PAGE 3: HYPOTHESIS TESTING

Data Loading and Cleaning:
- Reads data from an Excel file (hypothesis.xlsx).
- Drops unnecessary columns to focus on relevant fields for hypothesis testing.
Hypothesis Formulation:
- Defines null and alternative hypotheses for comparing the mean revenues of Group A and Group B.
Confidence Level Setup:
- Sets a confidence level of 95% for statistical significance in hypothesis testing.
T-Test for Independent Samples:
- Conducts a t-test to compare means of two independent groups (Group A and Group B).
- Calculates t-statistic and p-value for hypothesis evaluation.
Sample Statistics Calculation:
- Computes and displays sample mean and standard deviation for both groups.
- Confirms sample size and enforces t-distribution usage only for samples smaller than 30.
Critical Value Determination:
- Calculates the critical value based on confidence level and sample size.
T-Distribution Curve Generation:
- Generates a probability density curve for visualizing the t-distribution.
Decision-Making:
- Compares computed t-statistic with critical value to decide whether to reject the null hypothesis.
Visualization of Results:
- Displays t-distribution curve with annotated critical value, t-statistic, and rejection region.
- Uses visual aids (vertical lines, filled regions) to highlight decision boundaries and critical regions.
Summary Metrics Display:
- Shows computed values and critical values in a dashboard format.
- Presents a sample size and statistical metrics in a well-organized layout using Streamlit components.

PAGE 4: ADVANCED LINEAR REGRESSION

1. Data Loading and Selection

Data Source: Loaded from CSV file (advanced_regression.csv).
Feature Columns: interest_rate, unemployment_rate, index_price.
Filtering: Data is filtered based on user-selected year and month.

2. Exploratory Data Analysis (EDA)

Correlation Analysis:

Used sns.regplot to visually explore relationships between features.
Calculated and displayed correlation matrix for the variables.

Visualizing Relationships:

Regression plots show the relationships between interest_rate and unemployment_rate, interest_rate and index_price.
Box plots detect outliers in the dataset.

Variable Distributions:

Displayed histograms for variable frequency distributions.
Used sns.pairplot to examine pairwise relationships.

3. Handling Missing Data

Checked for missing values and displayed the count of NaN entries in each column.
Provided descriptive statistics (mean, standard deviation, etc.) for each variable.

4. Data Preprocessing

Splitting the Data:

Split the data into training and testing sets using train_test_split.

Standardization:

Applied standardization using StandardScaler to scale features.

5. Modeling

Multiple Linear Regression Model:

Built a linear regression model using LinearRegression from sklearn.
Used cross-validation to evaluate model performance.

Prediction:

Predicted the target variable (index_price) on the test dataset.

6. Model Evaluation

Performance Metrics:

Calculated and displayed Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE).

R² and Adjusted R²:

Calculated and displayed the R² and Adjusted R² values for model performance.

Residuals Analysis:

Computed residuals and visualized them using a normal distribution curve to check the error distribution.

7. Statistical Analysis

Used OLS (Ordinary Least Squares) regression from statsmodels to obtain detailed model insights, including coefficients and p-values.

PAGE 5: CO-VARIANCE

Data Loading: Loads data from an Excel file, allowing for further statistical operations and visualizations.
Feature Selection: Provides a feature selection for X variable, enabling dynamic analysis of various numerical features against the target variable.
Statistical Model Fitting: Fits an Ordinary Least Squares (OLS) regression model to examine the relationship between the selected X feature and the target variable (Projects).
Key Statistical Metrics Calculation:
- Intercept: Displays the intercept term of the model, representing the baseline effect on Projects.
- R-Squared: Shows the R-squared value, providing insight into the model's explanatory power.
- Adjusted R-Squared: Adjusts for the number of predictors to gauge model fit accuracy.
- Standard Error: Provides the standard error, indicating the precision of the intercept estimate.
Predictions and Residuals Calculation: Calculates model predictions and residuals for further analysis.
Data Visualization:
- Line of Best Fit Plot: Generates a scatter plot with a line of best fit to visualize the relationship between the selected X feature and Projects, assessing the model fit visually.
- Grid and Border Customization: Customizes plot appearance for better interpretability.

PAGE 6: DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

Data Loading
- Load dataset from a CSV file for analysis.
Quartile and IQR Calculation
- Calculate the 1st Quartile (Q1), 3rd Quartile (Q3), and Interquartile Range (IQR) for understanding the spread of the dataset.
Basic Statistics Computation
- Determine minimum, maximum, and median values to summarize the dataset's range and central tendency.
Ogives Plotting
- Generate Less Than and Greater Than Ogives to visualize cumulative frequency distribution.
- Add a vertical line and annotation for the median value to highlight central tendency in the plot.
Display Statistics in Streamlit Dashboard
- Display quartiles, IQR, min, max, and median values in an interactive layout for user insights.
- Apply styling to metrics for improved readability and visual appeal.
Interactive Visualization
- Present the ogives plot in Streamlit to allow for intuitive data exploration.

PAGE 7: DATA VISUALIZATION TECHNIQUES

Data Loading and Selection
- Loads data from an Excel file (data.xlsx) and uses it for analytical processing.
- Allows users to filter data by Region, Location, and Construction fields for customized analysis.
Descriptive Analytics
- Computes key summary statistics such as Sum, Mode, Mean, and Median for the Investment column.
- Displays these metrics in the Streamlit interface for easy visualization.
Data Visualization
- Histograms: Visualizes the frequency distribution of variables in the dataset.
- Bar Chart: Shows investments by BusinessType, providing a breakdown of investments by type.
- Line Chart: Visualizes investments by State, showing trends across different states.
- Pie Chart: Represents Ratings by Region, showing the proportion of ratings for each region.
Target Tracking and Progress Bar
- Defines a target for investment and calculates the current percentage toward this target.
- Provides a progress bar to visually represent how close the current investment is to the target.
Quartile Analysis
- Uses a box plot to analyze the distribution of Investment by BusinessType, displaying quartiles and helping identify outliers.
User Interface with Interactive Elements
- Includes an interactive sidebar with options to navigate between different views (Home, Progress).
- Enables selection of quantitative features for exploring distributions and trends.

PAGE 8: LINEAR REGRESSION

1. Data Loading and Preprocessing:

The dashboard loads an Excel dataset (regression.xlsx) containing information on Dependant, Wives, and Projects.
Extracts the independent variables (Dependant and Wives) and the dependent variable (Projects) for use in regression analysis.

2. Model Fitting and Prediction:

A Linear Regression model is trained on the dataset using Dependant and Wives to predict the Projects (dependent variable).
Predictions are made using the trained model and stored for further analysis.

3. Regression Coefficients:

The Intercept (Bo) and Coefficients (B1, B2) for the independent variables are calculated and displayed. These represent the linear relationship between the predictors and the dependent variable.

4. Model Evaluation Metrics:

R-squared (R²): Measures the proportion of variance in the dependent variable explained by the independent variables.
Adjusted R-squared: Adjusts R² for the number of predictors in the model, preventing overfitting.
Sum of Squared Errors (SSE): Calculates the total error between the predicted and actual values.
Sum of Squared Regression (SSR): Measures the variation explained by the model.

5. Prediction Table:

Displays a table with the actual and predicted Projects (Y) values, along with the SSE and SSR values for each data point.

6. Residual Analysis:

Residuals: The difference between the actual and predicted values of Projects is calculated.
A scatter plot of the residuals versus the predicted values is displayed to visualize model fit.
A Kernel Density Estimation (KDE) plot of the residuals is shown to analyze their distribution.

7. User Input and Prediction:

Users can input new values for Dependant and Wives in a sidebar form.
Upon submission, the model predicts the number of Projects for the provided inputs and displays the result.

8. Download Option:

The user can download the dataset with the actual values, predicted values, SSE, and SSR as a CSV file.

9. Visualizations:

Regression Line and Scatter Plot: Visualizes the relationship between actual and predicted values, including the best fit line.
Residual Plot: Shows the distribution of residuals using a KDE plot.

PAGE 9: NORMAL DISTRIBUTION

1. Data Collection

The application uses an Excel file (normal_distr.xlsx) to load the dataset which contains student marks.

2. Data Preprocessing

The data is cleaned by extracting the 'Marks' column for analysis.
A slider is created for users to select an X value from the data range (min, max, mean).

3. Statistical Calculations

Mean & Standard Deviation:
The application calculates the population mean and standard deviation of the marks.
Z-Score Calculation:
The Z-score is calculated using the formula:
Z = (X - Mean) / Standard Deviation, where X is the user-selected value.
Probability Calculation:
The cumulative distribution function (CDF) for the Z-score is computed using the normal distribution.

4. Visualizations

Standard Normal Distribution Curve:
A line plot of the standard normal distribution (Z ~ N(0, 1)) is generated using Plotly.
- Red marker indicates the selected Z-score value.
- The shaded area on the graph represents the probability for the selected Z-score value.
Standardized Marks Distribution:
A plot shows the probability distribution of standardized marks.
Probability of Selected X:
Another plot shows the probability density associated with the selected X value.

5. Standardization of Data

The application standardizes the marks (i.e., converts the marks into Z-scores) for comparison across datasets.
The standardized marks are added as a new column in the dataset.

6. Z-Table

A Z-table is generated which maps Z-scores to their corresponding cumulative probabilities.
The table allows the user to quickly reference the probability associated with different Z-scores.

7. Interactive Elements

Filters:
The user can filter the data using a multiselect dropdown for columns such as "fullname", "gender", "Marks", "Probability", and "Standardized Marks".
PDF Download:
The Z-table can be downloaded as a PDF file for further use or offline reference.

8. User Interaction

The sidebar allows the user to interact with the X value slider and see the corresponding changes in the graph and statistics.
Various interactive graphs display the probability distributions and Z-score information dynamically.

9. Statistical Insight

The application offers insights such as the probability of the selected X value, the Z-score, and the standard deviation, helping users understand the statistical significance of their data.

10. Output Display

The output is displayed in a structured layout with expandable sections for viewing different analyses:
- Estimation Parameters
- Normal Curves
- Standardized Student Marks Table
- Z Table

PAGE 10: ESTIMATION OF POPULATION PARAMETERS

Overview

In this page, we are performing a population estimation based on a sample dataset containing ages. The analysis involves calculating sample statistics and confidence intervals for the population mean and standard deviation. The critical steps and results are presented below, with visualizations to enhance the understanding of the statistical concepts.

Key Data Science and Statistical Concepts Used

1. Loading and Processing Data

The data is loaded from a CSV file, and the age column is extracted for statistical analysis.

2. Sample Statistics

Sample Size (n): The number of entries in the age column.
Sample Mean: The average age in the sample.
Sample Standard Deviation: The measure of variability in the sample.

3. Population Estimation

Population Size (N): The total number of individuals in the population (set to 1000 in this case).
Confidence Level (95%): The level of certainty we have in our estimation.

4. Confidence Intervals

Population Mean Confidence Interval: A range within which the true population mean is likely to lie, calculated using the sample mean and sample standard deviation.
Population Standard Deviation Confidence Interval: A range within which the true population standard deviation is likely to lie, calculated using the sample's chi-square distribution.

5. Standard Error of the Mean (SEM)

This metric is used to estimate the precision of the sample mean as an estimate of the population mean.

6. Critical Z-Value

We calculate the critical z-value for a 95% confidence level using the standard normal distribution, which helps in defining the range of values for the confidence interval.

7. Normal Distribution Curve

The normal distribution curve is plotted to represent the probability density of the sample mean. A shaded region is used to represent the 95% confidence interval for the population mean.

8. Plotly Visualization

A normal distribution curve is plotted using Plotly.
The 95% confidence interval is shaded under the curve to visualize the area within which the population mean is expected to lie.
Markers are added to highlight the sample mean and the confidence interval bounds.

PAGE 11: SALES ANALYTICS { CASE STUDY }

1. Data Import and Processing

Dataset Loading: A CSV file (sales.csv) is read into a pandas DataFrame for analysis.
Date Filtering: Users can filter the dataset by a date range (start and end dates). The data is filtered based on the OrderDate column to display relevant sales data.
Data Exploration: A DataFrame explorer is used to interactively view and filter the dataset, making it easier for users to explore the data.

2. Descriptive Analytics

Metrics Calculation:
- Total Products in Inventory: Count of Product entries to display the number of inventory items.
- Total Price Sum: The sum of all TotalPrice values is displayed to give an overall view of sales revenue.
- Price Range Analysis:
  - Maximum and minimum price for products are calculated and displayed.
  - Price range (difference between the maximum and minimum prices) is calculated.
These metrics provide key insights into inventory and sales data.

3. Data Visualization

Dot Plot: A scatter plot is used to visualize the relationship between Product and TotalPrice. Each point represents a product with its corresponding total price, and products are color-coded by their category.
Bar Graph: A bar chart is used to display the relationship between Product and UnitPrice. The chart aggregates UnitPrice over months to show trends in pricing.
Scatter Plot: A scatter plot is created based on user-selected features. It visualizes relationships between categorical (qualitative) data (feature_x) and numerical (quantitative) data (feature_y).
Bar Chart of Quantities: A bar chart visualizes the total quantity sold for each product, helping to analyze product demand.

4. Interactive User Interface

Date Range Selection: Users can select a date range from the sidebar, allowing them to filter sales data dynamically.
Feature Selection: Users can select features for the x and y axes to explore relationships in the data through scatter plots.
Data Table: The filtered dataset is displayed interactively for further analysis.

5. Statistical and Business Insights

Price Range Insights: The metrics calculated (maximum, minimum, range) help users identify high-value and low-value products, which is critical for pricing strategies.
Sales Trend Analysis: The dot plot and bar charts help identify trends in product sales, such as which products have higher sales and which products are more expensive.
Business Metrics: The overall revenue and inventory metrics provide insights into the health of the business and help with decision-making.

CONCLUTION

This page is focused on descriptive analytics and basic statistics. The main tasks involve:

Data cleaning and filtering.
Displaying key business metrics related to product pricing and sales volume.
Visualizing the relationship between various features such as product prices and quantities.
Providing interactive tools for users to explore the dataset and extract insights.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
__pycache__		__pycache__
data		data
pages		pages
resources		resources
streamlit		streamlit
Home.py		Home.py
README.md		README.md
UI.py		UI.py
Z table.pdf		Z table.pdf
add_data.py		add_data.py
advanced_regression.csv		advanced_regression.csv
coordinates.xlsx		coordinates.xlsx
data.xlsx		data.xlsx
dataset.csv		dataset.csv
descriptive_statistics.xlsx		descriptive_statistics.xlsx
estimation.csv		estimation.csv
hypothesis.xlsx		hypothesis.xlsx
logo1.png		logo1.png
logo2.png		logo2.png
normal_distr.xlsx		normal_distr.xlsx
python_query.xlsx		python_query.xlsx
query.py		query.py
regression.xlsx		regression.xlsx
requirements.txt		requirements.txt
sales.csv		sales.csv
style.css		style.css
telehealth_data.csv		telehealth_data.csv

shamiraty/OPEN-STREAMLIT-PROJECTS

Folders and files

Latest commit

History

Repository files navigation

BUSINESS INTELLIGENCE: KPI, TRENDS, AND PREDICTIONS

SCIENTIFIC COMPUTING: INFERENTIAL & DESCRIPTIVE STATISTICS

DATA-DRIVEN WEB APPLICATIONS: PYTHON & STREAMLIT

Live Demo

YouTube

TARGET AUDIENCE

INTRODUCTION

PROBLEM STATEMENT

IMPORTANCE OF THIS PROJECT

MAIN OBJECTIVE

SPECIFIC OBJECTIVES

METHODOLOGY

PROJECT FEATURES:

PROJECT PAGES

PAGE 1: DESCRIPTIVE STATISTICS FOR GROUPED DATA

1. Data Loading

2. Age Interval Calculation

3. Frequency Table Creation

4. Grouped Statistical Calculations

5. Metric Display in Streamlit

6. Skewness Visualization

7. Frequency Table Display

PAGE 2: DESCRIPTIVE STATISTICS & DATA VISUALIZATION

PAGE 3: HYPOTHESIS TESTING

PAGE 4: ADVANCED LINEAR REGRESSION

1. Data Loading and Selection

2. Exploratory Data Analysis (EDA)

Correlation Analysis:

Visualizing Relationships:

Variable Distributions:

3. Handling Missing Data

4. Data Preprocessing

Splitting the Data:

Standardization:

5. Modeling

Multiple Linear Regression Model:

Prediction:

6. Model Evaluation

Performance Metrics:

R² and Adjusted R²:

Residuals Analysis:

7. Statistical Analysis

PAGE 5: CO-VARIANCE

PAGE 6: DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

PAGE 7: DATA VISUALIZATION TECHNIQUES

PAGE 8: LINEAR REGRESSION

1. Data Loading and Preprocessing:

2. Model Fitting and Prediction:

3. Regression Coefficients:

4. Model Evaluation Metrics:

5. Prediction Table:

6. Residual Analysis:

7. User Input and Prediction:

8. Download Option:

9. Visualizations:

PAGE 9: NORMAL DISTRIBUTION

1. Data Collection

2. Data Preprocessing

3. Statistical Calculations

4. Visualizations

5. Standardization of Data

6. Z-Table

7. Interactive Elements

8. User Interaction

9. Statistical Insight

10. Output Display

PAGE 10: ESTIMATION OF POPULATION PARAMETERS

Overview

1. Loading and Processing Data

2. Sample Statistics

3. Population Estimation

4. Confidence Intervals

5. Standard Error of the Mean (SEM)

6. Critical Z-Value

7. Normal Distribution Curve

Packages