Skip to content

Commit

Permalink
PCA,数据处理基础
Browse files Browse the repository at this point in the history
  • Loading branch information
cr-mao committed Sep 8, 2024
1 parent 3c204db commit a7790b3
Show file tree
Hide file tree
Showing 62 changed files with 7,813 additions and 707 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
.idea

.vscode
mydata
74 changes: 36 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

jupyter notebook ,numpy,pandas,matplotlib

- [开发环境](datahandling/docs/开发环境.md)
- [开发环境](datahandling/开发环境.md)
- [数据领域中的专业术语](datahandling/数据领域中的专业术语.md)
- [numpy数据基础](datahandling/01-NumpyArrayBasics/01-NumpyArrayBasics.ipynb)
- [numpy数组创建](datahandling/02-NumpyCreateArray/02CreateNumpyArray.ipynb)
Expand All @@ -30,52 +30,51 @@ jupyter notebook ,numpy,pandas,matplotlib
- [Series创建、属性、计算](datahandling/21-SeriesBasic/seriesBasic.ipynb)
- [Series的索引与基本操作](datahandling/22-SerieIndexAndOperation/22-seriesIndexAndOperation.ipynb)
- pandas
- [dataframe创建、基本属性与索引切片](datahandling/23-PandasDataframeBasic/dataframeBasic.ipynb)
- [dataframe中的方法与索引技巧](datahandling/24-PandasDataframeMethodAndIndex/dataframeMethodAndIndex.ipynb)
- [dataframe统计运算和逻辑运算](datahandling/25-PandasDataframeStatAndLogic/dataframeStatAndLogic.ipynb)
- [dataframe数据计算](datahandling/26-PandasDataframeCompute/dataframe_compute.ipynb)
- [时间序列](datahandling/27-PandasTime/pandas_time.ipynb)
- [io的读取和存储、缺失值处理、离散化处理](datahandling/28-PandasIoAndNanAndDiscrete/pandasIoNan.ipynb)



- [dataframe创建、基本属性与索引切片](datahandling/23-PandasDataframeBasic/dataframeBasic.ipynb)
- [dataframe中的方法与索引技巧](datahandling/24-PandasDataframeMethodAndIndex/dataframeMethodAndIndex.ipynb)
- [dataframe统计运算和逻辑运算](datahandling/25-PandasDataframeStatAndLogic/dataframeStatAndLogic.ipynb)
- [dataframe数据计算](datahandling/26-PandasDataframeCompute/dataframe_compute.ipynb)
- [时间序列](datahandling/27-PandasTime/pandas_time.ipynb)
- [io的读取和存储、缺失值处理、离散化处理](datahandling/28-PandasIoAndNanAndDiscrete/pandasIoNan.ipynb)
- matplot
- [matplot基础](datahandling/31-Matplotlib-Basics/Matplotlib-Basics.ipynb)
- [matplot其他](datahandling/32-Matplot/matplot.ipynb)

## 机器学习

- knn
- [knn理论、公式](machinelearning/01knn.md)
- [实现自己的knn](machinelearning/knn/01-kNNBasics/kNNBasics.ipynb)
- [sklearn中的knn](machinelearning/knn/02-kNNInScikitLearn/kNNinScikitlearn.ipynb)
- [训练数据集和测试数据集拆分](machinelearning/knn/03-TrainTestSplit/TrainTestSplit.ipynb)
- [结果准确度](machinelearning/knn/04-AccuracyScore/AccuracyScore.ipynb)
- [超参数寻找](machinelearning/knn/05-HyperParameters/HyperParameters.ipynb)
- [网格搜索超参数](machinelearning/knn/06-GridSearch/GridSearch.ipynb)
- [数据归一化和标准化](machinelearning/knn/07-FeatureScaling/FeatureScaling.ipynb)
- [sklearn中的标准化](machinelearning/knn/08-ScalerinScikitLearn/ScalerInScikitLearn.ipynb)
- [knn理论、公式](machinelearning/01knn.md)
- [实现自己的knn](machinelearning/knn/01-kNNBasics/kNNBasics.ipynb)
- [sklearn中的knn](machinelearning/knn/02-kNNInScikitLearn/kNNinScikitlearn.ipynb)
- [训练数据集和测试数据集拆分](machinelearning/knn/03-TrainTestSplit/TrainTestSplit.ipynb)
- [结果准确度](machinelearning/knn/04-AccuracyScore/AccuracyScore.ipynb)
- [超参数寻找](machinelearning/knn/05-HyperParameters/HyperParameters.ipynb)
- [网格搜索超参数](machinelearning/knn/06-GridSearch/GridSearch.ipynb)
- [数据归一化和标准化](machinelearning/knn/07-FeatureScaling/FeatureScaling.ipynb)
- [sklearn中的标准化](machinelearning/knn/08-ScalerinScikitLearn/ScalerInScikitLearn.ipynb)
- 线性回归法
- [线性回归理论、公式](machinelearning/02线性回归.md)
- [简单线性回归实现](machinelearning/linearRegression/01-SimpleLinearRegressionImplementation/SimpleLinearRegressionImplementation.ipynb)
- [向量化运算效率高](machinelearning/linearRegression/02-Vectorization/Vectorization.ipynb)
- [衡量回归算法的标准,MSE、MAE](machinelearning/linearRegression/03-RegressionMetricsMSE-vs-MAE/RegressionMetricsMSE-vs-MAE.ipynb)
- [最好的衡量线性回归法的指标:R Squared ](machinelearning/linearRegression/04-R-Squared/R-Squared.ipynb)
- [正规方程法实现多元线性回归](machinelearning/linearRegression/05-OurLinearRegression/OurLinearRegression.ipynb)
- [sklearn中解决线性回归](machinelearning/linearRegression/06-RegressionInScikitLlearn/RegressionInScikitlearn.ipynb)
- [模拟欠拟合与过拟合、正则化处理](machinelearning/linearRegression/08-UnderfittingAndOverfitting/underfittingAndOverfitting.ipynb)

- [线性回归理论、公式](machinelearning/02线性回归.md)
- [简单线性回归实现](machinelearning/linearRegression/01-SimpleLinearRegressionImplementation/SimpleLinearRegressionImplementation.ipynb)
- [向量化运算效率高](machinelearning/linearRegression/02-Vectorization/Vectorization.ipynb)
- [衡量回归算法的标准,MSE、MAE](machinelearning/linearRegression/03-RegressionMetricsMSE-vs-MAE/RegressionMetricsMSE-vs-MAE.ipynb)
- [最好的衡量线性回归法的指标:R Squared ](machinelearning/linearRegression/04-R-Squared/R-Squared.ipynb)
- [正规方程法实现多元线性回归](machinelearning/linearRegression/05-OurLinearRegression/OurLinearRegression.ipynb)
- [sklearn中解决线性回归](machinelearning/linearRegression/06-RegressionInScikitLlearn/RegressionInScikitlearn.ipynb)
- [模拟欠拟合与过拟合、正则化处理](machinelearning/linearRegression/08-UnderfittingAndOverfitting/underfittingAndOverfitting.ipynb)

- 梯度下降法
- [梯度下降法理论、公式](machinelearning/03梯度下降法.md)
- [模拟实现梯度下降法(单变量)](machinelearning/gradientDescent/01-GradientDescentSimulations/01-GradientDescentSimulations.ipynb)
- [在线性回归中实现梯度下降法](machinelearning/gradientDescent/02-ImplementGradientDescentInLinearRegression/02-ImplementGradientDescentInLinearRegression.ipynb)
- [梯度下降向量化公式及性能和正规方程对比](machinelearning/gradientDescent/03-VectorizeGradientDescent/03-VectorizeGradientDescent.ipynb)
- [随机梯度下降法](machinelearning/gradientDescent/04-StochasticGradientDescent/04-StochasticGradientDescent.ipynb)
- [sklearn中的随机梯度下降法](machinelearning/gradientDescent/05-SGDInScikitLearn/SGDInScikitLearn.ipynb)
- [梯度下降法理论、公式](machinelearning/03梯度下降法.md)
- [模拟实现梯度下降法(单变量)](machinelearning/gradientDescent/01-GradientDescentSimulations/01-GradientDescentSimulations.ipynb)
- [在线性回归中实现梯度下降法](machinelearning/gradientDescent/02-ImplementGradientDescentInLinearRegression/02-ImplementGradientDescentInLinearRegression.ipynb)
- [梯度下降向量化公式及性能和正规方程对比](machinelearning/gradientDescent/03-VectorizeGradientDescent/03-VectorizeGradientDescent.ipynb)
- [随机梯度下降法](machinelearning/gradientDescent/04-StochasticGradientDescent/04-StochasticGradientDescent.ipynb)
- [sklearn中的随机梯度下降法](machinelearning/gradientDescent/05-SGDInScikitLearn/SGDInScikitLearn.ipynb)
- PCA
- [PCA理论、公式](machinelearning/PCA.md)
- 逻辑回归
- [逻辑回归理论、公式](machinelearning/04逻辑回归.md)
- [逻辑回归理论、公式](machinelearning/04逻辑回归.md)

- 朴素叶贝斯



### 案例

Expand All @@ -86,7 +85,6 @@ jupyter notebook ,numpy,pandas,matplotlib
- [用户消费能力、标准化欧式距离](machinelearning/recommand/02distance/distance.ipynb)
- [NearestNeighbors、余弦相似性找出最相似的用户](machinelearning/recommand/03NearestNeighborsAndConsineSimiarity/NearestNeighbors_and_consine_simiarity.ipynb)


## links

- 机器学习(公式推导与代码实现)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1723,17 +1723,18 @@
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-23T05:43:57.038124Z",
"start_time": "2024-08-23T05:43:57.033665Z"
"end_time": "2024-09-07T02:34:42.994496Z",
"start_time": "2024-09-07T02:34:42.989930Z"
}
},
"cell_type": "code",
"source": [
"import numpy as np \n",
"fruit_price=np.array([[5,4,3]]) \n",
"jinshu=np.array([[2],[3],[1]]) \n",
"print('水果的总价格为:\\n',fruit_price@jinshu) \n",
"print('水果的总价格为:\\n',np.dot(fruit_price,jinshu))\n",
"\n",
"print(np.sum(np.dot(fruit_price,jinshu)))\n",
"# numpy 数组提供的矩阵乘法符号 @ 或者 dot 方法实现矩阵乘法,便可以非常方便地获取总价格\n"
],
"outputs": [
Expand All @@ -1744,11 +1745,12 @@
"水果的总价格为:\n",
" [[25]]\n",
"水果的总价格为:\n",
" [[25]]\n"
" [[25]]\n",
"25\n"
]
}
],
"execution_count": 8
"execution_count": 2
},
{
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -816,6 +816,48 @@
}
],
"execution_count": 29
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-09-06T15:40:48.482736Z",
"start_time": "2024-09-06T15:40:48.402018Z"
}
},
"cell_type": "code",
"source": [
"import numpy as np \n",
"x=np.random.randint(0,50,(50,2))"
],
"outputs": [],
"execution_count": 2
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-09-06T15:41:46.155432Z",
"start_time": "2024-09-06T15:41:46.152552Z"
}
},
"cell_type": "code",
"source": "print(np.mean(x))",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"22.93\n"
]
}
],
"execution_count": 6
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": ""
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@
{
"metadata": {},
"cell_type": "markdown",
"source": "### series",
"source": [
"### series\n",
"\n",
"单列多行的结构"
],
"id": "1cef1671ff84ea8e"
},
{
Expand Down
474 changes: 474 additions & 0 deletions datahandling/31-Matplotlib-Basics/Matplotlib-Basics.ipynb

Large diffs are not rendered by default.

447 changes: 0 additions & 447 deletions datahandling/31matplot/matplot.ipynb

This file was deleted.

591 changes: 591 additions & 0 deletions datahandling/32-Matplot/matplot.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit a7790b3

Please sign in to comment.