series 操作

cr-mao · Aug 26, 2024 · 55c546e · 55c546e
1 parent 522e6a5
commit 55c546e
Show file tree

Hide file tree

Showing 11 changed files with 1,275 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -24,6 +24,8 @@ jupyter notebook ,numpy,pandas,matplotlib
 - [numpy排序找索引操作](datahandling/07-NumpyArgAndSortOperation/07-ArgAndSortOperation.ipynb)
 - [numpy比较和神奇索引](datahandling/08-ComparisonAndFancyIndexing/08-ComparisonAndFancyIndexing.ipynb)
 - [pandas中的数据结构](datahandling/20-PandasDataFrameSeriesPanel/pandasDataFrameSeriesPanel.ipynb)
+- [Series创建、属性、计算](datahandling/21-SeriesBasic/seriesBasic.ipynb)
+- [Series的索引与基本操作](datahandling/22-SerieIndexAndOperation/22-seriesIndexAndOperation.ipynb)
 
 
 ### 机器学习
@@ -48,17 +50,23 @@ jupyter notebook ,numpy,pandas,matplotlib
   - [正规方程法实现多元线性回归](machinelearning/linearRegression/05-OurLinearRegression/OurLinearRegression.ipynb)
   - [sklearn中解决线性回归](machinelearning/linearRegression/06-RegressionInScikitLlearn/RegressionInScikitlearn.ipynb)
 - 梯度下降法
+  - [梯度下降法理论、公式](machinelearning/03梯度下降法.md)
   - [模拟实现梯度下降法(单变量)](machinelearning/gradientDescent/01-GradientDescentSimulations/01-GradientDescentSimulations.ipynb)
   - [在线性回归中实现梯度下降法](machinelearning/gradientDescent/02-ImplementGradientDescentInLinearRegression/02-ImplementGradientDescentInLinearRegression.ipynb)
   - [梯度下降向量化公式及性能和正规方程对比](machinelearning/gradientDescent/03-VectorizeGradientDescent/03-VectorizeGradientDescent.ipynb)
-  - [随机梯队下降法](machinelearning/gradientDescent/04-StochasticGradientDescent/04-StochasticGradientDescent.ipynb)
+  - [随机梯度下降法](machinelearning/gradientDescent/04-StochasticGradientDescent/04-StochasticGradientDescent.ipynb)
 
 
 
 ## links
 - [基于Python的数据分析与可视化](https://juejin.cn/book/7240731597035864121)
 - [sklearn官网](https://scikit-learn.org/stable/index.html)
 - [Python3入门机器学习 经典算法与应用](https://coding.imooc.com/class/chapter/169.html)
+- 书籍
+  - 机器学习(公式推导与代码实现)
+  - 从零开始机器学习的数学原理和算法实践
+  - 跟着迪哥学python数据分析与机器学习实战
+
 
 
 
diff --git a/datahandling/20-PandasDataFrameSeriesPanel/pandasDataFrameSeriesPanel.ipynb b/datahandling/20-PandasDataFrameSeriesPanel/pandasDataFrameSeriesPanel.ipynb
diff --git a/datahandling/20-PandasDataFrameSeriesPanel/股票.csv b/datahandling/20-PandasDataFrameSeriesPanel/股票.csv
@@ -0,0 +1,4 @@
+date,open,close,high,low,volume
+2020-01-02,16.024,16.244,16.324,15.924,1530231
+2020-01-03,16.314,16.554,16.684,16.294,1116194
+2020-01-06,16.384,16.444,16.714,16.284,862083
diff --git a/datahandling/21-SeriesBasic/seriesBasic.ipynb b/datahandling/21-SeriesBasic/seriesBasic.ipynb
@@ -0,0 +1,346 @@
+{
+ "cells": [
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "## series 篇\n",
+    "\n",
+    "\n",
+    " Series 数据结构由两部分组成，分别为标签索引 index 和一维数组 Values。\n",
+    " \n",
+    "\n",
+    "构造方法：pandas.Series(data, index)\n",
+    "\n",
+    "Series 的数据传入类型有三种形式，分别为列表、Numpy 数组、字典。\n",
+    "\n",
+    "Series有点类型php中的一维数组\n",
+    "\n",
+    "\n",
+    " \n"
+   ],
+   "id": "4cf95c18716226d4"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T03:09:26.689571Z",
+     "start_time": "2024-08-26T03:09:26.355388Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import pandas as pd  \n",
+    "data1=[93,92,79,59]  \n",
+    "index=['张飞','关羽','赵云','貂蝉']  \n",
+    "Score=pd.Series(data1,index)  \n",
+    "print(Score)"
+   ],
+   "id": "bf0142e39b6f6fce",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "张飞    93\n",
+      "关羽    92\n",
+      "赵云    79\n",
+      "貂蝉    59\n",
+      "dtype: int64\n"
+     ]
+    }
+   ],
+   "execution_count": 1
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T03:10:13.139102Z",
+     "start_time": "2024-08-26T03:10:13.131617Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# name 属性 表示这些数据是什么属性的\n",
+    "import pandas as pd  \n",
+    "data1=[93,92,79,59]  \n",
+    "index=['张飞','关羽','赵云','貂蝉']  \n",
+    "Score=pd.Series(data1,index,name='学生成绩')  \n",
+    "print(Score)"
+   ],
+   "id": "765d1dfc79cda2ef",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "张飞    93\n",
+      "关羽    92\n",
+      "赵云    79\n",
+      "貂蝉    59\n",
+      "Name: 学生成绩, dtype: int64\n"
+     ]
+    }
+   ],
+   "execution_count": 2
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "#### 用列表创建 Series\n",
+   "id": "ece9f19202a36779"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T03:18:56.260035Z",
+     "start_time": "2024-08-26T03:18:56.253975Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import pandas as pd  \n",
+    "data1=[93,92,79,59]  \n",
+    "index=['张飞','关羽','赵云','貂蝉']  \n",
+    "Score=pd.Series(data1,index)  \n",
+    "print(Score)"
+   ],
+   "id": "b3817d29ce8b7f10",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "张飞    93\n",
+      "关羽    92\n",
+      "赵云    79\n",
+      "貂蝉    59\n",
+      "dtype: int64\n"
+     ]
+    }
+   ],
+   "execution_count": 3
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "#### 利用 Numpy 数组创建 Series\n",
+    "\n",
+    "\n",
+    "\n"
+   ],
+   "id": "35f8095263bff462"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T03:19:30.530510Z",
+     "start_time": "2024-08-26T03:19:30.525235Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import pandas as pd  \n",
+    "import numpy as np \n",
+    "data1=np.array([93,92,79,59])  \n",
+    "index=['张飞','关羽','赵云','貂蝉']  \n",
+    "Score=pd.Series(data1,index)  \n",
+    "print(Score)"
+   ],
+   "id": "c8d99a858c169a95",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "张飞    93\n",
+      "关羽    92\n",
+      "赵云    79\n",
+      "貂蝉    59\n",
+      "dtype: int64\n"
+     ]
+    }
+   ],
+   "execution_count": 4
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T03:22:56.069167Z",
+     "start_time": "2024-08-26T03:22:56.062796Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# 浅copy，修改数据会影响\n",
+    "Score[\"张飞\"]=80\n",
+    "print(data1)"
+   ],
+   "id": "96eed9c8206e854e",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[80 92 79 59]\n"
+     ]
+    }
+   ],
+   "execution_count": 5
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "#### 利用字典创建 Series\n",
+    "\n",
+    "利用字典创建 Series 对象，key 关键字自动转化为 Series 对象的索引，不需要额外设置索引\n",
+    "\n",
+    "in 和 not in 操作在 Series 对象中是支持的。\n",
+    "\n",
+    "用字典来创建Series，不会产生副本。 \n",
+    "\n"
+   ],
+   "id": "1079c2719adaa357"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T03:23:49.876990Z",
+     "start_time": "2024-08-26T03:23:49.870028Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import pandas as pd  \n",
+    "data={'张飞':93,'关羽':92,'赵云':79,'貂蝉':59}  \n",
+    "Score=pd.Series(data)  \n",
+    "print(Score)"
+   ],
+   "id": "5862a61fb75b1d70",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "张飞    93\n",
+      "关羽    92\n",
+      "赵云    79\n",
+      "貂蝉    59\n",
+      "dtype: int64\n"
+     ]
+    }
+   ],
+   "execution_count": 6
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "### series的属性\n",
+    "\n",
+    "方法\t作用\n",
+    "- size\t获取数据个数\n",
+    "- index\t获取索引\n",
+    "- dtypes\t获取数据类型\n",
+    "- values\t查看数据部分"
+   ],
+   "id": "766c3cafbce88ac4"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T05:27:19.432154Z",
+     "start_time": "2024-08-26T05:27:19.426781Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "print(Score.size)\n",
+    "print(Score.index)\n",
+    "print(Score.dtypes)\n",
+    "print(Score.values)"
+   ],
+   "id": "aeef7b26dc8786d4",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4\n",
+      "Index(['张飞', '关羽', '赵云', '貂蝉'], dtype='object')\n",
+      "int64\n",
+      "[93 92 79 59]\n"
+     ]
+    }
+   ],
+   "execution_count": 8
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "### 支持运算",
+   "id": "318239e23d4aa318"
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-08-26T05:27:45.902568Z",
+     "start_time": "2024-08-26T05:27:45.892758Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import pandas as pd  \n",
+    "data={1:93,2:92,3:79,4:59}  \n",
+    "Score=pd.Series(data)  \n",
+    "print(Score*1.5)"
+   ],
+   "id": "fddabc21aaf710af",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1    139.5\n",
+      "2    138.0\n",
+      "3    118.5\n",
+      "4     88.5\n",
+      "dtype: float64\n"
+     ]
+    }
+   ],
+   "execution_count": 9
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "outputs": [],
+   "execution_count": null,
+   "source": "",
+   "id": "16354096f564187b"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}