目录
-环境搭建
-python_learning_note
-numpy
-pandas
-matplotlib-seaborn
-爬虫
- 字符串格式化
- 调用函数时加与不加括号的区别
- 函数的默认参数
- enumberate() -- 作用于一个可遍历的对象,同时返回key 和values
>>> list(enumerate(seasons, start=1)) # 下标从 1 开始
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
>>>seq = ['one', 'two', 'three']
>>> for i, element in enumerate(seq):
... print i, element
...
0 one
1 two
2 three
-
显示全部array 输入
np.set_printoptions(threshold=np.inf)
-
数据加载
- npz file 加载
data = np.load(file.npz)
# 有时直接的load 网页数据无法下载,可以通过其他方式下载到本地再加载
这是data是有一个npz class ,不能直接的去看里面的内容
# 查看data里面的数据
>>> npx.files
>>> ['y','x']
>>> npz.f.x or npz['x']
>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) #得到里面的array
-
矩阵索引,切片
-
random模块
- permutation
- seed
- uniform
- randint
基本用法 :
np.random.randint(1,5,(3,3)) (最低值,最高值,元组(output shape)) array([[6, 1, 6], [4, 5, 7], [4, 4, 7]])
- linalg模块
- norm
- dataframe获取列名
- df.columns.values
- list(df)
-
df.values 返回df的值, np.arrays
-
pd.cut & pd.quct cut是根据values来平均划分,而qcut是根据分位数来划分,4分位数,中位数等
-
pd.groupby
-
示例 参数as_index作用 What is as_index in groupby in pandas?
当as_index = True时 , df.loc[] 只能用label来 比如'bk1'.
当as_index = False时 ,df.loc[] 只能用索引 0,1,2,
但是都能用 df.iloc[1], 结果一致
df.groupby('day')['total_bill'].mean()
df.groupby('day').filter(lambda x : x['total_bill'].mean() > 20)
df.groupby('day')['total_bill'].transform(lambda x : x/x.mean())
if we want to get a single value for each group -> use aggregate()
if we want to get a subset of the input rows -> use filter()
if we want to get a new value for each input row -> use transform()
- pd.drop 丢掉行或者列
- 丢掉列
df.drop(['lable'],axis = 1,inpalce = True)
axis丢掉列,inplace 是否返回改变df - 丢掉行 why can't pd.drop() by index number row
df.drop(df.index[[0, 2]])
ordf.drop(df.index[[np.arange(0,2)]])
- 丢掉列
why sort_values() is diifferent form sort_values().values
1.df = df.apply( lambda x: x.sort_values()) 会考虑到索引再合并
2.df.apply(lambda x: x.sort_values().values) 先返回numpy的arrays,再将arrays合并为dataframe
find maximum value in col C in pandas dataframe while group by both col A and B
df.groupby(['RT','Similarity','Name'],as_index=False)['Quality'].sum()
How to replace one col values with another col values in conditions [duplicate]- 通过mask来删选条件 , mask会返回False的object
df['RT'] = df['RT'].mask(df['similarity'] > 0.99, df['patch'])
Pandas mask / where methods versus NumPy np.where
链接
if we want to get a single value for each group -> use aggregate()
if we want to get a subset of the input rows -> use filter()
if we want to get a new value for each input row -> use transform()
- np.c_ : 将array转换为列向量, 并将所有的列向量合并
Examples
--------
>>> np.c_[np.array([1,2,3]), np.array([4,5,6])]
array([[1, 4],
[2, 5],
[3, 6]])
>>> np.c_[np.array([[1,2,3]]), 0, 0, np.array([[4,5,6]])]
array([[1, 2, 3, 0, 0, 4, 5, 6]])
-
subplot & subplots subplot返回axis ,而subplots返回fig,axis. subplots更加方便 Why do many examples use “fig, ax = plt.subplots()” in Matplotlib/pyplot/python differences between subplot() and subplots()
-
matplotlib颜色 转载
-
- ax.legend(loc = 1) 改变legend位置 ,常用的loc = {'best': 0}, {'upper right': 1}, {'upper left': 2}
-
sns.countplot
- scatter
- bar
- barplot官方example
-
sns.FacetGrid
- 参数bins 代表用多少个长方形 ,bins= False表示直接用kernel 分布曲线