Pythonpandas操作excel详解

见贤思齐 · 发表于 2024-9-5 00:39:31

文章目录1概述1.1pandas和openpyxl区别1.2Series和DataFrame2常用操作2.1创建Excel：to_excel()2.2读取Excel：read_excel()2.2.1header：的行索引2.2.2index_col：索引列2.2.3dtype：数据类型2.2.4skiprows：跳过的行数2.2.5usercols：指定列数2.2.6head(n)、tail(n)：读取前、后n行数据2.3读写数据2.3.1at()：获取单元格2.3.2loc[]：数据筛选2.3.3sort_values()：数据排序3实战3.1遍历Excel1概述1.1pandas和openpyxl区别Python中的pandas和openpyxl库，均可以处理excel文件，其中主要区别：pandas：①数据操作和分析方面表现优异。它提供了各种文件格式（包括Excel）中读取数据的函数，在过滤数据、汇总数据、处理缺失值和执行其它数据转换任务方便，特别有用。②使用方便。DataFrame对象，使用快速方便，且功能十分强大。openpyxl：侧重单元格格式设置。这个库也允许我们直接处理Excel文件。pandas快，但pandas做不了的事情，可以让openpyxl来做，例如：单元格注释、填充背景色等等1.2Series和DataFrameSeries：连续。可理解为“一维数组”，由一行或一列组成，具体是行，还是列，由DataFrame指定DataFrame：数据框。可理解为“二维数组”，由行和列组成importpandasaspd#Series示例s=pd.Series(['a','b','c'],index=[1,2,3],name='A')print(s)#1a#2b#3c#Name:A,dtype

bject#DataFrame示例s1=pd.Series(['a','b','c'],index=[1,2,3],name='A')s2=pd.Series(['aa','bb','cc'],index=[1,2,3],name='B')s3=pd.Series(['aaa','bbb','ccc'],index=[1,2,3],name='C')#方式1：指定Series为行df=pd.DataFrame([s1,s2,s3])print(df)#123#Aabc#Baabbcc#Caaabbbccc#方式2：指定Series为列df=pd.DataFrame({s1.name:s1,s2.name:s2,s3.name:s3})print(df)#ABC#1aaaaaa#2bbbbbb#3cccccc1234567891011121314151617181920212223242526272829'运行运行2常用操作2.1创建Excel：to_excel()importpandasaspd#测试数据data={'ID':[1,2,3],'Name':['张三','李四','王五']}#1.创建DataFrame对象df=pd.DataFrame(data=data)#可选操作。将ID设为索引，若不设置，会使用默认索引narray(n)df=df.set_index('ID')#写法1#df.set_index('ID',inplace=True)#写法2#2.写入excel至指定位置（若文件已存在，则覆盖）df.to_excel(r'C:\Users\Administrator\Desktop\Temp\1.xlsx')1234567891011121314指定索引前后，效果对比：2.2读取Excel：read_excel()importpandasaspd#1.读取excel。默认读取第一个sheetstudent=pd.read_excel(r'C:\Users\Administrator\Desktop\Temp\1.xlsx')#2.读取常用属性print(student.shape)#形状（行，列）print(student.columns)#列名12345678读取指定sheet：importpandasaspd#1.读取指定sheet的excel，以下两种方式等同student=pd.read_excel(r'C:\Users\Administrator\Desktop\Temp\1.xlsx',sheet_name=1)#student=pd.read_excel(r'C:\Users\Administrator\Desktop\Temp\1.xlsx',sheet_name='Sheet2')#2.读取常用属性print(student.shape)#形状（行，列）print(student.columns)#列名1234567892.2.1header：的行索引场景1：默认。第一行为（行索引为0，即：header=0）importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#1.读取excel（默认第1行为，行索引为0，即：header=0）student=pd.read_excel(filePath)print(student.columns)#Index(['ID','Name','Age','Grade'],dtype='object')123456789场景2：指定第n行为importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#场景2：excel中第2行才是我们想要的（即：header=1）student=pd.read_excel(filePath,header=1)print(student.columns)#Index(['ID','Name','Age','Grade'],dtype='object')123456789场景3：没有，需要人为给定importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#场景3：excel中没有，需要人为设定student=pd.read_excel(filePath,header=None)student.columns=['ID','Name','Age','Grade']student.set_index('ID',inplace=True)#指定索引列，并替换原数据student.to_excel(filePath)#写入至Excelprint(student)#NameAgeGrade#ID#1张三1890#2李四2070#3王五2180#4赵六199012345678910111213141516172.2.2index_col：索引列importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#读取Excel，不指定索引列（会默认新增一个索引列，从0开始）student=pd.read_excel(filePath)print(student)#IDNameAgeGrade#01张三1890#12李四2070#23王五2180#34赵六1990#读取Excel，指定索引列student=pd.read_excel(filePath,index_col='ID')print(student)#NameAgeGrade#ID#1张三1890#2李四2070#3王五2180#4赵六1990123456789101112131415161718192021222324索引相关：importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#1.读取excel，并指定索引列student=pd.read_excel(filePath,index_col='ID')12345672.2.3dtype：数据类型importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#1.读取excel并指定数据类型student=pd.read_excel(filePath,dtype={'ID':str,'Name':str,'Age':int,'Grade':float})print(student)#IDNameAgeGrade#01张三1890.0#12李四2070.0#23王五2180.0#34赵六1990.0123456789101112132.2.4skiprows：跳过的行数比如：Excel中有空行，如下图实际的数据是在第3行，所以要跳过前2行importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'student=pd.read_excel(filePath,skiprows=2)print(student)#IDNameAgeGrade#01张三1890#12李四2070#23王五2180#34赵六19901234567891011122.2.5usercols：指定列数importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#读取ExcelB-D列（均包含）student=pd.read_excel(filePath,usecols='B

')print(student)#NameAgeGrade#0张三1890#1李四2070#2王五2180#3赵六1990123456789101112132.2.6head(n)、tail(n)：读取前、后n行数据有时候，excel数据量很大，读取全部会很耗时，也没必要咱测试时，仅读取部分行即可importpandasaspd#1.读取excelstudent=pd.read_excel(r'C:\Users\Administrator\Desktop\Temp\1.xlsx')#读取前3行数据（默认5行）print(student.head(3))#读取后3行数据（默认5行）print(student.tail(3))123456789102.3读写数据2.3.1at()：获取单元格importpandasaspd#文件路径filePath=r'C:\Users\Administrator\Desktop\Temp\1.xlsx'#1.读取excel并指定索引student=pd.read_excel(filePath,index_col=None)foriinperson.index:#读写单元格：ID列，i行的数据student['ID'].at[i]=i+2print(student)123456789101112132.3.2loc[]：数据筛选importpandasaspddefage_18_to_20(age):return18

		自动登录	找回密码
密码			会员注册