|
库名主要用途主要功能安装命令PyPDF2操作和合并PDF拆分、合并、旋转、裁剪PDF页面,提取文本和元数据pipinstallPyPDF2pdfminer.six提取PDF文本高精度提取文本、图像和表格,支持复杂布局和字体pipinstallpdfminer.sixReportLab生成PDF文件创建包含文本、图像、图形和表格的复杂PDF文档pipinstallreportlabPyMuPDF读取和操作PDF提取文本和图像,处理页面、注释和书签,渲染PDF页面pipinstallPyMuPDFpdfplumber提取表格和文本高精度提取和分析PDF中的表格和文本pipinstallpdfplumberCamelot提取PDF表格高精度表格检测和提取,导出为CSV、Excel、JSON等格式pipinstallcamelot-py[cv]tabula-py提取PDF表格基于Java库tabula,提取表格为DataFrame或CSVpipinstalltabula-pySlate提取PDF文本基于pdfminer的简单文本提取工具pipinstallslatepdfquery高级文本提取结合PDFMiner和lxml,支持复杂查询和文本提取pipinstallpdfqueryPDFKitHTML转PDF将HTML文档转换为PDF,基于wkhtmltopdf工具pipinstallpdfkitpdf2imagePDF转图像使用poppler将PDF页面转换为PIL图像对象pipinstallpdf2image详细示例1.PyPDF2用途:操作和合并PDFimportPyPDF2#读取PDF文件withopen('sample.pdf','rb')asfile:reader=PyPDF2.PdfFileReader(file)#提取第一页文本page=reader.getPage(0)print(page.extract_text())#合并PDF文件merger=PyPDF2.PdfFileMerger()merger.append('sample1.pdf')merger.append('sample2.pdf')merger.write('merged.pdf')2.pdfminer.six用途:提取PDF文本frompdfminer.high_levelimportextract_text#提取文本text=extract_text('sample.pdf')print(text)3.ReportLab用途:生成PDF文件fromreportlab.lib.pagesizesimportletterfromreportlab.pdfgenimportcanvas#创建PDFc=canvas.Canvas('generated.pdf',pagesize=letter)c.drawString(100,750,'Hello,World!')c.save()4.PyMuPDF(fitz)用途:读取和操作PDFimportfitz#读取PDF文件document=fitz.open('sample.pdf')page=document.load_page(0)text=page.get_text()print(text)#提取图像forimgindocument.get_page_images(0):xref=img[0]pix=fitz.Pixmap(document,xref)ifpix.n
|
|