一、背景我们在使用python对网页爬虫的时候,经常会得到一些html数据,因此我们就会利用soup.find()和soup.find_all()方法来筛选出想要的数据。二、用法1.soup.find() 1.1利用name来查找代码如下:frombs4importBeautifulSouphtml_string="""
蔡x坤
唱跳rap篮球你干嘛"""soup=BeautifulSoup(html_string,features="html.parser")#利用name来查找tag_list=soup.find(name="h1")print(tag_list)'运行运行 结果如下:
蔡x坤
1.2利用属性attrs来寻找代码如下:html_string="""
蔡x坤
唱跳rap篮球你干嘛"""soup=BeautifulSoup(html_string,features="html.parser")#利用属性attrs查找tag_list=soup.find(attrs={"id":"x3"})print(tag_list)结果如下: 你干嘛1.3利用name和attrs寻找 代码如下:html_string="""
蔡x坤
我是一名练习生唱跳rap篮球你干嘛"""soup=BeautifulSoup(html_string,features="html.parser")#利用name和attrs寻找tag_list=soup.find(name="ul",attrs={"id":"x2"})print(tag_list)结果如下: 唱跳rap篮球2.soup.find_all()2.1利用name找多个代码如下:html_string="""
蔡x坤
我是一名练习生唱跳rap篮球你干嘛"""soup=BeautifulSoup(html_string,features="html.parser")#利用name找多个tag_list=soup.find_all(name="li")fortagintag_list:print(tag.name,tag.text)结果如下:(输出name和text)li唱li跳lirapli篮球2.2利用attrs找多个代码如下:frombs4importBeautifulSouphtml_string="""
蔡x坤
我是一名练习生唱跳rap篮球"""soup=BeautifulSoup(html_string,features="html.parser")#用attrs找多个tag_list=soup.find_all(attrs={"class":"item"})fortagintag_list:print(tag.name,tag.text)'运行运行结果如下:(输出name和text) h1蔡x坤ul我是一名练习生ul 唱跳rap篮球2.3利用recursive判断是否递归寻找,默认为True代码如下:(recursive=False 只找儿子)frombs4importBeautifulSouphtml_string="""你干嘛这是一句话这是一句话的一句话这也是一句话"""soup=BeautifulSoup(html_string,features="html.parser")#找儿子tag_list1=soup.find(attrs={"id":"x1"})fortagintag_list1.find_all(recursive=False):print(tag)'运行运行结果如下(recursive=False 只找儿子):你干嘛这是一句话这是一句话的一句话这也是一句话代码如下:(recursive=True 找子子孙孙) frombs4importBeautifulSouphtml_string="""你干嘛这是一句话这是一句话的一句话这也是一句话"""soup=BeautifulSoup(html_string,features="html.parser")#找子子孙孙tag_list1=soup.find(attrs={"id":"x1"})fortagintag_list1.find_all(recursive=True):print(tag)'运行运行结果如下(recursive=True 找子子孙孙):你干嘛这是一句话这是一句话的一句话这也是一句话这是一句话这是一句话的一句话这也是一句话三、案例 爬取易车网的车品牌为例子(本例子参考python讲师武沛齐老师)1.分析网页用chrome的无痕网页打开https://car.yiche.com/并分析网页分析发现车牌的名字在 name="div",attrs={"class":"item-brand"}里面2.模拟请求,获取HTML文本importrequestsfrombs4importBeautifulSoup#获取HTML文本res=requests.get(url="https://car.yiche.com/")3.筛选数据importrequestsfrombs4importBeautifulSoup#获取HTML文本res=requests.get(url="https://car.yiche.com/")soup=BeautifulSoup(res.text,features="html.parser")#创一个列表存result_list=[]#筛选数据tag_list=soup.find_all(name="div",attrs={"class":"item-brand"})fortagintag_list:result_list.append(tag.attrs["data-name"])print(result_list)4.结果 ['奥迪','埃安','AITO','阿斯顿·马丁','阿维塔','阿尔法·罗密欧','爱驰','AUXUN傲旋','ALPINA','Apollo','阿尔卑斯','Abarth','ABT','安凯客车','安徽猎豹','Arash','Aurus','艾康尼克','AgileAutomotive','APEX','ATS','Ariel','Aspark','ARMADILLO','Alpine','AURA','Aviar','ACSchnitzer','Atlis','AEHRA','Aria','AlphaMotor','AZNOM','AEVROBOTICS','阿尔特','AFEELA','ASKA','AKXY2','Alef','AIM','ATOM','安培','本田','奔驰','比亚迪','宝马','别克','保时捷','北京','奔腾','宝骏','标致','宾利','BAW北汽制造','北京汽车','布加迪','博速','北汽昌河','奔驰卡车','巴菲特','霸王龙','宝沃','北汽瑞翔','北汽新能源','北汽幻速','北奔重卡','百智新能源','北汽威旺','北汽雷驰','宾尼法利纳','比速汽车','百度Apollo','比德文汽车','铂驰','宝骐汽车','博世','拜腾','北汽泰普','宝腾','BAO','保斐利','北汽道达','北汽黑豹','北京清行','博郡汽车','Bowler','BAC','Bertone','BollingerMotors','BeyonCa','Brabham','比克汽车','Bremach','Bizzarrini','宝雅','长安','长安启源','长安欧尚','长城','长安凯程','长安跨越','创维汽车','曹操','橙仕','乘龙汽车','成功汽车','车驰汽车','超境汽车','ChargeCars','采埃孚','CANDELA','车和家','Cupra','长江EV','Conquest','Corbellati','昶洧','Czinger','Caterham','Canoo','Continental','大众','东风风神','东风风行','东风风光','东风奕派','东风','东风纳米','道奇','东风小康','DS','东南','东风轻型车','东风风度','大运','大力牛魔王','东风御风','东风商用车','电动屋','东风富康','滴滴','东风·瑞泰特','大乘汽车','东风氢舟','Dianchè','DeTomaso','Drako','电咖','大迪','Delage','DEUSAutomobiles','大发','DAVIDBROWN','达契亚','Donkervoort','Datsun','dÄHLer','DeLorean','ElectraMeccanica','Elektron','EdisonFuture','Elemental','E.Go','E-Legend','丰田','福特','福田','法拉利','飞凡汽车','方程豹','飞碟汽车','菲亚特','福迪','FaradayFuture','丰田纺织','法诺新能源','FOXTRON','Foxe-mobility','弗那萨利','Frangivento','Fresco','Fisker','辅恒汽车','广汽传祺','高合汽车','广汽集团','观致','GMC','国机智骏','光冈','谷歌','国金汽车','GTA','广汽吉奥','国吉商用车','G&BDesign','广通客车','广汽日野','Gumpert','格罗夫','GAZ','NEVS国能汽车','GFGStyle','GLM','G-Power','Gemballa','Ginetta','GMA','GYON','GUNTHERWERKS','国新新能源','高通','红旗','哈弗','昊铂','海马','合创汽车','悍马','恒驰','黄海','汉腾汽车','华晨新日','活越','华梓汽车','恒润汽车','华泰','汉龙汽车','恒天','华夏领舰','哈飞','华菱汽车','华颂','Hyperion','Hennessey','红星汽车','华凯','宏远汽车','华普','海格','HOFELE','汇众','毫末智行','华骐','HOPIUM','HispanoSuiza','华利','霍顿','Hudson','HURTAN','Holon','iCAR汽车','Inferno','Italdesign','INEOS','Icona','INKAS','IZERA','IED','INDI','Indigo','吉利汽车','捷途','捷达','吉利银河','捷豹','极氪','Jeep','ARCFOX极狐','吉利几何','江铃','捷尼赛思','金杯','江汽集团','江淮瑞风','极石汽车','江淮汽车','江铃集团新能源','江淮钇为','Polestar极星','极越','钧天','金龙','江南汽车','金旅','江铃旅居车','九龙','嘉远汽车','江铃重汽','君马汽车','金冠汽车','江铃晶马汽车','金琥汽车','吉威新能源','Jannarelly','奇点汽车','佳跃','吉祥汽车','凯迪拉克','凯翼','开瑞','科尼赛克','克蒂汽车','开云汽车','克莱斯勒','科瑞斯的','克慕勒','凯马汽车','Karlmann','焜驰','开沃汽车','卡尔森','KTM','凯佰赫','Kimera','开利','Karma','卡威','KHANN','卡升','克罗斯哈特','路虎','领克','雷克萨斯','理想汽车','林肯','零跑汽车','劳斯莱斯','兰博基尼','岚图汽车','铃木','路特斯','凌宝汽车','雷诺','猎豹汽车','雷丁','蓝电','拉帝','LUMMA','陆风','Lorinser','菱势汽车','力帆汽车','理念','雷达汽车','LEVC','莲花汽车','领途汽车','联合卡车','LG','龙程汽车','LIMGENE凌际','拉达','莱茵汽车','陆地方舟','罗夫哈特','拉共达','Lucid','灵悉','蓝旗亚','绿驰','LITE','雷诺三星','罗孚','朗世','Lightyear','LOCALMOTORS','领志','LeSEE','LordstownMotors','LUNAZ','LIUX','洛轲智能','蓝擎汽车','马自达','名爵','玛莎拉蒂','MINI','迈凯伦','迈巴赫','猛士','迈莎锐','摩登汽车','迈越','摩根','敏安汽车','Michelin米其林','曼','Mole','迈迈','Manhart','MeyersManx','MILITEM','MAGNA','Micro','Munro','Mazzanti','美亚','Mahindra','MELKUS','Mobilize','Moke','Mopar','魅族','Mullen','哪吒汽车','纳智捷','南骏汽车','NamX','诺博汽车','Naran','nanoFLOWCELL','NeuronEV','NEXTLEVEL','Nikola','Noble','Novitec','欧拉','讴歌','OBBIN','欧宝','欧铃汽车','欧朗','欧联','帕加尼','朋克汽车','Posaidon','Puritalia','Praga','PiëchAutomotive','佩奇奥','PogeaRacing','ProjectArrow','奇瑞','起亚','启辰','奇瑞新能源','乔治·巴顿','庆铃五十铃','青岛解放','前途','前晨汽车','全球鹰','骐铃汽车','奇鲁汽车','清源汽车','轻橙时代','日产','荣威','睿蓝汽车','RAM','瑞弗','瑞驰新能源','容大智造','Rezvani','如虎','瑞麒','瑞腾汽车','锐马克','RIVIAN','Ringbrothers','RENOVO','Rinspeed','Radical','Radford','REVOZERO','深蓝汽车','上汽大通MAXUS','三菱','斯柯达','斯巴鲁','思皓','smart','SWM斯威汽车','思铭','SERES赛力斯','SONGSANMOTORS','陕汽重卡','上汽轻卡','时风','三一集团','沙龙汽车','上汽红岩','斯堪尼亚','山姆','盛唐','SHELBY','陕汽轻卡','双龙','萨博','世爵','陕汽商用车','上汽集团','上海','赛麟','斯太尔','神州','斯达泰克','双环','陕汽通家','申龙客车','上喆','SSC','SONY','首望','SVE','STI','SPIRRA','SCOUT','SpyrosPanopoulos','速达','Spofec','SonoMotors','Scion','Share2Drive','SINCARS','Singer','深向','Silence','世极','特斯拉','坦克','腾势','天际汽车','泰卡特','塔塔','泰克鲁斯·腾风','Triton','TopCar','TouringSuperleggera','Troller','Togg','TheonDesign','Tramontana','TOROIDION','TVR','TECHART','途柚汽车','THOR','拓锐斯特','Telo','THK','TWR','天地良心汽车','UgurSahinDesign','Ultima','Uniti','Vinfast','VegaInnovations','VLFAutomotive','VandaElectric','VANTAS','VIRITECH','Venturi','Vanwall','五菱汽车','沃尔沃','蔚来','魏牌','五十铃','未奥汽车','沃尔沃卡车','威马汽车','维努斯','瓦滋','WALD','威兹曼','伟昊汽车','潍柴欧睿','潍柴英致','万象汽车','威麟','WMotors','沃克斯豪尔','WayRay','现代','雪佛兰','星途','小鹏','雪铁龙','小米汽车','小鹏汇天','小跑车','小虎','SRM鑫源','新龙马汽车','徐工汽车','新吉奥','AM晓奥汽车','西雅特','现代摩比斯','小猬汽车','新特汽车','星客特','英菲尼迪','仰望','依维柯','一汽解放','云度','一汽','远程','野马汽车','一汽解放轻卡','驭胜','远航汽车','宇通客车','野马新能源','运良','一汽凌河','雅升汽车','YAMAHA','游侠','云雀汽车','悠遥科技','永源','IMSA英飒','悠跑科技','悠宝利','银隆新能源','御捷','翼刻','裕路','易电易行','一汽富维','怡亚通','越界','智己汽车','智界','众泰','中国重汽VGV','中国重汽','中兴','中华','重汽王牌','重汽豪曼','知豆','智行盒子','自游家','中欧房车','智点汽车','正道汽车','中通客车','之诺','Zenvo','中植汽车','777']
|