免责声明:本程序仅供个人学习使用 功能目标:方便摘抄网易云音乐的热评,写一个爬虫程序,通过输入歌单号,爬取并保存歌单内歌曲的热评。 记录第一个自己完成的简单爬虫程序 展示截图:(命令台界面和保存文件界面) 顺便安利一下这个日语歌单 下面是代码部分: 库:requests、bs4 、json、csv、time(可选) 代码分析: 常量的定义、界面设计 爬取歌曲信息 创建表格 爬取评论信息并保存 结束显示 缺点分析:当歌曲数量多会导致用时特别长 完 生活的最好状态是冷冷清清的风风火火——木心
新手学习Python 爬虫笔记(一)
import requests from bs4 import BeautifulSoup import json from csv import writer import time t1=time.clock() headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', } print("===========================n获取网易云音乐歌单中所有歌曲热评n===========================") id= input("请输入歌单id:") print("正在搜索。。。n") playlist_url = f'https://music.163.com/playlist?id={id}' rs = requests.session() r = rs.get(playlist_url, headers=headers) soup = BeautifulSoup(r.content, 'lxml') main = soup.find('ul', {'class': 'f-hide'}) name_list= [] id_list= [] for music in main.find_all('a'): music_id= music['href'][9:] music_name= music.text name_list.append(music_name) id_list.append(music_id) total= len(name_list) cout=1 allcout=0 file= open("网易云音乐歌单热评.csv","w",encoding="gb18030",newline="") writer= writer(file) for i in id_list: list=[cout,name_list[cout-1]] writer.writerow(list) get_url= "https://music.163.com/api/v1/resource/comments/R_SO_4_"+i+"?limit=0&offset=0" r= requests.get(get_url,headers=headers) json_dict=json.loads(r.content.decode("utf-8")) hotcomments= json_dict["hotComments"] allcout+=len(hotcomments) no=1 for j in hotcomments: nickname= j["user"]["nickname"] content= j["content"].replace("n"," ") liked= str(j["likedCount"])+"赞" list=["","",no,nickname,liked,content] writer.writerow(list) no+=1 cout+=1 writer.writerow('') file.close() t2= time.clock()-t1 print(f"共找到{total}首歌 {allcout}条热评 用时{t2:.2f}s",end="n===========================n") input("按任意键退出。。。")
#计时开始 t1=time.clock() headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36', } print("===========================n获取网易云音乐歌单中所有歌曲热评n===========================") id= input("请输入歌单id:") print("正在搜索。。。n") playlist_url = f'https://music.163.com/playlist?id={id}'#得到歌单的url
#得到歌单内的歌曲名称和歌曲id rs = requests.session() r = rs.get(playlist_url, headers=headers) soup = BeautifulSoup(r.content, 'lxml') #可通过歌曲网页的源码得到有关信息 main = soup.find('ul', {'class': 'f-hide'}) name_list= [] id_list= [] for music in main.find_all('a'): music_id= music['href'][9:] music_name= music.text name_list.append(music_name) id_list.append(music_id) #计数君 total= len(name_list) cout=1 allcout=0
#易错点:encoding,newline应设为"" file= open("网易云音乐歌单热评.csv","w",encoding="gb18030",newline="") writer= writer(file)
核心:此api会返回歌曲评论
“https://music.163.com/api/v1/resource/comments/R_SO_4_”+ id +”?limit=0&offset=0″#用id查找评论,按照评论顺序记录 for i in id_list: list=[cout,name_list[cout-1]] writer.writerow(list) get_url= "https://music.163.com/api/v1/resource/comments/R_SO_4_"+i+"?limit=0&offset=0" r= requests.get(get_url,headers=headers) json_dict=json.loads(r.content.decode("utf-8")) hotcomments= json_dict["hotComments"] allcout+=len(hotcomments) no=1 #从返回的json获取有关信息 for j in hotcomments: nickname= j["user"]["nickname"] content= j["content"].replace("n"," ") liked= str(j["likedCount"])+"赞" list=["","",no,nickname,liked,content] writer.writerow(list) no+=1 cout+=1 writer.writerow('') file.close()
#计时结束 t2= time.clock()-t1 #计数君 print(f"共找到{total}首歌 {allcout}条热评 用时{t2:.2f}s",end="n===========================n") input("按任意键退出。。。")
拓展:多线程爬虫
本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器 下载并得到。
ImovieBox网页视频下载器 下载地址: ImovieBox网页视频下载器-最新版本下载
本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.
阅读和此文章类似的: 全球云计算