使用 Selenium 和 bs4 报废亚马逊时出错

数据挖掘 Python 网页抓取
2022-02-21 01:13:06

我正在使用 BeautifulSoup 和 webdriver 进行课程项目,以在亚马逊上报废一次性尿布,以获取商品名称、价格、评论、评级。

我的目标是有这样的东西,我将把这些信息分成不同的列:

 Diapers Size 4, 150 Count - Pampers Swaddlers Disposable Baby Diapers, One 
 Month Supply
   4.0 out of 5 stars
   1,982
   $43.98
  ($0.29/Count)

不幸的是,在出现 50 个数据后,我收到此消息:消息:no such element: unable to locate element: {"method":"css selector","selector":".a-last"}

这是我的代码:

URL = "https://www.amazon.com/s? 
k=baby+disposable&rh=n%3A166772011&ref=nb_sb_noss" 
driver = ('C:/Users/Desktop/chromedriver_win32/chromedriver.exe') 
driver.get(URL) html = driver.page_source soup = BeautifulSoup(html, "html.parser") 
df = pd.DataFrame(columns = ["Product Name","Rating","Number of 
Reviews","Price","Price Count"])

while True:
for i in soup.find_all(class_= "sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36 
s-result-item sg-col-
4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"):
ProductName = i.find(class_= "a-size-base-plus a-color-base a-text- normal").text#.span.get_text
print(ProductName)
 try:
Rating = i.find(class_= "a-icon-alt").text#.span.get_text()
  except:
 Rating = "Null"
print(Rating)
try:
NumberOfReviews = i.find(class_= "a-size-base").text#.span.get_text()
 except:
 NumberOfReviews = "Null"
 print(NumberOfReviews)
 try:
Price = i.find(class_= "a-offscreen").text#.span.get_text()
except:
Price = "Null"
print(Price)
try:
PriceCount = i.find(class_= "a-size-base a-color-secondary").text#.span.get_text()
except:
PriceCount = "Null"
print(PriceCount)
df = df.append({"Product Name":ProductName, "Rating":Rating, "Number of 
Reviews":NumberOfReviews, 
"Price":Price, "Price Count":PriceCount}, ignore_index = True)
nextlink = soup.find(class_= "a-disabled a-last")
if nextlink:
print ("This is the last page. ")
break
else:
progress = driver.find_element_by_class_name('a-last').click()
subhtml = driver.page_source
soup = BeautifulSoup(subhtml, "html.parser")

不幸的是,我遇到了一条街区路,试图弄清楚为什么它没有采取 a_last。

1个回答

您尝试查找的元素(我想它是一个链接或带有 class 的按钮a-last)很可能不在页面上。当错误出现时,您应该查看您的 chrome 窗口并查看它显示的内容。检查元素是否在页面上。

可能是 amazon.com 只是显示了一个验证码,因此所有常见的元素都从屏幕上消失了。例如,如果您尝试while True在 Google 上不等待时间,它会在多次请求后向您显示验证码。