我正在使用 BeautifulSoup 和 webdriver 进行课程项目,以在亚马逊上报废一次性尿布,以获取商品名称、价格、评论、评级。
我的目标是有这样的东西,我将把这些信息分成不同的列:
Diapers Size 4, 150 Count - Pampers Swaddlers Disposable Baby Diapers, One
Month Supply
4.0 out of 5 stars
1,982
$43.98
($0.29/Count)
不幸的是,在出现 50 个数据后,我收到此消息:消息:no such
element: unable to locate element: {"method":"css selector","selector":".a-last"}
这是我的代码:
URL = "https://www.amazon.com/s?
k=baby+disposable&rh=n%3A166772011&ref=nb_sb_noss"
driver = ('C:/Users/Desktop/chromedriver_win32/chromedriver.exe')
driver.get(URL) html = driver.page_source soup = BeautifulSoup(html, "html.parser")
df = pd.DataFrame(columns = ["Product Name","Rating","Number of
Reviews","Price","Price Count"])
while True:
for i in soup.find_all(class_= "sg-col-4-of-24 sg-col-4-of-12 sg-col-4-of-36
s-result-item sg-col-
4-of-28 sg-col-4-of-16 sg-col sg-col-4-of-20 sg-col-4-of-32"):
ProductName = i.find(class_= "a-size-base-plus a-color-base a-text- normal").text#.span.get_text
print(ProductName)
try:
Rating = i.find(class_= "a-icon-alt").text#.span.get_text()
except:
Rating = "Null"
print(Rating)
try:
NumberOfReviews = i.find(class_= "a-size-base").text#.span.get_text()
except:
NumberOfReviews = "Null"
print(NumberOfReviews)
try:
Price = i.find(class_= "a-offscreen").text#.span.get_text()
except:
Price = "Null"
print(Price)
try:
PriceCount = i.find(class_= "a-size-base a-color-secondary").text#.span.get_text()
except:
PriceCount = "Null"
print(PriceCount)
df = df.append({"Product Name":ProductName, "Rating":Rating, "Number of
Reviews":NumberOfReviews,
"Price":Price, "Price Count":PriceCount}, ignore_index = True)
nextlink = soup.find(class_= "a-disabled a-last")
if nextlink:
print ("This is the last page. ")
break
else:
progress = driver.find_element_by_class_name('a-last').click()
subhtml = driver.page_source
soup = BeautifulSoup(subhtml, "html.parser")
不幸的是,我遇到了一条街区路,试图弄清楚为什么它没有采取 a_last。