Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

for loop - For statement Not working in Selenium's scraper

The for statement doesn't work in Scraper collecting articles made with Selenium. The purpose is to scrape all the article-related contents(title, date, office, sort, article) that appear on the screen entering the URL.

However, only the first article is scraped. I guess there is a problem with Pandas' data frame, but it's not clear.

import time
import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36")
chrome_options.add_argument("lang=ko_KR")

wd = webdriver.Chrome(executable_path='c:/chromedriver.exe', options=chrome_options)
wd.implicitly_wait(10)

news_df = pd.DataFrame(columns=('Title', 'Date', 'Office', 'Sort', 'Article'))
idx = 0
news_url = 'https://newslibrary.naver.com/search/searchByKeyword.nhn#%7B%22mode%22%3A1%2C%22sort%22%3A0%2C%22trans%22%3A%221%22%2C%22pageSize%22%3A10%2C%22keyword%22%3A%22%EA%B1%B4%EC%84%A4%EC%82%B0%EC%97%85%22%2C%22status%22%3A%22success%22%2C%22startIndex%22%3A1%2C%22page%22%3A1%2C%22startDate%22%3A%221945-01-01%22%2C%22endDate%22%3A%221945-12-31%22%7D'
wd.get(news_url)

data = wd.find_elements_by_css_selector('#searchlist > ul > li:nth-child(1)')
try:
    for da in data:
        title = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/h3/a').get_attribute('title')
        date = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[1]').text
        office = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[2]').text
        sort = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[4]').text
        article = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/div').text
        article = article.replace("
", "")
        article = article.replace("
", "")
        article = article.replace("", "")
       
        news_df.loc[idx] = [title, date, office, sort, article]
        idx += 1
        
except AttributeError:
    pass

wd.close()
print('Complete!')
question from:https://stackoverflow.com/questions/65898271/for-statement-not-working-in-seleniums-scraper

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...