python - Scraping CSS Background-image urls with bs4? - OGeek|极客中国-技术改变生活,极客改变未来

I am currently trying to scrape certain URLs from websites. However, sometimes these URLs are given as CSS background or background-image URLs. However I tried, I could not reach these URLs and I am not in a position to provide a 'this is what I could do so far' code snippet, unfortunately.

I am using bs4 and requests. Each website that the script will visit has a different CSS file thus different naming conventions are followed. Hence, I do not have a xyz.css file. The script will find the relevant one and scrape the URL. I would really appreciate some hint or help.

cssList=soup.find_all('link',{'href':re.compile('.css')})
for css in cssList:
    css = css['href']
    css_response = requests.get(css,headers=custom_headers, verify=True,timeout=2)
    soup = tarhana(css_response.content, features='lxml')
    bgimg = soup.find_all('background-image',url=re.compile('svg|logo'))
    for bg in bgimg:
       //

So it should go and fetch all css files, see each one of them for background-image:url(), and if this contains some keywords, it should return them as a list

question from:https://stackoverflow.com/questions/65864143/scraping-css-background-image-urls-with-bs4

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

Categories

python - Scraping CSS Background-image urls with bs4?

python - Scraping CSS Background-image urls with bs4?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags