Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
506 views
in Technique[技术] by (71.8m points)

python - Scraping CSS Background-image urls with bs4?

I am currently trying to scrape certain URLs from websites. However, sometimes these URLs are given as CSS background or background-image URLs. However I tried, I could not reach these URLs and I am not in a position to provide a 'this is what I could do so far' code snippet, unfortunately.

I am using bs4 and requests. Each website that the script will visit has a different CSS file thus different naming conventions are followed. Hence, I do not have a xyz.css file. The script will find the relevant one and scrape the URL. I would really appreciate some hint or help.

cssList=soup.find_all('link',{'href':re.compile('.css')})
for css in cssList:
    css = css['href']
    css_response = requests.get(css,headers=custom_headers, verify=True,timeout=2)
    soup = tarhana(css_response.content, features='lxml')
    bgimg = soup.find_all('background-image',url=re.compile('svg|logo'))
    for bg in bgimg:
       //                

So it should go and fetch all css files, see each one of them for background-image:url(), and if this contains some keywords, it should return them as a list

question from:https://stackoverflow.com/questions/65864143/scraping-css-background-image-urls-with-bs4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...