I am currently trying to scrape certain URLs from websites. However, sometimes these URLs are given as CSS background or background-image URLs. However I tried, I could not reach these URLs and I am not in a position to provide a 'this is what I could do so far' code snippet, unfortunately.
I am using bs4 and requests. Each website that the script will visit has a different CSS file thus different naming conventions are followed. Hence, I do not have a xyz.css file. The script will find the relevant one and scrape the URL.
I would really appreciate some hint or help.
cssList=soup.find_all('link',{'href':re.compile('.css')})
for css in cssList:
css = css['href']
css_response = requests.get(css,headers=custom_headers, verify=True,timeout=2)
soup = tarhana(css_response.content, features='lxml')
bgimg = soup.find_all('background-image',url=re.compile('svg|logo'))
for bg in bgimg:
//
So it should go and fetch all css files, see each one of them for background-image:url(), and if this contains some keywords, it should return them as a list
question from:
https://stackoverflow.com/questions/65864143/scraping-css-background-image-urls-with-bs4 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…