python - Webscraping not working with BeautfiulSoup

Question

Welcome To Ask or Share your Answers For Others

python - Webscraping not working with BeautfiulSoup

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Webscraping not working with BeautfiulSoup

In advance: Sorry for any bady formatting, this is my very first post!

I'm trying to create a program that scrapes "CoinMarketCap" and compares the prices from a South African exchange (Luno) and all the other Bitcoin exchanges.

Sadly, it doesn't work on the https://coinmarketcap.com/de/currencies/bitcoin/markets/ page. It works on the https://coinmarketcap.com/de/exchanges/luno/ page though.

Any suggestions? Here is my code:

from bs4 import BeautifulSoup 
import requests
from time import sleep
from random import randint

def scrapeWebsite(link):
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

    results = requests.get(link, headers=headers)

    src = results.content

    soup = BeautifulSoup(src,features="html.parser")

    items = []

    print(soup.prettify())

    for tr in soup.find_all("tr"):
        line = ""
        for td in tr.find_all("td"):
            line = line + td.text + "/"
            if(td.text == "Kürzlich"):
                items.append(line)
    return items



itemsLuno = scrapeWebsite("https://coinmarketcap.com/de/currencies/bitcoin/markets/")

#Coins on Luno are: Bitcoin, Ethereum, Litecoin and ripple

for item in itemsLuno:
        print(item)

question from:https://stackoverflow.com/questions/65861022/webscraping-not-working-with-beautfiulsoup

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:26:38+0000

the content of the first page is generated by javascript, so when you fetch the page you fetch the initial, unmodified html. you fetch the response getting from the server before execute the js in your browser.check this response here
in your case you need to render the javascript content before you crawl the page. you can do that using scrapy framework or selenium for exemple in selenium

from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)
html = driver.page_source

Categories

python - Webscraping not working with BeautfiulSoup

python - Webscraping not working with BeautfiulSoup

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags