Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
205 views
in Technique[技术] by (71.8m points)

Python Web Scraping Using Selenium

Website to scrape https://idaman.kpkt.gov.my/idv5xe/98_eHome/maklumatProjek.cfm?pmju_kod=8898&proj_kod_Fasa=1

Item to scrape in BOLD - Part 1 (HTML Below)

<form onsubmit="return lucee_form_c9u.check();" name="myForm" enctype="multipart/form-data" action="mPPTProjek3.cfm?mn=BPPT" method="post">
    <div align="center" style="background-color: white; border: 1px solid grey;">
        <br />
        <table class="MainContent" width="100%" align="center">
            <tbody>
                <tr style="font-weight: bold;">
                    <td class="column" width="30%">Nama Pemaju</td>
                    <td>
                        :
                        <a style="color: blue;" href="maklumatPemaju.cfm?pmju_Kod=8877">**RAPID UNITY SDN. BHD.**</a>
                        <font color="red">* Klik Untuk Melihat Maklumat</font>
                    </td>
                </tr>
                <tr>
                    <td class="column">Kod Pemaju</td>
                    <td>: **8877<**/td></td>
                </tr>

                <tr>
                    <td class="column">Kod Fasa</td>
                    <td>: **1<**/td></td>
                </tr>

                <tr>
                    <td class="column">Nama Pemajuan</td>
                    <td>: **TAMAN UNITY**</td>
                </tr>
            </tbody>
        </table>
    </div>
</form>
question from:https://stackoverflow.com/questions/66045610/python-web-scraping-using-selenium

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think what you want can be obtained just using requests and beautifulsoup as follows:

import requests
from bs4 import BeautifulSoup

s = requests.Session()

params = {"pmju_Kod" : 8877, "proj_Kod_Fasa" : 1}
r = s.get("https://idaman.kpkt.gov.my/idv5xe/98_eHome/maklumatProjek.cfm", params=params)
soup = BeautifulSoup(r.content, "html.parser")

tables = soup.find_all('table', class_="MainContent")

items = []

items.append(tables[0].a.text)

data = [[td.text for td in tr.find_all('td')] for tr in tables[0].find_all('tr')]
items.append(data[1][1].strip(': '))
items.append(data[2][1].strip(': '))
items.append(data[3][1].strip(': '))

data = [[td.text for td in tr.find_all('td')] for tr in tables[3].find_all('tr')]

items.append(data[1][2].strip())
items.append(data[1][3].strip())
items.append(data[1][4])
items.append(data[1][5])
items.append(data[1][6])

items.append(data[2][2].strip())
items.append(data[2][3].strip())
items.append(data[2][4])
items.append(data[2][5])
items.append(data[2][6])

# Pemajuan table
params['rekid'] = 419975503
r2 = s.get('https://idaman.kpkt.gov.my/idv5xe/98_eHome/template/pemajuan.cfm', params=params)
soup2 = BeautifulSoup(r2.content, "html.parser")
table = soup2.find('table', class_="MainContent")
data = [[td.text for td in tr.find_all('td')] for tr in table.find_all('tr')]
items.append(data[-1][1].strip(': '))

print(items)

This would give you the following items:

['RAPID UNITY SDN. BHD.', '8877', '1', 'TAMAN UNITY', 'RUMAH BERKEMBAR', 'HARGA TINGGI', '1', '370,000.00', '394,900.00', 'RUMAH TERES', 'HARGA TINGGI', '1', '190,000.00', '290,550.00', '0%']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...