I am supposed to use Beautiful Soup 4 to obtain course information off of my school's website as an exercise.
(我应该使用“美丽汤4”作为练习从我学校的网站上获取课程信息。)
I have been at this for the past few days and my code still does not work. (过去几天我一直在此工作,但是我的代码仍然无法正常工作。)
The first thing I ask the user is to import the course catalog abbreviation.
(我要求用户做的第一件事是导入课程目录缩写。)
For example, ICS is abbreviated as Information for Computer Science. (例如,ICS缩写为计算机科学信息。)
Beautiful Soup 4 is supposed to list all of the courses and how many students are enrolled. (美丽的汤4应该列出所有课程以及有多少学生报名。)
While I was able to get the input portion to work, I still have errors or the program just stops.
(虽然我可以使输入部分正常工作,但仍然有错误,或者程序刚刚停止。)
Question: Is there a way for Beautiful Soup to accept user input so that when the user inputs ICS, the output would be a list of all courses that are related to ICS?
(问题:Beautiful Soup是否可以接受用户输入,以便当用户输入ICS时,输出将是与ICS相关的所有课程的列表?)
Here is the code and my attempt at it:
(这是代码和我的尝试:)
from bs4 import BeautifulSoup
import requests
import re
#get input for course
course = input('Enter the course:')
#Here is the page link
BASE_AVAILABILITY_URL = f"https://www.sis.hawaii.edu/uhdad/avail.classes?i=MAN&t=202010&s={course}"
#get request and response
page_response = requests.get(BASE_AVAILABILITY_URL)
#getting Beautiful Soup to gather the html content
page_content = BeautifulSoup(page_response.content, 'html.parser')
#getting course information
main = page_content.find_all(class_='parent clearfix')
main_p = "".join(str (x) for x in main)
#get the course anchor tags
main_q = BeautifulSoup(main_p, "html.parser")
courses = main.find('a', href = True)
#get each course name
#empty dictionary for course list
courses_list = []
for a in courses:
courses_list.append(a.text)
search = input('Enter the course title:')
for course in courses_list:
if re.search(search, course, re.IGNORECASE):
print(course)
This is the original code that was provided in Juypter Notebook
(这是Juypter Notebook中提供的原始代码)
import requests, bs4
BASE_AVAILABILITY_URL = f"https://www.sis.hawaii.edu/uhdad/avail.classes?i=MAN&t=202010&s={course}"
#get input for course
course = input('Enter the course:')
def scrape_availability(text):
soup = bs4.BeautifulSoup(text)
r = requests.get(str(BASE_AVAILABILITY_URL) + str(course))
rows = soup.select('.listOfClasses tr')
for row in rows[1:]:
columns = row.select('td')
class_name = columns[2].contents[0]
if len(class_name) > 1 and class_name != b'xa0':
print(class_name)
print(columns[4].contents[0])
print(columns[7].contents[0])
print(columns[8].contents[0])
What's odd is that if the user saves the html file, uploads it into Juypter Notebook, and then opens the file to be read, the courses are displayed.
(奇怪的是,如果用户保存html文件,将其上传到Juypter Notebook,然后打开要读取的文件,则会显示课程。)
But, for this task, the user can not save files and it must be an outright input to get the output. (但是,对于此任务,用户无法保存文件,并且它必须是直接输入才能获得输出。)
ask by usukidoll translate from so