I'm working on a personal project where I'm thinking of doing sentiment analysis using NLTK and Vader to compare presidential speeches.
I was able to use beautiful soup to find one of George Washington's speeches and I managed to put the speech in a list. But after that, I'm not really sure the best way to go further. It seems that it's typical for the file to be read from a text file but I have the brackets that have the list which make it difficult. I'm not sure if I should store the web scraped speech in a file or just work at from the list. Or maybe I should put the speech into a dataframe already? I'm not too sure.
from bs4 import BeautifulSoup
import requests
import spacy
import pandas as pd
page_link = 'https://www.ourdocuments.gov/doc.php?flash=false&doc=11&page=transcript'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
textContent = []
for i in range(0, 7):
paragraphs = page_content.find_all("p")[i].text
textContent.append(paragraphs)
toWrite = open('washington.txt', 'w')
line = textContent
toWrite.write(str(line))
toWrite.close()
Any help or pointers would be greatly appreciated.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…