I am working on as a python novice on an exercise to practice importing data in python. Eventually I want to analyze data from different podcasts (infos on the podcasts itself and every episode) by putting the data into a coherent dataframe work on it with NLP.
So far I have managed to read a list of RSS feeds and get the information on every single episode of the RSS feed (a post).
But I am having trouble to find an integrated working process in python to gather both
- information on every single episode of the RSS feed (a post)
- and general information about the RSS feed (like title of the podcast)
in one go.
Code
This is what i have got so far
import feedparser
import pandas as pd
rss_feeds = ['http://feeds.feedburner.com/TEDTalks_audio',
'https://joelhooks.com/rss.xml',
'https://www.sciencemag.org/rss/podcast.xml',
]
#number of feeds is reduced for testing
posts = []
feed = []
for url in rss_feeds:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary'])
Output
The dataframe includes 652 non-null objects for three columns (as intended) - basically every post made in every podcast. The column title refers to the title of the episode but not to the title of the podcast (which in this example is 'Ted Talk Daily').
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…