python - Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

Question

Welcome To Ask or Share your Answers For Others

python - Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

I am working on as a python novice on an exercise to practice importing data in python. Eventually I want to analyze data from different podcasts (infos on the podcasts itself and every episode) by putting the data into a coherent dataframe work on it with NLP.

So far I have managed to read a list of RSS feeds and get the information on every single episode of the RSS feed (a post).

But I am having trouble to find an integrated working process in python to gather both

information on every single episode of the RSS feed (a post)
and general information about the RSS feed (like title of the podcast) in one go.

Code This is what i have got so far

import feedparser
import pandas as pd

rss_feeds = ['http://feeds.feedburner.com/TEDTalks_audio',
        'https://joelhooks.com/rss.xml',
        'https://www.sciencemag.org/rss/podcast.xml',
    ]
#number of feeds is reduced for testing

posts = []
feed = []
for url in rss_feeds:
       feed = feedparser.parse(url)
       for post in feed.entries:
           posts.append((post.title, post.link, post.summary))

df = pd.DataFrame(posts, columns=['title', 'link', 'summary'])

Output The dataframe includes 652 non-null objects for three columns (as intended) - basically every post made in every podcast. The column title refers to the title of the episode but not to the title of the podcast (which in this example is 'Ted Talk Daily').

	title	link	summary
0	3 questions to ask yourself about everything y...	https://www.ted.com/talks/stacey_abrams_3_ques...	How you respond to setbacks is what defines yo...
1	What your sleep patterns say about your relati...	https://www.ted.com/talks/tedx_shorts_what_you...	Wendy Troxel looks at the cultural expectation...
2	How we can actually pay people enough -- with ...	https://www.ted.com/talks/ted_business_how_we_...	Capitalism urgently needs an upgrade, says Pay...

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-19T03:50:47+0000

Feed title can be accessed in this case with feed.feed.title:

# ...
for url in rss_feeds:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((feed.feed.title, post.title, post.link, post.summary))

df = pd.DataFrame(posts, columns=['feed_title', 'title', 'link', 'summary'])
df

Output:

          feed_title            title             link          summary
0    TED Talks Daily  3 ways compa...  https://www....  When we expe...
1    TED Talks Daily  How we could...  https://www....  Concrete is ...
2    TED Talks Daily  3 questions ...  https://www....  How you resp...
3    TED Talks Daily  What your sl...  https://www....  Wendy Troxel...
4    TED Talks Daily  How we can a...  https://www....  Capitalism u...
..               ...              ...              ...              ...
649  Science Maga...  Science Podc...  https://traf...  Fear-enhance...
650  Science Maga...  Science Podc...  https://traf...  Discussing t...
651  Science Maga...  Science Podc...  https://traf...  Talking kids...
652  Science Maga...  Science Podc...  https://traf...  The minimum ...
653  Science Maga...  Science Podc...  https://traf...  The origin o...

Categories

python - Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

python - Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags