Skip to content

From an OPML file of podcasts, the dataset created contains named entities extracted from the descriptions of the podcast (and it's episodes) as well as color palette information from the image associated with the podcast.

License

Notifications You must be signed in to change notification settings

podcast-data-lab/podcast-data-generator

Repository files navigation

Podcast Data Generator

GitHub release (latest by date) GitHub

About

From an OPML file of podcasts, the dataset created contains named entities extracted from the descriptions of the podcast (and it's episodes) as well as color palette information from the image associated with the podcast.

Usage

Download the latest release and unzip. All the data is in JSON format.

Terms Used

  • OPML - is, 'a popular XML format used to store and exchange outlines with attributes' (Opml.org).
  • Natural Language Processing - commonly termed as simply NLP - is the field that encompasses analyzing and extracting meaning from text-based data.
  • Named Entity Recognition - is a process of extracting 'topics'(named entities) such as Names, People, Organizations from text - it is a sub-field of NLP.

Extract from your own OPML File

You can extract data from your own OPML File and create your own dataset.

Use Cases

Data Analysis

  • Research into the topics that are popular in podcasts as well as commonly mentioned topics
  • Find related podcasts based on the topics mentioned in different podcasts
  • Analyze common patterns and wordings surrounding certain topics

App Development

  • Use podcast data as a mock data for an application. Demo App can be found here.
  • Utilize color palette information to add visual elements to the application such as styling and theme as well as personalization

Continued and Future Work

  • Refine extraction of named entities
  • Extract more data points such as Ad information, Location and more

About

From an OPML file of podcasts, the dataset created contains named entities extracted from the descriptions of the podcast (and it's episodes) as well as color palette information from the image associated with the podcast.

Resources

License

Stars

Watchers

Forks

Packages

No packages published