From an OPML file of podcasts, the dataset created contains named entities extracted from the descriptions of the podcast (and it's episodes) as well as color palette information from the image associated with the podcast.
Download the latest release and unzip. All the data is in JSON format.
- OPML - is, 'a popular XML format used to store and exchange outlines with attributes' (Opml.org).
- Natural Language Processing - commonly termed as simply NLP - is the field that encompasses analyzing and extracting meaning from text-based data.
- Named Entity Recognition - is a process of extracting 'topics'(named entities) such as Names, People, Organizations from text - it is a sub-field of NLP.
You can extract data from your own OPML File and create your own dataset.
- Research into the topics that are popular in podcasts as well as commonly mentioned topics
- Find related podcasts based on the topics mentioned in different podcasts
- Analyze common patterns and wordings surrounding certain topics
- Use podcast data as a mock data for an application. Demo App can be found here.
- Utilize color palette information to add visual elements to the application such as styling and theme as well as personalization
- Refine extraction of named entities
- Extract more data points such as Ad information, Location and more