Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve channel and EPG matching with Sørensen-Dice similarity algorithm #962

Open
wants to merge 1 commit into
base: Piers
Choose a base branch
from

Conversation

alpgul
Copy link

@alpgul alpgul commented Feb 21, 2025

  • Add StringComparer utility to calculate string similarity
  • Enhance channel and EPG matching to use fuzzy string comparison
  • Update channel and EPG finding methods to support more flexible matching
  • Modify EPG entry description to include channel ID

Copy link
Author

@alpgul alpgul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add the algorithm to the MergeEpgDataIntoMedia method in pvr.iptvsimple\src\iptvsimple\Epg.cpp because it causes performance loss. It also needs to be debugged as it returns "-1" on the first run, contains a bug

@alpgul alpgul force-pushed the ImproveEpgSearch branch 2 times, most recently from 74804a4 to 48abda6 Compare February 22, 2025 07:14
Copy link
Author

@alpgul alpgul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const MediaEntry* Media::FindMediaEntry(const std::string& id, const std::string& displayName) const
MergeEpgDataIntoMedia
For optimization purposes, the execution of FindMediaEntry and MergeEpgDataIntoMedia methods should be prevented if the media is in recordings. Additionally, an isMedia attribute value should be defined in the channel definition of epg.xml, thus preventing unnecessary performance loss.

@@ -214,6 +214,7 @@ bool EpgEntry::UpdateFrom(const xml_node& programmeNode, const std::string& id,
return false;

m_broadcastId = static_cast<int>(programmeStart);
m_plot = id + "\n" + GetNodeValue(programmeNode, "desc");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why prepend the id onto the plot?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more suitable tag where I can display the EPG Channel ID value in the EPG View? This would help users identify any incorrect matches

@phunkyfish
Copy link
Member

Many channel ids are short in nature.

for example: BBC1, BBC2, BBC3 etc.

if you have these channels but only BBC4 in the xmltv data what happens?

Matches for channel ids are supposed to be the exact by there nature as IDs.

I think for this feature we should put it behind a setting somewhere a user can switch it on if they choose to do so. WDYT?

Copy link
Author

@alpgul alpgul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minimum similarity percentage can be added in settings. I included this code m_plot = id + "\n" + GetNodeValue(programmeNode, "desc"); as an example to help in such cases. This is the actual EPG channel ID value being added. I added it to the plot since it doesn't look visually disruptive. Additional usable tags from the EPG screen can be added there if available.

@phunkyfish
Copy link
Member

phunkyfish commented Feb 22, 2025

A minimum similarity percentage can be added in settings. I included this code m_plot = id + "\n" + GetNodeValue(programmeNode, "desc"); as an example to help in such cases. This is the actual EPG channel ID value being added. I added it to the plot since it doesn't look visually disruptive. Additional usable tags from the EPG screen can be added there if available.

Ok, but this should only be for testing and not in the final PR. I suggest a Debug log statement that can be grepped for to find false positives.

The default similarity setting should be 100% match, then users or addons can set this to another value to enable fuzzy matching. You could also supply some labelled values instead of users trying to set a magic number.

Copy link
Author

@alpgul alpgul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add EPG channel name match threshold setting

  • Introduce new setting to control channel name matching similarity
  • Add configuration for minimum similarity percentage when matching EPG channels
  • Update settings XML, language strings, and implementation to support new threshold

Copy link
Author

@alpgul alpgul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set it to 100%. I also set the increment amount to 5. the similarity between BBC1 and BBC4 comes out as 75%, it shows good performance at 80% threshold

- Add StringComparer utility to calculate string similarity
- Enhance channel and EPG matching to use fuzzy string comparison
- Update channel and EPG finding methods to support more flexible matching
- Modify EPG entry description to include channel ID
- Improve EPG loading efficiency by filtering out empty entries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants