-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve channel and EPG matching with Sørensen-Dice similarity algorithm #962
base: Piers
Are you sure you want to change the base?
Conversation
alpgul
commented
Feb 21, 2025
- Add StringComparer utility to calculate string similarity
- Enhance channel and EPG matching to use fuzzy string comparison
- Update channel and EPG finding methods to support more flexible matching
- Modify EPG entry description to include channel ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add the algorithm to the MergeEpgDataIntoMedia
method in pvr.iptvsimple\src\iptvsimple\Epg.cpp because it causes performance loss. It also needs to be debugged as it returns "-1" on the first run, contains a bug
74804a4
to
48abda6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const MediaEntry* Media::FindMediaEntry(const std::string& id, const std::string& displayName) const
MergeEpgDataIntoMedia
For optimization purposes, the execution of FindMediaEntry
and MergeEpgDataIntoMedia
methods should be prevented if the media is in recordings. Additionally, an isMedia attribute value should be defined in the channel definition of epg.xml, thus preventing unnecessary performance loss.
@@ -214,6 +214,7 @@ bool EpgEntry::UpdateFrom(const xml_node& programmeNode, const std::string& id, | |||
return false; | |||
|
|||
m_broadcastId = static_cast<int>(programmeStart); | |||
m_plot = id + "\n" + GetNodeValue(programmeNode, "desc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why prepend the id onto the plot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more suitable tag where I can display the EPG Channel ID value in the EPG View? This would help users identify any incorrect matches
Many channel ids are short in nature. for example: BBC1, BBC2, BBC3 etc. if you have these channels but only BBC4 in the xmltv data what happens? Matches for channel ids are supposed to be the exact by there nature as IDs. I think for this feature we should put it behind a setting somewhere a user can switch it on if they choose to do so. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minimum similarity percentage can be added in settings. I included this code m_plot = id + "\n" + GetNodeValue(programmeNode, "desc");
as an example to help in such cases. This is the actual EPG channel ID value being added. I added it to the plot since it doesn't look visually disruptive. Additional usable tags from the EPG screen can be added there if available.
Ok, but this should only be for testing and not in the final PR. I suggest a Debug log statement that can be grepped for to find false positives. The default similarity setting should be 100% match, then users or addons can set this to another value to enable fuzzy matching. You could also supply some labelled values instead of users trying to set a magic number. |
48abda6
to
108c6a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add EPG channel name match threshold setting
- Introduce new setting to control channel name matching similarity
- Add configuration for minimum similarity percentage when matching EPG channels
- Update settings XML, language strings, and implementation to support new threshold
108c6a7
to
88e0626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set it to 100%. I also set the increment amount to 5. the similarity between BBC1
and BBC4
comes out as 75%, it shows good performance at 80% threshold
- Add StringComparer utility to calculate string similarity - Enhance channel and EPG matching to use fuzzy string comparison - Update channel and EPG finding methods to support more flexible matching - Modify EPG entry description to include channel ID - Improve EPG loading efficiency by filtering out empty entries
88e0626
to
cd0cbae
Compare