Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] ShapeletTransform: binary ig calculation problem #1322

Open
zjeqw opened this issue Mar 18, 2024 · 2 comments
Open

[ENH] ShapeletTransform: binary ig calculation problem #1322

zjeqw opened this issue Mar 18, 2024 · 2 comments
Labels
transformations Transformations package

Comments

@zjeqw
Copy link

zjeqw commented Mar 18, 2024

Describe the bug

The current _calc_binary_ig( ) evaluates split points between data points with the same feature values but different labels, which might not be suitable for datasets that contain a lot of such data points.

Steps/Code to reproduce the bug

from aeon.transformations.collection.shapelet_based._shapelet_transform import _calc_binary_ig
orderline = [(2,-1),(2,-1),(2,1),(3,1),(3,1)]
c1, c2 = 3, 2
_calc_binary_ig(orderline,c1,c2)

Expected results

0.42

Actual results

0.97

Versions

System:
python: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
executable: c:\xxx\python.exe
machine: Windows-10-10.0.19041-SP0

Python dependencies:
pip: 22.3.1
setuptools: 57.4.0
scikit-learn: 1.4.0
aeon: 0.7.1
statsmodels: None
numpy: 1.24.0
scipy: 1.10.1
pandas: 2.0.3
matplotlib: 3.5.0
joblib: 1.3.2
numba: 0.58.1
pmdarima: None
tsfresh: None

@zjeqw zjeqw added the bug Something isn't working label Mar 18, 2024
@MatthewMiddlehurst MatthewMiddlehurst added the transformations Transformations package label Mar 18, 2024
@TonyBagnall
Copy link
Contributor

thanks for this, we will take a look next week

@TonyBagnall
Copy link
Contributor

next week became next month sorry about that....

I dont think this really constitutes a bug really, its true to the algorithm.

I guess for the above you are recommending ignoring splits such as
[(2,-1), (2,-1)], [(2,1),(3,1),(3,1)]
so we would then evaluate (default split)
[ ] [(2,-1), (2,-1),(2,1),(3,1),(3,1)]
skip
[(2,-1)] [(2,-1),(2,1),(3,1),(3,1)] split == 0 I think by the logic
and
[(2,-1),(2,-1)] ,[(2,1),(3,1),(3,1)] split == 1

then continue with
[(2,-1), (2,-1),(2,1)] [(3,1),(3,1)] split == 2

I can enforce this

    # evaluate each split point
    for split in range(len(orderline)):
        next_class = orderline[split][1]  # +1 if this class, -1 if other
        # Check here that the distance is different to the next one
        if split == 0 and orderline[split][0] == orderline[split+1][0]:
            continue
        elif orderline[split][0] == orderline[split-1][0]:
            continue

need to double check the logic a bit confusing about first item, but this gives me IG
0.770950 not of 0.42

@TonyBagnall TonyBagnall removed the bug Something isn't working label May 20, 2024
@TonyBagnall TonyBagnall changed the title [BUG] ShapeletTransform: binary ig calculation problem [ENH] ShapeletTransform: binary ig calculation problem May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transformations Transformations package
Projects
None yet
Development

No branches or pull requests

3 participants