Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information Theoretic Methods #8

Open
FaridehJalali85 opened this issue Sep 21, 2021 · 5 comments
Open

Information Theoretic Methods #8

FaridehJalali85 opened this issue Sep 21, 2021 · 5 comments

Comments

@FaridehJalali85
Copy link

Hi Adam,

I have 500 examples, continues outcome ( above and below 0) and 33000 features. I would like to use feast,... for finding the most informative features. As far as I know, I have to discretise the outcome. Can you please advise how I should that? Also do you aware of any information theoretic packages that can deal with the continues outcome?

Please see some of numbers in the list for the outcome.

0.55228076
-0.3197724
0.58774863
-0.5174945
-0.173138
1.27375815
0.78408593
0.74372201
-1.3886196
0.43545769
-0.0689654
0.18626918
1.8202002
0.10355088
-0.0560193
0.29268956
-0.4401979
1.61399178
-0.704535
0.00430693
1.41157343
-0.0646488
-1.4474665
0.4307273
0.44970806
-0.3885697
1.10758465
-0.1339137
0.12522725
-0.3379575
-1.099666
0.31524279
-0.7666343
0.22144974
-0.8017797
-1.1909404
-0.7958541
-1.5830851
1.03128861
-1.0312886
-1.736507
0.77242535
1.06106625
0.95375794
-0.8626751
-0.4025419
-0.1078819
0.71006597
0.5422753
-2.2562499
-1.3238074
-1.919012
1.24519952
-0.4930275
0.47848871
1.21762188
-0.6239719
1.42333462
-0.0086139
0.89435119
-1.698852
1.77679865
-0.038772
-0.2569066
0.28371043
-0.360848
-0.2658198
0.8440787
-0.9812682
0.67719571
0.31977237
-2.0077154
0.48332361
0.70453501
0.52242483
-0.9537579
-0.8197281
-1.2835121
-0.5076713
0.35625516
0.03015302
0.05601926
-0.1994326
1.13176806
-2.3148972
1.3886196
0.5724601
-1.8925387
-1.1822326
-0.2926896
0.37467286
-0.6610331
-0.2971879
-1.3555018
-1.5536207
0.73805488
-0.3746729
-0.6031758
-0.6935372
1.43529606
1.48532767
0.629213
0.19943255
-0.7211937
-1.068659
-0.4165933
0.01292112
1.26412388
-0.9335899
0.22586574
-2.2044686
-0.6450414
-0.5724601
1.00954199
0.05170609
-1.0840322
-0.5027781
-0.7782424
-0.2258657
-0.5623418
-0.1644006
-0.7156187
-0.8879448
-0.2792295
-0.0776024
-0.4544784
-0.8137162
0.93358993
0.73241136
1.89253869
-1.366375
-1.472472
0.15131773
-0.6135407
-0.1165502
0.80177968
0.92695117
1.71737513
0.77824241
-1.4853277
-0.2347111
-0.5522808
0.85644338
0.09922176
-0.5573042
-1.1566322
0.34252111
0.37005675
1.51180219
-1.5118022
0.02153627
0.37929696
1.28351212
-0.0948945
-0.7551286
1.59834762
0.14696264
-0.4784887
1.56818275
0.60317579
0.82576977
1.05353422
0.33795753
0.83184174
-2.702943
-0.3107197
0.72679107
1.40000431
-0.379297
-0.2702843
-0.4307273
1.18223265
0.3516698
1.13997784
1.73650701
0.60835006
1.47247197
0.03446219
1.55362072
-0.5224248
1.06865897
2.46384107
0.38856968
-0.333401
0.18188866
2.31489724
-0.3654485
-0.5422753
1.36637495
-0.112215
-0.3152428
0.42600652
-0.1426103
0.94027009
-0.7899563
-0.5472712
0.39787592
1.25460554
-1.3135329
0.42129522
0.12088759
0.86267505
0.99530565
-1.4233346
0.97432155
0.19065327
0.26581978
-0.6664013
-2.158
0.9007945
0.0776024
-0.3016924
-1.9763958
0.27475411
1.08403222
0.81972809
0.23028616
2.00771543
1.33422361
0.55730423
-0.9402701
0.76086887
-0.6083501
0.14261033
-0.4354577
1.6634623
-1.009542
-0.473665
-0.6880696
0.34709182
-1.6634623
0.65035348
-0.3562552
0.92035301
0.85024479
-0.6717887
1.04606148
-0.4978969
0.4544784
-1.680894
-2.1157713
-0.1469626
1.94690278
-2.041138
0.49789692
-1.1650809
0.11655021
-1.1317681
1.03864671
-1.6300412
2.15799996
1.16508088
0.30169235
0.23471108
-2.5652793
0.01722854
-2.382787
0.46885224
-0.8318417
0.038772
2.92573583
0.41659325
0.41190048
0.58263735
-0.2747541
-1.6139918
-0.1687677
-0.0430825
0.78995634
0.40721675
-1.0918158
-1.1236339
-1.2176219
0.25690662
-1.5393809
-0.030153
-1.7562965
1.52544651
-0.9882626
-1.3774121
0.83794453
-0.6771957
0.24801382
-1.4115734
0.27028425
-1.7767986
0.56739369
-2.0770094
-1.2546055
-1.7173751
-0.9007945
-0.2126306
-0.1818887
-1.0763138
-0.3470918
-0.2038279
-1.2086323
0.07328321
0.20822723
-0.8564434
1.09966602
0.96742157
1.97639581
-0.2837104
-0.920353
0.71561873
1.91901198

Thanks
Fari

@Craigacp
Copy link
Owner

MIToolbox operates on discrete inputs so you will need to discretise them before using it, otherwise it will apply a standard discretisation which probably doesn't do what you want. In the past we've used 10 bins of equal width in the range (min, max) and that has tended to work reasonably well.

I believe scikit-learn has a continuous/discrete mutual information calculation, or there are packages like ITE (https://bitbucket.org/szzoli/ite-in-python/src/master/) which provide many different estimators for the mutual information.

@FaridehJalali85
Copy link
Author

Thanks Adam

Can you please elaborate more about 10 bins of equal width in the range (min, max) for the outcome? Is there any other approach that we can map the continuous output to the classification task?
Thanks
Fari

@Craigacp
Copy link
Owner

Craigacp commented Oct 5, 2021

There are many different binning algorithms. Equal frequency binning (where the bin widths are set to ensure each bin has the same number of elements in it) interacts oddly with information theoretic feature selection, as it makes each feature maximum entropy. We used equal width binning in our papers on feature selection and it worked well. You can also set the bins based on mean & std dev if you think the variable is approximately gaussian distributed, or use some meaningful bins if you have domain knowledge about the feature values.

@FaridehJalali85
Copy link
Author

Thanks Adam,

Can we have multi-label class(outcome) with your developed information theoretic feature selection toolbox or should it be only binary?
Thanks

@Craigacp
Copy link
Owner

Multi-class is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants