Kmeans-NMI-DMIFS

Automatic hybrid feature selection by converting continuous to discrete data using combination of Kmeans and NMI along with DMIFS

The proposed hybrid method has two sections. First, a two-stage filter method proposed is called (Kmeans-NMI) so that the features with continuous values are converted to the respective features with discrete values using the concept of Kmeans clustering algorithm. Second the most informative features are selected using the concept of Maximum Relevance Minimum Redundancy (MRMR) with Normalized Mutual Information (NMI). Then, the existing filter Dynamic Mutual Information Feature Selection (DMIFS) is automatically applied to generate the final feature subset. In fact, the hybrid Kmeans-NMI-DMIFS is the combination of a proposed filter method (Kmeans-NMI) and the existing automatic filter feature selection approach (DMIFS). Experiments on high- dimensional medical data sets validate that Kmeans-NMI-DMIFS could automatically determine the number of selected features and simultaneously achieve the best performance in comparison with state-of-the-art methods.

INSTRUCTION:

Run mainKmenas.m with matlab
line 4-19 Read Data Sets
line 20-23 division data & label
line 32-50 clustering feature with additive K-means
line 53-88 calculate Normalize Mutual Information
line 89-93 calculate Dynamic Mutual Information Feature Selection
line 97-99 calculate Performance with ACC-Precision-Recall & F1

Performance: These studies use stratified 10-fold cross validation to randomly divide the original dataset into training and test sets in order to evaluate performance. Decision Tree (DT): The decision tree classifier is used as the machine learning model to categorize the training data for the dataset. This classifier is employed in this study since the decision tree is also the fundamental tool for feature selection in the feature importance ranking methodology. Second, the suggested method produces a dataset that satisfies the criteria for the decision tree method. It considers the dataset's minimal amount of features and data, the heterogeneity of the data for each feature, and the absence of redundant or duplicate data in the newly created dataset. Random Forest (RF): RF is a multi-decision tree ensemble technique that uses majority voting and has a several uses in high-dimensional and challenging scenarios for pattern identification. Decision trees have a lot of variance, therefore RF tries to reduce it. K-Nearest Neighbors (KNN): When data is not easily separable, KNN is one of the machine learning algorithms that has acceptable classification performance. KNN searches all previous experiences for the K closest experiences (data points), which are referred to as the K nearest neighbors.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Datasets		Datasets
Acc.m		Acc.m
Classify.m		Classify.m
Classify1.m		Classify1.m
DMIFS.m		DMIFS.m
DTree.m		DTree.m
README.md		README.md
cmi.m		cmi.m
emgm.m		emgm.m
feature_clustering.m		feature_clustering.m
figpreciosion.m		figpreciosion.m
kmean.m		kmean.m
kmeans.m		kmeans.m
mainKmenas.m		mainKmenas.m
mi.m		mi.m
miauto.m		miauto.m
mutualinfo.m		mutualinfo.m
newAcc1.m		newAcc1.m
peformance.m		peformance.m
peformance1.m		peformance1.m
precision.xlsx		precision.xlsx
randomforest.m		randomforest.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kmeans-NMI-DMIFS

About

Releases

Packages

Languages

sadeqa20/Kmeans-NMI-DMIFS

Folders and files

Latest commit

History

Repository files navigation

Kmeans-NMI-DMIFS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages