diff --git a/README.md b/README.md
index 5a54d4a..772e38f 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
# EAGLET: Evolutionary AlGorithm for multi-Label Ensemble opTimization
-EME is an evolutionary approach for the automatic generation of ensembles of diverse and competitive multi-label classifiers. It takes into account characteristics of the multi-label data such as the relationships among the labels, imbalance of the data, and the complexity of the output space. The ensemble is based on small projections of the label space, considering in this way the relationships among the labels but also reducing the computational cost in cases where the output space is complex. Further, EME takes into account all the labels approximately the same number of times in the ensemble, regardless of their frequency or its ease to be predicted; so that the imbalance of the data is considered and the infrequent labels are not ignored. For that, the fitness function takes into account both the predictive performance of the model and the number of times that each label is considered in the ensemble.
+EAGLET is an algorithm for the selection of simple, accurate and diverse multi-label classifiers to build an ensemble. This method implicitly considers characteristics of the data, such as the relationship among labels and the imbalance degree of the labels when building the ensemble. In order to model the relationships among labels, each classifier of the ensemble is focused on a small subset of the label space, resulting in models with a relative low computational complexity and lower imbalance in the output space. The resulting ensemble is generated incrementally given the population of multi-label classifiers, so the member that best fits to the ensemble generated so far, considering both predictive performance and diversity, is selected.
More information about this algorithm can be find in the following article:
> Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, Sebastián Ventura. "Combining Accurate and Diverse Multi-Label Classifiers Based on Projections of the Output Space Using Evolutionary Algorithms". Submitted to ---. (2019).
@@ -18,40 +18,41 @@ The configuration file is a xml file including the parameters of the evolutionar
- 25
+ 25
- true
+ 0.75
- data/emotions_train1.arff
- data/emotions_test1.arff
- data/emotions.xml
+ data/Yeast/Yeast-train1.arff
+ data/Yeast/Yeast-test1.arff
+ data/Yeast/Yeast.xml
- 10
+ 10
-* The configuration file must start with the `````` tag and then the `````` tag, the last indicating the class with the evolutionary algorithm, in our case ```eme.EnsembleAlgorithm```.
+* The configuration file must start with the `````` tag and then the `````` tag, the last indicating the class with the evolutionary algorithm, in our case ```eaglet.algorithm.MLCAlgorithm```.
* The `````` must determine the seed for random numbers with the ```seed``` attribute. Further, it may indicate the type of the rand-gen-factory, which by default is ```net.sf.jclec.util.random.RanecuFactory```. If several seeds are going to be used, the tag `````` may be used, including inside the different seeds, as follows:
@@ -64,13 +65,13 @@ The configuration file is a xml file including the parameters of the evolutionar
* The parents selector is determined with the `````` tag. If, for example, the tournament selector is selected, its size is determined with the sub-tag ``````.
* The size of the population is determined with the `````` tag.
* The number of generations of the evolutionary algorithm is determined with the `````` tag.
-* The `````` tag determines the type of recombinator or crossover operator. In EME, three crossover operators are implemented: ```ModelCrossover```, ```MultiModelCrossover```, and ```UniformModelCrossover```. Further, the probability to apply this operator to each individual is determined with the ```rec-prob``` attribute.
-* The `````` tag determines the type of mutation operator. In EME, two crossover operators are implemented: the basic ```IntraModelMutator```, and ```PhiBasedIntraModelMutator```. Further, the probability to apply this operator to each individual is determined with the ```mut-prob``` attribute.
+* The `````` tag determines the type of recombinator or crossover operator. In EAGLET, ```RandomCrossover``` operator is implemented. Further, the probability to apply this operator to each individual is determined with the ```rec-prob``` attribute.
+* The `````` tag determines the type of mutation operator. In EAGLET, the basic ```RandomMutator``` is implemented. Further, the probability to apply this operator to each individual is determined with the ```mut-prob``` attribute.
* The number of classifiers in each ensemble is determined by the `````` tag.
* The number of labels of each classifier, or size of the *k*-labelset, is determined by the `````` tag.
* The threshold used for the final prediction of the ensemble is determined with the `````` tag.
-* The `````` tag determines if the coverage ratio measure is included in the fitness of the individuals. The coverage ratio takes into account the number of times that each label appears in the ensemble.
-* With the `````` tag, the datasets used for training (for the evolutionary algorithm) and testing (for testing the final ensemble obtained by EME) are determined with the tags `````` and `````` respectively. The `````` tag indicates the xml file of the dataset (Mulan format, [see more](http://www.uco.es/kdis/mllresources/)). Several datasets, or several partitions of the same dataset may be used, including the tag ``````, and the different datasets inside, as follows:
+* The `````` tag determines the value of beta (between 0 and 1) for the selection of members in the ensemble; the greater the beta value, the more diverse individuals are selected.
+* With the `````` tag, the datasets used for training (for the evolutionary algorithm) and testing (for testing the final ensemble obtained by EAGLET) are determined with the tags `````` and `````` respectively. The `````` tag indicates the xml file of the dataset (Mulan format, [see more](http://www.uco.es/kdis/mllresources/)). Several datasets, or several partitions of the same dataset may be used, including the tag ``````, and the different datasets inside, as follows:
@@ -91,14 +92,14 @@ The configuration file is a xml file including the parameters of the evolutionar
-* The `````` tag determines the class used as listener; it is the responsible of creating the different reports during and at the end of the evolutionary process. By default, the listener used is the one of the ```eme.EnsembleListener``` class. The `````` tag determines the directory where the reports of the different executions are stored. The `````` tag indicates the filename of the global report file. Finally, the `````` tag indicates the frequency with which the reports for the iterations are created.
+* The `````` tag determines the class used as listener; it is the responsible of creating the different reports during and at the end of the evolutionary process. By default, the listener used is the one of the ```eaglet.algorithm.MLCListener``` class. The `````` tag determines the directory where the reports of the different executions are stored. The `````` tag indicates the filename of the global report file. Finally, the `````` tag indicates the frequency with which the reports for the iterations are created.
Then, several more characteristics of the evolutionary algorithm could be modified in the configuration file, but they are optional and default values for them are given if they are not included in this file:
* The `````` tag indicates if the training set is divided into training and validation, in order to evaluate the individuals with a different dataset to which was used to train them. By default, its value is ```false```.
-* The `````` tag determines the class of the evaluator used for evaluating the individuals. Since only one evaluator has been implemented in EME, its default value is ```eme.EnsembleMLCEvaluator```.
-* The `````` tag determines the class that generates the initial population of individuals. By default, the ```eme.EnsembleMLCCreator``` class is used.
+* The `````` tag determines the class of the evaluator used for evaluating the individuals. Since only one evaluator has been implemented in EAGLET, its default value is ```eaglet.algorithm.MLCEvaluator```.
+* The `````` tag determines the class that generates the initial population of individuals. By default, the ```eaglet.individualCreator.FrequencyBasedIndividualCreator``` class is used.
-Two multi-label datasets (*emotions* and *yeast*) have been included in the repository as example; however, a wide variety of dataset are available at the [KDIS Research Group Repository](http://www.uco.es/kdis/mllresources/). Further, two example configuration files (*Experiment_emotions.cfg* and *Experiment_yeast.cfg*) are also provided.
+*Yeast* multi-label dataset has been included in the repository as example; however, a wide variety of dataset are available at the [KDIS Research Group Repository](http://www.uco.es/kdis/mllresources/). Further, the example configuration file (*Experiment_yeast.cfg*) is also provided.
### References
diff --git a/cfg/Experiment_Yeast.cfg b/cfg/Experiment_Yeast.cfg
new file mode 100644
index 0000000..5e3409e
--- /dev/null
+++ b/cfg/Experiment_Yeast.cfg
@@ -0,0 +1,32 @@
+ 2
+ 50
+ 25
+ 12
+ 3
+ 0.5
+ 0.75
+ data/Yeast/Yeast-train1.arff
+ data/Yeast/Yeast-test1.arff
+ data/Yeast/Yeast.xml
+ reports/EnsembleMLC
+ summaryEnsembleMLC
+ 10
diff --git a/data/Yeast/Yeast-test1.arff b/data/Yeast/Yeast-test1.arff
new file mode 100644
index 0000000..bd3b986
--- /dev/null
+++ b/data/Yeast/Yeast-test1.arff
@@ -0,0 +1,605 @@
+@relation Yeast
+@attribute Att1 numeric
+@attribute Att2 numeric
+@attribute Att3 numeric
+@attribute Att4 numeric
+@attribute Att5 numeric
+@attribute Att6 numeric
+@attribute Att7 numeric
+@attribute Att8 numeric
+@attribute Att9 numeric
+@attribute Att10 numeric
+@attribute Att11 numeric
+@attribute Att12 numeric
+@attribute Att13 numeric
+@attribute Att14 numeric
+@attribute Att15 numeric
+@attribute Att16 numeric
+@attribute Att17 numeric
+@attribute Att18 numeric
+@attribute Att19 numeric
+@attribute Att20 numeric
+@attribute Att21 numeric
+@attribute Att22 numeric
+@attribute Att23 numeric
+@attribute Att24 numeric
+@attribute Att25 numeric
+@attribute Att26 numeric
+@attribute Att27 numeric
+@attribute Att28 numeric
+@attribute Att29 numeric
+@attribute Att30 numeric
+@attribute Att31 numeric
+@attribute Att32 numeric
+@attribute Att33 numeric
+@attribute Att34 numeric
+@attribute Att35 numeric
+@attribute Att36 numeric
+@attribute Att37 numeric
+@attribute Att38 numeric
+@attribute Att39 numeric
+@attribute Att40 numeric
+@attribute Att41 numeric
+@attribute Att42 numeric
+@attribute Att43 numeric
+@attribute Att44 numeric
+@attribute Att45 numeric
+@attribute Att46 numeric
+@attribute Att47 numeric
+@attribute Att48 numeric
+@attribute Att49 numeric
+@attribute Att50 numeric
+@attribute Att51 numeric
+@attribute Att52 numeric
+@attribute Att53 numeric
+@attribute Att54 numeric
+@attribute Att55 numeric
+@attribute Att56 numeric
+@attribute Att57 numeric
+@attribute Att58 numeric
+@attribute Att59 numeric
+@attribute Att60 numeric
+@attribute Att61 numeric
+@attribute Att62 numeric
+@attribute Att63 numeric
+@attribute Att64 numeric
+@attribute Att65 numeric
+@attribute Att66 numeric
+@attribute Att67 numeric
+@attribute Att68 numeric
+@attribute Att69 numeric
+@attribute Att70 numeric
+@attribute Att71 numeric
+@attribute Att72 numeric
+@attribute Att73 numeric
+@attribute Att74 numeric
+@attribute Att75 numeric
+@attribute Att76 numeric
+@attribute Att77 numeric
+@attribute Att78 numeric
+@attribute Att79 numeric
+@attribute Att80 numeric
+@attribute Att81 numeric
+@attribute Att82 numeric
+@attribute Att83 numeric
+@attribute Att84 numeric
+@attribute Att85 numeric
+@attribute Att86 numeric
+@attribute Att87 numeric
+@attribute Att88 numeric
+@attribute Att89 numeric
+@attribute Att90 numeric
+@attribute Att91 numeric
+@attribute Att92 numeric
+@attribute Att93 numeric
+@attribute Att94 numeric
+@attribute Att95 numeric
+@attribute Att96 numeric
+@attribute Att97 numeric
+@attribute Att98 numeric
+@attribute Att99 numeric
+@attribute Att100 numeric
+@attribute Att101 numeric
+@attribute Att102 numeric
+@attribute Att103 numeric
+@attribute Class1 {0,1}
+@attribute Class2 {0,1}
+@attribute Class3 {0,1}
+@attribute Class4 {0,1}
+@attribute Class5 {0,1}
+@attribute Class6 {0,1}
+@attribute Class7 {0,1}
+@attribute Class8 {0,1}
+@attribute Class9 {0,1}
+@attribute Class10 {0,1}
+@attribute Class11 {0,1}
+@attribute Class12 {0,1}
+@attribute Class13 {0,1}
+@attribute Class14 {0,1}
diff --git a/data/Yeast/Yeast-train1.arff b/data/Yeast/Yeast-train1.arff
new file mode 100644
index 0000000..c833810
--- /dev/null
+++ b/data/Yeast/Yeast-train1.arff
@@ -0,0 +1,2054 @@
+@relation Yeast
+@attribute Att1 numeric
+@attribute Att2 numeric
+@attribute Att3 numeric
+@attribute Att4 numeric
+@attribute Att5 numeric
+@attribute Att6 numeric
+@attribute Att7 numeric
+@attribute Att8 numeric
+@attribute Att9 numeric
+@attribute Att10 numeric
+@attribute Att11 numeric
+@attribute Att12 numeric
+@attribute Att13 numeric
+@attribute Att14 numeric
+@attribute Att15 numeric
+@attribute Att16 numeric
+@attribute Att17 numeric
+@attribute Att18 numeric
+@attribute Att19 numeric
+@attribute Att20 numeric
+@attribute Att21 numeric
+@attribute Att22 numeric
+@attribute Att23 numeric
+@attribute Att24 numeric
+@attribute Att25 numeric
+@attribute Att26 numeric
+@attribute Att27 numeric
+@attribute Att28 numeric
+@attribute Att29 numeric
+@attribute Att30 numeric
+@attribute Att31 numeric
+@attribute Att32 numeric
+@attribute Att33 numeric
+@attribute Att34 numeric
+@attribute Att35 numeric
+@attribute Att36 numeric
+@attribute Att37 numeric
+@attribute Att38 numeric
+@attribute Att39 numeric
+@attribute Att40 numeric
+@attribute Att41 numeric
+@attribute Att42 numeric
+@attribute Att43 numeric
+@attribute Att44 numeric
+@attribute Att45 numeric
+@attribute Att46 numeric
+@attribute Att47 numeric
+@attribute Att48 numeric
+@attribute Att49 numeric
+@attribute Att50 numeric
+@attribute Att51 numeric
+@attribute Att52 numeric
+@attribute Att53 numeric
+@attribute Att54 numeric
+@attribute Att55 numeric
+@attribute Att56 numeric
+@attribute Att57 numeric
+@attribute Att58 numeric
+@attribute Att59 numeric
+@attribute Att60 numeric
+@attribute Att61 numeric
+@attribute Att62 numeric
+@attribute Att63 numeric
+@attribute Att64 numeric
+@attribute Att65 numeric
+@attribute Att66 numeric
+@attribute Att67 numeric
+@attribute Att68 numeric
+@attribute Att69 numeric
+@attribute Att70 numeric
+@attribute Att71 numeric
+@attribute Att72 numeric
+@attribute Att73 numeric
+@attribute Att74 numeric
+@attribute Att75 numeric
+@attribute Att76 numeric
+@attribute Att77 numeric
+@attribute Att78 numeric
+@attribute Att79 numeric
+@attribute Att80 numeric
+@attribute Att81 numeric
+@attribute Att82 numeric
+@attribute Att83 numeric
+@attribute Att84 numeric
+@attribute Att85 numeric
+@attribute Att86 numeric
+@attribute Att87 numeric
+@attribute Att88 numeric
+@attribute Att89 numeric
+@attribute Att90 numeric
+@attribute Att91 numeric
+@attribute Att92 numeric
+@attribute Att93 numeric
+@attribute Att94 numeric
+@attribute Att95 numeric
+@attribute Att96 numeric
+@attribute Att97 numeric
+@attribute Att98 numeric
+@attribute Att99 numeric
+@attribute Att100 numeric
+@attribute Att101 numeric
+@attribute Att102 numeric
+@attribute Att103 numeric
+@attribute Class1 {0,1}
+@attribute Class2 {0,1}
+@attribute Class3 {0,1}
+@attribute Class4 {0,1}
+@attribute Class5 {0,1}
+@attribute Class6 {0,1}
+@attribute Class7 {0,1}
+@attribute Class8 {0,1}
+@attribute Class9 {0,1}
+@attribute Class10 {0,1}
+@attribute Class11 {0,1}
+@attribute Class12 {0,1}
+@attribute Class13 {0,1}
+@attribute Class14 {0,1}
diff --git a/data/Yeast/Yeast.xml b/data/Yeast/Yeast.xml
new file mode 100644
index 0000000..16d3ce1
--- /dev/null
+++ b/data/Yeast/Yeast.xml
@@ -0,0 +1 @@
\ No newline at end of file
diff --git a/src/eaglet/algorithm/MLCAlgorithm.java b/src/eaglet/algorithm/MLCAlgorithm.java
index d6a988d..0d32ae2 100644
--- a/src/eaglet/algorithm/MLCAlgorithm.java
+++ b/src/eaglet/algorithm/MLCAlgorithm.java
@@ -4,6 +4,7 @@
import java.util.Arrays;
import java.util.Hashtable;
import java.util.List;
import org.apache.commons.configuration.Configuration;
import eaglet.individualCreator.EagletIndividualCreator;
@@ -170,7 +171,7 @@ private enum ValidationSetTechnique{
boolean weightVotesByFrequency;
- * Alpha value to multiply by distance to the ensemble in member selection
+ * beta value to multiply by distance to the ensemble in member selection
double betaMemberSelection;
@@ -257,10 +258,66 @@ public int getNumberOfEvaluatedIndividuals(){
return this.tableFitness.size();
+ /**
+ * Configure some default aspects and parameters of EME to make the configuration easier
+ *
+ * @param configuration Configuration
+ */
+ private void configureEagletDefaults(Configuration configuration) {
+ //Species
+ configuration.setProperty("species[@type]", "net.sf.jclec.binarray.BinArrayIndividualSpecies");
+ configuration.setProperty("species[@genotype-length]", "1");
+ //Variable
+ configuration.addProperty("variable", "false");
+ //Validation set (only if not provided)
+ if(! configuration.containsKey("validation-set")) {
+ configuration.addProperty("validation-set", "false");
+ }
+ //Evaluator (only if not provided)
+ if(! configuration.containsKey("evaluator[@type]")) {
+ configuration.addProperty("evaluator[@type]", "eaglet.algorithm.MLCEvaluator");
+ }
+ //Provider (only if not provided)
+ if(! configuration.containsKey("provider[@type]")) {
+ configuration.addProperty("provider[@type]", "eaglet.individualCreator.FrequencyBasedIndividualCreator");
+ }
+ //Randgen type (only if not provided)
+ if(! configuration.containsKey("rand-gen-factory[@type]")) {
+ configuration.addProperty("rand-gen-factory[@type]", "net.sf.jclec.util.random.RanecuFactory");
+ }
+ //Parents-selector (only if not provided)
+ if(! configuration.containsKey("parents-selector[@type]")) {
+ configuration.addProperty("parents-selector[@type]", "net.sf.jclec.selector.TournamentSelector");
+ }
+ if(! configuration.containsKey("parents-selector.tournament-size")) {
+ configuration.addProperty("parents-selector.tournament-size", "2");
+ }
+ //Listener type (only if not provided)
+ if(! configuration.containsKey("listener[@type]")) {
+ configuration.addProperty("listener[@type]", "eaglet.algorithm.MLCListener");
+ }
+ //Other parameters
+ if(! configuration.containsKey("predictThreshold")) {
+ configuration.addProperty("predictThreshold", "false");
+ }
+ if(! configuration.containsKey("weightVotesByFrequency")) {
+ configuration.addProperty("weightVotesByFrequency", "false");
+ }
+ }
public void configure(Configuration configuration)
+ configureEagletDefaults(configuration);
try {
@@ -285,6 +342,8 @@ public void configure(Configuration configuration)
* - Generate trainDataset with bagging and validation set with outOfBag instances?
//Use or not a validation set to evaluate individuals
+ useValidationSet = configuration.getBoolean("validation-set");
useValidationSet = configuration.getBoolean("validation-set");
@@ -322,17 +381,17 @@ public void configure(Configuration configuration)
//Get number of labels
numberLabels = datasetTrain.getNumLabels();
- numClassifiers = configuration.getInt("max-number-classifiers");
+ numClassifiers = configuration.getInt("number-classifiers");
predictThreshold = configuration.getBoolean("predictThreshold");
predictionThreshold = configuration.getDouble("prediction-threshold");
variable = configuration.getBoolean("variable");
- maxNumLabelsClassifier = configuration.getInt("max-number-labels-classifier");
+ maxNumLabelsClassifier = configuration.getInt("number-labels-classifier");
weightVotesByFrequency = configuration.getBoolean("weightVotesByFrequency");
- betaMemberSelection = configuration.getDouble("alpha-member-selection");
+ betaMemberSelection = configuration.getDouble("beta-member-selection");
// Set provider settings
((EagletIndividualCreator) provider).setMaxNumLabelsClassifier(maxNumLabelsClassifier);
@@ -589,7 +648,7 @@ private EnsembleMLC generateEnsemble(List members, int n){
- private List selectEnsembleMembers(List individuals, int n, int [] expectedVotes, double alpha){
+ private List selectEnsembleMembers(List individuals, int n, int [] expectedVotes, double beta){
//Copy of the expectedVotes array
int [] expectedVotesCopy = new int[numberLabels];
System.arraycopy(expectedVotes, 0, expectedVotesCopy, 0, numberLabels);
@@ -597,7 +656,7 @@ private List selectEnsembleMembers(List individuals, i
//Weights for each label
double [] weights = weightsPerLabel.clone();
-// double alpha = 0.5;
+// double beta = 0.5;
byte [][] EnsembleMatrix = new byte[n][numberLabels];
List members = new ArrayList();
@@ -625,7 +684,7 @@ private List selectEnsembleMembers(List individuals, i
//Update fitness for all individuals
for(int i=0; i selectEnsembleMembers(List individuals, i
double [] candidatesFitness = new double[candidates.size()];
for(int i=0; i