Skip to content

Commit

Permalink
Added cfgs and datasets and more
Browse files Browse the repository at this point in the history
Further, in cfg alpha has been changed by beta (and the code that control that).
Updated Readme.md
  • Loading branch information
i02momuj committed Apr 12, 2019
1 parent b56d91e commit ce25638
Show file tree
Hide file tree
Showing 6 changed files with 2,786 additions and 34 deletions.
53 changes: 27 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# EAGLET: Evolutionary AlGorithm for multi-Label Ensemble opTimization

EME is an evolutionary approach for the automatic generation of ensembles of diverse and competitive multi-label classifiers. It takes into account characteristics of the multi-label data such as the relationships among the labels, imbalance of the data, and the complexity of the output space. The ensemble is based on small projections of the label space, considering in this way the relationships among the labels but also reducing the computational cost in cases where the output space is complex. Further, EME takes into account all the labels approximately the same number of times in the ensemble, regardless of their frequency or its ease to be predicted; so that the imbalance of the data is considered and the infrequent labels are not ignored. For that, the fitness function takes into account both the predictive performance of the model and the number of times that each label is considered in the ensemble.
EAGLET is an algorithm for the selection of simple, accurate and diverse multi-label classifiers to build an ensemble. This method implicitly considers characteristics of the data, such as the relationship among labels and the imbalance degree of the labels when building the ensemble. In order to model the relationships among labels, each classifier of the ensemble is focused on a small subset of the label space, resulting in models with a relative low computational complexity and lower imbalance in the output space. The resulting ensemble is generated incrementally given the population of multi-label classifiers, so the member that best fits to the ensemble generated so far, considering both predictive performance and diversity, is selected.

More information about this algorithm can be find in the following article:
> Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, Sebastián Ventura. "Combining Accurate and Diverse Multi-Label Classifiers Based on Projections of the Output Space Using Evolutionary Algorithms". Submitted to ---. (2019).
Expand All @@ -18,40 +18,41 @@ The configuration file is a xml file including the parameters of the evolutionar

```xml
<experiment>
<process algorithm-type="eme.EnsembleAlgorithm">
<process algorithm-type="eaglet.algorithm.MLCAlgorithm">
<rand-gen-factory seed="10"/>
<parents-selector type="net.sf.jclec.selector.TournamentSelector">
<tournament-size>2</tournament-size>
</parents-selector>
<population-size>50</population-size>
<max-of-generations>25</max-of-generations>
<recombinator type="eme.rec.UniformModelCrossover" rec-prob="0.8" />
<mutator type="eme.mut.PhiBasedIntraModelMutator" mut-prob="0.2" />
<max-of-generations>25</max-of-generations>
<recombinator type="eaglet.recombinator.RandomCrossover" rec-prob="0.7" />
<mutator type="eaglet.mutator.RandomMutator" mut-prob="0.2" />
<number-classifiers>12</number-classifiers>
<number-labels-classifier>3</number-labels-classifier>
<prediction-threshold>0.5</prediction-threshold>
<use-coverage>true</use-coverage>
<beta-member-selection>0.75</beta-member-selection>
<dataset>
<train-dataset>data/emotions_train1.arff</train-dataset>
<test-dataset>data/emotions_test1.arff</test-dataset>
<xml>data/emotions.xml</xml>
<train-dataset>data/Yeast/Yeast-train1.arff</train-dataset>
<test-dataset>data/Yeast/Yeast-test1.arff</test-dataset>
<xml>data/Yeast/Yeast.xml</xml>
</dataset>
<listener type="eme.EnsembleListener">
<listener type="eaglet.algorithm.MLCListener">
<report-dir-name>reports/EnsembleMLC</report-dir-name>
<global-report-name>summaryEnsembleMLC</global-report-name>
<report-frequency>10</report-frequency>
<report-frequency>10</report-frequency>
</listener>
</process>
</experiment>

```

* The configuration file must start with the ```<experiment>``` tag and then the ```<process>``` tag, the last indicating the class with the evolutionary algorithm, in our case ```eme.EnsembleAlgorithm```.
* The configuration file must start with the ```<experiment>``` tag and then the ```<process>``` tag, the last indicating the class with the evolutionary algorithm, in our case ```eaglet.algorithm.MLCAlgorithm```.
* The ```<rand-gen-factory>``` must determine the seed for random numbers with the ```seed``` attribute. Further, it may indicate the type of the rand-gen-factory, which by default is ```net.sf.jclec.util.random.RanecuFactory```. If several seeds are going to be used, the tag ```<rand-gen-factory multi="true">``` may be used, including inside the different seeds, as follows:
```xml
<rand-gen-factory multi="true">
Expand All @@ -64,13 +65,13 @@ The configuration file is a xml file including the parameters of the evolutionar
* The parents selector is determined with the ```<parents-selector>``` tag. If, for example, the tournament selector is selected, its size is determined with the sub-tag ```<tournament-size>```.
* The size of the population is determined with the ```<population-size>``` tag.
* The number of generations of the evolutionary algorithm is determined with the ```<max-of-generations>``` tag.
* The ```<recombinator>``` tag determines the type of recombinator or crossover operator. In EME, three crossover operators are implemented: ```ModelCrossover```, ```MultiModelCrossover```, and ```UniformModelCrossover```. Further, the probability to apply this operator to each individual is determined with the ```rec-prob``` attribute.
* The ```<mutator>``` tag determines the type of mutation operator. In EME, two crossover operators are implemented: the basic ```IntraModelMutator```, and ```PhiBasedIntraModelMutator```. Further, the probability to apply this operator to each individual is determined with the ```mut-prob``` attribute.
* The ```<recombinator>``` tag determines the type of recombinator or crossover operator. In EAGLET, ```RandomCrossover``` operator is implemented. Further, the probability to apply this operator to each individual is determined with the ```rec-prob``` attribute.
* The ```<mutator>``` tag determines the type of mutation operator. In EAGLET, the basic ```RandomMutator``` is implemented. Further, the probability to apply this operator to each individual is determined with the ```mut-prob``` attribute.
* The number of classifiers in each ensemble is determined by the ```<number-classifiers>``` tag.
* The number of labels of each classifier, or size of the *k*-labelset, is determined by the ```<number-labels-classifier>``` tag.
* The threshold used for the final prediction of the ensemble is determined with the ```<prediction-threshold>``` tag.
* The ```<use-coverage>``` tag determines if the coverage ratio measure is included in the fitness of the individuals. The coverage ratio takes into account the number of times that each label appears in the ensemble.
* With the ```<dataset>``` tag, the datasets used for training (for the evolutionary algorithm) and testing (for testing the final ensemble obtained by EME) are determined with the tags ```<train-dataset>``` and ```<test-dataset>``` respectively. The ```<xml>``` tag indicates the xml file of the dataset (Mulan format, [see more](http://www.uco.es/kdis/mllresources/)). Several datasets, or several partitions of the same dataset may be used, including the tag ```<dataset multi="true">```, and the different datasets inside, as follows:
* The ```<beta-member-selection>``` tag determines the value of beta (between 0 and 1) for the selection of members in the ensemble; the greater the beta value, the more diverse individuals are selected.
* With the ```<dataset>``` tag, the datasets used for training (for the evolutionary algorithm) and testing (for testing the final ensemble obtained by EAGLET) are determined with the tags ```<train-dataset>``` and ```<test-dataset>``` respectively. The ```<xml>``` tag indicates the xml file of the dataset (Mulan format, [see more](http://www.uco.es/kdis/mllresources/)). Several datasets, or several partitions of the same dataset may be used, including the tag ```<dataset multi="true">```, and the different datasets inside, as follows:
```xml
<dataset multi="true">
<dataset>
Expand All @@ -91,14 +92,14 @@ The configuration file is a xml file including the parameters of the evolutionar
...
</dataset>
```
* The ```<listener>``` tag determines the class used as listener; it is the responsible of creating the different reports during and at the end of the evolutionary process. By default, the listener used is the one of the ```eme.EnsembleListener``` class. The ```<report-dir-name>``` tag determines the directory where the reports of the different executions are stored. The ```<global-report-name>``` tag indicates the filename of the global report file. Finally, the ```<report-frequency>``` tag indicates the frequency with which the reports for the iterations are created.
* The ```<listener>``` tag determines the class used as listener; it is the responsible of creating the different reports during and at the end of the evolutionary process. By default, the listener used is the one of the ```eaglet.algorithm.MLCListener``` class. The ```<report-dir-name>``` tag determines the directory where the reports of the different executions are stored. The ```<global-report-name>``` tag indicates the filename of the global report file. Finally, the ```<report-frequency>``` tag indicates the frequency with which the reports for the iterations are created.

Then, several more characteristics of the evolutionary algorithm could be modified in the configuration file, but they are optional and default values for them are given if they are not included in this file:
* The ```<validation-set>``` tag indicates if the training set is divided into training and validation, in order to evaluate the individuals with a different dataset to which was used to train them. By default, its value is ```false```.
* The ```<evaluator>``` tag determines the class of the evaluator used for evaluating the individuals. Since only one evaluator has been implemented in EME, its default value is ```eme.EnsembleMLCEvaluator```.
* The ```<provider>``` tag determines the class that generates the initial population of individuals. By default, the ```eme.EnsembleMLCCreator``` class is used.
* The ```<evaluator>``` tag determines the class of the evaluator used for evaluating the individuals. Since only one evaluator has been implemented in EAGLET, its default value is ```eaglet.algorithm.MLCEvaluator```.
* The ```<provider>``` tag determines the class that generates the initial population of individuals. By default, the ```eaglet.individualCreator.FrequencyBasedIndividualCreator``` class is used.

Two multi-label datasets (*emotions* and *yeast*) have been included in the repository as example; however, a wide variety of dataset are available at the [KDIS Research Group Repository](http://www.uco.es/kdis/mllresources/). Further, two example configuration files (*Experiment_emotions.cfg* and *Experiment_yeast.cfg*) are also provided.
*Yeast* multi-label dataset has been included in the repository as example; however, a wide variety of dataset are available at the [KDIS Research Group Repository](http://www.uco.es/kdis/mllresources/). Further, the example configuration file (*Experiment_yeast.cfg*) is also provided.

### References

Expand Down
32 changes: 32 additions & 0 deletions cfg/Experiment_Yeast.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<experiment>
<process algorithm-type="eaglet.algorithm.MLCAlgorithm">
<rand-gen-factory seed="10"/>

<parents-selector type="net.sf.jclec.selector.TournamentSelector">
<tournament-size>2</tournament-size>
</parents-selector>

<population-size>50</population-size>
<max-of-generations>25</max-of-generations>

<recombinator type="eaglet.recombinator.RandomCrossover" rec-prob="0.7" />
<mutator type="eaglet.mutator.RandomMutator" mut-prob="0.2" />

<number-classifiers>12</number-classifiers>
<number-labels-classifier>3</number-labels-classifier>
<prediction-threshold>0.5</prediction-threshold>
<beta-member-selection>0.75</beta-member-selection>

<dataset>
<train-dataset>data/Yeast/Yeast-train1.arff</train-dataset>
<test-dataset>data/Yeast/Yeast-test1.arff</test-dataset>
<xml>data/Yeast/Yeast.xml</xml>
</dataset>

<listener type="eaglet.algorithm.MLCListener">
<report-dir-name>reports/EnsembleMLC</report-dir-name>
<global-report-name>summaryEnsembleMLC</global-report-name>
<report-frequency>10</report-frequency>
</listener>
</process>
</experiment>
Loading

0 comments on commit ce25638

Please sign in to comment.