The WOWA operator (Torra) is a powerfull aggregation operator that allows to combine multiple input values into a single score. This is particulary interesting for detection and ranking systems that rely on multiple heuristics. The system can use WOWA to produce a single meaningfull score.
A Java implementation of WOWA is available at https://github.com/tdebatty/java-aggregation.
The WOWA operator requires two sets of parameters: p weights and w weights. In this project we use a genetic algorithm to compute the best values for p and w weights. For the training, the algorithm uses a dataset of input vectors together with the expected aggregated score of each vector.
This project is a Java implementation of the PHP wowa-training project.
Using maven :
<dependency>
<groupId>be.cylab</groupId>
<artifactId>java-wowa-training</artifactId>
<version>0.0.4</version>
</dependency>
https://mvnrepository.com/artifact/be.cylab/java-wowa-training
public static void main(String[] args) {
Logger logger = Logger.getLogger(Trainer.class.getName());
logger.setLevel(Level.INFO);
int population_size = 100;
int crossover_rate = 60;
int mutation_rate = 10;
int max_generation = 110;
SelectionMethod selection = SelectionMethod.RWS;
GenerationMethod generation = GenerationMethod.RANDOM
TrainerParameters parameters = new TrainerParameters(logger, population_size,
crossover_rate, mutation_rate, max_generation, selection_method, generation_population_method);
//Input data
List<List<Double>> data = new ArrayList<>();
data.add(new ArrayList<>(Arrays.asList(0.1, 0.2, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.8, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.2, 0.6, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.2, 0.5, 0.8)));
data.add(new ArrayList<>(Arrays.asList(0.5, 0.1, 0.2, 0.3)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.1, 0.1, 0.1)));
//Expected aggregated value for each data vector
List<Double> expected = new ArrayList<>(Arrays.asList(0.1, 0.2, 0.3, 0.4, 0.5, 0.6));
//Create object for the type of Solution (fitness score evaluation)
SolutionDistance solution_type = new SolutionDistance();
//Create trainer object
Trainer trainer = new Trainer(parameters, solution_type);
AbstractSolution solution = trainer.run(data, expected);
//Display solution
System.out.println(solution);
}
The example above will produce something like:
SolutionDistance{
weights_w=[0.1403303611048977, 0.416828569516884, 0.12511121306189063, 0.1872211165629538, 0.1305087298401635],
weights_p=[0.0123494228072248, 0.10583088288437666, 0.5459452827654444, 0.17470250892324257, 0.1611718492107217],
distance=8.114097675242476}
The run method returns a solution object, consisting of p weights and w weights to use with the WOWA operator, plus the total distance between the expected aggregated values that are given as parameter, and the aggregated values computed by WOWA using these weights.
The method run can be used with ArrayList as the above example or with file name. One of these csv file names contains the data and the second contains the expected results.
population_size : size of the population in the algorithm. Suggested value : 100
crossover_rate : defines the percentage of population generated by crossover. Must be between 1 and 100. Suggested value : 60
mutation_rate : define the probability of random element change in the population. Must be between 1 and 100. Suggested value : 15
selection_method : Determine the method used to select element in the population (for generate the next generation). SELECTION_METHOD_RWS for Roulette Wheel Selection and SELECTION_METHOD_TOS for Tournament Selection.
max_generation : Determine the maximum number of iteration of the algorithm.
generation_population_method: Determine the method used to generate the initial population. POPULATION_INITIALIZATION_RANDOM for a full random initialization and POPULATION_INITIALIZATION_QUASI_RANDOM for a population with specific elements.
The algorithm is built to be used with different methods to evaluate the fitness score of each chromosome. Two different criteria are already implemented : distance and AUC.
It is possible to create new Solution type with new evaluation criterion. The new Solution type must inherit of AbstractSolution class and override the method computeScoreTo. It is also necessary to modify the method createSolutionObject method in the Factory class.
public static void main(String[] args) {
Logger logger = Logger.getLogger(Trainer.class.getName());
logger.setLevel(Level.INFO);
int population_size = 100;
int crossover_rate = 60;
int mutation_rate = 10;
int max_generation = 110;
int selection_method = TrainerParameters.SELECTION_METHOD_RWS;
int generation_population_method = TrainerParameters.POPULATION_INITIALIZATION_RANDOM;
TrainerParameters parameters = new TrainerParameters(logger, population_size,
crossover_rate, mutation_rate, max_generation, selection_method, generation_population_method);
//Input data
List<List<Double>> data = new ArrayList<>();
data.add(new ArrayList<>(Arrays.asList(0.1, 0.2, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.8, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.2, 0.6, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.2, 0.5, 0.8)));
data.add(new ArrayList<>(Arrays.asList(0.5, 0.1, 0.2, 0.3)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.1, 0.1, 0.1)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.2, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.8, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.2, 0.6, 0.3, 0.4)));
data.add(new ArrayList<>(Arrays.asList(0.5, 0.1, 0.2, 0.3)));
data.add(new ArrayList<>(Arrays.asList(0.1, 0.1, 0.1, 0.1)));
//Expected aggregated value for each data vector
List<Double> expected = new ArrayList<>(Arrays.asList(1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0,1.0));
//Create object for the type of Solution (fitness score evaluation)
SolutionDistance solution_type = new SolutionDistance(data.get(0).size());
//Create trainer object
Trainer trainer = new Trainer(parameters, solution_type);
List<AbstractSolution> solutions = trainer.runKFold(data, expected, 2, 2);
//Display solution
for (AbstractSolution solution : solutions) {
System.out.println(solution.getFitnessScore());
System.out.printlnt(solution.getAucRoc());
System.out.println(solution.getAucPr());
}
}
The method runKFold runs a k folds cross-validation. Concretely, it separates the dataset in k folds. For each folds, a single fold is retained as the validation data for testing the model, and the remaining k âˆ’ 1 folds are used as training data. The cross-validation process is then repeated k times, with each of the k folds used exactly once as the validation data. The k results can then be averaged to produce a single estimation. For each tested fold, the Area Under the Curve is also computed to evaluate the classification efficiency (works only expected vector that contains 0 and 1).
The code above produces a result similar to:
SolutionDistance{
weights_w=[0.8673383311511217, 0.04564604584006219, 0.0647437341741078, 0.022271888834708403],
weights_p=[0.5933035227430291, 0.10784413855996985, 0.03387258778518031, 0.26497975091182074],
fitness score=2.2260299633096268}=
0.16666666666666666
SolutionDistance{
weights_w=[0.7832984118592771, 0.12307744745817546, 0.07982187970335382, 0.013802260979193624],
weights_p=[0.01945033161182157, 0.3466399858254755, 0.18834296208558235, 0.44556672047712065],
fitness score=1.7056044468736795}=
0.4166666666666667
As output, the method runKFold return an ArrayList that contains the best solution for each fold. A k fold cross-validation compute also the AUC of a ROC curve and of a Precision-Recall curve. The method runKFold takes as argument the dataset (data and expected result) the number of folds used in the cross validation and a value that can increase the number of alert is this number is too low. This method is interesting to increase the penalty to do not detect an alert.
As for a simple learning, the method runKFold can be used as the example above or with CSV files. In this case, the arguments are String that are the file names.