WOWA Training

pipeline status

The WOWA operator (Torra) is a powerfull aggregation operator that allows to combine multiple input values into a single score. This is particulary interesting for detection and ranking systems that rely on multiple heuristics. The system can use WOWA to produce a single meaningfull score.

A PHP implementation of WOWA is available at https://github.com/tdebatty/php-aggregation-operators

The WOWA operator requires two sets of parameters: p weights and w weights. In this project we use a genetic algorithm to compute the best values for p and w weights. For the training, the algorithm uses a dataset of input vectors together with the expected aggregated score of each vector.

Installation

composer require cylab-be/wowa-training

Usage

Example

require __DIR__ . "/vendor/autoload.php"

use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;

use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParemeters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;

// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::DEBUG));

$parameters = new TrainerParameters(
    $logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitilizationMethod);
$trainer = new Trainer($parameters);

// Input data
$data = [
  [0.1, 0.2, 0.3, 0.4],
  [0.1, 0.8, 0.3, 0.4],
  [0.2, 0.6, 0.3, 0.4],
  [0.1, 0.2, 0.5, 0.8],
  [0.5, 0.1, 0.2, 0.3],
  [0.1, 0.1, 0.1, 0.1],
];

// expected aggregated value for each data vector
$expected = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6];

var_dump($trainer->run($data, $expected));

The example above will produce something like:

class RUCD\Training\Solution#56 (3) {
  public $weights_w =>
  array(4) {
    [0] =>
    double(0.31568310640557)
    [1] =>
    double(0.37517587135019)
    [2] =>
    double(0.23165073663557)
    [3] =>
    double(0.077490285608666)
  }
  public $weights_p =>
  array(4) {
    [0] =>
    double(0.67852325915809)
    [1] =>
    double(0.0083157109614166)
    [2] =>
    double(0.082353710617992)
    [3] =>
    double(0.2308073192625)
  }
  public $distance =>
  double(0.51636277259465)
}

The run method returns a solution object, consisting of p weights and w weights to use with the WOWA operator, plus the total distance between the expected aggregated values that were given as parameter, and the aggregated values computed by WOWA using these weights.

Parameters description

Cross validation

Example

require __DIR__ . "/vendor/autoload.php";

use RUCD\Training\Trainer;
use RUCD\Training\TrainerParameters;
use RUCD\Training\SolutionDistance;
use RUCD\Training\SolutionAUC;

use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$populationSize = 100;
$crossoverRate = 60;
$mutationRate = 3;
$selectionMethod = TrainerParameters::SELECTION_METHOD_RWS;
$maxGeneration = 100;
$populationInitializationMethod = TrainerParameters::INITIAL_POPULATION_GENERATION_RANDOM;
$solutionType = new SolutionDistance();

// For logging you can use any implementation of PSR Logger
$logger = new Logger('wowa-training-test');
$logger->pushHandler(new StreamHandler('php://stdout', Logger::WARNING));

$parameters = new TrainerParameters(
    $logger, $populationSize, $crossoverRate, $mutationRate, $selectionMethod, $maxGeneration, $populationInitializationMethod);
$trainer = new Trainer($parameters, $solutionType);

// Input data
$data = [
  [0.1, 0.2, 0.3, 0.4],
  [0.1, 0.8, 0.3, 0.4],
  [0.2, 0.6, 0.3, 0.4],
  [0.1, 0.2, 0.5, 0.8],
  [0.5, 0.1, 0.2, 0.3],
  [0.1, 0.1, 0.1, 0.1],
  [0.1, 0.2, 0.3, 0.4],
  [0.1, 0.8, 0.3, 0.4],
  [0.2, 0.6, 0.3, 0.4],
  [0.1, 0.2, 0.5, 0.8],
  [0.5, 0.1, 0.2, 0.3],
  [0.1, 0.1, 0.1, 0.1],
];

// expected aggregated value for each data vector
$expected = [1,0,0,1,0,1,0,0,0,1,0,0];

var_dump($trainer->runKFold($data, $expected, 3));

The method runKFold runs a k folds cross-validation. Concretely, it separates the dataset in k folds. For each folds, a single fold is retained as the validation data for testing the model, and the remaining k − 1 folds are used as training data. The cross-validation process is then repeated k times, with each of the k folds used exactly once as the validation data. The k results can then be averaged to produce a single estimation. For each tested fold, the Area Under the Curve is also computed to evaluate the classification efficiency (works only expected vector that contains 0 and 1).

As output, the method generates an array that contains the w and p vectors and the AUC value for each fold.

The example above produces result similar to:

array(3) {
  [0]=>
  array(2) {
    ["auc"]=>
    float(0.5)
    ["solution"]=>
    object(RUCD\Training\SolutionDistance)#133 (3) {
      ["weights_w"]=>
      array(4) {
        [0]=>
        float(0.16573697533351)
        [1]=>
        float(0.76165292950897)
        [2]=>
        float(0.024253730247718)
        [3]=>
        float(0.048356364909798)
      }
      ["weights_p"]=>
      array(4) {
        [0]=>
        float(0.20097150002833)
        [1]=>
        float(0.020364990979043)
        [2]=>
        float(0.17636230606784)
        [3]=>
        float(0.60230120292479)
      }
      ["distance"]=>
      float(1.7892117370011)
    }
  }
  [1]=>
  array(2) {
    ["auc"]=>
    float(0)
    ["solution"]=>
    object(RUCD\Training\SolutionDistance)#146 (3) {
      ["weights_w"]=>
      array(4) {
        [0]=>
        float(0.18742088232865)
        [1]=>
        float(0.57233147854378)
        [2]=>
        float(0.22507083815429)
        [3]=>
        float(0.015176800973267)
      }
      ["weights_p"]=>
      array(4) {
        [0]=>
        float(0.076670559592882)
        [1]=>
        float(0.019193144442706)
        [2]=>
        float(0.18316950831007)
        [3]=>
        float(0.72096678765435)
      }
      ["distance"]=>
      float(1.3403524893715)
    }
  }
  [2]=>
  array(2) {
    ["auc"]=>
    float(1)
    ["solution"]=>
    object(RUCD\Training\SolutionDistance)#12 (3) {
      ["weights_w"]=>
      array(4) {
        [0]=>
        float(0.16274887804484)
        [1]=>
        float(0.527446888854)
        [2]=>
        float(0.21225455965351)
        [3]=>
        float(0.097549673447646)
      }
      ["weights_p"]=>
      array(4) {
        [0]=>
        float(0.10891441031576)
        [1]=>
        float(0.023649196569852)
        [2]=>
        float(0.24106562811561)
        [3]=>
        float(0.62637076499877)
      }
      ["distance"]=>
      float(2.0314776184856)
    }
  }
}

References

Check this project on GitLab