
        Criteria

Contents
1       Introduction
2       Usage
3       Adding new criteria


1       Introduction

Criteria-module contains functions that allow the calculation of the error
of a given solution. They are designed to make it easy to add new criteria
without affecting algorithms etc. by providing uniform interface.


2       Usage

If you only need to pass a criterion to an algorithm, you can do it as
follows:

First, create suitable DistanceInfo-object. See distance.txt for more info.
ciCriterionDefaultDistance returns the identifier of the default distance
function for the criterion and it should be used unless there is a good
reason to do otherwise.

Second, create CriterionInfo-object with ciNew. You pass it the TRAININGSET,
DistanceInfo, criterion type and if you are making a user-defined criterion,
a set of function pointers. If Criterion-parameter is not CR_User, the
function pointer values have no effect on anything.

Note that the CriterionInfo-object should only be used with the TRAININGSET
it was initialized with. Initialization step is allowed to calculate extra
information e.g. to speed up later computations.

Third, pass the object to the algorithm you are using.

Fourth, delete the CriterionInfo-object with diDelete. After that you may
delete the DistanceInfo-object as well.


If you need to use the CriterionInfo-object, there are four macros that
should be used. ciEvaluate calculates the error of the solution.
ciPartitionOptimally performs partitioning, ciLocalPartition partitions
only points belonging to specified cluster, ciCalculateOptimalCentroids
updates the model.

The latter three functions use array of integers to tell which clusters
have changed and ciLocalPartition takes cluster index as well. These arrays
have 0 if the cluster hasn't changed and non-zero value if it has. Some
operations can be performed faster if we know which clusters have been
changed and which ones not. If in doubt, fill the arrays with non-zero values.
Arrays are supposed to have sufficient length to hold one value for all
clusters. These arrays are also passed as parameters to some algorithms.
Return value for ciEvaluate is float, for others the number of changed
clusters.


3       Adding new criteria

Since basic operations as partitioning and centroid update may be dependent
on the criterion because of some assumptions about what the model actually
is etc. you must provide functions that perform these basic operations. Note
that if partitioning and centroid update is done the same way as for MSE, then
you can use the MSEPartition, MSECentroid and MSELocal.

When defining your own criterion, you can provide a function that initializes
any criterion-specific data you may have. This data is then passed to your
functions through AuxiliaryData-pointer. You must provide both initialization
and destruction functions. The criterion-specific data may be anything you
want. Note that if you use MSEPartition you must create the data it expects
as done in MSEInit in criteria.c. If your function doesn't use the auxiliary
data, then indicate it e.g. with "void* UnusedParameter" in the parameter
list.

The functions that update the centroids and partitioning have no specific
restrictions on them, excepth they should do the job from the point of view
of your criterion.





