Classification in Machine Learning is the problem that identifies to which set of categories does a new observation belong. Classification falls under the category of Supervised Machine Learning
.
Any algorithm that implements classification is known as classifier
The classifiers supported in PHP-ML are
The train
and predict
method are same for all classifiers. The only difference would be in the underlying algorithm used.
Before we can start with predicting a new observation, we need to train our classifier. Consider the following code
// Import library
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;
// Data for training classifier
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]]; // Training samples
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];
// Initialize the classifier
$classifier = new SVC(Kernel::LINEAR, $cost = 1000);
// Train the classifier
$classifier->train($samples, $labels);
The code is pretty straight forward. $cost
used above is a measure of how much we want to avoid misclassifying each training example. For a smaller value of $cost
you might get misclassified examples. By default it is set to 1.0
Now that we have the classifier trained we can start making some actual predictions. Consider the following codes that we have for predictions
$classifier->predict([3, 2]); // return 'b'
$classifier->predict([[3, 2], [1, 5]]); // return ['b', 'a']
The classifier in the case above can take unclassified samples and predicts there labels. predict
method can take a single sample as well as an array of samples.
The classfier for this algorithm takes in two parameters and can be initialized like
$classifier = new KNearestNeighbors($neighbor_num=4);
$classifier = new KNearestNeighbors($neighbor_num=3, new Minkowski($lambda=4));
$neighbor_num
is the number of nearest neighbours to scan in knn algorithm while the second parameter is distance metric which by default in first case would be Euclidean
. More on Minkowski can be found here.
Following is a short example on how to use this classifier
// Training data
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];
// Initialize classifier
$classifier = new KNearestNeighbors();
// Train classifier
$classifier->train($samples, $labels);
// Make predictions
$classifier->predict([3, 2]); // return 'b'
$classifier->predict([[3, 2], [1, 5]]); // return ['b', 'a']
NaiveBayes Classifier
is based on Bayes' theorem
and does not need any parameters in constructor.
The following code demonstrates a simple prediction implementation
// Training data
$samples = [[5, 1, 1], [1, 5, 1], [1, 1, 5]];
$labels = ['a', 'b', 'c'];
// Initialize classifier
$classifier = new NaiveBayes();
// Train classifier
$classifier->train($samples, $labels);
// Make predictions
$classifier->predict([3, 1, 1]); // return 'a'
$classifier->predict([[3, 1, 1], [1, 4, 1]); // return ['a', 'b']
Till now we only used arrays of integer in all our case but that is not the case in real life. Therefore let me try to describe a practical situation on how to use classifiers.
Suppose you have an application that stores characteristics of flowers in nature. For the sake of simplicity we can consider the color and length of petals. So there two characteristics would be used to train our data.
color
is the simpler one where you can assign an int value to each of them and for length, you can have a range like(0 mm,10 mm)=1 , (10 mm,20 mm)=2
. With the initial data train your classifier. Now one of your user needs identify the kind of flower that grows in his backyard. What he does is select thecolor
of the flower and adds the length of the petals. You classifier running can detect the type of flower ("Labels in example above")