on 08-21-201705:14 AM - last edited on 08-24-201707:41 AM by ian.anderson
Implement Classification algorithms based on machine learning as Operators in Spatial Modeler. They can be used to perform multi-class prediction.
Options might include
When to use
CART Decision Tree
Decision Trees use a chain of simple decisions based on the results of sequential tests for class label assignment. The branches of the DT are composed of sets of decision sequences where tests are applied at the nodes of the tree and the leaves represent the class labels.
· Simple and easy to understand
· less influenced by outliers so good for classifying noisy data.
K Nearest Neighbors
Find the K nearest neighbors of each point, and assign the most occurring class to the point
· Good for uniformly sampled data.
· This is the go-to method for binary classification problems (Yes/No)
· It is affected by noise. So a clean training data is needed.
A classification technique based on Bayes’ Theorem with an assumption of independence among predictors. Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
· When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models and requires less training data.
Find the neighbors within a fixed radius each point, and assign the most occurring class to the point
· Better choice than K Nearest Neighbors when data is not uniformly sampled
In Random Forest, we grow multiple trees as opposed to a single tree in CART model. Each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes
· Multiple trees as opposed to a single tree in CART model
· considered to be a panacea of all data science problems. Generally start with this classifier and evaluate if the results are appropriate.
Support vector Machine (SVM)
Support Vector Machine (SVM) performs classification by constructing hyperplanes in a multidimensional space that separates different class,
· Works extremmely well with clear margins of separation. Don't use if the target classes are overlapping
· Does not perform well on highly skewed/imbalanced training data sets.