- Let us say, you are writing a nice and clean Machine Learning code (e.g. Linear Regression).
- As the name of the suggests, cross-validation is the next fun thing after learning Linear Regression because it helps to improve your prediction using the K-Fold strategy.
- But we divide the dataset into equal K parts (K-Folds or cv).
- Then train the model on the bigger dataset and test on the smaller dataset.
- This graph represents the k- folds Cross Validation for the Boston dataset with Linear Regression model.
Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.
Continue reading “Visualizing Cross-validation Code”
- The misclassification error is much lower when surrogate splits are used with decision trees.
- The example shows how decision trees with surrogate splits can be used to improve prediction accuracy in the presence of missing data.
- y_pred1 = predict(RF1,Xtest); confmat1 = confusionmat(Ytest,y_pred1); y_pred2 = predict(RF2,Xtest); confmat2 = confusionmat(Ytest,y_pred2); disp( ‘Confusion Matrix – without surrogates’ ) disp(confmat1) disp( ‘Confusion Matrix – with surrogates’ ) disp(confmat2) Confusion Matrix – without surrogates 67 1 24 13 Confusion Matrix – with surrogates 65 3 4 33
- Decreasing value with number of trees indicates good performance.
- There are several ways to improve prediction accuracy when missing data in some predictors without completely discarding the entire observation.
Read the full article, click here.
@MATLAB: “Learn how to improve prediction accuracy in the presence of missing data!
Learn about MATLAB support for machine learning. Resources include examples, documentation, and code describing different machine learning algorithms.
Classification in the presence of missing data