Caret R Package - classification and regression training
(http://topepo.github.io/caret/index.html)
(http://topepo.github.io/caret/index.html)
The caret package (short for classification
and regression
training) contains functions to
streamline the model training process for complex regression and classification
problems. The caret package is a set of functions that attempt to streamline the process for creating
predictive models. The package contains tools for:
- data
     splitting
- pre-processing
- feature
     selection
- model
     tuning using resampling
- variable
     importance estimation
Following are the steps to install caret
package (it has many dependencies).
Install "‘minqa’,
‘RcppEigen’, ‘scales’, ‘lme4’, ‘ggplot2’, ‘reshape2’, ‘BradleyTerry2’" one
by one.
Step 1: install.packages (“minqa”)
Step 2: install.packages (“RcppEigen”)
Step 3: install.packages(“lme4”)
Step 4: install.packages(“ggplot2”)
Step 5: install.packages(“reshape2”)
Step 6: install.packages(“BradleyTerry2”)
Step 7:  install.packages("caret",
dependencies = c("Depends", "Suggests"))
Example of
predicting using “glm” method:
library(caret)
library(kernlab)
data(spam)
inTrain <- createDataParition(y=spam$type,p=0.75,list=FALSE)
  #partition 75% training and 25% testing
training <- spam[inTrain, ]
testing <- spam[-inTrain, ]
| 
> dim(training) 
[1] 3451   58 
> dim(testing) 
[1] 1150   58 | |
| 
 | 
| 
> set.seed(1234) 
> fit<-train(type~., data=training, method="glm") 
Loading required namespace: e1071 
There were 26 warnings (use
  warnings() to see them) 
> fit 
Generalized Linear Model  
3451 samples 
 
  57 predictor 
  
  2 classes: 'nonspam', 'spam'  
No pre-processing 
Resampling: Bootstrapped (25 reps)  
Summary of sample sizes: 3451, 3451,
  3451, 3451, 3451, 3451, ...  
Resampling results 
 
  Accuracy   Kappa      Accuracy SD  Kappa SD 
   
 
  0.9207482  0.8330317  0.008444636 
  0.01755059 | |
| 
 | 
> fit$finalModel
  Call:  NULL
  Coefficients:
      (Intercept)               make  
       -1.515e+00         -3.393e-01  
          address                all  
       -1.482e-01          9.183e-02  
            num3d                our  
        2.531e+00          5.661e-01  
             over             remove  
        4.999e-01          2.612e+00  
         internet              order  
        5.661e-01          8.957e-01  
             mail            receive  
        9.189e-02         -2.957e-01  
             will             people  
       -1.321e-01         -2.583e-01  
           report          addresses  
        1.068e-01          1.121e+00  
             free           business  
        9.468e-01          1.080e+00  
            email                you  
        1.910e-02          8.164e-02  
           credit               your  
        1.387e+00          2.326e-01  
             font             num000  
        3.465e-01          3.525e+00  
            money                 hp  
        1.376e+00         -1.982e+00  
              hpl             george  
       -1.369e+00         -9.258e+00  
           num650                lab  
        9.965e-01         -2.143e+00  
             labs             telnet  
       -6.141e-01         -1.234e-01  
           num857               data  
        2.369e+00         -9.245e-01  
           num415              num85  
        1.111e+00         -2.231e+00  
       technology            num1999  
        7.566e-01          8.572e-02  
            parts                 pm  
       -5.501e-01         -1.005e+00  
           direct                 cs  
       -2.563e-01         -4.692e+01  
          meeting           original  
       -2.173e+00         -9.787e-01  
          project                 re  
       -1.610e+00         -7.536e-01  
              edu              table  
       -1.483e+00         -3.167e+00  
       conference      charSemicolon  
       -4.491e+00         -1.623e+00  
 charRoundbracket  charSquarebracket  
        1.356e-01         -6.342e-01  
  charExclamation         charDollar  
        2.497e-01          5.745e+00  
         charHash         capitalAve  
        2.223e+00         -1.661e-03  
      capitalLong       capitalTotal  
        8.800e-03          7.263e-04  
  Degrees of Freedom: 3450 Total (i.e. Null);  3393 Residual
Null Deviance:     4628 
Residual Deviance: 1297        AIC: 1413
PREDICTIONS:
> predictions<- predict(fit, newdata=testing)
> predictions
   [1] spam    spam    spam    spam    spam   
   [6] spam    nonspam spam    spam    spam   
  [11] spam    spam    spam    spam    spam   
  [16] spam    nonspam spam    spam    spam   
  [21] spam    spam    spam    spam    spam   
  [26] nonspam spam    spam    spam    spam   
  [31] nonspam spam    spam    spam    spam   
  [36] spam    spam    spam    spam    spam   
  [41] spam    spam    spam    spam    spam   
  [46] spam    spam    spam    spam    spam   
  [51] spam    spam    spam    nonspam spam   
  [56] spam    spam    spam    spam    spam   
  [61] spam    spam    spam    spam    spam   
  [66] spam    spam    spam    spam    spam   
  [71] spam    spam    spam    spam    nonspam
| 
> confusionMatrix(predictions,testing$type) 
Confusion Matrix and Statistics 
 
          Reference 
Prediction nonspam spam 
  
  nonspam     659   50 
  
  spam         38  403 
               Accuracy : 0.9235           
                 95% CI : (0.9066, 0.9382) 
   
  No Information Rate : 0.6061    
        
   
  P-Value [Acc > NIR] : <2e-16           
                  Kappa : 0.839            
 Mcnemar's Test P-Value : 0.241            
            Sensitivity : 0.9455           
            Specificity : 0.8896           
         Pos Pred Value : 0.9295           
         Neg Pred Value : 0.9138           
             Prevalence : 0.6061           
         Detection Rate : 0.5730           
  
  Detection Prevalence : 0.6165           
     
  Balanced Accuracy : 0.9176         
   
      
  'Positive' Class : nonspam        
   | |
| 
 |