Welcome!!: Caret R Package - classification and regression training

Caret R Package - classification and regression training
(http://topepo.github.io/caret/index.html)

The caret package (short for classification and regression training) contains functions to streamline the model training process for complex regression and classification problems. The caret package is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for:

data splitting
pre-processing
feature selection
model tuning using resampling
variable importance estimation

Following are the steps to install caret package (it has many dependencies).

Install "‘minqa’, ‘RcppEigen’, ‘scales’, ‘lme4’, ‘ggplot2’, ‘reshape2’, ‘BradleyTerry2’" one by one.

Step 1: install.packages (“minqa”)

Step 2: install.packages (“RcppEigen”)

Step 3: install.packages(“lme4”)

Step 4: install.packages(“ggplot2”)

Step 5: install.packages(“reshape2”)

Step 6: install.packages(“BradleyTerry2”)

Step 7: install.packages("caret", dependencies = c("Depends", "Suggests"))

Example of predicting using “glm” method:

library(caret)

library(kernlab)

data(spam)

inTrain <- createDataParition(y=spam$type,p=0.75,list=FALSE) #partition 75% training and 25% testing

training <- spam[inTrain, ]

testing <- spam[-inTrain, ]

> dim(training)

[1] 3451 58

> dim(testing)

[1] 1150 58

> set.seed(1234)

> fit<-train(type~., data=training, method="glm")

Loading required namespace: e1071

There were 26 warnings (use warnings() to see them)

> fit

Generalized Linear Model

3451 samples

57 predictor

2 classes: 'nonspam', 'spam'

No pre-processing

Resampling: Bootstrapped (25 reps)

Summary of sample sizes: 3451, 3451, 3451, 3451, 3451, 3451, ...

Resampling results

Accuracy Kappa Accuracy SD Kappa SD

0.9207482 0.8330317 0.008444636 0.01755059

> fit$finalModel

Call:  NULL

Coefficients:

      (Intercept)               make

       -1.515e+00         -3.393e-01

          address                all

       -1.482e-01          9.183e-02

            num3d                our

        2.531e+00          5.661e-01

             over             remove

        4.999e-01          2.612e+00

         internet              order

        5.661e-01          8.957e-01

             mail            receive

        9.189e-02         -2.957e-01

             will             people

       -1.321e-01         -2.583e-01

           report          addresses

        1.068e-01          1.121e+00

             free           business

        9.468e-01          1.080e+00

            email                you

        1.910e-02          8.164e-02

           credit               your

        1.387e+00          2.326e-01

             font             num000

        3.465e-01          3.525e+00

            money                 hp

        1.376e+00         -1.982e+00

              hpl             george

       -1.369e+00         -9.258e+00

           num650                lab

        9.965e-01         -2.143e+00

             labs             telnet

       -6.141e-01         -1.234e-01

           num857               data

        2.369e+00         -9.245e-01

           num415              num85

        1.111e+00         -2.231e+00

       technology            num1999

        7.566e-01          8.572e-02

            parts                 pm

       -5.501e-01         -1.005e+00

           direct                 cs

       -2.563e-01         -4.692e+01

          meeting           original

       -2.173e+00         -9.787e-01

          project                 re

       -1.610e+00         -7.536e-01

              edu              table

       -1.483e+00         -3.167e+00

       conference      charSemicolon

       -4.491e+00         -1.623e+00

 charRoundbracket  charSquarebracket

        1.356e-01         -6.342e-01

  charExclamation         charDollar

        2.497e-01          5.745e+00

         charHash         capitalAve

        2.223e+00         -1.661e-03

      capitalLong       capitalTotal

        8.800e-03          7.263e-04

Degrees of Freedom: 3450 Total (i.e. Null);  3393 Residual

Null Deviance:     4628

Residual Deviance: 1297        AIC: 1413

PREDICTIONS:

> predictions<- predict(fit, newdata=testing)

> predictions

   [1] spam    spam    spam    spam    spam

   [6] spam    nonspam spam    spam    spam

  [11] spam    spam    spam    spam    spam

  [16] spam    nonspam spam    spam    spam

  [21] spam    spam    spam    spam    spam

  [26] nonspam spam    spam    spam    spam

  [31] nonspam spam    spam    spam    spam

  [36] spam    spam    spam    spam    spam

  [41] spam    spam    spam    spam    spam

  [46] spam    spam    spam    spam    spam

  [51] spam    spam    spam    nonspam spam

  [56] spam    spam    spam    spam    spam

  [61] spam    spam    spam    spam    spam

  [66] spam    spam    spam    spam    spam

  [71] spam    spam    spam    spam    nonspam

> confusionMatrix(predictions,testing$type)

Confusion Matrix and Statistics

Reference

Prediction nonspam spam

nonspam 659 50

spam 38 403

Accuracy : 0.9235

95% CI : (0.9066, 0.9382)

No Information Rate : 0.6061

P-Value [Acc > NIR] : <2e-16

Kappa : 0.839

Mcnemar's Test P-Value : 0.241

Sensitivity : 0.9455

Specificity : 0.8896

Pos Pred Value : 0.9295

Neg Pred Value : 0.9138

Prevalence : 0.6061

Detection Rate : 0.5730

Detection Prevalence : 0.6165

Balanced Accuracy : 0.9176

'Positive' Class : nonspam

Welcome!!

Tabs

Wednesday, 11 March 2015

Caret R Package - classification and regression training

No comments:

Post a Comment