Making Receiver Operating Characteristic (ROC) curve in R

3 min readOct 9, 2023

The ROC curve is a graphical representation that illustrates the performance of a binary classifier across different classification thresholds. It is widely used in machine learning and statistics to evaluate the predictive power of a model and determine the trade-off between sensitivity and specificity.

To demonstrate the ROC curve analysis, we will use the mtcars dataset, which contains information about various car models. We will build a logistic regression model to predict whether a car has high or low fuel efficiency based on its characteristics. We will be using the pROC package for ROC curve analysis.

library(pROC)

Let’s start by loading the mtcars dataset and exploring its structure:

data(mtcars)

The mtcars dataset contains various variables related to car specifications, including the dependent variable “mpg” (miles per gallon). For the purpose of this analysis, let’s convert the “mpg” variable into a binary outcome by categorizing cars with mpg greater than the median as high fuel efficiency and the rest as low fuel efficiency:

 
mtcars$mpg_cat <- ifelse(mtcars$mpg > median(mtcars$mpg), "High", "Low")
table(mtcars$mpg_cat)

High  Low 
  15   17

Building the Logistic Regression Model

> model <- glm(mpg_binary ~ disp + hp, data = mtcars, family = "binomial")
> summary(model)

Call:
glm(formula = mpg_binary ~ disp + hp, family = "binomial", data = mtcars)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.58261  -0.10067  -0.01016   0.29146   1.75376  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  8.96510    3.78501   2.369   0.0179 *
disp        -0.02833    0.01449  -1.955   0.0506 .
hp          -0.02685    0.02586  -1.038   0.2993  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 44.236  on 31  degrees of freedom
Residual deviance: 12.327  on 29  degrees of freedom
AIC: 18.327

Number of Fisher Scoring iterations: 7

The logistic regression model provides insights into the relationship between the predictors and the outcome variable.

Now, let’s move on to evaluating the model’s performance using the ROC curve analysis. To generate the ROC curve and calculate the area under the curve (AUC), we will use the `roc` function from the `pROC` package.

predicted_probs <- predict(model, type = "response")
roc_obj <- roc(mtcars$mpg_cat, predicted_probs)

The roc function takes two arguments: the true outcome variable and the predicted probabilities from the logistic regression model. The roc function returns an object of class “roc” that represents the ROC curve.

Now, to visualize the ROC curve, we can use the `plot` function:

plot(roc_obj, main = "ROC Curve", print.auc = TRUE, auc.polygon = TRUE, grid = TRUE)

The `plot` function generates a plot of the ROC curve.

Interpreting the ROC Curve

The ROC curve provides a graphical representation of the trade-off between sensitivity and specificity. The closer the curve is to the top-left corner, the better the model’s performance. The area under the ROC curve (AUC) is a measure of the model’s discriminative power, with a value between 0.5 and 1. A higher AUC indicates better predictive performance.

In this case of an AUC of 0.980, it means that the model has a high ability to correctly discriminate between cars with high and low fuel efficiency based on the predictors (engine displacement and horsepower) used in the logistic regression.

In short, the model’s AUC of 0.980 suggests that when comparing two randomly selected cars, one with high fuel efficiency and the other with low fuel efficiency, the model will correctly rank them 98% of the time. This indicates a strong ability to distinguish between the two categories.

Happy ROCking!

Making Receiver Operating Characteristic (ROC) curve in R

Building the Logistic Regression Model

Interpreting the ROC Curve

Written by infoart.ca

No responses yet