Kickstarting R - Adding lines to a plot

Many lines that are added to plots are just straight lines that span the plot. abline() is a good choice for this type of line. Say that we wished to add a vertical line at 2.5 on the x axis to the plot to divide the women who completed high school from those who didn't.

> abline(v=2.5,col=3,lty=3)

This would produce a green, dotted, vertical line across the plot. To divide the other axis, say that age 33 was to be marked.

> abline(h=33,col=4,lty=2)

would draw a blue, dashed, horizontal line at 33 on the y axis. We can also display regression lines.

> abline(lm(infert$age~as.numeric(infert$educ)),col=2,lty=1)

This draws a solid, red line illustrating the regression of education on age.

Hypothetical distribution curves

Sometimes a hypothetical distribution curve for the data illustrated will give the viewer a better notion of how the distribution in the population might look (we sincerely hope). If you can write down the function that describes the distribution you think underlies the data, you can use curve() to add it to your plot. Using the airquality data, plot airquality$Ozone. Suppose you think that the probability of a given concentration of ozone on any day is described by two linear functions, one valid for the range 0 to 120, and the other for 120 and up.

> data(airquality)
> airhist<-hist(airquality$Ozone)
> curve(40-(x/3.3+1),from=0,to=120,add=T)
> curve(6.6-(x/30),from=120,to=180,add=T)

This might impress an uncritical audience, but it is completely fabricated. When you are at a loss for what the underlying distribution might be, it may be better to just smooth the data and plot the result.

> airhist<-hist(airquality$Ozone)
> airspline<-spline(airhist$counts)
> lines(rescale(airspline$x,range(airhist$mids)),airspline$y)

There are a number of smoothing algorithms available in R, including spline(). Producing smoothed curves for histogram() or barplot() is a common problem, partly because the horizontal axis on these plots is not scaled in an obvious way. As you can see, histogram() returns a list that contains the midpoints of the bars, as does barplot(). The function rescale() does a simple linear transformation of one vector of values into a new scale. In this case, the scaling was by about a factor of 20.

For more information, see An Introduction to R: Examining the distribution of a set of data.

Back to Table of Contents