Epidemiology Tools on the Net

Selected examples

[EpiTools Home]

Problem: Constructing epidemic curves (Part 1 of 2)

Constructing epidemic curves is a common and very important practice in epidemiology. Epidemic curves are used to monitor disease occurrence, to detect outbreaks, to generate hypotheses about the cause of an outbreak, to monitor the impact of intervention efforts, and to predict the course of an epidemic. Simple epidemic curves can be constructed using a spreadsheet. For spreadsheet examples, see the excellent tutorial on constructing epidemic curves by the North Carolina Center for Public Health Preparedness.

However, if you need to construct numerous epidemic curves because you are monitoring several geographic sites; to update frequently rapidly evolving data; to manipulate efficiently large data sets; to plot high-quality, publication-ready graphics; or to automate the production of plots, then R is available to do the job. In the example that follows we take you through a step-by-step, iterative process of constructing epidemic curves for West Nile Virus infections in California as of December 14, 2004. Once the R code is created it becomes trivial to re-run the code with updated data.

Solution: 'epicurve' functions

The R code that follows appears as it would in your text editor. If you have 'epitools' installed and loaded in R on your computer, you can paste this code into your R program.

Step 1. Read in data set and try epicurve.dates function.


library(epitools)
wdat <- read.table("http://www.medepi.net/data/wnv/wnv2004-12-14.txt",
                    sep = ",", header = TRUE, as.is = TRUE)

##fig1
epicurve.dates(wdat$date.onset)

Step 2. Change default axis labeling.


##suppress default axis labels and then re-label
##fig2
xv <- epicurve.dates(wdat$date.onset, axisnames = FALSE)
axis(1, at = xv$xvals, labels = xv$cmday, tick = FALSE, line = 0)
axis(1, at = xv$xvals, labels = xv$cmonth, tick = FALSE, line = 1)
      

Step 3. Using colors.plot function, add some color to this plot. (See next example to see how colors.plot function works.) Also, suppress default x-axis labels and then specify the x-axis labels precisely.


##fig3
xv <- epicurve.dates(wdat$date.onset,
                     axisnames = FALSE, col = "royalblue1")
dd <- xv$cmday==1 | xv$cmday==15
dd2 <- xv$cmday==15
axis(1, at = xv$xvals[dd], labels = xv$cmday[dd], tick = FALSE, line = 0)
axis(1, at = xv$xvals[dd], labels = xv$cmonth[dd], tick = FALSE, line = 1)
      

Step 4. Try epicurve.weeks function, stratify by clinical syndrome, add default legend, and add titles, etc.


##fig4
xv <- epicurve.weeks(wdat$date.onset, strata = wdat$syndrome,
                     min.date = "2004-01-01", max.date = "2004-12-31",
                     axisnames = FALSE,
                     legend.text = TRUE)
axis(1, at = xv$xvals, labels = xv$cweek, tick = FALSE, 
     line = -0.5, cex.axis = 0.8)
axis(1, at = xv$xvals, labels = xv$cmonth, tick = FALSE, 
     line = 0.5, cex.axis = 0.8)
axis(1, at = median(xv$xvals), labels = "Disease Week & Calendar Month, 2004", 
     tick = FALSE, line = 1.5)
title(main = "West Nile Virus Human Cases Reported in California
              by Disease Week as of December 14, 2004", ylab = "Cases")
      

Step 5. Use colorbrewer.display function to select sequential colors for 3 categories (similar to the sequential colors above).


colorbrewer.display(3, "seq")
      

Step 6. Use colorbrewer.palette function to create a palette and plot epidemic curve. Try red-like colors first.


seq3.r <- colorbrewer.palette(3, "seq", "r") ##red-like
seq3.a <- colorbrewer.palette(3, "seq", "a") ##blue-like

##Re-level syndrome categories using 'factor' function
##fig5
syndrome <- factor(wdat$syndrome, levels = c("WNND", "WNF", "Unknown"))
xv <- epicurve.weeks(wdat$date.onset, strata = syndrome,
                     min.date = "2004-01-01", max.date = "2004-12-31",
                     axisnames = FALSE, col = seq3.r[3:1],
                     legend.text = TRUE)
title(main = "West Nile Virus Human Cases Reported in California
              by Disease Week as of December 14, 2004", ylab = "Cases")
axis(1, at = xv$xvals, labels = xv$cweek, tick = FALSE, 
     line = -0.5, cex.axis = 0.8)
axis(1, at = xv$xvals, labels = xv$cmonth, tick = FALSE, 
     line = 0.5, cex.axis = 0.8)
axis(1, at = median(xv$xvals), labels = "Disease Week & Calendar Month, 2004", 
     tick = FALSE, line = 1.5)
      

Step 7. Try blue-like colors and add annotations to the plot.


##fig6
syndrome <- factor(wdat$syndrome, levels = c("WNND", "WNF", "Unknown"))
xv <- epicurve.weeks(wdat$date.onset, strata = syndrome,
                     min.date = "2004-01-01", max.date = "2004-12-31",
                     axisnames = FALSE, col = seq3.a[3:1],
                     legend.text = TRUE)
title(main = "West Nile Virus Human Cases Reported in California
              by Disease Week as of December 14, 2004", ylab = "Cases")
axis(1, at = xv$xvals, labels = xv$cweek, tick = FALSE, 
     line = -0.5, cex.axis = 0.8)
axis(1, at = xv$xvals, labels = xv$cmonth, tick = FALSE, 
     line = 0.5, cex.axis = 0.8)
axis(1, at = median(xv$xvals), labels = "Disease Week & Calendar Month, 2004", 
     tick = FALSE, line = 1.5)

##add arrows and annotations
zdates <- as.week(c("2004-02-24", "2004-04-14",
                    "2004-05-17", "2004-06-20"))
znames <- c("+ Bird\n2/24","+ Mosquito\n4/14",
            "+ Chicken\n5/17","+ Horse\n6/20")
x0 <- c(xv$xvals[xv$cstratum==zdates$stratum[1]],
        xv$xvals[xv$cstratum==zdates$stratum[2]],
        xv$xvals[xv$cstratum==zdates$stratum[3]],
        xv$xvals[xv$cstratum==zdates$stratum[4]])
y0 <- c(10,20,30,40)
x1 <- c(xv$xvals[xv$cstratum==zdates$stratum[1]],
        xv$xvals[xv$cstratum==zdates$stratum[2]],
        xv$xvals[xv$cstratum==zdates$stratum[3]],
        xv$xvals[xv$cstratum==zdates$stratum[4]])
y1 <- c(0, 0, 6, 18) 
arrows(x0, y0, x1, y1, length = 0.15)
text(x1, y0 + 5, znames, cex=.8)
      

Step 8.Try epicurve.months function. Which epidemic curve do you like best?


##fig7
syndrome <- factor(wdat$syndrome, levels = c("WNND", "WNF", "Unknown"))
xv <- epicurve.months(wdat$date.onset, strata = syndrome,
                     min.date = "2004-01-01", max.date = "2004-12-31",
                     axisnames = FALSE, col = seq3.a[3:1],
                     legend.text = TRUE)
title(main = "West Nile Virus Human Cases Reported in California
              by Month as of December 14, 2004", ylab = "Cases")
axis(1, at = xv$xvals, labels = xv$cmonth, tick = FALSE, line = -0.5, cex.axis = 0.8)
axis(1, at = median(xv$xvals), labels = "Calendar 2004", tick = FALSE, line = 0.5)
      

Problem: Constructing epidemic curves (Part 2 of 2)

Constructing epidemic curves can be challenging when the illness onset times are measured in 1-hour or 1/2-hour units and the events span several days. The Center for Disease Control and Prevention (CDC) Oswego outbreak module has been used for many years to train epidemiologists in conducting an outbreak investigation of a foodborne illness. Although this is considered an introductory-level training module, constructing this epidemic curve using a computer requires some work. To convince yourself, load this data into your favorite data management software/statistical package and generate the plots belows. This is the raw data as it appears in CDC publications. You'll have to do some data cleaning.

To make the creation of these types of plots easier, we wrote the 'epicurve.hours' function.

Solution: 'epicurve.hours' function

The R code that follows appears as it would in your text editor. If you have 'epitools' installed and loaded in R on your computer, you can paste this code into your R program.

Step 1. Read in and prepare data.


library(epitools)
odat <- read.table("http://www.medepi.net/data/oswego/oswego.txt",
                   sep = "", header = TRUE, as.is = TRUE)
odat[1:10,]
## create vector with meal date and time
mdt <- paste("4/18/1940", odat$meal.time)
## convert into standard date and time
meal.dt <- strptime(mdt, "%m/%d/%Y %I:%M %p")
## create vector with onset date and time
odt <- paste(paste(odat$onset.date,"/1940",sep=""), odat$onset.time)
## convert into standard date and time
onset.dt <- strptime(odt, "%m/%d/%Y %I:%M %p")

Step 2. Plot epidemic curve using 1-hour categories.


##using epicurve.hours function
col3seq.d <- rev(colorbrewer.palette(3,"seq","d"))

xv <- epicurve.hours(onset.dt, "1940-04-18 12:00:00", "1940-04-19 12:00:00",
                     axisnames = FALSE, axes = FALSE, ylim = c(0,11),
                     col = col3seq.d[2], segments =  TRUE,
                     strata = odat$sex)

hh <- xv$chour12==3 | xv$chour12== 6 | xv$chour12== 9
hh2 <- xv$chour12==12
hh3 <- xv$chour12==1
hlab <- paste(xv$chour12,xv$campm2,sep="")
hlab2 <- paste(xv$cmonth,xv$cmday)
axis(1, at = xv$xval[hh], labels = xv$chour12[hh], tick = FALSE, line = -.2)
axis(1, at = xv$xval[hh2], labels = hlab[hh2], tick = FALSE, line = -.2)
axis(1, at = xv$xval[hh3], labels = hlab2[hh3], tick = FALSE, line = 1.0)
axis(2, las = 1)
title(main = "Figure 1. Cases of Gastrointestinal Illness
by Symptom Onset Times (Hour Category)
Oswego County, New York, April 18-19, 2004",
      xlab = "Time of Onset",
      ylab = "Cases")

Step 2. Plot epidemic curve using 1/2-hour categories.


##half-hour
xv <- epicurve.hours(onset.dt, "1940-04-18 12:00:00", "1940-04-19 12:00:00",
                     axisnames = FALSE, axes = FALSE, ylim = c(0,11),
                     col = col3seq.d[2], segments =  TRUE,
                     half.hour = TRUE, strata = odat$sex)
hh <- xv$chour12==3 | xv$chour12== 6 | xv$chour12== 9
hh2 <- xv$chour12==12
hh3 <- xv$chour12==1
hlab <- paste(xv$chour12,xv$campm2,sep="")
hlab2 <- paste(xv$cmonth,xv$cmday)
axis(1, at = xv$xval[hh], labels = xv$chour12[hh], tick = FALSE, line = -.2)
axis(1, at = xv$xval[hh2], labels = hlab[hh2], tick = FALSE, line = -.2)
axis(1, at = xv$xval[hh3], labels = hlab2[hh3], tick = FALSE, line = 1.0)
axis(2, las = 1)
title(main = "Figure 2. Cases of Gastrointestinal Illness
by Symptom Onset Times (1/2 Hour Category)
Oswego County, New York, April 18-19, 2004",
      xlab = "Time of Onset",
      ylab = "Cases")
      

Problem: Selecting default colors for graphics

Selecting good colors for your graphics is crucial to optimizing the display of epidemiologic data. R has tremendous capabilities for selecting colors (try 'help.search("colors")'). However, most of us just need a broad selection from which to choose without having to specify RGB numbers. R has 657 named colors ready for action. The new colors.plot function plots the 657 colors and allows you to interactively point to the colors that interest you, and then it reports the corresponding color names.

To really learn how to appropriately select colors for any type of display visit Cindy Brewer's ColorBrewer site.

Solution: 'color.plot' function


library(epitools)
> ##plot R's 657 named colors
> colors.plot()

> ##activate 'locator' to identify color names
> colors.plot(locator = TRUE) ##select 5 colors from plot
   x  y     color.names
1  3  2          brown1
2 12 21    springgreen2
3 22 14 lightgoldenrod2
4 16  2      cadetblue4
5 30  1           blue4


Problem: Exact confidence intervals for age-adjusted rates

To make valid comparisons between rates from different groups (e.g., geographic area, ethnicity), epidemiologists often adjust for differences in age distribution to remove the confounding affect of age. When the number of events or rates are very small (as is often the case for local area studies), the normal approximation method of calculating confidence intervals may give a negative number for the lower confidence limit. To avoid this common pitfall, one can approximate exact confidence intervals. The ageadjust.direct and the ageadjust.indirect functions implement these methods (Anderson 1998).

Below is a comparison of United States white male cancer mortality data for the years 1940 and 1960. We replicate the analysis provided in Selvin (2004)

References

Solution: 'ageadjust.direct' and 'ageadjust.indirect'


library(epitools)
> ##From Selvin (2004)
> ##enter data

> dth60 <- scan()
1: 141 926 1253 1080 1869 4891 14956 30888 41725 26501 5928
12: 
Read 11 items
> pop60 <- scan()
1: 1784033 7065148 15658730 10482916 9939972 10563872 9114202
8: 6850263 4702482 1874619 330915
12: 
Read 11 items
> dth40 <- scan()
1: 45 201 320 670 1126 3160 9723 17935 22179 13461 2238
12: 
Read 11 items
> pop40 <- scan()
1: 906897 3794573 10003544 10629526 9465330 8249558 7294330
8: 5022499 2920220 1019504 142532
12: 
Read 11 items
> ##calculate age-specific rates
> rate60 <- dth60/pop60
> rate40 <- dth40/pop40
> 
> #create array for display
> tab <- array(c(dth60, pop60, round(rate60*100000,1), dth40, pop40,
+ round(rate40*100000,1)),c(11,3,2))
> agelabs <- c("<1", "1-4", "5-14", "15-24", "25-34", "35-44", "45-54",
+ "55-64", "65-74", "75-84", "85+")
> dimnames(tab) <- list(agelabs,c("Deaths", "Population", "Rate"),
+ c("1960", "1940"))
> tab
, , 1960

      Deaths Population   Rate

<1       141    1784033    7.9
1-4      926    7065148   13.1
5-14    1253   15658730    8.0
15-24   1080   10482916   10.3
25-34   1869    9939972   18.8
35-44   4891   10563872   46.3
45-54  14956    9114202  164.1
55-64  30888    6850263  450.9
65-74  41725    4702482  887.3
75-84  26501    1874619 1413.7
85+     5928     330915 1791.4

, , 1940

      Deaths Population   Rate
<1        45     906897    5.0
1-4      201    3794573    5.3
5-14     320   10003544    3.2
15-24    670   10629526    6.3
25-34   1126    9465330   11.9
35-44   3160    8249558   38.3
45-54   9723    7294330  133.3
55-64  17935    5022499  357.1
65-74  22179    2920220  759.5
75-84  13461    1019504 1320.3
85+     2238     142532 1570.2

> 
> ##implement direct age standardization using 'ageadjust.direct'
> dsr <- ageadjust.direct(count = dth40, pop = pop40, stdpop = pop60)
>
> round(100000*dsr, 2) ##rate per 100,000 per year
crude.rate   adj.rate        lci        uci 
    119.53     139.25     138.21     140.29 
> 
> ##implement indirect age standardization using 'ageadjust.indirect'
> isr <- ageadjust.indirect(count = dth40, pop = pop40,
+                           stdcount = dth60, stdpop = pop60)

>
> round(isr$sir, 2)         ##standarized incidence ratio
observed      exp      sir      lci      uci 
71058.00 85556.89     0.83     0.82     0.84 
>
> round(100000*isr$rate, 1) ##rate per 100,000 per year
crude.rate   adj.rate        lci        uci 
     166.1      137.9      136.9      139.0 
>