Functions for descriptive statistics

Keon-Woong Moon

2020-01-26

You can make tables summarizing descriptive statistics easily with webr package.

Installation of packages

You have to install the latest versions of “webr” and “moonBook” packages from github.

if(!require(devtools)) install.packages("devtools")
devtools::install_github("cardiomoon/webr")
devtools::install_github("cardiomoon/moonBook")   # For examples
devtools::install_github("cardiomoon/rrtable")    # For reproducible research

Load packages

require(webr)
require(moonBook) # For data acs

Summarizing Frequencies

You can summmarize the frequencies easily with freqSummary() function. Also you can make a table summarizng frequencies with freqTable() function.

freqSummary(acs$Dx)
                Count Percent Valid Percent Cum Percent
NSTEMI          "153" "17.9"  "17.9"        "17.9"     
STEMI           "304" "35.5"  "35.5"        "53.3"     
Unstable Angina "400" "46.7"  "46.7"        "100.0"    
Sum             "857" "100.0" "100.0"       ""         
freqTable(acs$Dx)

rowname

Count

Percent

Valid Percent

Cum Percent

NSTEMI

153

17.9

17.9

17.9

STEMI

304

35.5

35.5

53.3

Unstable Angina

400

46.7

46.7

100.0

Sum

857

100.0

100.0

Ready for reproducible research

The freqTable() function returns an object of class “flextable”. With this object, you can make html, pdf, docx, pptx file easily.

[1] "flextable"

Frequency table for a continuous variable

You can make the frequency table for a continuous variable. In this time, you can get a long table.

rowname

Count

Percent

Valid Percent

Cum Percent

10.4

2

6.2

6.2

6.2

13.3

1

3.1

3.1

9.4

14.3

1

3.1

3.1

12.5

14.7

1

3.1

3.1

15.6

15

1

3.1

3.1

18.8

15.2

2

6.2

6.2

25.0

15.5

1

3.1

3.1

28.1

15.8

1

3.1

3.1

31.2

16.4

1

3.1

3.1

34.4

17.3

1

3.1

3.1

37.5

17.8

1

3.1

3.1

40.6

18.1

1

3.1

3.1

43.8

18.7

1

3.1

3.1

46.9

19.2

2

6.2

6.2

53.1

19.7

1

3.1

3.1

56.2

21

2

6.2

6.2

62.5

21.4

2

6.2

6.2

68.8

21.5

1

3.1

3.1

71.9

22.8

2

6.2

6.2

78.1

24.4

1

3.1

3.1

81.2

26

1

3.1

3.1

84.4

27.3

1

3.1

3.1

87.5

30.4

2

6.2

6.2

93.8

32.4

1

3.1

3.1

96.9

33.9

1

3.1

3.1

100.0

Sum

32

100.0

100.0

Frequency table for two categorical variables

You can make a table summarizing the independency of two categorical variables.

x2Table(acs,Dx,sex)

rowname

Female

Male

Total

NSTEMI

50
(32.70%)

103
(67.30%)

153
(100 %)

STEMI

84
(27.60%)

220
(72.40%)

304
(100 %)

Unstable Angina

153
(38.20%)

247
(61.80%)

400
(100 %)

Total

287
(33.50%)

570
(66.50%)

857
(100 %)

Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

You can make a table with columnwise percentages.

x2Table(acs,Dx,sex,margin=2)

rowname

Female

Male

Total

NSTEMI

50
(17.40%)

103
(18.10%)

153
(17.90%)

STEMI

84
(29.30%)

220
(38.60%)

304
(35.50%)

Unstable Angina

153
(53.30%)

247
(43.30%)

400
(46.70%)

Total

287
(100 %)

570
(100 %)

857
(100 %)

Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

You can hide pecentages.

x2Table(acs,Dx,sex,show.percent=FALSE)

rowname

Female

Male

Total

NSTEMI

50

103

153

STEMI

84

220

304

Unstable Angina

153

247

400

Total

287

570

857

Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

Numerical summary

Numerical summary of a vector

You can make a numerical summary table with numSummary() function. If you use the numSummary() function to a continuous vector, you can get the following summary. This function uses psych::describe function

# A tibble: 1 x 12
      n  mean    sd median trimmed   mad   min   max range   skew kurtosis    se
  <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl>
1   857  63.3  11.7     64    63.6  13.3    28    91    63 -0.175   -0.566 0.400

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

857.00

63.31

11.70

64.00

63.56

13.34

28.00

91.00

63.00

-0.18

-0.57

0.40

Numerical summary of a data.frame or a tibble

You can make a numerical summary of a data.frame. The numSummary function uses is.numeric function to select numeric columns and make a numeric summary.

# A tibble: 9 x 13
  vars      n  mean    sd median trimmed   mad   min   max range   skew kurtosis
  <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 age     857  63.3 11.7    64      63.6 13.3   28    91    63   -0.175  -0.566 
2 EF      723  55.8  9.62   58.1    56.8  7.86  18    79    61   -0.978   1.11  
3 heig…   764 163.   9.08  165     164.   7.41 130   185    55   -0.440  -0.0145
4 weig…   766  64.8 11.4    65      64.5 10.4   30   112    82    0.336   0.444 
5 BMI     764  24.3  3.35   24.2    24.2  3.01  15.6  41.4  25.8  0.668   2.12  
6 TC      834 185.  47.8   183     184.  43.0   25   493   468    0.737   3.77  
7 LDLC    833 117.  41.1   114     115.  40.0   15   366   351    0.787   2.33  
8 HDLC    834  38.2 11.1    38      38.0 10.4    4    89    85    0.366   1.46  
9 TG      842 125.  90.9   106.    111.  60.0   11   877   866    3.02   14.9   
# … with 1 more variable: se <dbl>

rowname

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

1

age

857.00

63.31

11.70

64.00

63.56

13.34

28.00

91.00

63.00

-0.18

-0.57

0.40

2

EF

723.00

55.83

9.62

58.10

56.77

7.86

18.00

79.00

61.00

-0.98

1.11

0.36

3

height

764.00

163.18

9.08

165.00

163.52

7.41

130.00

185.00

55.00

-0.44

-0.01

0.33

4

weight

766.00

64.84

11.36

65.00

64.55

10.38

30.00

112.00

82.00

0.34

0.44

0.41

5

BMI

764.00

24.28

3.35

24.16

24.16

3.01

15.62

41.42

25.80

0.67

2.12

0.12

6

TC

834.00

185.20

47.77

183.00

183.76

43.00

25.00

493.00

468.00

0.74

3.77

1.65

7

LDLC

833.00

116.58

41.09

114.00

114.62

40.03

15.00

366.00

351.00

0.79

2.33

1.42

8

HDLC

834.00

38.24

11.09

38.00

37.95

10.38

4.00

89.00

85.00

0.37

1.46

0.38

9

TG

842.00

125.24

90.85

105.50

111.29

60.05

11.00

877.00

866.00

3.02

14.91

3.13

Use of dplyr::group_by() and dplyr::select() function to summarize

You can use dplyr::select() function to select variables to summarize.

# A tibble: 2 x 13
  vars      n  mean    sd median trimmed   mad   min   max range   skew kurtosis
  <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 age     857  63.3 11.7    64      63.6 13.3     28    91    63 -0.175   -0.566
2 EF      723  55.8  9.62   58.1    56.8  7.86    18    79    61 -0.978    1.11 
# … with 1 more variable: se <dbl>

rowname

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

1

age

857.00

63.31

11.70

64.00

63.56

13.34

28.00

91.00

63.00

-0.18

-0.57

0.40

2

EF

723.00

55.83

9.62

58.10

56.77

7.86

18.00

79.00

61.00

-0.98

1.11

0.36

You can use dplyr::group_by() and dplyr::select() function to select variables to summarize by group.

# A tibble: 4 x 14
# Groups:   sex [2]
  sex   vars      n  mean    sd median trimmed   mad   min   max range    skew
  <chr> <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Male  age     570  60.6 11.2    61      60.6 11.9   28      91  63   -0.0148
2 Male  EF      483  55.6  9.40   57.3    56.4  8.01  18      79  61   -0.789 
3 Fema… age     287  68.7 10.7    70      69.4 10.4   39      90  51   -0.593 
4 Fema… EF      240  56.3 10.1    59.2    57.6  7.19  18.4    75  56.6 -1.30  
# … with 2 more variables: kurtosis <dbl>, se <dbl>

sex

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

Male

age

570.00

60.61

11.23

61.00

60.65

11.86

28.00

91.00

63.00

-0.01

-0.36

0.47

Male

EF

483.00

55.62

9.40

57.30

56.38

8.01

18.00

79.00

61.00

-0.79

0.76

0.43

Female

age

287.00

68.68

10.73

70.00

69.43

10.38

39.00

90.00

51.00

-0.59

-0.26

0.63

Female

EF

240.00

56.27

10.06

59.25

57.57

7.19

18.40

75.00

56.60

-1.30

1.70

0.65

You can summarize by multiple groups.

# A tibble: 12 x 15
# Groups:   sex, Dx [6]
   sex   Dx    vars      n  mean    sd median trimmed   mad   min   max range
   <chr> <chr> <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
 1 Male  STEMI age     220  59.4 11.7    59.5    59.4 11.1   30    86    56  
 2 Male  STEMI EF      195  52.4  8.90   54      52.9  8.45  18    73.6  55.6
 3 Fema… STEMI age      84  69.1 10.4    70      70.0 10.4   42    89    47  
 4 Fema… STEMI EF       77  52.3 10.9    55.7    53.7  9.04  18.4  67.1  48.7
 5 Male  NSTE… age     103  61.1 11.6    59      61.3 13.3   28    85    57  
 6 Male  NSTE… EF       94  55.1  9.42   58      55.9  7.12  21.8  74    52.2
 7 Fema… Unst… age     153  67.7 10.7    70      68.3  8.90  39    90    51  
 8 Fema… Unst… EF      118  59.4  8.76   61.1    60.8  5.49  22    71.9  49.9
 9 Male  Unst… age     247  61.4 10.6    61      61.4 10.4   35    91    56  
10 Male  Unst… EF      194  59.1  8.67   60      60.2  5.93  24.7  79    54.3
11 Fema… NSTE… age      50  70.9 11.4    74.5    71.9  8.90  42    88    46  
12 Fema… NSTE… EF       45  54.8  9.10   57      55.3  9.79  36.8  75    38.2
# … with 3 more variables: skew <dbl>, kurtosis <dbl>, se <dbl>

sex

Dx

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

Male

STEMI

age

220.00

59.43

11.72

59.50

59.43

11.12

30.00

86.00

56.00

0.00

-0.55

0.79

Male

STEMI

EF

195.00

52.37

8.90

54.00

52.88

8.45

18.00

73.60

55.60

-0.62

0.53

0.64

Female

STEMI

age

84.00

69.11

10.36

70.00

70.04

10.38

42.00

89.00

47.00

-0.65

-0.09

1.13

Female

STEMI

EF

77.00

52.32

10.94

55.70

53.72

9.04

18.40

67.10

48.70

-1.17

1.01

1.25

Male

NSTEMI

age

103.00

61.15

11.57

59.00

61.28

13.34

28.00

85.00

57.00

-0.11

-0.53

1.14

Male

NSTEMI

EF

94.00

55.08

9.42

58.00

55.86

7.12

21.80

74.00

52.20

-0.83

0.57

0.97

Female

Unstable Angina

age

153.00

67.72

10.67

70.00

68.33

8.90

39.00

90.00

51.00

-0.54

-0.34

0.86

Female

Unstable Angina

EF

118.00

59.40

8.76

61.10

60.79

5.49

22.00

71.90

49.90

-1.86

4.06

0.81

Male

Unstable Angina

age

247.00

61.44

10.57

61.00

61.41

10.38

35.00

91.00

56.00

0.07

-0.15

0.67

Male

Unstable Angina

EF

194.00

59.14

8.67

60.00

60.15

5.93

24.70

79.00

54.30

-1.25

2.54

0.62

Female

NSTEMI

age

50.00

70.88

11.35

74.50

71.88

8.90

42.00

88.00

46.00

-0.72

-0.34

1.61

Female

NSTEMI

EF

45.00

54.85

9.10

57.00

55.26

9.79

36.80

75.00

38.20

-0.32

-0.83

1.36

For reproducible research

You can use package rrtable for reproducible research.

require(rrtable)
type=c("table","table")
title=c("Frequency Table","Numerical Summary")
code=c("freqTable(acs$Dx)","acs %>% group_by(sex) %>% select(EF,age) %>% numSummaryTable")
data=data.frame(type,title,code,stringsAsFactors = FALSE)
data2pptx(data)
[1] "/var/folders/ft/_w6lflrs4mz4f8n_r5w_h7vh0000gn/T/Rtmp5bd86I/Report.pptx"
data2docx(data)
[1] "/var/folders/ft/_w6lflrs4mz4f8n_r5w_h7vh0000gn/T/Rtmp5bd86I/Report.docx"