6 Working with Tables in R | Data Analysis and Processing with R based on IBIS data (2024)

6.1 Intro

Tables are often essential for organzing and summarizing your data, especially with categorical variables. When creating a table in R, it considers your table as a specifc type of object (called “table”) which is very similar to a data frame. Though this may seem strange since datasets are stored as data frames, this means working with tables will be very easy since we have covered data frames in detail over the previous tutorials. In this chapter, we will discuss how to create various types of tables, and how to use various statistical methods to analyze tabular data. Throughout the chapter, the AOSI dataset will be used.

6.2 Creating Basic Tables: table() and xtabs()

A contingency table is a tabulation of counts and/or percentages for one or more variables. In R, these tables can be created using table() along with some of its variations. To use table(), simply add in the variables you want to tabulate separated by a comma. Note that table() does not have a data= argument like many other functions do (e.g., ggplot2 functions), so you much reference the variable using dataset$variable. Some examples are shown below. By default, missing values are excluded from the counts; if you want a count for these missing values you must specify the argument useNA=“ifany” or useNA=“always”. The below examples show how to use this function.

aosi_data <- read.csv("Data/cross-sec_aosi.csv", stringsAsFactors=FALSE, na.strings = ".")# Table for gendertable(aosi_data$Gender)
## ## Female Male ## 235 352
# Table for study sitetable(aosi_data$Study_Site)
## ## PHI SEA STL UNC ## 149 152 145 141
# Two-way table for gender and study sitetable(aosi_data$Gender, aosi_data$Study_Site) 
## ## PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88
# Notice order matters: 1st variable is row variable, 2nd variable is column variable# Let's try adding in the useNA argumenttable(aosi_data$Gender, aosi_data$Study_Site, useNA = "ifany") 
## ## PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88
table(aosi_data$Gender, aosi_data$Study_Site, useNA = "always") 
## ## PHI SEA STL UNC <NA>## Female 55 67 60 53 0## Male 94 85 85 88 0## <NA> 0 0 0 0 0
# Let's save one of these tables to use for later examplestable_ex <- table(aosi_data$Gender, aosi_data$Study_Site) 

Now let’s add row and column labels to the gender by study site table. For a table object, these labels are referred to as “dimnames” (i.e., dimension names) which can be accessed using the dimnames() function. Note that this is similar to the names() function with lists, except that now our table has multiple dimensions, each of which can have its own set of names. For a table, dimnames are stored as a list, with each list entry holding the group labels for the variable corresponding to that dimension. The name for each of these list entries will specify the actual label to be used in the table. By default, these names are blank, hence why the default table has no row and column labels. We can change this by specifying these names, using names() with dimnames().

dimnames(table_ex)
## [[1]]## [1] "Female" "Male" ## ## [[2]]## [1] "PHI" "SEA" "STL" "UNC"
# we see the group labels. Note that each set of group labels in unnamed (blanks next to [[1]] and [[2]]). This is more clearly see by accessing these names explicitly using names()names(dimnames(table_ex))
## [1] "" ""
# Now, let's change these names and see how the table changesnames(dimnames(table_ex)) <- c("Gender", "Site")names(dimnames(table_ex))
## [1] "Gender" "Site"
table_ex
## Site## Gender PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88
# Now the row and column labels appear, making the table easier to understand

It also common to view these tabulations as percentages. This can be done by using prop.table(), which unlike table() takes in a table object as an argument and not the actual variables of interest. Note that any changes to dimnames that are done to the table object are kept when applying prop.table(). The output from prop.table() is also stored as an object of type table.

# 2 Way Proportion Table prop_table_ex <- prop.table(table_ex)prop_table_ex
## Site## Gender PHI SEA STL UNC## Female 0.09369676 0.11413969 0.10221465 0.09028961## Male 0.16013629 0.14480409 0.14480409 0.14991482

A second way of creating contingency tables is using the xtabs() function, which requires the stats package (which is included in R by default, though still load the package using library()). The function xtabs() creates a object of type xtabs and you will notice that the output of both xtabs() and tabel() is nearly identical. xtabs() has the following advantages: 1) row and column labels are included automatically, set to the variable names and 2) there is a data= argument, which means you just have to reference the variable names. With xtabs(), you do not list out the variables of interest separated by commas. Instead you use formula notation, which is ~variable1+variable2+… where variable1 and variable2 are the names of the variables of interest. You can add more then two variables (hence the …). See below for the two-way gender and site example.

library(stats)table_ex_xtabs <- xtabs(~Gender+Study_Site, data=aosi_data)table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC## Female 55 67 60 53## Male 94 85 85 88

To create a table of proportions using xtab(), you first create the table of counts using xtab(), and then use the prop.table() function on this table object. This is exactly what was done when using table().

One useful function when creating tables is proportions is round(). As seen with the previous table of proportions, R will not round decimals by default. The round() function can be used for all types of R objects. The first argument is the object of values you want to round and the second argument is the number of decimal places to round to.

prop_table_ex_xtabs <- prop.table(table_ex_xtabs)prop_table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC## Female 0.09369676 0.11413969 0.10221465 0.09028961## Male 0.16013629 0.14480409 0.14480409 0.14991482
prop_table_ex_xtabs <- round(prop_table_ex_xtabs, 2)prop_table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC## Female 0.09 0.11 0.10 0.09## Male 0.16 0.14 0.14 0.15
prop_table_ex <- round(prop_table_ex, 2)prop_table_ex
## Site## Gender PHI SEA STL UNC## Female 0.09 0.11 0.10 0.09## Male 0.16 0.14 0.14 0.15

Lastly, we discuss how to add margin totals to your table. Whether using table() or xtab(), a simple way to add all margin totals to your table is with the function addmargins() from the stats package. Simply add your table or xtab object as the first argument to the addmargins() function, and a new table will be returned which includes these margin totals. This also works with tables of proportions.

table_ex <- addmargins(table_ex)table_ex_xtabs <- addmargins(table_ex_xtabs)prop_table_ex <- addmargins(prop_table_ex)prop_table_ex_xtabs <- addmargins(prop_table_ex_xtabs)table_ex
## Site## Gender PHI SEA STL UNC Sum## Female 55 67 60 53 235## Male 94 85 85 88 352## Sum 149 152 145 141 587
table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC Sum## Female 55 67 60 53 235## Male 94 85 85 88 352## Sum 149 152 145 141 587
prop_table_ex
## Site## Gender PHI SEA STL UNC Sum## Female 0.09 0.11 0.10 0.09 0.39## Male 0.16 0.14 0.14 0.15 0.59## Sum 0.25 0.25 0.24 0.24 0.98
prop_table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC Sum## Female 0.09 0.11 0.10 0.09 0.39## Male 0.16 0.14 0.14 0.15 0.59## Sum 0.25 0.25 0.24 0.24 0.98

There are many packages which you can install with more advanced tools for creating and customizing contingency tables. We will cover some in the Chapter 9, though table() and xtabs() should suffice for exploratory analyses.

6.3 Tabular Data Analysis

In this section, we detail some common statistical methods used to analyze contingency table data as well as how to implement these methods in R. These methods are defined and the statistics behind them are explained and then implementation in R is discussed and shown through examples.

6.3.1 Tests for Independence

6.3.2 Defining Independence

Suppose we have two categorical variables, denoted \(X\) and \(Y\). Denote the joint distribution of \(X\) and \(Y\) by \(f_{x,y}\), the distribution of \(X\) by \(f_x\) and the distribution of \(Y\) by \(f_y\). Denote the distribution of \(X\) conditional on \(Y\) by \(f_{x|y}\) and the distribution of \(Y\) conditional on \(X\) by \(f_{y|x}\).

In statistics, \(X\) and \(Y\) are independent if \(f_{x,y}=f_{x}*f_{y}\) (i.e., if the distribution of \(X\) and \(Y\) as a pair is equal to the distribution of \(X\) times the the distribution of \(Y\)). This criteria is the equivalent to \(f_{x|y}=f_{x}\) and \(f_{y|x}=f_{y}\) (i.e., if the distribution of \(X\) in the whole population is the same as the distribution of \(X\) in the sub-population defined by specific values of \(Y\)). As an example, suppose we were interested in seeing if a person voting in an election (\(X\)) is independent of their sex at birth (\(Y\)). If these variables were independent, we would expect that the percentage of women in the total population is similar to the percentage of women among the people who vote in the election. This matches with our definition of independence in statistics.

6.3.3 Chi-Square Test for Independence

To motivate the concept of testing for independence, let’s consider the AOSI dataset. Let’s see if study site and gender are independent. Recall the contingency table for these variables in the data was the following.

table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC Sum## Female 55 67 60 53 235## Male 94 85 85 88 352## Sum 149 152 145 141 587
prop_table_ex_xtabs
## Study_Site## Gender PHI SEA STL UNC Sum## Female 0.09 0.11 0.10 0.09 0.39## Male 0.16 0.14 0.14 0.15 0.59## Sum 0.25 0.25 0.24 0.24 0.98

From our definition of independence, it looks like gender and site are independent based on comparing the counts within each gender and site group as well as the population-level counts. Let’s conduct a formal test to see if there is evidence for independence. First, we cover the Chi-Square test. For all of these tests the null hypothesis is that the variables are independent. Under this null hypothesis, we would expect the following contingency table.

expected_table <- table_ex_xtabssex_sums <- expected_table[,5]site_sums <- expected_table[3,]expected_table[1,1] <- 149*(235/587)expected_table[2,1] <- 149*(352/587)expected_table[1,2] <- 152*(235/587)expected_table[2,2] <- 152*(352/587)expected_table[1,3] <- 145*(235/587)expected_table[2,3] <- 145*(352/587)expected_table[1,4] <- 141*(235/587)expected_table[2,4] <- 141*(352/587)expected_table <- round(expected_table,2)

Where did these values come from? Take the Philadelphia study site column as an example (labeled PHI). As explained before, under independence, in Philadelphia we would expect the percentage of female participants to be the same as the percentage in the total sample There are 149 participants from Philadelphia, 235 females, and 587 total subjects in the sample. The total sample is about 40% female, so we would expect there to be approximately 0.40*149 or 59.6 females from the Philadelphia site and thus approximately 89.4 males. That is, the expected count is equal to (row total*column total)/sample size. All entries are calculated using this equation. Let’s look at the differences between the counts from the AOSI data and the expected counts.

expected_table[-3,-5]-table_ex_xtabs[-3,-5]
## Study_Site## Gender PHI SEA STL UNC## Female 4.65 -6.15 -1.95 3.45## Male -4.65 6.15 1.95 -3.45

We can see that the differences are small considering the study site margins, so it there no evidence to suggest dependence. However, let’s do this more rigorously using a formal hypothesis test. For any hypothesis test, we create a test statistic and then calculate a p-value from this test statistic. Informally, the p-value measures the probability you would observe a test statistic value as or more extreme then the value observed in the dataset if the null hypothesis is true. For the Chi-Square test, the test statistic is equal to the sum of the squared differences between the observed and expected counts, divided by the expected counts. The distribution of this test statistic is approximately Chi-Square with \((r-1)\*(c-1)\) degrees of freedom, where \(r\) is the number of row categories and \(c\) is the number of column categories. The approximation becomes more accurate as the large size grows larger and larger (to infinity). Thus, if the sample size is “large enough”, we can accurately approximate the test statistics distribution with this Chi-Square distribution. This is what is referred to by “large sample” or “asymptotic” statistics. Let’s conduct the Chi-Square test on AOSI dataset. This is done by using summary() with the contingency table object (created by table() or xtab()). Note that you cannot have the row and column margins in the table object when conducting this Chi-Square test, as R will consider these marginals are row and column categories.

table_ex_xtabs <- xtabs(~Gender+Study_Site, data=aosi_data)summary(table_ex_xtabs)
## Call: xtabs(formula = ~Gender + Study_Site, data = aosi_data)## Number of cases in table: 587 ## Number of factors: 2 ## Test for independence of all factors:## Chisq = 2.1011, df = 3, p-value = 0.5517

We see that the p-value is 0.55, which is very large and under a threshold of 0.05 is far from significance. Thus, we do not have evidence to reject the null hypothesis that gender and study site are independent.

6.3.4 Fisher’s Exact Test

An alternative to the Chi-Square test is Fisher’s Exact Test. This hypothesis test has the same null and alternative hypothesis as the Chi-Square test. However, its test statistic has a known distribution for any finite sample size. Thus, no distributional approximation is required, unlike the Chi-Square test and thus it produces accurate p-values for any sample size. To conduct Fisher’s Exact Test, use the function fisher.test() from the stats package with the table or xtab object. The drawback to Fisher’s Exact Test is that it has a high computation time if the data has a large sample size; in that case, the approximation from the Chi-Square is likely accurate and this testing procedure should be used. Let’s run Fisher’s Exact Test on the gender by site contingency table.

fisher.test(table_ex_xtabs)
## ## Fisher's Exact Test for Count Data## ## data: table_ex_xtabs## p-value = 0.553## alternative hypothesis: two.sided

Due to large sample size, we see that the p-value is very close to the p-value from the Chi-Square test, as expected.

6 Working with Tables in R | Data Analysis and Processing with R based on IBIS data (2024)

FAQs

How to create a table in R with data? ›

In R, these tables can be created using table() along with some of its variations. To use table(), simply add in the variables you want to tabulate separated by a comma.

What does table() do in R? ›

The table() function in R is a versatile tool that allows you to create frequency tables, also known as contingency tables, from categorical data. Its primary purpose is to summarize and organize the counts or frequencies of different unique values present within a vector, factor, or column of a data frame.

How to analyse data using R? ›

Preprocessing – Cleaning the data is mandatory to put it in a structured format before performing analysis.
  1. Removing outliers( noisy data).
  2. Removing null or irrelevant values in the columns. ...
  3. If there is any missing data, either ignore the tuple or fill it with a mean value of the column.
Dec 9, 2022

What are the data types in R table? ›

Data Types in R

These data types can be numeric, integer, logical/boolean, character/string, vector, matrix, array, list, data-frame. It is useful to know the data type in order to know what functions can be performed on the object. To determine the type of data, you can use the class(), mode() or typeof() functions.

How to create a data table? ›

To create a data table in Excel, you can follow these steps:
  1. Select the cells you'd like to convert. First, open Excel and input the data you'd like to include in the table by entering it as organized rows and columns. ...
  2. Open the Create Table window. ...
  3. Customize parameters and create your table. ...
  4. Edit as needed.

How to create a dataset in R? ›

Making the Dataset
  1. Step 1: List down all variables you want to include. Note down how many units or rows of data you want. ...
  2. Step 2: Describe the requirements of each variable. ...
  3. Step 3: Determine an appropriate distribution for your variables. ...
  4. Step 4: Writing the Code. ...
  5. Step 5: Gather and Save Your Data.

What is data table in R? ›

data.table inherits from data.frame . It offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by A[B] syntax in Rwhere A is a matrix and B is a 2-column matrix.

What is the function of table in data analysis? ›

Tables are used to organize data that is too detailed or complicated to be described adequately in the text, allowing the reader to quickly see the results. They can be used to highlight trends or patterns in the data and to make a manuscript more readable by removing numeric data from the text.

What is the difference between Dataframes and tables in R? ›

frame in R is similar to the data table which is used to create tabular data but data table provides a lot more features than the data frame so, generally, all prefer the data. table instead of the data. frame.

Why do we need R for data analysis? ›

R makes handling data from various sources easy, from import to analysis. Plus the R system itself and the CRAN library offer plenty of data visualization functions and tools, which makes it easy for professionals to present their research and findings in an impactful and easy-to-read format.

What are the R values in data analysis? ›

The correlation coefficient r is a unit-free value between -1 and 1. Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r = and p = . The closer r is to zero, the weaker the linear relationship.

How to arrange data for R analysis? ›

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

What are tables called in R? ›

Table function (table())in R performs a tabulation of categorical variable and gives its frequency as output. It is further useful to create conditional frequency table and Proportinal frequency table. This recipe demonstrates how to use table() function to create the following two tables: Frequency table.

How many data types are in R? ›

R has 6 basic data types. (In addition to the five listed below, there is also raw which will not be discussed in this workshop.) Elements of these data types may be combined to form data structures, such as atomic vectors. When we call a vector atomic, we mean that the vector only holds data of a single data type.

What are the different data structures in R? ›

Basic Data Types
  • numeric - (10.5, 55, 787)
  • integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
  • complex - (9 + 3i, where "i" is the imaginary part)
  • character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
  • logical (a.k.a. boolean) - (TRUE or FALSE)

How do you turn data into a table? ›

Create and format tables
  1. Select a cell within your data.
  2. Select Home > Format as Table.
  3. Choose a style for your table.
  4. In the Create Table dialog box, set your cell range.
  5. Mark if your table has headers.
  6. Select OK.

How do you create a data collection table? ›

Here's how to make a data table by hand:
  1. Name your table. Write a title at the top of your paper. ...
  2. Figure out how many columns and rows you need.
  3. Draw the table. Using a ruler, draw a large box. ...
  4. Label all your columns. ...
  5. Record the data from your experiment or research in the appropriate columns. ...
  6. Check your table.
Mar 8, 2024

How do you save data as a table in R? ›

To save data as an RData object, use the save function. To save data as a RDS object, use the saveRDS function. In each case, the first argument should be the name of the R object you wish to save. You should then include a file argument that has the file name or file path you want to save the data set to.

How do I convert a data frame to a data table in R? ›

Method 1 : Using setDT() method

table object is a part of the data. table package, which needs to be installed in the working space. The setDT() method can be used to coerce the dataframe or the lists into data. table, where the conversion is made to the original dataframe.

References

Top Articles
Singer Case Shot Himself
Brekie Hill tit*
Walgreens Boots Alliance, Inc. (WBA) Stock Price, News, Quote & History - Yahoo Finance
Tesla Supercharger La Crosse Photos
The Definitive Great Buildings Guide - Forge Of Empires Tips
Mychart Mercy Lutherville
How To Be A Reseller: Heather Hooks Is Hooked On Pickin’ - Seeking Connection: Life Is Like A Crossword Puzzle
Google Jobs Denver
Tyrunt
Calamity Hallowed Ore
Umn Pay Calendar
7543460065
Stolen Touches Neva Altaj Read Online Free
The Haunted Drury Hotels of San Antonio’s Riverwalk
Es.cvs.com/Otchs/Devoted
Rapv Springfield Ma
Ella Eats
Available Training - Acadis® Portal
Effingham Bookings Florence Sc
White Pages Corpus Christi
Craigslist Maui Garage Sale
Jet Ski Rental Conneaut Lake Pa
Vegas7Games.com
What Channel Is Court Tv On Verizon Fios
Timeforce Choctaw
Ford F-350 Models Trim Levels and Packages
Minnick Funeral Home West Point Nebraska
UMvC3 OTT: Welcome to 2013!
Ceramic tiles vs vitrified tiles: Which one should you choose? - Building And Interiors
Gilchrist Verband - Lumedis - Ihre Schulterspezialisten
Wood Chipper Rental Menards
Catchvideo Chrome Extension
Access a Shared Resource | Computing for Arts + Sciences
Truck from Finland, used truck for sale from Finland
5 Star Rated Nail Salons Near Me
Fastpitch Softball Pitching Tips for Beginners Part 1 | STACK
Kristen Hanby Sister Name
Beaver Saddle Ark
Bratislava | Location, Map, History, Culture, & Facts
Goodwill Thrift Store & Donation Center Marietta Photos
THE 10 BEST Yoga Retreats in Konstanz for September 2024
Mistress Elizabeth Nyc
Emerge Ortho Kronos
Zasilacz Dell G3 15 3579
Review: T-Mobile's Unlimited 4G voor Thuis | Consumentenbond
Top 25 E-Commerce Companies Using FedEx
Tedit Calamity
The Wait Odotus 2021 Watch Online Free
Promo Code Blackout Bingo 2023
Legs Gifs
When Is The First Cold Front In Florida 2022
The Love Life Of Kelsey Asbille: A Comprehensive Guide To Her Relationships
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 5997

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.