Notification texts go here Contact Us Buy Now!

Creating side by side table for longitudinal analysis in R

On the surface, a solution to your question seems relatively straightforward. However, your data break the rules of tidy data. I encourage you to read the information in that link, particularly if you need to adapt this code in any way.

One way to produce your desired output is to first create a "tidy" version of your dataset. For the dataset in this example, that means converting (pivoting) your data into 'long' format, then back to 'wide' format. You have two groups of 100 (n=200) and three numeric variables worth of 'long' data. In other words, 100 rows becomes 600. Your desired output is further complicated by the fact that you want three different variables in a single column. I have added "age" columns to your sample data for completeness.

There will be more elegant ways to achieve this, and using a for() loop to populate an empty df would definitely require less code. But I was curious about a tidyverse solution. Based on the data you have provided, it works. It's inelegant because it would be quite clunky (not impossible) to edit if you want to include another variable for instance. But there is a lot of information on SO to help you achieve this if necessary. One further consideration, this has only been tested on study groups of equal length. Some modifications to the code that creates "df1" may be be required to accommodate groups of varying length. Again, SO is your friend.

Credit to @Mark for assistance with the imap() solution

    library(dplyr)
    library(tidyr)
    library(purrr)
    library(stringr)

    # Generate sample data
    set.seed(1)
    df <- data.frame(weight_study1 = rnorm(100, mean = 60, sd = 5),
                     weight_study2 = rnorm(100, mean = 62, sd = 6),
                     gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE),
                     gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE),
                     age1 = sample(18:65, 100, replace = TRUE), # Added column
                     age2 = sample(18:65, 100, replace = TRUE)) # Added column

    # Step 1: get gender counts by study; get all numeric variables into a single column;
    # create "study" column and define study groups.
    df1 <- df %>%
      group_by(gender_study1) %>%
      mutate(count1 = n()) %>%
      group_by(gender_study2) %>%
      mutate(count2 = n()) %>%
      pivot_longer(cols = starts_with(c("weight", "age", "count"))) %>%
      mutate(study = paste0("Pilot_study", str_sub(name, start = -1))) %>%
      ungroup()

    # Step 2: use imap() to create a single (tidy) "gender" column;
    # select only necessary columns;
    # group by study and derive "age" and "count" strings for final output;
    # group by study and gender and derive "weight" strings for final output;
    # ungroup, select only necessary columns, and 'tidy' the "Category" names;
    # get distinct/unique rows, and pivot data to wide format.
    df2 <- df1 %>%
      # If number at end of the gender column's name matches the end number of values
      # in "study" column, get the corresponding gender value
      mutate(gender = unlist(imap(study,
                                  ~df1[.y,
                                    str_replace(.x, "Pilot_study", "gender_study")] %>%
                                  ifelse(is.null(.), NA, .)))) %>%
      select(-starts_with("gender_")) %> # Not "tidy" and no longer needed
      group_by(study) %>%
      mutate(temp = ifelse(str_detect(name, "age"),
                           paste0(round(mean(value), 1),
                                  "(",
                                  round(sd(value), 1),
                                  ")"), NA),
             # Workaround needed to get count % per study because study group
             # count gets multiplied by the number of variables. n_distinct()
             # returns count of unqiue values
             temp = ifelse(str_detect(name, "count"),
                           paste0(as.integer(value),
                                  "(",
                                  round(100 / (n() / n_distinct(name)) * value , 1),
                                  ")"), temp))  %>%
      group_by(study, gender) %>%
      mutate(temp = ifelse(str_detect(name, "weight"),
                           paste0(round(mean(value), 1),
                                  "(",
                                  round(sd(value), 1),
                                  ")"), temp)) %>%
      ungroup() %>%
      select(-value) %>%
      mutate(name = str_sub(name, end = -2),
             name = ifelse(str_detect(name, "weight"), "weight mean(sd)", name),
             name = ifelse(str_detect(name, "count"), "count(%)", name),
             gender = ifelse(name == "age", "", gender),
             name = ifelse(name == "age", "age mean(sd)", name)) %>%
      distinct() %>%
      pivot_wider(names_from = study,
                  values_from = temp) %>%
      arrange(name) %>%
      rename(Category = "name")

    df2
    # A tibble: 5 × 4
    Category gender Pilot_study1 Pilot_study2
    <chr> <chr> <chr> <chr>
    1 age mean(sd) "" 51.1(12.2) 50.9(12.1)
    2 count(%) Male 55(55) 52(52)
    3 count(%) Female 45(45) 48(48)
    4 weight mean(sd) Male 52.3(11.9) 51.2(12.2)
    5 weight mean(sd) Female 49.6(12.4) 50.5(12.1)
  

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.