On the surface, a solution to your question seems relatively straightforward. However, your data break the rules of tidy data. I encourage you to read the information in that link, particularly if you need to adapt this code in any way.
One way to produce your desired output is to first create a "tidy" version of your dataset. For the dataset in this example, that means converting (pivoting) your data into 'long' format, then back to 'wide' format. You have two groups of 100 (n=200) and three numeric variables worth of 'long' data. In other words, 100 rows becomes 600. Your desired output is further complicated by the fact that you want three different variables in a single column. I have added "age" columns to your sample data for completeness.
There will be more elegant ways to achieve this, and using a for()
loop to populate an empty df would
definitely require less code. But I was curious about a tidyverse
solution. Based on the data you have
provided, it works. It's inelegant because it would be quite clunky (not impossible) to edit if you want to include
another variable for instance. But there is a lot of information on SO to help you achieve this if necessary. One
further consideration, this has only been tested on study groups of equal length. Some modifications to the code
that creates "df1" may be be required to accommodate groups of varying length. Again, SO is your friend.
Credit to @Mark for assistance with the imap()
solution
library(dplyr) library(tidyr) library(purrr) library(stringr) # Generate sample data set.seed(1) df <- data.frame(weight_study1 = rnorm(100, mean = 60, sd = 5), weight_study2 = rnorm(100, mean = 62, sd = 6), gender_study1 = sample(c("Male", "Female"), 100, replace = TRUE), gender_study2 = sample(c("Male", "Female"), 100, replace = TRUE), age1 = sample(18:65, 100, replace = TRUE), # Added column age2 = sample(18:65, 100, replace = TRUE)) # Added column # Step 1: get gender counts by study; get all numeric variables into a single column; # create "study" column and define study groups. df1 <- df %>% group_by(gender_study1) %>% mutate(count1 = n()) %>% group_by(gender_study2) %>% mutate(count2 = n()) %>% pivot_longer(cols = starts_with(c("weight", "age", "count"))) %>% mutate(study = paste0("Pilot_study", str_sub(name, start = -1))) %>% ungroup() # Step 2: use imap() to create a single (tidy) "gender" column; # select only necessary columns; # group by study and derive "age" and "count" strings for final output; # group by study and gender and derive "weight" strings for final output; # ungroup, select only necessary columns, and 'tidy' the "Category" names; # get distinct/unique rows, and pivot data to wide format. df2 <- df1 %>% # If number at end of the gender column's name matches the end number of values # in "study" column, get the corresponding gender value mutate(gender = unlist(imap(study, ~df1[.y, str_replace(.x, "Pilot_study", "gender_study")] %>% ifelse(is.null(.), NA, .)))) %>% select(-starts_with("gender_")) %> # Not "tidy" and no longer needed group_by(study) %>% mutate(temp = ifelse(str_detect(name, "age"), paste0(round(mean(value), 1), "(", round(sd(value), 1), ")"), NA), # Workaround needed to get count % per study because study group # count gets multiplied by the number of variables. n_distinct() # returns count of unqiue values temp = ifelse(str_detect(name, "count"), paste0(as.integer(value), "(", round(100 / (n() / n_distinct(name)) * value , 1), ")"), temp)) %>% group_by(study, gender) %>% mutate(temp = ifelse(str_detect(name, "weight"), paste0(round(mean(value), 1), "(", round(sd(value), 1), ")"), temp)) %>% ungroup() %>% select(-value) %>% mutate(name = str_sub(name, end = -2), name = ifelse(str_detect(name, "weight"), "weight mean(sd)", name), name = ifelse(str_detect(name, "count"), "count(%)", name), gender = ifelse(name == "age", "", gender), name = ifelse(name == "age", "age mean(sd)", name)) %>% distinct() %>% pivot_wider(names_from = study, values_from = temp) %>% arrange(name) %>% rename(Category = "name") df2 # A tibble: 5 × 4 Category gender Pilot_study1 Pilot_study2 <chr> <chr> <chr> <chr> 1 age mean(sd) "" 51.1(12.2) 50.9(12.1) 2 count(%) Male 55(55) 52(52) 3 count(%) Female 45(45) 48(48) 4 weight mean(sd) Male 52.3(11.9) 51.2(12.2) 5 weight mean(sd) Female 49.6(12.4) 50.5(12.1)