Average by group for 'n' number of rows per group - R

Question

I have a dataframe with three columns; entity, date, value.
First I must order the 'value' based on 'Descending order of 'DATE'' per entity.
Then,the requirement is to get 2 types of averages per entity based on the user defined number of rows. For example, if user enters 3 and 6; it means that 'give me the average of first 3 values and then average of next 6 values' per entity.
for the given dataset, result would be a data frame as:
    Entity    Avg3 Avg6
        A     110   65 
        B     220  130

I can use 'aggregate' function to get 'mean' by entity but I am not able to extract specific rows data per entity.
Also, ordering the dataframe based on entity and then 'date' seems to not work.
#order data based on date (tried adding entity here but it does not work)
df_new <- df[rev(order(as.Date(df$Date)))),]

here is the dput:
structure(list(Wells = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B"), Date = structure(c(1577836800, 1577923200, 1578009600, 
1578096000, 1578182400, 1578268800, 1578355200, 1578441600, 1578528000, 
1578614400, 1578700800, 1578787200, 1577836800, 1577923200, 1578009600, 
1578096000, 1578182400, 1578268800, 1578355200, 1578441600, 1578528000, 
1578614400, 1578700800, 1578787200), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Index = c(10, 20, 30, 40, 50, 60, 70, 80, 
90, 100, 110, 120, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 
220, 240)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", 
"data.frame"))

akrun · Accepted Answer

Here is an option with tidyverse. Assuming we have dynamic inputs ('n1', 'n2'), arrange the data by 'Wells' and in descending order of 'Date', group_by the 'Well', use slice_head to get the first (n1 + n2) rows, then summarise to create the 'Avg' mean columns by taking the head and tail of 'Index' based on the 'n1' and 'n2' respectively library(dplyr) library(stringr) n1 <- 3 n2 <- 6 df %>% arrange(Wells, desc(Date)) %>% group_by(Wells) %>% slice_head(n = n1 + n2) %>% summarise(!! str_c('Avg', n1) := mean(head(Index, n1)), !! str_c('Avg', n2) := mean(tail(Index, n2)), .groups = 'drop') -output # A tibble: 2 x 3 # Wells Avg3 Avg6 # #1 A 110 65 #2 B 220 130 Or using base R df1 <- df[order(df$Wells, -as.numeric(df$Date)),] out <- do.call(data.frame, aggregate(Index ~ Wells, subset(df1, ave(seq_along(Wells), Wells, FUN = seq_along) <= (n1 + n2)), FUN = function(x) c(Avg3 = mean(head(x, n1)), Avg6 = mean(tail(x, n2)))))

Ronak Shah · Answer

You can use cut/findInterval to divide data into groups, take mean of each group and get data in wide format using pivot_wider. library(dplyr) n <- c(3, 6) df %>% arrange(Wells, desc(Date)) %>% group_by(Wells) %>% group_by(grp = findInterval(row_number(), cumsum(n), left.open = TRUE), .add = TRUE) %>% #For older dplyr version use add = TRUE #group_by(grp = findInterval(row_number(), cumsum(n), left.open = TRUE), add = TRUE) %>% summarise(Index = mean(Index)) %>% slice(seq_along(n)) %>% mutate(grp = paste0('avg', n)) %>% tidyr::pivot_wider(names_from = grp, values_from = Index) # Wells avg3 avg6 # #1 A 110 65 #2 B 220 130

Average by group for 'n' number of rows per group - R

2 Answers

Add your own answers!

Ask a Question