Stack Overflow Asked on November 26, 2020
I have a dataframe on R with expenditure for many groups along the years. It basically looks like this (the grey columns):
I want to add the mean of spending for the years, as shown on the yellow column, based on the spending on the previous and following year.
I have trying using this code:
expenditures %>%
group_by(id) %>%
mutate(
avg_exp = ifelse(year != 2011 && year != 2008,
mean(c(
Spending[Year %in% (Year-1)],
Spending[Year %in% (Year+1)])),
NA)) %>%
View()
However, i keep all sort of weird numbers. First of all, the ifelse only apply the else condition, even tho the Year column is set as integer. Second of all, even if i set to calculate the average also on the else condition, all rows (in each group) are filled with the same number, which I don’t know where it came from (it is close to the general average of the group but not the same).
Is there any simple way to do this?
Thanks
We could use the +
of lag
and lead
and divide by 2 after grouping by 'ID'. The default
option in both lead
and lag
are NA
so, those first and last 'Year' will be NA
in the 'Mean' column
library(dplyr)
expenditures %>%
group_by(ID) %>%
mutate(Mean = (lead(Spending) + lag(Spending))/2)
-output
# A tibble: 12 x 4
# Groups: ID [3]
# ID Year Spending new
# <int> <int> <dbl> <dbl>
# 1 1 2008 55 NA
# 2 1 2009 57 60
# 3 1 2010 65 63.5
# 4 1 2011 70 NA
# 5 2 2008 80 NA
# 6 2 2009 87 85
# 7 2 2010 90 91
# 8 2 2011 95 NA
# 9 3 2008 120 NA
#10 3 2009 123 125
#11 3 2010 130 129
#12 3 2011 135 NA
Or another option is to cbind
the lead/lag
output and then use rowMeans
expenditures %>%
group_by(ID) %>%
mutate(Mean = rowMeans(cbind(lead(Spending), lag(Spending))))
expenditures <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L), Year = c(2008L, 2009L, 2010L, 2011L, 2008L, 2009L, 2010L,
2011L, 2008L, 2009L, 2010L, 2011L), Spending = c(55, 57, 65,
70, 80, 87, 90, 95, 120, 123, 130, 135)), class = "data.frame",
row.names = c(NA,
-12L))
Correct answer by akrun on November 26, 2020
For completion here is a data.table
answer with shift
:
library(data.table)
setDT(expenditures)
expenditures[, Mean := (shift(Spending) + shift(Spending, type = 'lead'))/2, ID]
expenditures
# ID Year Spending Mean
# 1: 1 2008 55 NA
# 2: 1 2009 57 60.0
# 3: 1 2010 65 63.5
# 4: 1 2011 70 NA
# 5: 2 2008 80 NA
# 6: 2 2009 87 85.0
# 7: 2 2010 90 91.0
# 8: 2 2011 95 NA
# 9: 3 2008 120 NA
#10: 3 2009 123 125.0
#11: 3 2010 130 129.0
#12: 3 2011 135 NA
Answered by Ronak Shah on November 26, 2020
Here is a base R option using embed
within ave
transform(
expenditures,
Mean = ave(Spending,ID,FUN = function(x) c(NA,rowMeans(embed(x,3)[,-2]),NA))
)
which gives
ID Year Spending Mean
1 1 2008 55 NA
2 1 2009 57 60.0
3 1 2010 65 63.5
4 1 2011 70 NA
5 2 2008 80 NA
6 2 2009 87 85.0
7 2 2010 90 91.0
8 2 2011 95 NA
9 3 2008 120 NA
10 3 2009 123 125.0
11 3 2010 130 129.0
12 3 2011 135 NA
Data
> dput(expenditures)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L), Year = c(2008L, 2009L, 2010L, 2011L, 2008L, 2009L, 2010L,
2011L, 2008L, 2009L, 2010L, 2011L), Spending = c(55, 57, 65,
70, 80, 87, 90, 95, 120, 123, 130, 135)), class = "data.frame", row.names = c(NA,
-12L))
Answered by ThomasIsCoding on November 26, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP