Stack Overflow Asked by youtube on January 27, 2021
I have a long data table that provides cumulative values only. What would be the best way to add another column that has the current values? Here is a short data table you can use as an example:
ContractID Date Cum_Sum_1M
1: 1 2018-02-01 10
2: 1 2018-02-20 30
3: 1 2018-03-12 50
4: 2 2018-02-01 10
5: 2 2018-02-12 30
Try this solution with diff()
and a vector in order to get values before cumulative sum. Here the code:
#Code
df$Var <- c(df$Cum_Sum_1M[1],diff(df$Cum_Sum_1M))
df$CumVar2 <- cumsum(df$Var)
Output:
ContractID Date Cum_Sum_1M Var CumVar2
1: 1 2018-02-01 10 10 10
2: 1 2018-02-20 30 20 30
3: 1 2018-03-12 50 20 50
4: 2 2018-02-01 10 -40 10
5: 2 2018-02-12 30 20 30
Some data used:
#Data
df <- structure(list(ContractID = c(1L, 1L, 1L, 2L, 2L), Date = c("2018-02-01",
"2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Cum_Sum_1M = c(10L,
30L, 50L, 10L, 30L)), row.names = c("1:", "2:", "3:", "4:", "5:"
), class = "data.frame")
Also if a grouped operation is required, we could use dplyr
:
library(dplyr)
#Code
df %>% group_by(ContractID) %>%
mutate(NewVar=c(Cum_Sum_1M[1],diff(Cum_Sum_1M)))
Output:
# A tibble: 5 x 4
# Groups: ContractID [2]
ContractID Date Cum_Sum_1M NewVar
<int> <chr> <int> <int>
1 1 2018-02-01 10 10
2 1 2018-02-20 30 20
3 1 2018-03-12 50 20
4 2 2018-02-01 10 10
5 2 2018-02-12 30 20
Correct answer by Duck on January 27, 2021
As it is a data.table
, the best option would be data.table
methods. We group by 'ContractID' and take the difference of the lag
and current values of 'Cum_Sum_1M' column
library(data.table)
dt[, Var := c(first(Cum_Sum_1M), (Cum_Sum_1M - shift(Cum_Sum_1M))[-1]), by = ContractID]
dt
# ContractID Date Cum_Sum_1M Var
#1: 1 2018-02-01 10 10
#2: 1 2018-02-20 30 20
#3: 1 2018-03-12 50 20
#4: 2 2018-02-01 10 10
#5: 2 2018-02-12 30 20
dt <- structure(list(ContractID = c(1L, 1L, 1L, 2L, 2L), Date = c("2018-02-01",
"2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Cum_Sum_1M = c(10L,
30L, 50L, 10L, 30L)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
Answered by akrun on January 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP