Stack Overflow Asked on January 19, 2021
I have a dataframe that looks like this:
dataframe:
Date Revenue
2009 15
dec 15
2010 450
jan 13
feb 14
mar 14
apr 10
may 10
jun 31
jul 99
aug 43
sep 87
oct 32
nov 54
dec 43
2011 67
And it continues for several years in the same pattern until 2019. The row which contains the year represents the aggregate revenue for that year. 2009 is the only year which contains only one data point (december).
The dataframe is from a pivot table imported from excel that had months subgrouped for every year.
Each month is in the same column as the year and months from different years are not differentiated. I need to plot a line graph with monthly revenue for each year (that is, several lines for different years that show the revenue month by month), but the fact that I can’t differentiate months from different years is not allowing me to.
How can I make subgroups of months by year? Or assigning a new column with years for determined intervals (that is, every 12 rows), but excluding the year rows?
Thank you!
I would suggest next approach formating your data, and completing values for year. Your data (I have defined as df
the output you included) has the feature that Date
variable has mixed numeric and character values. The code I added creates a new variable according to the type in order to extract the year. After that missing rows are filled to completely identify the year group. Finally, it is sketched the plot. You only have one value for 2009 so it can not be seen and for 2011 there is only information about total. With your entire data you will have the complete image of all years. Here a tidyverse
approach:
library(tidyverse)
#Data
df <- structure(list(Date = c("2009", "dec", "2010", "jan", "feb",
"mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov",
"dec", "2011"), Revenue = c(15L, 15L, 450L, 13L, 14L, 14L, 10L,
10L, 31L, 99L, 43L, 87L, 32L, 54L, 43L, 67L)), class = "data.frame", row.names = c(NA,
-16L))
The code:
#Code
df %>% mutate(Var=ifelse(is.na(as.numeric(Date)),NA,as.numeric(Date))) %>%
fill(Var) %>%
#filter years in date to exclude big totals
filter(is.na(as.numeric(Date))) %>%
#Add order to levels
mutate(Date=factor(Date,levels = c("jan","feb","mar","apr","may",
"jun","jul","aug","sep","oct",
"nov","dec"),ordered=T)) %>%
#Finally plot
ggplot(aes(x=Date,y=Revenue,group=factor(Var),color=factor(Var)))+
geom_line()+
theme_bw()
Output:
Correct answer by Duck on January 19, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP