Data Science Asked by DataGuy23 on July 4, 2021
I’m trying to come up with a function in R that gives the mode value of a column along with the number of times (or frequency) that the value occurs. I want it to exclude missing (or blank) values, and treat ties by showing both values. When there are no repeating values I want it to return the first-appearing value that is found along with its frequency 1.
"Name Color
Drew Blue
Drew Green
Drew Red
Bob Green
Bob Green
Bob Green
Bob Blue
Jim Red
Jim Red
Jim blue
Jim blue
mode of Drew = Blue, 1
mode of Bob = Green, 3
mode of jim = Red, Blue, 2
Here’s the function code i have so far, it excludes NAs but does not show both values when there is a tie and does not show frequency. Any help appreciated!
mode <- function(x) {
if ( anyNA(x) ) x = x[!is.na(x)]
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
You do not need a custom function to do this. Let dplyr
handle it. Assuming your data is in a dataframe named df
, here is what it might look like:
df %>% # Set up the pipe
subset(complete.cases(df)) %>% # Removes rows with NA values
group_by(Name) %>% # Groups by the Name column
count(Color) %>% # Counts each Color by Name, creates a new column n
mutate(max = max(n)) %>% # Creates a new column for the max(n) by Name
subset(n == max(n)) %>% # Keeps only those rows where n equals max(n)
mutate(Keep == case_when( # Creates a dummy logical column named 'Keep'
n > 1 ~ TRUE, # That is TRUEfor n > 1 to keep ties
n == 1 & Color == head(Color, 1) ~ TRUE, # That is TRUE for the first row of n = 1
TRUE ~ FALSE)) %>% # That is FALSE for all other cases
subset(Keep) %>% # Keeps only those rows where Keep is TRUE
select(Name, Mode = Color, n) # Keeps only the Name, Color, and n columns and
# renames Color as Mode
Here is the output
# A tibble: 3 x 3
# Groups: Name [3]
Name Mode Count
<fct> <fct> <int>
1 Bob Green 3
2 Drew Blue 1
3 Jim Blue 2
4 Jim Red 2
If you want a function, then wrap this up in a function definition:
my_mode_func <- function(df){
df %>%
subset(complete.cases(df)) %>%
group_by(Name) %>%
count(Color) %>%
mutate(max = max(n)) %>%
subset(n == max) %>%
mutate(Keep = case_when(
n > 1 ~ TRUE,
n == 1 & Color == head(Color,1) ~ TRUE,
TRUE ~ FALSE)) %>%
subset(Keep) %>%
select(Name, Mode = Color, Count = n)
}
Answered by Ben Norris on July 4, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP