Stack Overflow Asked by Triss on February 11, 2021
I need a simple dplyr solution to filter a data.frame. For example I have
set.seed(100)
x = sort(sample(1:5,10,1))
y = sort(sample(6:10,10,1))
z = as.data.frame(cbind(x,y))
z
x y
1 1 6
2 2 7
3 2 7
4 2 8
5 2 8
6 3 8
7 3 9
8 4 9
9 4 9
10 5 9
As first, I need an output which extract the doubled values like this:
one = rbind(c(1,6),c(2,7), c(2,8), c(3,8), c(3,9), c(4,9), c(5,9))
one
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 2 8
[4,] 3 8
[5,] 3 9
[6,] 4 9
[7,] 5 9
two = rbind(c(1,6),c(2,7),c(3,8),c(4,9),c(5,9))
two
Then I want to have unique x with for example the first entry like this:
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 9
In a last step I need the number of different values of y in x:
three = rbind(c(1,1),c(2,2),c(3,2),c(4,2),c(5,1))
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 2
[4,] 4 2
[5,] 5 1
Hi you can achieve this with the following code:
z %>% distinct(.keep_all=TRUE)
z %>%
group_by(x) %>%
mutate(row=row_number()) %>%
filter(row==1) %>%
select(-row)
z %>%
group_by(x) %>%
summarise(values=n_distinct(y,na.rm=TRUE))
Answered by Gerardo Flores on February 11, 2021
Both of these can be achieved using distinct
and group_by
. The first, group by both x
and y
and run distinct
to get all unique combinations of x
and y
.
For the second, we group only by x
and run distinct
again. We have to include .keep_all
to make sure y
stays in the resulting dataframe because it isn't contained in the group_by
. This works because distinct
keeps the first occurring record for x
. To clarify check ?distinct
.
set.seed(100)
x = sort(sample(1:5,10,1))
y = sort(sample(6:10,10,1))
z = as.data.frame(cbind(x,y))
# First scenario
z1 <- z %>%
group_by(x, y) %>%
distinct(x)
z1 output:
# A tibble: 7 x 2
# Groups: x, y [7]
x y
<int> <int>
1 1 6
2 2 7
3 2 8
4 3 8
5 3 9
6 4 9
7 5 9
# Second scenario
z2 <- z %>%
group_by(x) %>%
distinct(x, .keep_all = TRUE)
z2 output:
# A tibble: 5 x 2
# Groups: x [5]
x y
<int> <int>
1 1 6
2 2 7
3 3 8
4 4 9
5 5 9
Answered by TTS on February 11, 2021
distinct(z, x, y)
group_by(z, x) %>% slice(1) %>% ungroup()
group_by(z, x) %>% summarize(count = n())
Answered by bcarlsen on February 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP