Data Science Asked by cpumar on February 6, 2021
This is something I can’t achieve with the reshape2 library for R. I have the following data:
zone code literal
1: A 14 bicl
2: B 14 bicl
3: B 24 calso
4: A 51 mara
5: B 51 mara
6: A 125 gan
7: A 143 carc
8: B 143 carc
i.e.: each zone has 4 codes with its corresponding literal. I would like to transform it to a dataset with one column for each of the four codes and one column for each of the four literals:
zone code1 literal1 code2 literal2 code3 literal3 code4 literal4
1: A 14 bicl 51 mara 125 gan 143 carc
2: B 14 bicl 24 calso 51 mara 143 carc
Any easy way to achieve this in R? If not, I would also be comfortable with a solution in Python.
Using pandas in python you can transpose the rows and columns with .T
Answered by tormond on February 6, 2021
Here is a python solution, given a dataframe (df
) containing the data you have above:
>>> from itertools import chain
>>> data = []
>>> for zone in df.zone.unique():
... codetuples = [(row[2], row[3]) for row in df[df['zone']==zone].itertuples()]
... data.append([zone] + list(chain.from_iterable(codetuples)))
...
>>> df = pandas.DataFrame(data, columns=['zone', 'code1', 'literal1', 'code2', 'literal2', 'code3', 'literal3', 'code4', 'literal4'])
>>> df
zone code1 literal1 code2 literal2 code3 literal3 code4 literal4
0 A 14 bicl 51 mara 125 gan 143 carc
1 B 14 bicl 24 calso 51 mara 143 carc
df.itertuples()
returns an iterator through the rows of a dataframe as tuples. The first entry (0 indexed in the tuple) will be the index, so the 2nd and 3rd columns of the df will be the two you are interested in.
There is no guarantee of order for code1 vs code2; I stored the data from the df in a variable codetuples
so you can sort or something. There is also no guarantee that you will have exactly 4 pairs of code and literal, so you could put error checking in there, if you needed to.
Once you have an acceptable list of four tuples, from_iterable()
flattens this list. Then append the zone number to the front and store it as another dataframe.
Answered by kingledion on February 6, 2021
Below is one way to achieve this in R using Tidyverse :
data %>%
mutate(group = rep(1:4, each = 2)) %>% # in this example such is the rule # data-dependent step
gather("key", "value", c("code", "literal")) %>%
mutate(key = paste0(key, group)) %>%
dplyr::select(-group) %>%
spread(key, value) %>%
dplyr::select(zone, ends_with("1"), ends_with("2"), ends_with("3"), ends_with("4")) # to arrange the column names, because otherwise all the 'code' columns are together and 'literal' columns come after # data-dependent step
Answered by eg-r on February 6, 2021
Just two lines in plain R
X <- read.table(header = TRUE, text = "
zone code literal
A 14 bicl
B 14 bicl
B 24 calso
A 51 mara
B 51 mara
A 125 gan
A 143 carc
B 143 carc")
X$time <- ave(X$code, X$zone, FUN = seq_along)
reshape(X, direction = "wide", timevar = "time", idvar = "zone", sep = "")
# output
zone code1 literal1 code2 literal2 code3 literal3 code4 literal4
1 A 14 bicl 51 mara 125 gan 143 carc
2 B 14 bicl 24 calso 51 mara 143 carc
Answered by Michael M on February 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP