TransWikia.com

rows to columns in data.table R (or Python)

Data Science Asked by cpumar on February 6, 2021

This is something I can’t achieve with the reshape2 library for R. I have the following data:

 zone       code        literal
 1: A         14           bicl
 2: B         14           bicl
 3: B         24          calso
 4: A         51           mara
 5: B         51           mara
 6: A         125           gan
 7: A         143          carc
 8: B         143          carc

i.e.: each zone has 4 codes with its corresponding literal. I would like to transform it to a dataset with one column for each of the four codes and one column for each of the four literals:

 zone  code1 literal1   code2 literal2   code3 literal3   code4 literal4    
 1: A    14     bicl      51     mara     125      gan      143   carc
 2: B    14     bicl      24    calso      51     mara      143   carc 

Any easy way to achieve this in R? If not, I would also be comfortable with a solution in Python.

4 Answers

Using pandas in python you can transpose the rows and columns with .T

Answered by tormond on February 6, 2021

Here is a python solution, given a dataframe (df) containing the data you have above:

>>> from itertools import chain
>>> data = []
>>> for zone in df.zone.unique():
...    codetuples = [(row[2], row[3]) for row in df[df['zone']==zone].itertuples()]
...    data.append([zone] + list(chain.from_iterable(codetuples)))
...
>>> df = pandas.DataFrame(data, columns=['zone', 'code1', 'literal1', 'code2', 'literal2', 'code3', 'literal3', 'code4', 'literal4'])
>>> df
  zone   code1 literal1   code2 literal2 code3 literal3  code4 literal4
0    A      14     bicl      51     mara   125      gan    143     carc
1    B      14     bicl      24    calso    51     mara    143     carc

Explanation

df.itertuples() returns an iterator through the rows of a dataframe as tuples. The first entry (0 indexed in the tuple) will be the index, so the 2nd and 3rd columns of the df will be the two you are interested in.

There is no guarantee of order for code1 vs code2; I stored the data from the df in a variable codetuples so you can sort or something. There is also no guarantee that you will have exactly 4 pairs of code and literal, so you could put error checking in there, if you needed to.

Once you have an acceptable list of four tuples, from_iterable() flattens this list. Then append the zone number to the front and store it as another dataframe.

Answered by kingledion on February 6, 2021

Below is one way to achieve this in R using Tidyverse :

data %>% 
mutate(group = rep(1:4, each = 2)) %>% # in this example such is the rule # data-dependent step
gather("key", "value", c("code", "literal")) %>% 
mutate(key = paste0(key, group)) %>% 
dplyr::select(-group) %>% 
spread(key, value) %>% 
dplyr::select(zone, ends_with("1"), ends_with("2"), ends_with("3"), ends_with("4")) # to arrange the column names, because otherwise all the 'code' columns are together and 'literal' columns come after # data-dependent step

Answered by eg-r on February 6, 2021

Just two lines in plain R

X <- read.table(header = TRUE, text = "
zone       code        literal
 A         14           bicl
 B         14           bicl
 B         24          calso
 A         51           mara
 B         51           mara
 A         125           gan
 A         143          carc
 B         143          carc")

X$time <- ave(X$code, X$zone, FUN = seq_along)
reshape(X, direction = "wide", timevar = "time", idvar = "zone", sep = "")

# output
  zone code1 literal1 code2 literal2 code3 literal3 code4 literal4
1    A    14     bicl    51     mara   125      gan   143     carc
2    B    14     bicl    24    calso    51     mara   143     carc

Answered by Michael M on February 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP