TransWikia.com

pivot_wider for Datasets

Mathematica Asked by Stephan on December 1, 2020

I’d like to transform this dataset:

Dataset[{
  <|"Name" -> "Steven","Born" -> 1980, "Year" -> 2017, "Score" -> 115|>,
  <|"Name" -> "Steven", "Born" -> 1980, "Year" -> 2018, "Score" -> 230|>,
  <|"Name" -> "Joe", "Born" -> 1981, "Year" -> 2017, "Score" -> 70|>,
  <|"Name" -> "Joe", "Born" -> 1981, "Year" -> 2018, "Score" -> 300|>
}]

into this dataset:

Dataset[{
  <|"Name" -> "Steven", "Born" -> 1980, "Score_2017" -> 115, "Score_2018" -> 230|>,
  <|"Name" -> "Joe", "Born" -> 1981, "Score_2017" -> 70, "Score_2018" -> 300|>
}]

In R’s tidyr this operation is called pivot_wider.

Any pointers?

I know that it requires a GroupBy["Name"]

3 Answers

This is the simplest way I could think off:

ds = Dataset[{
  <|"Name" -> "Steven","Born" -> 1980, "Year" -> 2017, "Score" -> 115|>,
  <|"Name" -> "Steven", "Born" -> 1980, "Year" -> 2018, "Score" -> 230|>,
  <|"Name" -> "Joe", "Born" -> 1981, "Year" -> 2017, "Score" -> 70|>,
  <|"Name" -> "Joe", "Born" -> 1981, "Year" -> 2018, "Score" -> 300|>
}];
ds[
  GroupBy[#Name &],
  ResourceFunction["MergeByKey"][{{"Name", "Born"} -> First}]
][
  Values,
  Append[KeyDrop[#, {"Year", "Score"}],
    AssociationThread["Score_" <> ToString[#]& /@ #Year, #Score]
  ]&
]

enter image description here

If different people have different years, you might want to use KeyUnion to make the dataset square again. Documentation for MergeByKey is here

Correct answer by Sjoerd Smit on December 1, 2020

You can use the package "DataReshape.m" as shown below. (It is in my ToDo list to submit the corresponding functions to Wolfram Function Repository very soon.)

Load the package:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/DataReshape.m"]

Here is the original dataset:

dsData = Dataset[{<|"Name" -> "Steven", "Born" -> 1980, 
     "Year" -> 2017, "Score" -> 115|>, <|"Name" -> "Steven", 
     "Born" -> 1980, "Year" -> 2018, "Score" -> 230|>, <|
     "Name" -> "Joe", "Born" -> 1981, "Year" -> 2017, 
     "Score" -> 70|>, <|"Name" -> "Joe", "Born" -> 1981, 
     "Year" -> 2018, "Score" -> 300|>}];
dsData

enter image description here

Let us convert it to a “proper” long form with ID variables all except “Score”:

dsLong = ToLongForm[dsData, {"Name", "Born", "Year"}, Automatic]

enter image description here

Here we change the values of the column “Variable” to have the values of the column “Year” as suffixes:

dsLong2 = 
 dsLong[All, 
  Join[KeyTake[#, {"Name", "Born", "Value"}], <|
     "Variable" -> 
      StringRiffle[{#Variable, ToString[#Year]}, "_"]|>] &]

enter image description here

Here we convert to wide form with ID variables “Name” and “Born”:

ToWideForm[dsLong2, {"Name", "Born"}, "Variable", "Value"]

enter image description here

Answered by Anton Antonov on December 1, 2020

Both Sjoerd's and Anton's answers work nicely. Upon Sjoerd's solution using MergeByKey from the Wolfram Function Repository, I just wanted to post here for completeness that one can find almost the same solution with Merge from the Wolfram Language:

ds = Dataset[{
  <|"Name" -> "Steven", "Born" -> 1980, "Year" -> 2017, "Score" -> 115|>,
  <|"Name" -> "Steven", "Born" -> 1980, "Year" -> 2018, "Score" -> 230|>,
  <|"Name" -> "Joe",    "Born" -> 1981, "Year" -> 2017, "Score" -> 70 |>,
  <|"Name" -> "Joe",    "Born" -> 1981, "Year" -> 2018, "Score" -> 300|>
}];
ds[GroupBy["Name"], Merge[#, Identity] &][Values,
  Append[Map[First, KeyDrop[#, {"Year", "Score"}]], 
    AssociationThread["Score_" <> ToString[#] & /@ #Year, #Score]] &]

Answered by Stephan on December 1, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP