Mathematica Asked by IntroductionToProbability on January 24, 2021
Given a dataset as such
If "letter"
is the header that is chosen, how do I convert it into an indexed dataset / association-of-associations?
i.e. How do I define f
such that f[dataset_,columnHeader_]
produces the following?
Please note GroupBy
is close but fails as you are unable to use Part
to work with the result to extract column data. eg:
data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <|"letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>};
dataDS = Dataset[data];
dataDSg= GroupBy[dataDS, Key["letter"]];
dataDSg[All, "foo"] (* <- produces an error *)
Where as data in the format of an association-of-association works fine
data2 = <|"a" -> <|"foo" -> 1, "bar" -> 2|>, "b" -> <|"foo" -> 3, "bar" -> 4|>, "c" -> <|"foo" -> 5, "bar" -> 6|>|>;
data2DS = data2 // Dataset;
data2DS [All, "foo"] (* <- returns a dataset with 1,3,5 *)
Update
Some timing comparisons
(* make dataset to test *)
colHeader = CharacterRange["a", "z"];
colHeader[[1]] = "letter";
data = RandomReal[{-1, 1}, {100000, 26}];
table = Insert[data, colHeader, 1];
dataDS = Dataset[AssociationThread[table[[1]], #] & /@ table[[2 ;;]]];
Anton Antonov answer
f[ds_Dataset, ch_] := Dataset@Association@Normal@ds[All, #[ch] -> KeyDrop[#, ch] &]
fAns = f[dataDS, "letter"]; // RepeatedTiming (* 0.934 *)
kglr answer
f0 = GroupBy[##, Association@*KeyDrop[#2]] &;
f0ans = f0[dataDS, "letter"]; // RepeatedTiming (* 1.85 *)
f1 = #[GroupBy[#2] /* Map[Association@*KeyDrop[#2]]] &;
f1ans = f1[dataDS, "letter"]; // RepeatedTiming (* 1.714 *)
Sjoerd Smit answer
groupByKey[ds_, key_String] := GroupBy[ds, Function[Slot[key]] -> KeyDrop[key], First];
groupByKeyAns = groupByKey[dataDS, "letter"]; // RepeatedTiming (* 1.2 *)
some other timings that don’t produce an answer but help to put the times above into context
GroupBy[dataDS, "letter"]; // RepeatedTiming (* 0.25 *)
Dataset[Normal[dataDS]]; // RepeatedTiming (* 0.38 *)
data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <|
"letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c",
"foo" -> 5, "bar" -> 6|>};
dataDS = Dataset[data];
ClearAll[f];
f[ds_Dataset, ch_] := ds[Apply[Association], #[ch] -> KeyDrop[#, ch] &];
f[dataDS, "letter"]
(Using the definition suggested by @WReach in the comments.)
data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <|
"letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c",
"foo" -> 5, "bar" -> 6|>};
dataDS = Dataset[data];
ClearAll[f];
f[ds_Dataset, ch_] := Dataset@Association@Normal@ds[All, #[ch] -> KeyDrop[#, ch] &];
f[dataDS, "letter"]
Correct answer by Anton Antonov on January 24, 2021
ClearAll[f0]
f0 = GroupBy[##, Association @* KeyDrop[#2]] &;
Examples:
ds = Dataset @ {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>,
<|"letter" -> "b", "foo" -> 3, "bar" -> 4|>,
<|"letter" -> "c", "foo" -> 5, "bar" -> 6|>};
Row[{ds, f0[ds, "letter"], f0[ds, "foo"], f0[ds, "bar"]}, Spacer[10]]
You can also do:
ClearAll[f1]
f1 = #[GroupBy[#2] /* Map[Association @* KeyDrop[#2]]] &;
Row[{ds, f1[ds, "letter"], f1[ds, "foo"], f1[ds, "bar"]}, Spacer[10]]
and
ClearAll[f2]
f2 = #[GroupBy @ #2, All, First @ Normal @ Keys @ KeyDrop @ ##] &;
Row[{ds, f2[ds, "letter"], f2[ds, "foo"], f2[ds, "bar"]}, Spacer[10]]
Answered by kglr on January 24, 2021
Related to kglr's answer, here's a slight variation:
ds = Dataset @ {
<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>,
<|"letter" -> "b", "foo" -> 3, "bar" -> 4|>,
<|"letter" -> "c", "foo" -> 5, "bar" -> 6|>
};
groupByKey[ds_, key_String] := GroupBy[ds, Function[Slot[key]] -> KeyDrop[key], First];
groupByKey[ds, "letter"]
Of course, you have to be confident that the key values you're grouping by is actually unique, otherwise you'll be dropping rows.
Answered by Sjoerd Smit on January 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP