How to convert a Dataset into an indexed dataset / association-of-associations given a column header?

Question

Given a dataset as such If "letter" is the header that is chosen, how do I convert it into an indexed dataset / association-of-associations? i.e. How do I define f such that f[dataset_,columnHeader_] produces the following? Please note GroupBy is close but fails as you are unable to use Part to work with the result to extract column data. eg: data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <|"letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>}; dataDS = Dataset[data]; dataDSg= GroupBy[dataDS, Key["letter"]]; dataDSg[All, "foo"] (* <- produces an error *) Where as data in the format of an association-of-association works fine data2 = <|"a" -> <|"foo" -> 1, "bar" -> 2|>, "b" -> <|"foo" -> 3, "bar" -> 4|>, "c" -> <|"foo" -> 5, "bar" -> 6|>|>; data2DS = data2 // Dataset; data2DS [All, "foo"] (* <- returns a dataset with 1,3,5 *) Update Some timing comparisons (* make dataset to test *) colHeader = CharacterRange["a", "z"]; colHeader[[1]] = "letter"; data = RandomReal[{-1, 1}, {100000, 26}]; table = Insert[data, colHeader, 1]; dataDS = Dataset[AssociationThread[table[[1]], #] & /@ table[[2 ;;]]]; Anton Antonov answer f[ds_Dataset, ch_] := Dataset@Association@Normal@ds[All, #[ch] -> KeyDrop[#, ch] &] fAns = f[dataDS, "letter"]; // RepeatedTiming (* 0.934 *) kglr answer f0 = GroupBy[##, Association@*KeyDrop[#2]] &; f0ans = f0[dataDS, "letter"]; // RepeatedTiming (* 1.85 *) f1 = #[GroupBy[#2] /* Map[Association@*KeyDrop[#2]]] &; f1ans = f1[dataDS, "letter"]; // RepeatedTiming (* 1.714 *) Sjoerd Smit answer groupByKey[ds_, key_String] := GroupBy[ds, Function[Slot[key]] -> KeyDrop[key], First]; groupByKeyAns = groupByKey[dataDS, "letter"]; // RepeatedTiming (* 1.2 *) some other timings that don't produce an answer but help to put the times above into context GroupBy[dataDS, "letter"]; // RepeatedTiming (* 0.25 *) Dataset[Normal[dataDS]]; // RepeatedTiming (* 0.38 *)

Anton Antonov · Accepted Answer

data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <| "letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>}; dataDS = Dataset[data]; ClearAll[f]; f[ds_Dataset, ch_] := ds[Apply[Association], #[ch] -> KeyDrop[#, ch] &]; f[dataDS, "letter"] (Using the definition suggested by @WReach in the comments.) First answer data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <| "letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>}; dataDS = Dataset[data]; ClearAll[f]; f[ds_Dataset, ch_] := Dataset@Association@Normal@ds[All, #[ch] -> KeyDrop[#, ch] &]; f[dataDS, "letter"]

kglr · Answer

ClearAll[f0] f0 = GroupBy[##, Association @* KeyDrop[#2]] &; Examples: ds = Dataset @ {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <|"letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>}; Row[{ds, f0[ds, "letter"], f0[ds, "foo"], f0[ds, "bar"]}, Spacer[10]] You can also do: ClearAll[f1] f1 = #[GroupBy[#2] /* Map[Association @* KeyDrop[#2]]] &; Row[{ds, f1[ds, "letter"], f1[ds, "foo"], f1[ds, "bar"]}, Spacer[10]] and ClearAll[f2] f2 = #[GroupBy @ #2, All, First @ Normal @ Keys @ KeyDrop @ ##] &; Row[{ds, f2[ds, "letter"], f2[ds, "foo"], f2[ds, "bar"]}, Spacer[10]]

Sjoerd Smit · Answer

Related to kglr's answer, here's a slight variation:

ds = Dataset @ {
    <|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, 
    <|"letter" -> "b",  "foo" -> 3, "bar" -> 4|>,
    <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>
};
groupByKey[ds_, key_String] := GroupBy[ds, Function[Slot[key]] -> KeyDrop[key], First];
groupByKey[ds, "letter"]

Of course, you have to be confident that the key values you're grouping by is actually unique, otherwise you'll be dropping rows.

How to convert a Dataset into an indexed dataset / association-of-associations given a column header?

3 Answers

First answer

Add your own answers!

Ask a Question