TransWikia.com

Apply a function to each row of a Dataset

Mathematica Asked by user6546 on February 11, 2021

Have a Dataset which is derived from the output from another program.
Have written some functions to retrieve and format this data.
Can make this work as intended by using Table to apply the function to each row of the Dataset.
But cannot achieve the same result when attempting to use some of the built in capabilities of Dataset.
Can someone point me in the right direction?

Below is the statement that works with Table but doesn’t work with the alternate syntax.
Both lines are intended to apply the function dsGetValueList to each row of dsApples.

dsAllAppleParamValues = Table[dsGetValueList[dsAllApples[i], dsApplesAllParams],
                              {i, 1, Length@dsAllApples}]; 

dsAllAppleParamValues2 =
dsAllApples[All, dsGetValueList[#, dsApplesAllParams] &] // Normal;

The structure of the Dataset might be non-standard, but it is derived from another program and that can’t be changed. Further background: the source file is a JSON file and that can be Import-ed with the option "RawJSON" to obtain a Dataset.

Code for a test case below. In summary the code changes data like this:

Dataset content

to this:

Results after processing

(*sample data*)
item01 = <| "name" -> "item01", "class" -> "apples" , 
  "params" -> {<| "name" -> "TYPE", "value" -> "fuji"|>
    , <| "name" -> "WEIGHT", "value" -> "0.5"|>
    , <| "name" -> "COLOR", "value" -> "red"|>
     }|>
item02 = <| "name" -> "item02", "class" -> "apples" , 
   "params" -> {<| "name" -> "TYPE", "value" -> "gala"|>
     , <| "name" -> "COLOR", "value" -> "red"|>
     , <| "name" -> "EXP_DATE", "value" -> "10/10/20"|>
     , <| "name" -> "WEIGHT", "value" -> "1.5"|>
      }|>;
item03 = <| "name" -> "item03", "class" -> "apples" , 
   "params" -> {<| "name" -> "TYPE", "value" -> "granny"|>
     , <| "name" -> "COLOR", "value" -> "green"|>
      }|>;
item04 =  <| "name" -> "item04", "class" -> "oranges" , 
   "params" -> {<| "name" -> "TYPE", "value" -> "navwl"|>
     , <| "name" -> "WEIGHT", "value" -> "3.5"|>
     , <| "name" -> "EXP_DATE", "value" -> "09/10/20"|>
      }|>;
item05 =  <| "name" -> "item05", "class" -> "oranges" , 
   "params" -> {<| "name" -> "TYPE", "value" -> "seville"|>
     , <| "name" -> "WEIGHT", "value" -> "1.5"|>
     , <| "name" -> "EXP_DATE", "value" -> "09/10/20"|>
      }|>;
dsAll = Dataset[{item01, item02, item03, item04, item05}];

(*useful functions*)
dsGetName[ds_] := ds["name"]
dsGetValue[ds_, pName_] :=  Module[{paramDS, valueList},
  paramDS =  ds["params"] ;
  valueList = Normal@paramDS[Select[#name == pName &] , "value"];
  If[Length[valueList] > 0, First[valueList], "-"]
  ]
dsGetValueList[ds_, pList_List] := 
 Module[{}, dsGetValue[ds, #] & /@ pList]

(*retrieve metadata about apples: their names and parameters*)
dsAllApples =  dsAll[Select[#class == "apples" &] ]
dsAllAppleNames = dsAllApples[All, dsGetName]  // Normal;
dsApplesAllParams = 
  dsAllApples[All, "params", All, "name"] // Normal // Flatten // 
   Union;
(*retrieve parameter values for each apple, there may be missing values*)
(**-- the first statement works as intended*)
(* -- second statement does not*)
dsAllAppleParamValues = 
  Table[dsGetValueList[dsAllApples[i], dsApplesAllParams], {i, 1, 
    Length@dsAllApples}];
dsAllAppleParamValues2 = 
  dsAllApples[All, dsGetValueList[#, dsApplesAllParams] &] // Normal;
Equal[dsAllAppleParamValues2, dsAllAppleParamValues]
(*format results*)
r1 = Prepend[Transpose[dsAllAppleParamValues], dsAllAppleNames] // 
   Transpose ;
TableForm[r1, 
 TableHeadings -> {None, Prepend[dsApplesAllParams, "Name"]}]

3 Answers

This is quite a bit awkward, but perhaps you can use this as a starting point:

dsApples = dsAll[Select[#class === "apples" &], {"name", "params"}];

tmp = Join[dsApples[All, Key["name"] /* <|"Name" -> Identity|>], 
           Dataset[KeyUnion[(Apply[AssociationThread] @* Transpose) /@ 
                            Normal[dsApples[All, Lookup["params"] /* Values]],
                            Missing[] &]], 2];

tmp[All, {"Name", "COLOR", "EXP_DATE", "TYPE", "WEIGHT"}]

new Dataset

I'll leave the reformatting to a TableForm[] object up to you.

Answered by J. M.'s ennui on February 11, 2021

The difference between your two approaches is that in the first version, extracting parts of a dataset returns the part wrapped in Dataset while using the second approach, the part is not wrapped in Dataset. So, you can just add the Dataset wrapper yourself with:

dsAllAppleParamValues2 = dsAllApples[
    All,
    dsGetValueList[Dataset@#, dsApplesAllParams]&
] //Normal;

dsAllAppleParamValues == dsAllAppleParamValues2

True

That being said, the version without the Dataset head is probably easier to work with, so I would modify your dsGetValueList function to work with non-Dataset objects (in this case, just an Association).

Answered by Carl Woll on February 11, 2021

Here is a way that generates the columns in the order that they occur in the original dataset:

dsAll[
  Select[#class==="apples"&] /* KeyUnion
, <| "Name" -> #name, #name -> #value& /@ #params |>&
]

dataset screenshot

If the exact order of the columns is important, an additional re-ordering stage can be added:

dsAll[
  Select[#class==="apples"&] /* KeyUnion
, <| "Name" -> #name, #name -> #value& /@ #params |>&
][All, {"Name", "COLOR", "EXP_DATE", "TYPE", "WEIGHT"}]

dataset screenshot, columns reordered

Answered by WReach on February 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP