Mathematica Asked by George Ellis on April 3, 2021
I want to create a Query to a Dataset that will select a subset of three columns, modify the date column in this subset (to month), group by month and a second column (location), take the group means of the third column (temperature) and create a single DateListPlot displaying the multiple lines for the date/location temperature vectors.
I am able to do this in multiple steps; but creating one query which combines a "descending" subquery and an "ascending" summary query eludes me. A very reduced Dataset is included below along with my current code attempt. The first query returns a Dataset without named columns and so the next Query uses position "values".
ds1 = Query[
All, {Replace[#date, #date -> DateObject[#date, "Month"]] &,
"temperature", "temp_value"}][energyDS];
ds2b = Query[GroupBy[#[[2]] &], GroupBy[#[[1]] &], Mean][ds1];
DateListPlot[ds2b[#, All, 3] & /@ Keys[ds2b],
PlotLegends -> Normal[Keys[ds2b]]]
I would appreciate assistance with the code and, if possible some insight into combining subqueries into queries — beyond that provided by the sparse examples in Help documentation. I should also add that this code runs slow as molasses on the actual dataset of 1.6 million records.
energyDS =
Dataset[{<|"date" -> "2016-01-11 17:00:00", "Appliances" -> 60,
"lights" -> 30, "T_out" -> 6.6`, "Press_mm_hg" -> 733.5`,
"RH_out" -> 92, "Windspeed" -> 7, "Visibility" -> 63,
"Tdewpoint" -> 5.3`, "rv1" -> 13.275433157104999`,
"rv2" -> 13.275433157104999`, "temperature" -> "kitchen",
"temp_value" -> 19.89`, "humidity" -> "kitchen",
"hum_value" -> 47.5966666666667`|>, <|"date" ->
"2016-01-15 13:30:00", "Appliances" -> 190, "lights" -> 0,
"T_out" -> 4.05`, "Press_mm_hg" -> 755, "RH_out" -> 86.5`,
"Windspeed" -> 8, "Visibility" -> 45, "Tdewpoint" -> 2,
"rv1" -> 19.12523116916418`, "rv2" -> 19.12523116916418`,
"temperature" -> "laundry", "temp_value" -> 20.5`,
"humidity" -> "living",
"hum_value" -> 38.015`|>, <|"date" -> "2016-01-19 10:10:00",
"Appliances" -> 50, "lights" -> 0, "T_out" -> -3.48333333333333`,
"Press_mm_hg" -> 757.316666666667`,
"RH_out" -> 89.3333333333333`, "Windspeed" -> 1,
"Visibility" -> 62.3333333333333`, "Tdewpoint" -> -5.05`,
"rv1" -> 34.51393429422751`, "rv2" -> 34.51393429422751`,
"temperature" -> "kitchen", "temp_value" -> 18.6`,
"humidity" -> "laundry",
"hum_value" -> 39.79`|>, <|"date" -> "2016-01-23 06:40:00",
"Appliances" -> 40, "lights" -> 0, "T_out" -> 5.9`,
"Press_mm_hg" -> 767.133333333333`,
"RH_out" -> 99.3333333333333`, "Windspeed" -> 4,
"Visibility" -> 29, "Tdewpoint" -> 5.76666666666667`,
"rv1" -> 14.32367623783648`, "rv2" -> 14.32367623783648`,
"temperature" -> "laundry", "temp_value" -> 17.6`,
"humidity" -> "office",
"hum_value" -> 42.59`|>, <|"date" -> "2016-01-27 03:20:00",
"Appliances" -> 20, "lights" -> 0, "T_out" -> 10.1`,
"Press_mm_hg" -> 758.4`, "RH_out" -> 80.3333333333333`,
"Windspeed" -> 10, "Visibility" -> 40, "Tdewpoint" -> 6.8`,
"rv1" -> 37.99445059848949`, "rv2" -> 37.99445059848949`,
"temperature" -> "kitchen", "temp_value" -> 20.6`,
"humidity" -> "bathroom",
"hum_value" -> 58.5733333333333`|>, <|"date" ->
"2016-01-30 23:50:00", "Appliances" -> 40, "lights" -> 0,
"T_out" -> 4.41666666666667`, "Press_mm_hg" -> 754.7`,
"RH_out" -> 87.1666666666667`, "Windspeed" -> 5,
"Visibility" -> 28.8333333333333`,
"Tdewpoint" -> 2.43333333333333`, "rv1" -> 6.051994732115418`,
"rv2" -> 6.051994732115418`, "temperature" -> "laundry",
"temp_value" -> 20.79`, "humidity" -> "north",
"hum_value" -> 99.3`|>, <|"date" -> "2016-02-03 20:30:00",
"Appliances" -> 130, "lights" -> 20, "T_out" -> 5,
"Press_mm_hg" -> 764.15`, "RH_out" -> 82, "Windspeed" -> 3,
"Visibility" -> 40, "Tdewpoint" -> 2.1`,
"rv1" -> 16.25068896682933`, "rv2" -> 16.25068896682933`,
"temperature" -> "kitchen", "temp_value" -> 22.6`,
"humidity" -> "ironing",
"hum_value" -> 35.3327777777778`|>, <|"date" ->
"2016-02-07 17:00:00", "Appliances" -> 100, "lights" -> 20,
"T_out" -> 8.2`, "Press_mm_hg" -> 747.3`, "RH_out" -> 66,
"Windspeed" -> 8, "Visibility" -> 40, "Tdewpoint" -> 2.2`,
"rv1" -> 5.914690508507192`, "rv2" -> 5.914690508507192`,
"temperature" -> "laundry", "temp_value" -> 21.5`,
"humidity" -> "teenager",
"hum_value" -> 46.7355555555556`|>, <|"date" ->
"2016-02-11 13:40:00", "Appliances" -> 80, "lights" -> 20,
"T_out" -> 5.06666666666667`, "Press_mm_hg" -> 749,
"RH_out" -> 85.6666666666667`, "Windspeed" -> 5,
"Visibility" -> 35, "Tdewpoint" -> 2.83333333333333`,
"rv1" -> 10.903250332921743`, "rv2" -> 10.903250332921743`,
"temperature" -> "kitchen", "temp_value" -> 20.5`,
"humidity" -> "parents",
"hum_value" -> 41.6633333333333`|>, <|"date" ->
"2016-02-15 10:20:00", "Appliances" -> 740, "lights" -> 20,
"T_out" -> 3.06666666666667`, "Press_mm_hg" -> 757.666666666667`,
"RH_out" -> 74.6666666666667`, "Windspeed" -> 6,
"Visibility" -> 40, "Tdewpoint" -> -1.06666666666667`,
"rv1" -> 1.7749762744642794`, "rv2" -> 1.7749762744642794`,
"temperature" -> "kitchen", "temp_value" -> 19.5`,
"humidity" -> "kitchen",
"hum_value" -> 42.1333333333333`|>, <|"date" ->
"2016-02-19 06:50:00", "Appliances" -> 50, "lights" -> 0,
"T_out" -> -0.9`, "Press_mm_hg" -> 759.633333333333`,
"RH_out" -> 99, "Windspeed" -> 2,
"Visibility" -> 45.6666666666667`,
"Tdewpoint" -> -1.08333333333333`, "rv1" -> 17.355701071210206`,
"rv2" -> 17.355701071210206`, "temperature" -> "laundry",
"temp_value" -> 20.1`, "humidity" -> "living",
"hum_value" -> 37.5675`|>, <|"date" -> "2016-02-23 03:30:00",
"Appliances" -> 60, "lights" -> 0, "T_out" -> 3.75`,
"Press_mm_hg" -> 753.8`, "RH_out" -> 95.5`, "Windspeed" -> 1.5`,
"Visibility" -> 26.5`, "Tdewpoint" -> 3.05`,
"rv1" -> 40.263680985663086`, "rv2" -> 40.263680985663086`,
"temperature" -> "kitchen", "temp_value" -> 21,
"humidity" -> "laundry",
"hum_value" -> 42.59`|>, <|"date" -> "2016-02-27 00:00:00",
"Appliances" -> 50, "lights" -> 0, "T_out" -> 1.7`,
"Press_mm_hg" -> 751, "RH_out" -> 85, "Windspeed" -> 2,
"Visibility" -> 20, "Tdewpoint" -> -0.6`,
"rv1" -> 22.86010766401887`, "rv2" -> 22.86010766401887`,
"temperature" -> "laundry", "temp_value" -> 20.5`,
"humidity" -> "office",
"hum_value" -> 35.2`|>, <|"date" -> "2016-03-01 20:40:00",
"Appliances" -> 80, "lights" -> 20, "T_out" -> 7,
"Press_mm_hg" -> 751.766666666667`, "RH_out" -> 96,
"Windspeed" -> 8, "Visibility" -> 55.6666666666667`,
"Tdewpoint" -> 6.4`, "rv1" -> 27.875589963514358`,
"rv2" -> 27.875589963514358`, "temperature" -> "kitchen",
"temp_value" -> 21.5`, "humidity" -> "bathroom",
"hum_value" -> 44.6633333333333`|>, <|"date" ->
"2016-03-05 17:10:00", "Appliances" -> 70, "lights" -> 0,
"T_out" -> 5.76666666666667`, "Press_mm_hg" -> 743.05`,
"RH_out" -> 67.3333333333333`, "Windspeed" -> 2.16666666666667`,
"Visibility" -> 40, "Tdewpoint" -> 0.0666666666666667`,
"rv1" -> 20.385111880023032`, "rv2" -> 20.385111880023032`,
"temperature" -> "laundry", "temp_value" -> 21.86`,
"humidity" -> "north",
"hum_value" -> 51.5666666666667`|>, <|"date" ->
"2016-03-09 13:50:00", "Appliances" -> 80, "lights" -> 10,
"T_out" -> 7.16666666666667`, "Press_mm_hg" -> 744.133333333333`,
"RH_out" -> 64.3333333333333`, "Windspeed" -> 9.83333333333333`,
"Visibility" -> 40, "Tdewpoint" -> 0.75`,
"rv1" -> 37.6603338168934`, "rv2" -> 37.6603338168934`,
"temperature" -> "kitchen", "temp_value" -> 19.4633333333333`,
"humidity" -> "ironing",
"hum_value" -> 31.2`|>, <|"date" -> "2016-03-13 10:20:00",
"Appliances" -> 100, "lights" -> 0, "T_out" -> 3.13333333333333`,
"Press_mm_hg" -> 769.7`, "RH_out" -> 76.6666666666667`,
"Windspeed" -> 6.33333333333333`,
"Visibility" -> 49.6666666666667`,
"Tdewpoint" -> -0.666666666666667`, "rv1" -> 41.63221240742132`,
"rv2" -> 41.63221240742132`, "temperature" -> "laundry",
"temp_value" -> 20, "humidity" -> "teenager",
"hum_value" -> 38.13`|>, <|"date" -> "2016-03-17 07:00:00",
"Appliances" -> 50, "lights" -> 0, "T_out" -> -0.4`,
"Press_mm_hg" -> 766.3`, "RH_out" -> 87, "Windspeed" -> 1,
"Visibility" -> 63, "Tdewpoint" -> -2.4`,
"rv1" -> 3.332387760747224`, "rv2" -> 3.332387760747224`,
"temperature" -> "kitchen", "temp_value" -> 20.6666666666667`,
"humidity" -> "parents",
"hum_value" -> 39.3266666666667`|>, <|"date" ->
"2016-03-21 03:40:00", "Appliances" -> 50, "lights" -> 0,
"T_out" -> 4.7`, "Press_mm_hg" -> 761.1`,
"RH_out" -> 95.3333333333333`, "Windspeed" -> 1,
"Visibility" -> 49.3333333333333`,
"Tdewpoint" -> 4.03333333333333`, "rv1" -> 3.2356246723793447`,
"rv2" -> 3.2356246723793447`, "temperature" -> "kitchen",
"temp_value" -> 21.7`, "humidity" -> "kitchen",
"hum_value" -> 37.4`|>, <|"date" -> "2016-03-25 00:10:00",
"Appliances" -> 60, "lights" -> 0, "T_out" -> 6.3`,
"Press_mm_hg" -> 755.666666666667`, "RH_out" -> 96,
"Windspeed" -> 3, "Visibility" -> 43.6666666666667`,
"Tdewpoint" -> 5.7`, "rv1" -> 28.789900441188365`,
"rv2" -> 28.789900441188365`, "temperature" -> "laundry",
"temp_value" -> 22, "humidity" -> "living",
"hum_value" -> 41.9333333333333`|>, <|"date" ->
"2016-03-28 20:50:00", "Appliances" -> 90, "lights" -> 0,
"T_out" -> 8.16666666666667`, "Press_mm_hg" -> 744.333333333333`,
"RH_out" -> 77.8333333333333`, "Windspeed" -> 3.33333333333333`,
"Visibility" -> 40, "Tdewpoint" -> 4.51666666666667`,
"rv1" -> 5.767669249325991`, "rv2" -> 5.767669249325991`,
"temperature" -> "kitchen", "temp_value" -> 23.39`,
"humidity" -> "laundry",
"hum_value" -> 38.5`|>, <|"date" -> "2016-04-01 17:20:00",
"Appliances" -> 50, "lights" -> 0, "T_out" -> 10.4333333333333`,
"Press_mm_hg" -> 759.933333333333`,
"RH_out" -> 59.6666666666667`, "Windspeed" -> 2.66666666666667`,
"Visibility" -> 40, "Tdewpoint" -> 2.86666666666667`,
"rv1" -> 32.87173660937697`, "rv2" -> 32.87173660937697`,
"temperature" -> "laundry", "temp_value" -> 22.39`,
"humidity" -> "office",
"hum_value" -> 36.79`|>, <|"date" -> "2016-04-05 14:00:00",
"Appliances" -> 270, "lights" -> 10, "T_out" -> 11.6`,
"Press_mm_hg" -> 751, "RH_out" -> 73, "Windspeed" -> 3,
"Visibility" -> 29, "Tdewpoint" -> 6.9`,
"rv1" -> 13.358150830026716`, "rv2" -> 13.358150830026716`,
"temperature" -> "kitchen", "temp_value" -> 22.1333333333333`,
"humidity" -> "bathroom",
"hum_value" -> 45.3`|>, <|"date" -> "2016-04-09 10:30:00",
"Appliances" -> 390, "lights" -> 0, "T_out" -> 9.8`,
"Press_mm_hg" -> 750.35`, "RH_out" -> 69, "Windspeed" -> 4.5`,
"Visibility" -> 32.5`, "Tdewpoint" -> 4.35`,
"rv1" -> 42.310866445768625`, "rv2" -> 42.310866445768625`,
"temperature" -> "laundry", "temp_value" -> 22.1`,
"humidity" -> "north",
"hum_value" -> 18.1666666666667`|>, <|"date" ->
"2016-04-13 07:10:00", "Appliances" -> 60, "lights" -> 0,
"T_out" -> 5.08333333333333`, "Press_mm_hg" -> 750.266666666667`,
"RH_out" -> 93.5`, "Windspeed" -> 1.33333333333333`,
"Visibility" -> 40, "Tdewpoint" -> 4.15`,
"rv1" -> 4.957313183695078`, "rv2" -> 4.957313183695078`,
"temperature" -> "kitchen", "temp_value" -> 22,
"humidity" -> "ironing",
"hum_value" -> 33.9`|>, <|"date" -> "2016-04-17 03:40:00",
"Appliances" -> 60, "lights" -> 0, "T_out" -> 1.46666666666667`,
"Press_mm_hg" -> 751.566666666667`, "RH_out" -> 97,
"Windspeed" -> 1, "Visibility" -> 63,
"Tdewpoint" -> 1.03333333333333`, "rv1" -> 39.543289749417454`,
"rv2" -> 39.543289749417454`, "temperature" -> "laundry",
"temp_value" -> 23.7`, "humidity" -> "teenager",
"hum_value" -> 40.53`|>, <|"date" -> "2016-04-21 00:20:00",
"Appliances" -> 60, "lights" -> 0, "T_out" -> 7.96666666666667`,
"Press_mm_hg" -> 764.5`, "RH_out" -> 65, "Windspeed" -> 4,
"Visibility" -> 40, "Tdewpoint" -> 1.7`,
"rv1" -> 36.77555826725438`, "rv2" -> 36.77555826725438`,
"temperature" -> "kitchen", "temp_value" -> 22.1`,
"humidity" -> "parents",
"hum_value" -> 37.73`|>, <|"date" -> "2016-04-24 21:00:00",
"Appliances" -> 90, "lights" -> 0, "T_out" -> 4.1`,
"Press_mm_hg" -> 758, "RH_out" -> 82, "Windspeed" -> 3,
"Visibility" -> 40, "Tdewpoint" -> 1.2`,
"rv1" -> 10.66819637781009`, "rv2" -> 10.66819637781009`,
"temperature" -> "kitchen", "temp_value" -> 21.9266666666667`,
"humidity" -> "kitchen",
"hum_value" -> 35.5`|>, <|"date" -> "2016-04-28 17:30:00",
"Appliances" -> 230, "lights" -> 0, "T_out" -> 9.85`,
"Press_mm_hg" -> 756.1`, "RH_out" -> 50.5`, "Windspeed" -> 3.5`,
"Visibility" -> 40, "Tdewpoint" -> 0, "rv1" -> 29.4617329724133`,
"rv2" -> 29.4617329724133`, "temperature" -> "laundry",
"temp_value" -> 21.5`, "humidity" -> "living",
"hum_value" -> 31.39`|>, <|"date" -> "2016-05-02 14:10:00",
"Appliances" -> 80, "lights" -> 0, "T_out" -> 16.1833333333333`,
"Press_mm_hg" -> 762.516666666667`, "RH_out" -> 34.5`,
"Windspeed" -> 3, "Visibility" -> 29.1666666666667`,
"Tdewpoint" -> 0.483333333333333`, "rv1" -> 40.099792391993105`,
"rv2" -> 40.099792391993105`, "temperature" -> "kitchen",
"temp_value" -> 22.4633333333333`, "humidity" -> "laundry",
"hum_value" -> 35.4`|>, <|"date" -> "2016-05-06 10:40:00",
"Appliances" -> 70, "lights" -> 0, "T_out" -> 17.4666666666667`,
"Press_mm_hg" -> 754.4`, "RH_out" -> 51.6666666666667`,
"Windspeed" -> 3, "Visibility" -> 40,
"Tdewpoint" -> 7.33333333333333`, "rv1" -> 2.572263346519321`,
"rv2" -> 2.572263346519321`, "temperature" -> "laundry",
"temp_value" -> 23.7`, "humidity" -> "office",
"hum_value" -> 35.79`|>, <|"date" -> "2016-05-10 07:20:00",
"Appliances" -> 50, "lights" -> 0, "T_out" -> 15.2666666666667`,
"Press_mm_hg" -> 751, "RH_out" -> 92.3333333333333`,
"Windspeed" -> 3, "Visibility" -> 40,
"Tdewpoint" -> 13.9666666666667`, "rv1" -> 5.569597787689418`,
"rv2" -> 5.569597787689418`, "temperature" -> "kitchen",
"temp_value" -> 24.89`, "humidity" -> "bathroom",
"hum_value" -> 57.2633333333333`|>, <|"date" ->
"2016-05-14 03:50:00", "Appliances" -> 60, "lights" -> 0,
"T_out" -> 8.85`, "Press_mm_hg" -> 754.25`,
"RH_out" -> 78.1666666666667`, "Windspeed" -> 3.66666666666667`,
"Visibility" -> 24.6666666666667`,
"Tdewpoint" -> 5.16666666666667`, "rv1" -> 37.84072716953233`,
"rv2" -> 37.84072716953233`, "temperature" -> "laundry",
"temp_value" -> 24.79`, "humidity" -> "north",
"hum_value" -> 21.3633333333333`|>, <|"date" ->
"2016-05-18 00:30:00", "Appliances" -> 50, "lights" -> 0,
"T_out" -> 12.4`, "Press_mm_hg" -> 756.05`, "RH_out" -> 76,
"Windspeed" -> 2, "Visibility" -> 33, "Tdewpoint" -> 8.2`,
"rv1" -> 3.8205624907277524`, "rv2" -> 3.8205624907277524`,
"temperature" -> "kitchen", "temp_value" -> 23.5`,
"humidity" -> "ironing",
"hum_value" -> 40.7`|>, <|"date" -> "2016-05-21 21:00:00",
"Appliances" -> 100, "lights" -> 10, "T_out" -> 18.8`,
"Press_mm_hg" -> 753.1`, "RH_out" -> 76, "Windspeed" -> 2,
"Visibility" -> 40, "Tdewpoint" -> 14.4`,
"rv1" -> 35.10843818075955`, "rv2" -> 35.10843818075955`,
"temperature" -> "laundry", "temp_value" -> 26.612`,
"humidity" -> "teenager",
"hum_value" -> 49.96`|>, <|"date" -> "2016-05-25 17:40:00",
"Appliances" -> 160, "lights" -> 0, "T_out" -> 16.3333333333333`,
"Press_mm_hg" -> 756.133333333333`,
"RH_out" -> 54.3333333333333`, "Windspeed" -> 1.66666666666667`,
"Visibility" -> 35.6666666666667`,
"Tdewpoint" -> 7.06666666666667`, "rv1" -> 16.66860954137519`,
"rv2" -> 16.66860954137519`, "temperature" -> "kitchen",
"temp_value" -> 24.5`, "humidity" -> "parents",
"hum_value" -> 37.3333333333333`|>}];
This does what I want (using Composition).
dsQuery =
Query[Query[GroupBy[#[[2]] &], GroupBy[#[[1]] &], Mean] @*
Query[All, {Replace[#date, #date ->
DateObject[#date, "Month"]] &, "temperature",
"temp_value"}]][energyDS];
Or using Right Composition
Query[Query[
All, {Replace[#date, #date -> DateObject[#date, "Month"]] &,
"temperature", "temp_value"}] /*
Query[GroupBy[#[[2]] &], GroupBy[#[[1]] &], Mean]][energyDS]
DateListPlot[dsQuery[#, All, 3] & /@ Keys[dsQuery],
PlotLegends -> Normal[Keys[dsQuery]]]
So is this how subqueries are embedded in Query?
And, it still is horribly slow. Is there a better way to handle the Replace function?
Answered by George Ellis on April 3, 2021
I try to give more detailed answer. From the comments I understood what the main problem is and will try to give the optimal code for handling string dates here.
Special function for getting month as DateObject
with memoization
toMonthMem[s_] := toMonthMem[s] =
DateObject[Map[ToExpression] @ StringSplit[s, "-"]];
toMonth[s_] :=
toMonthMem[StringTake[s, 7]];
And try to apply this function to the dataset
AbsoluteTiming[Query[All, {"date" -> toMonth}] @ energyDS;]
(*Out[..] := {0.0026185, Null}*)
For the perfomance testing we can create dataset with a large number of random dates
randDateString :=
DateString[
RandomInteger[Round[AbsoluteTime[]]],
{"Year", "-", "Month", "-", "Day", " ", "Hour", ":", "Minute", ":", "Second"}
]
datasetDatesTest =
Table[Prepend[Rest @ First @ Normal @ energyDS, "date" -> randDateString], {16000}];
AbsoluteTiming[Query[All, {"date" -> toMonth}] @ datasetDatesTest;]
(* Out[..] := {0.0838846, Null}*)
```
Answered by Kirill Belov on April 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP