TransWikia.com

XML, select several values

Mathematica Asked by Michiel van Mens on December 29, 2020

I import a XML file, using:

xmldatatest = 
 Import["C:......name.xml", {"XML", "XMLObject"}, 
  "ReadDTD" -> False]

The file looks like (in real, the file is much longer):

> <?xml version="1.0" encoding="UTF-8"?> <TimeSeries
> xmlns="http://www.wldelft.nl/fews/PI"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://www.wldelft.nl/fews/PI
> http://fews.wldelft.nl/schemas/version1.0/pi-schemas/pi_timeseries.xsd"
> version="1.23" xmlns:fs="http://www.wldelft.nl/fews/fs">
>     <timeZone>1.0</timeZone>
>     <series>
>         <header>
>             <type>instantaneous</type>
>             <moduleInstanceId>ImportGWNET</moduleInstanceId>
>             <locationId>20793_WaterDiepte_WdH</locationId>
>             <parameterId>WaterDiepte_WdH</parameterId>
>             <timeStep unit="nonequidistant"/>
>             <startDate date="2020-06-30" time="12:00:00"/>
>             <endDate date="2020-07-30" time="12:03:00"/>
>             <missVal>NaN</missVal>
>             <stationName>A010008A_WaterDiepte_WdH</stationName>
>             <lat>52.41827906418918</lat>
>             <lon>4.736055394401724</lon>
>             <x>110702.58</x>
>             <y>492473.03</y>
>             <units>m/nap</units>
>         </header>
>         <event date="2020-06-30" time="12:00:00" value="-3.72" flag="0" fs:V1="V"/>
>         <event date="2020-07-30" time="12:03:00" value="-2.57" flag="0" fs:V1="V"/>
>     </series>
>     <series>
>         <header>
>             <type>instantaneous</type>
>             <moduleInstanceId>ImportGWNET</moduleInstanceId>
>             <locationId>20794_WaterDiepte_WdH</locationId>
>             <parameterId>WaterDiepte_WdH</parameterId>
>             <timeStep unit="nonequidistant"/>
>             <startDate date="2020-06-30" time="12:00:00"/>
>             <endDate date="2020-07-30" time="12:09:00"/>
>             <missVal>NaN</missVal>
>             <stationName>A010009A_WaterDiepte_WdH</stationName>
>             <lat>52.417894313462774</lat>
>             <lon>4.73618879136875</lon>
>             <x>110711.27</x>
>             <y>492430.14</y>
>             <units>m/nap</units>
>         </header>
>         <event date="2020-06-30" time="12:00:00" value="-3.56" flag="0" fs:V1="V"/>
>         <event date="2020-07-30" time="12:09:00" value="-2.59" flag="0" fs:V1="V"/>
>     </series>
>     <series>
>         <header>
>             <type>instantaneous</type>
>             <moduleInstanceId>ImportGWNET</moduleInstanceId>
>             <locationId>20795_WaterDiepte_WdH</locationId>
>             <parameterId>WaterDiepte_WdH</parameterId>
>             <timeStep unit="nonequidistant"/>
>             <startDate date="2020-06-30" time="12:00:00"/>
>             <endDate date="2020-07-30" time="12:20:00"/>
>             <missVal>NaN</missVal>
>             <stationName>A010010A_WaterDiepte_WdH</stationName>
>             <lat>52.417453753682466</lat>
>             <lon>4.736404882357068</lon>
>             <x>110725.53</x>
>             <y>492380.99</y>
>             <units>m/nap</units>
>         </header>
>         <event date="2020-06-30" time="12:00:00" value="-3.38" flag="0" fs:V1="V"/>
>         <event date="2020-07-30" time="12:20:00" value="-2.28" flag="0" fs:V1="V"/>
>     </series>
>     <series>
>         <header>
>             <type>instantaneous</type>
>             <moduleInstanceId>ImportGWNET</moduleInstanceId>
>             <locationId>12134_WaterDiepte_WdH</locationId>
>             <parameterId>WaterDiepte_WdH</parameterId>
>             <timeStep unit="nonequidistant"/>
>             <startDate date="2016-02-15" time="12:00:00"/>
>             <endDate date="2020-07-21" time="12:55:00"/>
>             <missVal>NaN</missVal>
>             <stationName>M2-0609O_WaterDiepte_WdH</stationName>
>             <lat>52.34412530887633</lat>
>             <lon>4.863795961286827</lon>
>             <x>119333.0</x>
>             <y>484152.0</y>
>             <units>m/nap</units>
>         </header>
>         <event date="2016-02-15" time="12:00:00" value="-0.41" flag="0" fs:V1="V"/>
>         <event date="2016-03-10" time="12:00:00" value="-0.44" flag="0" fs:V1="V"/>
>         <event date="2016-04-20" time="13:00:00" value="-0.51" flag="0" fs:V1="V"/>
>         <event date="2016-05-27" time="13:00:00" value="-0.46" flag="0" fs:V1="V"/>
>         <event date="2016-07-08" time="13:00:00" value="-0.43" flag="0" fs:V1="V"/>
>         <event date="2016-08-23" time="13:00:00" value="-0.46" flag="0" fs:V1="V"/>
>         <event date="2016-09-29" time="13:00:00" value="-0.48" flag="0" fs:V1="V"/>
>         <event date="2016-11-17" time="12:00:00" value="-0.52" flag="0" fs:V1="V"/>
>         <event date="2017-01-05" time="12:00:00" value="-0.44" flag="0" fs:V1="V"/>
>         <event date="2017-02-27" time="12:00:00" value="-0.47" flag="0" fs:V1="V"/>
>         <event date="2017-07-03" time="13:00:00" value="-0.44" flag="0" fs:V1="V"/>
>         <event date="2018-10-12" time="13:40:00" value="-0.38" flag="1"/>
>         <event date="2019-02-07" time="13:04:00" value="-0.38" flag="1"/>
>         <event date="2019-07-09" time="10:07:00" value="-0.39" flag="1"/>
>         <event date="2019-08-15" time="10:17:00" value="-0.39" flag="1" fs:V1="V"/>
>         <event date="2019-09-24" time="12:45:00" value="-0.38" flag="1" fs:V1="V"/>
>         <event date="2020-04-06" time="11:40:47" value="-0.38" flag="1" fs:V1="V"/>
>         <event date="2020-05-11" time="13:32:10" value="-0.34" flag="1" fs:V1="V"/>
>         <event date="2020-07-21" time="12:55:00" value="-0.37" flag="0" fs:V1="V"/>
>     </series> </TimeSeries>

For analyse purpose i want to create a flat-file.

xmldata1 = Cases[xmldatatest,
  XMLElement["series", {}, {XMLElement["header", {}
      , {
       XMLElement["type", {}, {aa_}]
       , XMLElement["moduleInstanceId", {}, {bb_}]
       , XMLElement["locationId", {}, {cc_}]
       , XMLElement["parameterId", {}, {dd_}]
       , XMLElement["timeStep", {"unit" -> ee_}, {}]
       , XMLElement["startDate", {"date" -> ff_, "time" -> gg_}, {}]
       , XMLElement["endDate", {"date" -> hh_, "time" -> ii_}, {}]
       , XMLElement["missVal", {}, {jj_}]
       , XMLElement["stationName", {}, {kk_}]
       , XMLElement["lat", {}, {ll_}]
       , XMLElement["lon", {}, {mm_}]
       , XMLElement["x", {}, {nn_}]
       , XMLElement["y", {}, {oo_}]
       , XMLElement["units", {}, {pp_}]
       }], 
     XMLElement[
      "event", {"date" -> a_, "time" -> b_, "value" -> c_, 
       "flag" -> d_ , ___}, {}], ___}] -> {aa, bb, cc, dd, ee, ff, gg,
     hh, ii, jj, kk, ll, mm, nn, oo, pp, a, b, c, d}, [Infinity]]

The output is a list like:

{{"instantaneous", "ImportGWNET", "20793_WaterDiepte_WdH", 
  "WaterDiepte_WdH", "nonequidistant", "2020-06-30", "12:00:00", 
  "2020-07-30", "12:03:00", "NaN", "A010008A_WaterDiepte_WdH", 
  "52.41827906418918", "4.736055394401724", "110702.58", "492473.03", 
  "m/nap", "2020-06-30", "12:00:00", "-3.72", "0"}, {"instantaneous", 
  "ImportGWNET", "20794_WaterDiepte_WdH", "WaterDiepte_WdH", 
  "nonequidistant", "2020-06-30", "12:00:00", "2020-07-30", 
  "12:09:00", "NaN", "A010009A_WaterDiepte_WdH", "52.417894313462774",
   "4.73618879136875", "110711.27", "492430.14", "m/nap", 
  "2020-06-30", "12:00:00", "-3.56", "0"}, {"instantaneous", 
  "ImportGWNET", "20795_WaterDiepte_WdH", "WaterDiepte_WdH", 
  "nonequidistant", "2020-06-30", "12:00:00", "2020-07-30", 
  "12:20:00", "NaN", "A010010A_WaterDiepte_WdH", "52.417453753682466",
   "4.736404882357068", "110725.53", "492380.99", "m/nap", 
  "2020-06-30", "12:00:00", "-3.38", "0"}, {"instantaneous", 
  "ImportGWNET", "12134_WaterDiepte_WdH", "WaterDiepte_WdH", 
  "nonequidistant", "2016-02-15", "12:00:00", "2020-07-21", 
  "12:55:00", "NaN", "M2-0609O_WaterDiepte_WdH", "52.34412530887633", 
  "4.863795961286827", "119333.0", "484152.0", "m/nap", "2016-02-15", 
  "12:00:00", "-0.41", "0"}}

The issue is: it takes only the first value of the XMLElement["event"]. I want a file with all the XMLElement["event"] in combination with the values in the "header".

Suggestions, how to fix this?

One Answer

You can capture all consecutive events like this:

xmldata1 = Cases[xmldatatest, 
  XMLElement[
    "series", {}, {XMLElement[
      "header", {}, {XMLElement["type", {}, {aa_}], 
       XMLElement["moduleInstanceId", {}, {bb_}], 
       XMLElement["locationId", {}, {cc_}], 
       XMLElement["parameterId", {}, {dd_}], 
       XMLElement["timeStep", {"unit" -> ee_}, {}], 
       XMLElement["startDate", {"date" -> ff_, "time" -> gg_}, {}], 
       XMLElement["endDate", {"date" -> hh_, "time" -> ii_}, {}], 
       XMLElement["missVal", {}, {jj_}], 
       XMLElement["stationName", {}, {kk_}], 
       XMLElement["lat", {}, {ll_}], XMLElement["lon", {}, {mm_}], 
       XMLElement["x", {}, {nn_}], XMLElement["y", {}, {oo_}], 
       XMLElement["units", {}, {pp_}]}], 
     events : Longest[XMLElement["event", ___] ..], ___}] :> 
   Flatten @ {
     aa, bb, cc, dd, ee, ff, gg, hh, ii, jj, kk, ll, mm, nn, oo, pp, 
     Lookup[{events}[[All, 2]], {"date", "time", "value", "flag"}]
     },
  [Infinity]
 ]

You may want to not use Flatten, though, because this produces lists of variable length.

Correct answer by Sjoerd Smit on December 29, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP