Super User Asked by user7758051 on November 12, 2021
I have about 50 XML files that are structured the same. I want to extract multiple values from each one, with the values being in different “rows” of the XML table.
Right now, I’m able to use Power Query to pull in one “row” of the XML table for one file.
This doesn’t work for me because:
Can I use Power Query for this or do I have to use VBA? Are there any resources available for this type of project?
The magic you're looking for is Table.Combine(MyTable[ColumnOfTables])
instead of MyTable{0}[ColumnOfTables]
.
For a given XML file (assuming they are all the same structure), try building a query to get all you need from just one file. I recommend not clicking the word "Table" within a table cell to navigate. Instead, type each step by hand and name the row you want rather than assume it's always going to be in the same order. Start with a query of
let
Source = Xml.Tables(File.Contents("C:TempYourFile.xml"))
in
Source
Then see what that looks like. In the XML file I was playing with, I saw a table with columns of "Name" and "Table". I only wanted the one row where "Name" column value was "Body", so I changed my query to
let
Source = Xml.Tables(File.Contents("C:TempYourFile.xml")),
Body = Source{[Name="Body"]}[Table]
in
Body
For some reason I don't understand yet, what I saw was a similar table with a single row containing the namespace. Easy enough. Repeat the same step.
let
Source = Xml.Tables(File.Contents("C:TempYourFile.xml")),
Body = Source{[Name="Body"]}[Table],
#"namespace url here" = Body{[Name="namespace url here"]}[Table]
in
#"namespace url here"
Let's say that gave me a table with a column called "Cases", each containing a table with a column of "Participants" and what I wanted was a list of all cases' participants
let
Source = Xml.Tables(File.Contents("C:TempYourFile.xml")),
Body = Source{[Name="Body"]}[Table],
#"namespace url here" = Body{[Name="namespace url here"]}[Table],
Cases = Table.Combine(#"namespace url here"[Cases]),
Participants = Table.Combine(Cases[Participants])
in
Participants
Once you get something like this for a particular file, duplicate the query, rename it to "fGetParticipants" (or whatever name makes sense to you), and replace the first two lines with the following:
(record as record) as table =>
let
FilePath = record[File Path],
Source = Xml.Tables(File.Contents(FilePath)),
This query will return a function that takes a table row as an argument, gets file path, then does the same as before.
Assuming you have or can create a query of file paths (either start with a query of a folder and add a new "File Path" column of [Folder Name] & [Name]
, or whatever, you just need to have a query that contains a column called "File Path" that contains the filepaths you want), now you can have a query like
let
Source = #"Name of your query containing a column called File Path",
#"Added Participants" = Table.AddColumn(Source, "Participants", fGetParticipants),
// Not all had participants, so some of the files resulted in errors...
#"Removed Errors" = Table.RemoveRowsWithErrors("#Added Participants", {"Participants"}),
// Combine them all into one huge table
#"All Participants" = Table.Combine(#"Removed Errors"[Participants])
in
#"All Participants"
I'm sure this won't get you all you need. You mentioned there was some nesting, you didn't give much detail, and there are probably some edge cases in your data that the above doesn't touch. However, this should help you move forward if you wanted to do this via PowerQuery.
Answered by JSmart523 on November 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP