Data Science Asked on March 29, 2021
I have created the following function which converts an XML File to a DataFrame. This function works good for files smaller than 1 GB, for anything greater than that the RAM(13GB Google Colab RAM) crashes. Same happens if I try it locally on Jupyter Notebook (4GB Laptop RAM). Is there a way to optimize the code?
Code
#Libraries
import pandas as pd
import xml.etree.cElementTree as ET
#Function to convert XML file to Pandas Dataframe
def xml2df(file_path):
#Parsing XML File and obtaining root
tree = ET.parse(file_path)
root = tree.getroot()
dict_list = []
for _, elem in ET.iterparse(file_path, events=("end",)):
if elem.tag == "row":
dict_list.append(elem.attrib) # PARSE ALL ATTRIBUTES
elem.clear()
df = pd.DataFrame(dict_list)
return df
Part of an XML File (‘Badges.xml’)
<badges>
<row Id="82946" UserId="3718" Name="Teacher" Date="2008-09-15T08:55:03.923" Class="3" TagBased="False" />
<row Id="82947" UserId="994" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
<row Id="82949" UserId="3893" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
<row Id="82950" UserId="4591" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
<row Id="82951" UserId="5196" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
<row Id="82952" UserId="2635" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
<row Id="82953" UserId="1113" Name="Teacher" Date="2008-09-15T08:55:03.957" Class="3" TagBased="False" />
This conversion in needed so that I can perform furthur Data Analysis.
I have asked this on StackOverflow (Link) but the answers did not solve my query. I hope to find a solution here.
import dask
import dask.bag as db
import dask.dataframe as dd
from dask.dot import dot_graph
from dask.diagnostics import ProgressBar
dask.set_options(get=dask.multiprocessing.get)
tags_xml = db.read_text('data/Tags.xml', encoding='utf-8')
tags_xml.take(10)
Refer this link for complete tutorial Dask with XML
Answered by Syenix on March 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP