Stack Overflow Asked by TendaiM on February 24, 2021
Is there a way to read data from a pandas DataFrame and construct a tree using anytree?
Parent Child
A A1
A A2
A2 A21
I can do it with static values as follows. However, I want to automate this by reading the data from a pandas DataFrame with anytree.
>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)
Output is
A
├── A1
└── A2
└── A21
This question AND especially the ANSWER has been adopted, copied really, from:
Read data from a file and create a tree using anytree in python
Many thanks to @Fabien N
Create nodes first if not exist, store their references in a dictionary nodes
for further usage. Change parent when necessary for children. We can derive roots of the forest of trees by seeing what Parent
values are not in Child
values, since a parent is not a children of any node it won't appear in Child
column.
def add_nodes(nodes, parent, child):
if parent not in nodes:
nodes[parent] = Node(parent)
if child not in nodes:
nodes[child] = Node(child)
nodes[child].parent = nodes[parent]
data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
nodes = {} # store references to created nodes
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1) # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
add_nodes(nodes, parent, child)
roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots: # you can skip this for roots[0], if there is no forest and just 1 tree
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
Result:
A
├── A1
└── A2
└── A21
B
└── B1
Update printing a specific root:
root = 'A' # change according to usecase
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
Correct answer by รยקคгรђשค on February 24, 2021
Please refer to @Fabian N 's answer at Read data from a file and create a tree using anytree in python for details.
Below is an adoption of his answer for an external file to work with a pandas DataFrame:
df['Parent_child'] = df['Parent'] + ',' + df['child'] # column of comma separated Parent,child
i = 0
for index, row in df.iterrows():
if row['child']==row['Parent']: # I modified the DataFrame by concatenating a
# dataframe of all the roots in my data, then
# copied in into both parent and child columns.
# This can be skipped by statically setting the
# roots, only making sure the assumption
# highlighted by @Fabien in the above quoted
# answer still holds true (This assumes that the
# entries are in such an order that a parent node
# was always introduced as a child of another
# node beforehand)
root = Node(row['Parent'])
nodes = {}
nodes[root.name] = root
i=i+1
else:
line = row['Parent_child'].split(",")
name = "".join(line[1:]).strip()
nodes[name] = Node(name, parent=nodes[line[0]])
#predecessor = df['child_Parent'].values[i]
i=i+1
for pre, _, node in RenderTree(root):
print("%s%s" % (pre, node.name))
If there is a better way to achieve the above, kindly post an answer and I will accept is as the solution.
Many thanks @Fabian N.
Answered by TendaiM on February 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP