Stack Overflow Asked by SuperAnnuated on December 27, 2020
I’ve been searching for a while and I think I may have built the block wrong, but I’m hoping there is a simple solution. I need to break apart a list and every solution I could think of has failed, (limited knowledge). My code is built to look for specific words within the text and pulling the section that text is in, I am also adding the filename that the text was found in. However, this is all to the same list!
for filename in os.scandir(directory):
if filename.path.endswith(".txt"):
f = open(filename, encoding = 'utf-8')
lines = f.readlines()
for line in lines:
if pattern.search(line) != None:
list.append((filename.name, line.rstrip('n')))
continue
else:
continue
when this prints it looks like:
[(‘AEE_0000018654_10Q_20200331_Item1A_excerpt.txt’, ‘In 2019, Ameren Missouri entered into a build-transfer agreement to acquire, after construction, an up-to 300-megawatt wind generation facility. In 2018, Ameren Missouri entered into a build-transfer agreement to acquire, after construction, an up-to 400-megawatt wind generation facility. Unless relevant regulations are modified by the IRS or applicable legislation is enacted by Congress to include an extension of the December 31, 2020 in-service date criteria, if any portion of these facilities is completed ‘), (‘AEE_0000018654_10Q_20200331_Item2_excerpt.txt’, ‘an up-to 400-megawatt wind generation facility. These two agreements are subject to customary contract terms and conditions. The two build-transfer acquisitions collectively represent $1.2 billion of capital expenditures and would support Ameren Missouri’s compliance with the Missouri renewable energy standard. Ameren Missouri and the developers continue to monitor the impact to each project schedule. To date, neither developer has reported to Ameren Missouri that the projects will not be completed in 2020. Ameren Missouri expects the up-to 400-megawatt project to be placed in-service by the end of 2020. However, at this time, due to manufacturing, shipping, and other supply chain issues, and based on Ameren Missouri’s discussions with the developer, Ameren Missouri expects that a portion of the up-to 300-megawatt project, representing approximately $100 million of investment, could be placed in-service in the first quarter of 2021.’)]
So, is there a way I can split this up so that the file name is in a separate list? I would like to use –
import pandas
df = pandas.DataFrame(data={"col1": filename, "col2": list})
df.to_csv("./SECParse.csv", sep=',',index=False)
but so far I am unable to break up this list I’ve created.
Any help?
Since you already have a list of tuples in the form (filename,text)
, I think you can just call
pd.DataFrame(ls,columns=['filename','text'])
where ls
is the list you generated from your for
loop.
Output should look like this:
filename text
0 AEE_0000018654_10Q_20200331_Item1A_excerpt.txt In 2019, Ameren Missouri entered into a build-...
1 AEE_0000018654_10Q_20200331_Item2_excerpt.txt an up-to 400-megawatt wind generation facility...
Answered by Jeff on December 27, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP