Stack Overflow Asked by Michael H on January 1, 2021
thank you so much for taking your time. Please see code below. The code works, but instead of searching for one word, I need to search for several words. I’ve tried:
search_word = [‘python’ , ‘aws’ , ‘sql’]
but this doesn’t work. Any ideas on how to make this work?
Any suggestions to improve the code are all welcome!
Code:
directory = r"/Users/resumes_for_testing/"
# define keywords
search_word = 'python'
# Loop through all PDFs in specified directory:
for filename in os.listdir(directory):
if filename.endswith(".pdf"):
# open the pdf file
f = open(filename,'rb')
object = PyPDF2.PdfFileReader(f)
# search for keywords
for i in range(object.numPages):
page = object.getPage(i)
text = page.extractText()
search_text = text.lower().split()
for word in search_text:
if search_word in word:
print("The word '{}' was found in '{}'".format(search_word,filename))
Try pdfreader to extract texts:
import os
from pdfreader import SimplePDFViewer, PageDoesNotExist
def search_in_file(fname, search_words):
fd = open(fname, "rb")
viewer = SimplePDFViewer(fd)
try:
while True:
viewer.render()
text = "".join(viewer.canvas.strings)
for word in search_words:
if word in text:
print("The word '{}' was found in '{}' on page {}".format(word, fname, viewer.current_page_number))
viewer.next()
except PageDoesNotExist:
pass
# define keywords
search_words = ['python', 'aws', 'sql']
# define directory
directory = "./"
# Loop through all PDFs in specified directory:
for fname in os.listdir(directory):
if fname.endswith(".pdf"):
search_in_file(fname, search_words)
Answered by Maksym Polshcha on January 1, 2021
You could try small change in approach where instead of looping the search_text
you could loop through your list of search_words
and then use if statement to see whether it is in search_text
e.g.
# define keywords
search_words = ['python', 'aws', 'sql']
# Loop through all PDFs in specified directory:
for filename in os.listdir(directory):
if filename.endswith(".pdf"):
# open the pdf file
f = open(filename,'rb')
object = PyPDF2.PdfFileReader(f)
# search for keywords
for i in range(object.numPages):
page = object.getPage(i)
text = page.extractText()
search_text = text.lower().split()
for word in search_words:
if word in search_text:
print("The word '{}' was found in '{}'".format(word, filename))
Answered by Matthew King on January 1, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP