Stack Overflow Asked by UndefinedKid01 on January 1, 2022
So basically, I’m creating a directory that allows users to put csv files in there. But I want to create python script that would look in that folder everyday at a given time (lets say noon) and pick up the latest file that was placed in there if it’s not over a day old. But I’m not sure if that’s possible.
Its this chunk of code that I would like to run if it the app finds a new file in the desired directory:
def better_Match(results, best_percent = "Will need to get the match %"):
result = {}
result_list = [{item.name:item.text for item in result} for result in results]
if result_list:
score_list = [float(item['score']) for item in result_list]
match_index = max(enumerate(score_list),key=lambda x: x[1])[0]
logger.debug('MRCs:{}, Chosen MRC:{}'.format(score_list,score_list[match_index]))
logger.debug(result_list[match_index])
above_threshold = float(result_list[match_index]['score']) >= float(best_percent)
if above_threshold:
result = result_list[match_index]
return result
def clean_plate_code(platecode):
return str(platecode).lstrip('0').zfill(5)[:5]
def re_ch(file_path, orig_data, return_columns = ['ex_opbin']):
list_of_chunk_files = list(file_path.glob('*.csv'))
cb_ch = [pd.read_csv(f, sep=None, dtype=object, engine='python') for f in tqdm(list_of_chunk_files, desc='Combining ch', unit='chunk')]
cb_ch = pd.concat(cb_ch)
shared_columns = [column_name.replace('req_','') for column_name in cb_ch.columns if column_name.startswith('req_')]
cb_ch.columns = cb_ch.columns.str.replace("req_", "")
return_columns = return_columns + shared_columns
cb_ch = cb_ch[return_columns]
for column in shared_columns:
cb_ch[column] = cb_ch[column].astype(str)
orig_data[column] = orig_data[column].astype(str)
final= orig_data.merge(cb_ch, how='left', on=shared_columns)
return final
This will do the job!
import os
import time
import threading
import pandas as pd
DIR_PATH = 'DIR_PATH_HERE'
def create_csv_file():
# create files.csv file that will contains all the current files
# This will run for one time only
if not os.path.exists('files.csv'):
list_of_files = os.listdir(DIR_PATH )
list_of_files.append('files.csv')
pd.DataFrame({'files':list_of_files}).to_csv('files.csv')
else:
None
def check_for_new_files():
create_csv_file()
files = pd.read_csv('files.csv')
list_of_files = os.listdir(DIR_PATH )
if len(files.files) != len(list_of_files):
print('New file added')
#do what you want
#save your excel with the name sample.xslx
#append your excel into list of files and get the set so you will not have the sample.xlsx twice if run again
list_of_files.append('sample.xslx')
list_of_files=list(set(list_of_files))
#save again the curent list of files
pd.DataFrame({'files':list_of_files}).to_csv('files.csv')
print('Finished for the day!')
ticker = threading.Event()
# Run the program every 86400 seconds = 24h
while not ticker.wait(86400):
check_for_new_files()
It basically uses threading to check for new files every 86400s which is 24h, and saves all the current files in a directory where the py file is in and checks for new files that does not exist in the csv file and append them to the files.csv file every day.
Answered by JaniniRami on January 1, 2022
For running script at certain time:
You can use cron for linux. In windows you can use windows scheduler
Here is an example for getting latest file in directory
files = os.listdir(output_folder)
files = [os.path.join(output_folder, file) for file in files]
files = [file for file in files if os.path.isfile(file)]
latest_file = max(files, key=os.path.getctime)
Answered by DD_N0p on January 1, 2022
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP