Is it possible to create a python script that looks for files in a directory on a given time daily?

Stack Overflow Asked by UndefinedKid01 on January 1, 2022

So basically, I’m creating a directory that allows users to put csv files in there. But I want to create python script that would look in that folder everyday at a given time (lets say noon) and pick up the latest file that was placed in there if it’s not over a day old. But I’m not sure if that’s possible.

Its this chunk of code that I would like to run if it the app finds a new file in the desired directory:

def better_Match(results, best_percent = "Will need to get the match %"):
    result = {}
    result_list = [{ for item in result} for result in results]
    if result_list:
        score_list = [float(item['score']) for item in result_list]
        match_index = max(enumerate(score_list),key=lambda x: x[1])[0]
        logger.debug('MRCs:{}, Chosen MRC:{}'.format(score_list,score_list[match_index]))
        above_threshold = float(result_list[match_index]['score']) >= float(best_percent)
        if above_threshold:
            result = result_list[match_index]
    return result

def clean_plate_code(platecode):
    return str(platecode).lstrip('0').zfill(5)[:5]

def re_ch(file_path, orig_data, return_columns = ['ex_opbin']):
    list_of_chunk_files = list(file_path.glob('*.csv'))
    cb_ch = [pd.read_csv(f, sep=None, dtype=object, engine='python') for f in tqdm(list_of_chunk_files, desc='Combining ch', unit='chunk')]
    cb_ch = pd.concat(cb_ch)
    shared_columns = [column_name.replace('req_','') for column_name in cb_ch.columns if column_name.startswith('req_')]
    cb_ch.columns = cb_ch.columns.str.replace("req_", "")
    return_columns = return_columns + shared_columns
    cb_ch = cb_ch[return_columns]
    for column in shared_columns:
        cb_ch[column] = cb_ch[column].astype(str)
        orig_data[column] = orig_data[column].astype(str)
    final= orig_data.merge(cb_ch, how='left', on=shared_columns)
    return final

2 Answers

This will do the job!

import os
import time
import threading
import pandas as pd


def create_csv_file():
    # create files.csv file that will contains all the current files
    # This will run for one time only
    if not os.path.exists('files.csv'):
        list_of_files = os.listdir(DIR_PATH )

def check_for_new_files():
    files = pd.read_csv('files.csv')
    list_of_files = os.listdir(DIR_PATH )
    if len(files.files) != len(list_of_files):
        print('New file added')
        #do what you want
        #save your excel with the name sample.xslx
        #append your excel into list of files and get the set so you will not have the sample.xlsx twice if run again


        #save again the curent list of files
        print('Finished for the day!')

ticker = threading.Event()
# Run the program every 86400 seconds = 24h
while not ticker.wait(86400):

It basically uses threading to check for new files every 86400s which is 24h, and saves all the current files in a directory where the py file is in and checks for new files that does not exist in the csv file and append them to the files.csv file every day.

Answered by JaniniRami on January 1, 2022

For running script at certain time:

You can use cron for linux. In windows you can use windows scheduler

Here is an example for getting latest file in directory

files = os.listdir(output_folder)
files = [os.path.join(output_folder, file) for file in files]
files = [file for file in files if os.path.isfile(file)]
latest_file = max(files, key=os.path.getctime)

Answered by DD_N0p on January 1, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP