Stack Overflow Asked on December 27, 2021
I have an AWS instance running in a loop which loads files from an s3 folder as they appear using boto3, reads them, does some processing, then deletes the file. Instance is a sagemaker instance, with full access to the s3 folder.
This process below works fine when there are any number of files in the s3 folder to work through. However if a new file is created while the below loop is running, then when it tries to load that new file at some later point (dataframe = read_csv(filepath, header=None)) then I get a permission denied error. ‘is_file_available’ spots the file is there, but error occurs when triying to open the file.
Is there something I am missing, e.g.. closing connection?
I have to close / restart the kernel and restart the process to fix the issue.
# Check if file is available to predict and return file id (int)
def is_file_available():
my_bucket = s3.Bucket('processing-ml')
id = -1
for obj in my_bucket.objects.filter(Prefix='to-process/acc'): #Delimiter=''):
filename = obj.key
id = mk_int(filename)
print('acc.csv found id = ',id)
return id
# load a single file as a numpy array
def load_file(filepath):
dataframe = read_csv(filepath, header=None)
return dataframe.values
#load data
def load_dataset_group(id):
filepath = 's3://processing-ml/to-process/acc' + str(id) + '.csv'
print('filepath',filepath)
data = load_file(filepath)
loaded = list()
loaded.append(data)
print(data.shape)
return loaded
while True:
#Run forever
file_id = is_file_available()
if file_id != -1:
data = load_dataset_group(file_id)
... do stuff with data ...
#delete the file in s3 now finished with it
s3.Object('processing-ml', 'to-process/acc' + str(file_id) + '.csv').delete()
time.sleep(1)
I found the issue is related to pandas read_csv(). For some reason it / or mistake I have made implementing, it hits permission denied for new files in the s3 bucket that appear after the code while loop was executed. My solutuion is the below code to read the csv instead using obj.get();
def load_dataset_group(id):
s3 = boto3.resource('s3')
bucket = 'processing-ml'
if id == 0:
key = 'to-process/acc.csv'
else:
key = 'to-process/acc' + str(id) + '.csv'
obj = s3.Object(bucket, key)
data = obj.get()['Body'].read().decode('utf-8')
print(len(data))
data = data.replace('n',',')
my_list = data.split(",")
output = list()
row = list()
count = 0
for x in range(len(my_list)-1): #after last comma is blank value
row.append(int(my_list[x]))
count += 1
if count == 3:
count = 0
output.append(row)
row = list()
output = np.array(output)
print(output.shape)
return output
Answered by Phil on December 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP