Server Fault Asked by aseba on February 4, 2021
I have to copy 400G of files from an elastic block store volume to an s3 bucket… Those are about 300k files of ~1Mb
I’ve tried s3cmd and s3fuse, both of them are really, really slow.. s3cmd ran for a complete day, said it finished copying, and when I checked the bucket, nothing had happened (I suppose something went wrong, but at least s3cmd never complained of anything)
S3Fuse is working for an other complete day, and copied less than 10% of files…
Is there a better solution for this?
I’m running Linux (ubuntu 12.04) of course
Another good option is peak/s5cmd:
For uploads, s5cmd is 32x faster than s3cmd and 12x faster than aws-cli. For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s), whereas s3cmd and aws-cli can only reach 85 MB/s and 375 MB/s respectively.
Answered by Shane Brinkman-Davis on February 4, 2021
Tune AWS CLI S3 Configuration values as per http://docs.aws.amazon.com/cli/latest/topic/s3-config.html.
The below increased an S3 sync speed by at least 8x!
Example:
$ more ~/.aws/config
[default]
aws_access_key_id=foo
aws_secret_access_key=bar
s3 =
max_concurrent_requests = 100
max_queue_size = 30000
Answered by Fletcher on February 4, 2021
Try using s3-cli instead of s3cmd. I used it instead of s3cmd to upload files to my s3 bucket and it made my deployment faster almost by 17 minutes (from 21 to 4 minutes)!
Here's the link : https://github.com/andrewrk/node-s3-cli
Answered by Yahya on February 4, 2021
Try s4cmd instead, it's really faster than s3cmd. Its address: https://github.com/bloomreach/s4cmd
Answered by mcsrainbow on February 4, 2021
There is also: s3funnel, which seems very old (2008) and some open bugs, but is still listed from Amazon itself: amzn-lnk
Answered by math on February 4, 2021
I wrote a optimized console application in C# (CopyFasterToS3) to do this. I used in EBS vol, i my case it had 5 folders with more than 2 millions files in a amount of 20Gb. The script executed in less than 30 minutes.
In this article i showed how to using a recursive function with parallel. You can transcripted it to another language.
Good luck!
Answered by André Agostinho on February 4, 2021
There are several key factors that determine throughput from EC2 to S3:
In cases of transferring large amounts of data, it may be economically practical to use a cluster compute instance, as the effective gain in throughput (>10x) is more than the difference in cost (2-3x).
While the above ideas are fairly logical (although, the per-thread cap may not be), it is quite easy to find benchmarks backing them up. One particularly detailed one can be found here.
Using between 64 and 128 parallel (simultaneous) uploads of 1MB objects should saturate the 1Gbps uplink that an m1.xlarge has and should even saturate the 10Gbps uplink of a cluster compute (cc1.4xlarge) instance.
While it is fairly easy to change instance size, the other two factors may be harder to manage.
Answered by cyberx86 on February 4, 2021
So, after a lot of testing s3-parallel-put did the trick awesomely. Clearly the solution if you need to upload a lot of files to S3. Thanks to cyberx86 for the comments.
Answered by aseba on February 4, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP