Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
parallel_rsync [08.08.2016 21:21] – Pascal Suter | parallel_rsync [12.09.2016 14:09] – Pascal Suter | ||
---|---|---|---|
Line 8: | Line 8: | ||
here is, how i did it when i needed to copy 40 TB of data from one raidset to another while the server was still online serving files to everybody in the company: | here is, how i did it when i needed to copy 40 TB of data from one raidset to another while the server was still online serving files to everybody in the company: | ||
+ | |||
+ | ===== Before we get startet ===== | ||
+ | one important note right at the begining: while parallelizing is certainly nice we have to consider, that spinning harddisks don't like concurrent file access. so be prepared to never ever see your harddisks theoretical throughput reached if you copy lots of small files. | ||
+ | make sure you don't run too many parallel rsyncs by checking your cpu load with top. if you see the " | ||
+ | besides '' | ||
===== Step 1: creat an incremental file list ===== | ===== Step 1: creat an incremental file list ===== | ||
Line 24: | Line 29: | ||
after waiting too long for Option 1 to finish on a system that carried tons of backups of other systems, i tried this option: \\ | after waiting too long for Option 1 to finish on a system that carried tons of backups of other systems, i tried this option: \\ | ||
if you have tons of files and want to skip the lengthy process of producing a file list via rsync, you can create a list of directories using find and then simply run an rsync per directory. this will give you the full parallelism at the begining but might end with a few ever lasting rsyncs if you don't dig deep enough when doing your initial directory list. still, this might save alot of time. | if you have tons of files and want to skip the lengthy process of producing a file list via rsync, you can create a list of directories using find and then simply run an rsync per directory. this will give you the full parallelism at the begining but might end with a few ever lasting rsyncs if you don't dig deep enough when doing your initial directory list. still, this might save alot of time. | ||
- | find /source/./ -maxdepth 5 -type d | perl -pe ' | + | find /source/./ -maxdepth 5 -type d | perl -pe ' |
with the '' | with the '' | ||
Line 45: | Line 50: | ||
===== Step 3: make sure we didn't miss anything ===== | ===== Step 3: make sure we didn't miss anything ===== | ||
probably the best feature about rsync is, that it resumes aborted previous jobs nicely and it can be run several times across the same source and target with no harm. so let's use this property to just fix everything we have missed or done wrong by simply running a single thread rsync in the end. now this can take some time, and I know no way around that. | probably the best feature about rsync is, that it resumes aborted previous jobs nicely and it can be run several times across the same source and target with no harm. so let's use this property to just fix everything we have missed or done wrong by simply running a single thread rsync in the end. now this can take some time, and I know no way around that. | ||
- | rsync -aHvx /source/ /target/ | + | rsync -aHvx --delete |