parallel_rsync

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
parallel_rsync [20.05.2020 19:41] – [the code] Pascal Suterparallel_rsync [20.05.2020 19:44] (current) – [Before we get startet] Pascal Suter
Line 214: Line 214:
   rm -rf /tmp/target/* /tmp/testdirlist /tmp/progressfile   rm -rf /tmp/target/* /tmp/testdirlist /tmp/progressfile
  
-===== Before we get startet =====+===== Doing it manually ===== 
 +Initially i did this manually to copy data from an old storage to a new one. when I later had to write a script to archive large directories with lots of small files, I decided to writhe the above function. So for those who are interested in reading more about the basic method and don't like my bash script, here is the manual way this all originated from :)  
 + 
 +==== Before we get startet ====
 one important note right at the begining: while parallelizing is certainly nice we have to consider, that spinning harddisks don't like concurrent file access. so be prepared to never ever see your harddisks theoretical throughput reached if you copy lots of small files. one important note right at the begining: while parallelizing is certainly nice we have to consider, that spinning harddisks don't like concurrent file access. so be prepared to never ever see your harddisks theoretical throughput reached if you copy lots of small files.
 make sure you don't run too many parallel rsyncs by checking your cpu load with top. if you see the "wa" (waiting) load increase, it means you have too many processes. On the sytem i did this all for, first tried with 80 parallel rsyncs using option 2 below and i had a waiting load of about 50% and a througput of about 20MB/s. i then reduced to 15 parallel rsyncs and the waiting load went down to 25% and the bandwith went up to over 100MB/s. that is on a raid set that achieves a raw throughput of over 500MB/s if streaming performance is measured. just to give you an idea.  make sure you don't run too many parallel rsyncs by checking your cpu load with top. if you see the "wa" (waiting) load increase, it means you have too many processes. On the sytem i did this all for, first tried with 80 parallel rsyncs using option 2 below and i had a waiting load of about 50% and a througput of about 20MB/s. i then reduced to 15 parallel rsyncs and the waiting load went down to 25% and the bandwith went up to over 100MB/s. that is on a raid set that achieves a raw throughput of over 500MB/s if streaming performance is measured. just to give you an idea. 
  • parallel_rsync.txt
  • Last modified: 20.05.2020 19:44
  • by Pascal Suter