parallel_rsync

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
parallel_rsync [20.05.2020 19:32] – [known issues] Pascal Suterparallel_rsync [20.05.2020 19:41] – [the code] Pascal Suter
Line 21: Line 21:
 how many jobs should run in parallel and how many directories deep you want to parallellize your jobs really depends on your sepcific situation. if you have several terabytes of data and you do a complete sync it makes sense to dive deeper into the structure than when you just want to update an already existing copy of the same data, in that case it might be faster to only dive 1 to 2 levels deep into your structure or even not use this script at all, when most of the time is spend by "creating incremental file list". really, read what's behind the script further down to understand how to parametrize it and how to modify it to adjust it to your specific situation how many jobs should run in parallel and how many directories deep you want to parallellize your jobs really depends on your sepcific situation. if you have several terabytes of data and you do a complete sync it makes sense to dive deeper into the structure than when you just want to update an already existing copy of the same data, in that case it might be faster to only dive 1 to 2 levels deep into your structure or even not use this script at all, when most of the time is spend by "creating incremental file list". really, read what's behind the script further down to understand how to parametrize it and how to modify it to adjust it to your specific situation
  
 +in the second version I have added the possibility to optionally pass a 5th and 6th argument. A filename can be passed as $5. If the file does not exist, the initial directory list which resulted from the ''find'' call at the begining of the script is saved to it. If the file exists, prsync will read the contents of the file and use it as directory list instead of re-running the whole ''find'' operation. 
 +a second filename can be passed optionally as $6. prsync will save its progress to that file. if prsync is re-run, this file will be checked before the start of each rsync progress. in case the directory that was supposed to be rsynced is already on the list, it will be skipped. this can prevent re-running rsync for a large number of already synced directories to speed up resuming after an interrupted previous prsync run. 
  
 +these two optional options should only be used if the source does not change between prsync runs. It is specially beneficial if the source storage in unstable and may crash after a certain period of time. using these two files will help to prevent unnecessary file scanning and comparing when resuming the prsync operation after a crash and hence help to advance the progress faster by minimizing unnecessary load on the storage. 
 ==== the code ==== ==== the code ====
  
Line 30: Line 33:
 # #
 # version 1: initial release in 2017 # version 1: initial release in 2017
-# version 2: removed the need to escape filenames by using null delimiter + xargs to run commands such as mkdir and rsync,  +# version 2: May 2020, removed the need to escape filenames by using  
-#            added ability to resume without rescanning (argument $5) and to skip already synced directories (argument $6)+#            null delimiter + xargs to run commands such as mkdir and rsync,  
 +#            added ability to resume without rescanning (argument $5) and to skip  
 +#            already synced directories (argument $6)
 # #
  
Line 40: Line 45:
  # $4 = numjobs  # $4 = numjobs
  # $5 = dirlist file (optional) --> will allow to resume without re-scanning the entire directory structure  # $5 = dirlist file (optional) --> will allow to resume without re-scanning the entire directory structure
-    # $6 = progress log file (optional) --> will allow to skip previously synced directory when resuming with a dirlist file+        # $6 = progress log file (optional) --> will allow to skip previously synced directory when resuming with a dirlist file
  source=$1  source=$1
  destination=$2  destination=$2
  • parallel_rsync.txt
  • Last modified: 20.05.2020 19:44
  • by Pascal Suter