The rsync utility is a wonderfully useful tool for keeping two datasets synchronized, but it was never designed to keep two large datasets synchronized when they are separated by a long distance. Over the past couple of years, we developed a utility called UDR at the Laboratory for Advanced Computing at the University of Chicago which integrates rsync with the high performance network protocol UDT.
UDT is a reliable UDP-based protocol that was designed to move large datasets over wide area, high performance networks. UDT is open source and has been used as the basis for over six commercial products.
UDR is open source and available from github.
Here are some test results conducted by Erich Weiler from the University of California at Santa Cruz moving genomic data:
| Source | Destination | UDR | rsync |
|---|---|---|---|
| Santa Cruz | Milwaukee | 500 Mb/s | 160 Mb/s |
| Santa Cruz | Detroit | 600 Mb/s | 150 Mb/s |
| Santa Cruz | Bielefeld | 600 Mb/s | 6 Mb/s |
| Santa Cruz | Aarhus | 350 Mb/s | 6 Mb/s |
| Santa Cruz | Brisbane | 550 Mb/s | 3 Mb/s |
Allison Heath is the Project Lead for UDR.