A Tool for Keeping Big Data in Sync

Mar 20, 2013

UDR is now being using by the UC Santa Cruz Gene Browser.

The rsync utility is a wonderfully useful tool for keeping two datasets synchronized, but it was never designed to keep two large datasets synchronized when they are separated by a long distance. Over the past couple of years, we developed a utility called UDR at the Laboratory for Advanced Computing at the University of Chicago which integrates rsync with the high performance network protocol UDT.

UDT is a reliable UDP-based protocol that was designed to move large datasets over wide area, high performance networks. UDT is open source and has been used as the basis for over six commercial products.

UDR is open source and available from github.

Here are some test results conducted by Erich Weiler from the University of California at Santa Cruz moving genomic data:

SourceDestinationUDRrsync
Santa CruzMilwaukee500 Mb/s160 Mb/s
Santa CruzDetroit600 Mb/s150 Mb/s
Santa CruzBielefeld600 Mb/s6 Mb/s
Santa CruzAarhus350 Mb/s6 Mb/s
Santa CruzBrisbane550 Mb/s3 Mb/s

Allison Heath is the Project Lead for UDR.