This has been a recurring problem for a couple years which I and others
have experienced. I was free from it for a while, but after upgrading
OpenSSL/OpenSSH to avoid the recent exploit it is back and highly
repeatable. This has been persistant enough that I am going to start
with the assumption that it may be a kernel bug, or at least probably
debuggable definitively only by a proficient kernel developer. We have
got to squash this once and for all; SSH is used everywhere and it needs
to be reliable. Probably there is a race condition in ssh, as mentioned
below, but it must be subtle.
rsync/ssh transfers from local system to local system work perfectly.
Between the systems, there is nearly always large delays at certain
times and usually a complete hang. After a long period, this often
produces a timeout. These sytems are on 100baseT on the same switch.
One system appears to be having mild packet loss (400 out of 400,000 on
both send and receive as frame/carrier erros). BTW, running a cpio
through the SSH connections causes a nearly immediate hang, so it is
unlikely to be a problem with rsync.
Both systems work find receiving rsync/ssh from my laptop over a 400Kb
DSL connection with:
(systems are a combination of Suse and Redhat 7.3, upgraded variously by
My standard rsync/ssh script looks like:
brsyncndz (backup rsync no delete or compression):
if [ "$PORT" = "" ]; then PORT=22; fi
rsync -vv -HpogDtSxlra --partial --progress --stats -e "ssh -p $PORT" $*
On both sides:
On 'old' system:
On 'new' system:
References to past discussions: (Tried the TCP buffers tuning.)
Haven't tried this code yet:
[email protected] http://sdw.st
Stephen D. Williams 43392 Wayside Cir,Ashburn,VA 20147-4622
703-724-0118W 703-995-0407Fax Dec2001