Hello,
I have discovered a possible problem on my host. The short
story is: When downloading ISO images from this host (which
runs 2.4.3 + zerocopy and ProFTPd with sendfile()), the image is
sometimes corrupted (MD5 checksum of the downloaded file does not match).
The long story: My server is Athlon 850 on ASUS A7V, 256M RAM.
Seven IDE discs, one SCSI disc. The controllers and NIC are as follows
(output of lspci):
00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
00:0a.0 SCSI storage controller: Adaptec AIC-7881U
00:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
00:11.0 Unknown mass storage controller: Promise Technology, Inc.: Unknown device 0d30 (rev 02)
The server runs Linux 2.4.3 with zero-copy patches and ProFTPd
1.2.2rc1 compiled with --enable-sendfile.
The FTP area is on RAID-1 volume, which is created over two LVM
partitions (each LV spans three physical disks). I hope RAID-1 can speed
things up for multiple simultaneous users.
Yesterday the Red Hat Linux 7.1 has been released, and from that
time the server has about 220 anonymous FTP users and was pushing data
at almost full 100 Mbps ethernet speed (currently the 2hour average is
89.7 Mbps according to MRTG). Today I've got about three complains
about corrupted ISO images. When I run md5sum on the server itself,
the MD5 checksums, of course, perfectly match. I've tried to download
the files from another machine on the same net, and MD5 sums were correct.
However, I have one report of corrupted download even from the same physical
network.
In the last 24 hours the server pushed out about 660 gigabytes
of Red Hat 7.1. Is this amount (i.e. three reports out of 660 gigabytes)
a serious problem?
Also note that I have no corrupted download report for rsync.
But I think rsyncd does not use sendfile(), and of course vast majority
of people use FTP, not rsync, for downloading.
-Yenya
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
Mantra: "everything is a stream of bytes". Repeat until enlightened. --Linus
On Tue, Apr 17, 2001 at 03:10:07PM +0200, Jan Kasprzak wrote:
> 00:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
IIRC the problem came up earlier. Some versions of 3com NICs seem to make
problems with the hardware checksum. There were some fixes in the driver
later; could you try it with 2.4.4pre3 (which includes zerocopy) ?
-Andi
> The long story: My server is Athlon 850 on ASUS A7V, 256M RAM.
> Seven IDE discs, one SCSI disc. The controllers and NIC are as follows
> (output of lspci):
See the VIA chipset report on http://www.theregister.co.uk about corruption problems
with VIA chipsets. The cases seen on Linux included short and also sometimes
stale/corrupted DMA transfers.
Nothing in your report says it is or isnt going to be a VIA chipset problem
but once a fixed BIOS is out for your board that would be a good first step.
If it still does it then, its worth digging for kernel naughties
Alan Cox wrote:
: > The long story: My server is Athlon 850 on ASUS A7V, 256M RAM.
: > Seven IDE discs, one SCSI disc. The controllers and NIC are as follows
: > (output of lspci):
:
: See the VIA chipset report on http://www.theregister.co.uk about corruption problems
: with VIA chipsets. The cases seen on Linux included short and also sometimes
: stale/corrupted DMA transfers.
:
: Nothing in your report says it is or isnt going to be a VIA chipset problem
: but once a fixed BIOS is out for your board that would be a good first step.
: If it still does it then, its worth digging for kernel naughties
:
I don't think I have 686b southbridge. I have 686 (without "b"):
00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8305
00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
00:04.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10)
00:04.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10)
00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30
[...]
-Yenya
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
///... in B its 'extrn' not 'extern'. Alan (yes I programmed in B)\\\
Andi Kleen wrote:
: On Tue, Apr 17, 2001 at 03:10:07PM +0200, Jan Kasprzak wrote:
: > 00:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
:
: IIRC the problem came up earlier. Some versions of 3com NICs seem to make
: problems with the hardware checksum. There were some fixes in the driver
: later; could you try it with 2.4.4pre3 (which includes zerocopy) ?
:
I was not able to boot 2.4.4pre3 at all: It panicked when
initializing aic7xxx. So I've changed the config to old_aic7xxx,
but it locked up on starting up RAID arrays.
BTW, patch-2.4.4pre3 does not contain any significant change
to 3c59x.c (the only change is adding some #include file).
Now I am back to 2.4.3 and I'll try to run proftpd without sendfile().
-Y.
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
///... in B its 'extrn' not 'extern'. Alan (yes I programmed in B)\\\
> : but once a fixed BIOS is out for your board that would be a good first step.
> : If it still does it then, its worth digging for kernel naughties
> :
> I don't think I have 686b southbridge. I have 686 (without "b"):
Ok. What revision of 3c90x card do you have ?
Alan Cox wrote:
: > : but once a fixed BIOS is out for your board that would be a good first step.
: > : If it still does it then, its worth digging for kernel naughties
: > :
: > I don't think I have 686b southbridge. I have 686 (without "b"):
:
: Ok. What revision of 3c90x card do you have ?
:
PCI: Found IRQ 11 for device 00:0c.0
3c59x.c:LK1.1.13 27 Jan 2001 Donald Becker and others. http://www.scyld.com/network/vortex.html
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905C Tornado at 0xa000, 00:50:da:06:95:21, IRQ 11
product code 5957 rev 00.13 date 07-17-99
8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
MII transceiver found at address 24, status 782d.
Enabling bus-master transmits and whole-frame receives.
eth0: scatter/gather enabled. h/w checksums enabled
Some more progress: I now downgraded to proftpd without sendfile().
The CPU usage is now nearly 100% (with ~170 FTP users; with sendfile()
it was under 50% with >320 FTP users). But nevertheless, the downloaded
images now seem to be OK.
Should I try the stock 2.4.3 without zero-copy patches?
-Yenya
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
///... in B its 'extrn' not 'extern'. Alan (yes I programmed in B)\\\
Andi Kleen wrote:
: I guess to debug this problem it would be useful to get some idea about the
: nature of the corruption. Could you enable sendfile() again, and when a
: user complains ask to download it again and provide a
: cmp -cl fileA fileB | head -500 listing of their differences?
Well, here it is:
$ cmp -cl seawolf-sendfile.iso seawolf-i386-SRPMS.iso
160628609 0 ^@ 276 M->
160628610 0 ^@ 32 ^Z
160628611 0 ^@ 14 ^L
160628612 0 ^@ 55 -
160628613 0 ^@ 116 N
160628614 0 ^@ 300 M-@
160628615 0 ^@ 150 h
160628616 0 ^@ 210 M-^H
160628617 0 ^@ 271 M-9
160628618 0 ^@ 307 M-G
160628619 0 ^@ 377 M-^?
[ all bytes in sendfile()d image changed to zero until: ]
160661374 0 ^@ 376 M-~
160661375 0 ^@ 231 M-^Y
160661376 0 ^@ 205 M-^E
160661377 1 ^A 364 M-t
160661378 103 C 277 M-?
160661379 104 D 13 ^K
160661380 60 0 50 (
160661381 60 0 360 M-p
160661382 61 1 77 ?
160661383 1 ^A 304 M-D
160661384 0 ^@ 133 [
160661385 114 L 131 Y
160661386 111 I 377 M-^?
160661387 116 N 123 S
160661388 125 U 234 M-^\
160661389 130 X 250 M-(
Which simply means, that at 160628609 it started to send
the CD image from the beginning. Yes, the original image contains 0x8000 zeros,
and then the text "\001CD001\001\000LINUX".
So it has probably nothing to do with 3c59x driver, but with sendfile()
or ProFTPd's use of sendfile().
If anybody wants to test it, I've left running ProFTPd with sendfile()
enabled at ftp.linux.cz, port 2121.
Thanks,
-Yenya
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
///... in B its 'extrn' not 'extern'. Alan (yes I programmed in B)\\\
Jan Kasprzak wrote:
: $ cmp -cl seawolf-sendfile.iso seawolf-i386-SRPMS.iso
[...]
:
: Which simply means, that at 160628609 it started to send
: the CD image from the beginning.
Well, I did strace of proftpd, and it _may_ be a mis-interpretation
of the sendfile(2) semantics on the proftpd side. The relevant part
of strace follows:
gettimeofday({987527927, 46167}, NULL) = 0
fcntl64(12, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(12, F_SETFL, O_RDWR) = 0
sendfile(12, 9, [0], 678244352) = 138133872
--- SIGALRM (Alarm clock) ---
rt_sigaction(SIGALRM, {0x804f520, [], SA_INTERRUPT|0x4000000}, NULL, 8) = 0
rt_sigaction(SIGALRM, NULL, {0x804f520, [], SA_INTERRUPT|0x4000000}, 8) = 0
rt_sigaction(SIGALRM, {0x804f520, [], SA_INTERRUPT|0x4000000}, NULL, 8) = 0
alarm(300) = 0
sigreturn() = ? (mask now [])
fcntl64(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0
alarm(0) = 300
alarm(300) = 0
alarm(0) = 300
alarm(300) = 0
getpid() = 24482
geteuid32() = 14
getegid32() = 50
flock(6, LOCK_EX) = 0
lseek(6, 644, SEEK_SET) = 644
read(6, "\242_\0\0\16\0\0\0002\0\0\0\0\0\0\0I\10\0\0\0\0\0\0ftp"..., 644) = 644
lseek(6, 644, SEEK_SET) = 644
write(6, "\242_\0\0\16\0\0\0002\0\0\0\0\0\0\0I\10\0\0\0\0\0\0ftp"..., 644) = 644
flock(6, LOCK_UN) = 0
fcntl64(12, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(12, F_SETFL, O_RDWR) = 0
sendfile(12, 9, [0], 540110480) = 103469424
Now the fd 6 is the control connection, fd 9 is the file on disk,
and fd 12 is the data connection. The ProFTPd seems to set alarm to 300
seconds (to detect stalled clients), but when interrupted, something strange
happens: either sendfile does not update the offset in its third parameter,
or it fails to update the offset in the filedescriptor, or something like that.
Maybe ProFTPd should pass the non-zero value (actual offset?) to sendfile()
second time?
What is the expected semantics of sendfile() wrt. restarting
transfers and being interrupted by SIGALRM?
-Yenya
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
Mantra: "everything is a stream of bytes". Repeat until enlightened. --Linus
On Tue, Apr 17, 2001 at 06:15:24PM +0200, Jan Kasprzak wrote:
> Some more progress: I now downgraded to proftpd without sendfile().
> The CPU usage is now nearly 100% (with ~170 FTP users; with sendfile()
> it was under 50% with >320 FTP users). But nevertheless, the downloaded
> images now seem to be OK.
>
> Should I try the stock 2.4.3 without zero-copy patches?
It might also be useful to try 2.4.3+zc with the dev->features |=
NETIF_F_SG; in the 3c59x driver taken out (so it won't use zero-copy)
Since it starts from the beginning instead of corrupting random packets I
doubt it's a hardware problem, though.
--
Pekka Pietikainen
On Tue, Apr 17, 2001 at 06:15:24PM +0200, Jan Kasprzak wrote:
> Alan Cox wrote:
> : > : but once a fixed BIOS is out for your board that would be a good first step.
> : > : If it still does it then, its worth digging for kernel naughties
> : > :
> : > I don't think I have 686b southbridge. I have 686 (without "b"):
> :
> : Ok. What revision of 3c90x card do you have ?
> :
> PCI: Found IRQ 11 for device 00:0c.0
> 3c59x.c:LK1.1.13 27 Jan 2001 Donald Becker and others. http://www.scyld.com/network/vortex.html
> See Documentation/networking/vortex.txt
> eth0: 3Com PCI 3c905C Tornado at 0xa000, 00:50:da:06:95:21, IRQ 11
> product code 5957 rev 00.13 date 07-17-99
> 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
> MII transceiver found at address 24, status 782d.
> Enabling bus-master transmits and whole-frame receives.
> eth0: scatter/gather enabled. h/w checksums enabled
>
> Some more progress: I now downgraded to proftpd without sendfile().
> The CPU usage is now nearly 100% (with ~170 FTP users; with sendfile()
> it was under 50% with >320 FTP users). But nevertheless, the downloaded
> images now seem to be OK.
After cursory examination of proftpd, it appears that there is a misuse of the
sendfile() call under Linux, which may be responsible for the corruption. The
code was originally based on BSD semantics. Under Linux, the offset argument
is not being used correctly to determine how much data has been sent in the
case of EINTR.
A patch will be coming out soon, as it is a fairly trivial fix.
--
"In the event of a failure, the system can be configured to automatically
restart itself. This feature of Windows NT Server provides maximum system
up-time." -- Reliability and Fault Tolerance in Windows NT Server, MSC
Jesse S Sipprell writes:
> A patch will be coming out soon, as it is a fairly trivial fix.
Thank you for tracking this down.
One more subtle note, for the case of error handling. There is a
change to sendfile() in the zerocopy patches which causes sendfile()
to act more like sendmsg() when errors occur.
Specifically, sendmsg() works roughly like the following when an
error happens:
handle_error:
if (sent_something)
return how_much_we_sent;
else
return ERROR_CODE;
So when an error happens, and the kernel was able to send some of
the data, you see something like this in the trace:
sendmsg() = N
...
sendmsg() = ERROR_CODE
sendfile() used to act differently, and this made it difficult to
directly transform a sendmsg()+local_buffer based server into a
sendfile() one because the error handling was so different.
Previously, sendfile() wouldn't give you the partial transfer length,
you'd just get the error _regardless_ of whether any data was sent
successfully during that call. Alexey, myself, and others considered
this behavior bogus and inconsistent. So it was changed.
The long and short of it is that sendfile() now acts just like
sendmsg() when errors happen mid-send.
Later,
David S. Miller
[email protected]
Jesse S Sipprell wrote:
: After cursory examination of proftpd, it appears that there is a misuse of the
: sendfile() call under Linux, which may be responsible for the corruption. The
: code was originally based on BSD semantics. Under Linux, the offset argument
: is not being used correctly to determine how much data has been sent in the
: case of EINTR.
:
: A patch will be coming out soon, as it is a fairly trivial fix.
:
FWIW, I've fixed ProFTPd on my server with the following patch.
Sorry for making noise @ linux-kernel list, it was totally unrelated
to the Linux kernel:
--- proftpd-1.2.2rc1/src/data.c.sendfile Thu Feb 15 15:24:53 2001
+++ proftpd-1.2.2rc1/src/data.c Tue Apr 17 21:35:24 2001
@@ -760,7 +760,9 @@
*
* ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
*/
- if((len = sendfile(session.d->outf->fd, retr_fd, offset, count)) == -1) {
+ len = sendfile(session.d->outf->fd, retr_fd, offset, count);
+ if (len == -1 || len > 0 && len < count) {
+ errno = EINTR;
#elif defined(HAVE_BSD_SENDFILE)
/* BSD semantics for sendfile are flexible...it'd be nice if we could
* standardize on something like it. The semantics are:
@@ -797,7 +799,9 @@
if((count -= len) <= 0)
break;
+#if !defined(HAVE_LINUX_SENDFILE)
*offset += len;
+#endif
if(TimeoutStalled)
reset_timer(TIMER_STALLED, ANY_MODULE);
-Yenya
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
Mantra: "everything is a stream of bytes". Repeat until enlightened. --Linus
On Tue, Apr 17, 2001 at 01:23:07PM -0700, David S. Miller wrote:
> One more subtle note, for the case of error handling. There is a
> change to sendfile() in the zerocopy patches which causes sendfile()
> to act more like sendmsg() when errors occur.
How is this likely to affect applications?
Currently, the glibc2.1 sendfile interface looks like:
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
On error, -1 is returned in the usual fashion and offset is purported to be
updated to point to the next byte following the last one sent.
Will the zerocopy patches break this?
>
> Specifically, sendmsg() works roughly like the following when an
> error happens:
>
> handle_error:
> if (sent_something)
> return how_much_we_sent;
> else
> return ERROR_CODE;
>
> So when an error happens, and the kernel was able to send some of
> the data, you see something like this in the trace:
>
> sendmsg() = N
> ...
> sendmsg() = ERROR_CODE
>
> sendfile() used to act differently, and this made it difficult to
> directly transform a sendmsg()+local_buffer based server into a
> sendfile() one because the error handling was so different.
>
> Previously, sendfile() wouldn't give you the partial transfer length,
> you'd just get the error _regardless_ of whether any data was sent
> successfully during that call. Alexey, myself, and others considered
> this behavior bogus and inconsistent. So it was changed.
>
> The long and short of it is that sendfile() now acts just like
> sendmsg() when errors happen mid-send.
>
> Later,
> David S. Miller
> [email protected]
--
"In the event of a failure, the system can be configured to automatically
restart itself. This feature of Windows NT Server provides maximum system
up-time." -- Reliability and Fault Tolerance in Windows NT Server, MSC
Jesse S Sipprell writes:
> On error, -1 is returned in the usual fashion and offset is purported to be
> updated to point to the next byte following the last one sent.
>
> Will the zerocopy patches break this?
No, they should not.
Later,
David S. Miller
[email protected]
On Tuesday 17 April 2001 22:36, Jan Kasprzak wrote:
> + ? ?if (len == -1 || len > 0 && len < count) {
are you sure there are no missing () ?
if ((len == -1) || (len > 0) && (len < count)) {
assumig that && has precedence over || (I believe so)
On Tue, 17 Apr 2001, Wolfgang Rohdewald wrote:
> On Tuesday 17 April 2001 22:36, Jan Kasprzak wrote:
> > + ? ?if (len == -1 || len > 0 && len < count) {
>
> are you sure there are no missing () ?
>
> if ((len == -1) || (len > 0) && (len < count)) {
>
> assumig that && has precedence over || (I believe so)
I don't this makes it that much cleaner.
If you want to make it clear what this does you should write it more like
this:
if (len == -1 || (len > 0 && len < count))
I don't think it's the == and < , > that confusing but the || and &&
/Martin
Wolfgang Rohdewald wrote:
: On Tuesday 17 April 2001 22:36, Jan Kasprzak wrote:
: > + ? ?if (len == -1 || len > 0 && len < count) {
:
: are you sure there are no missing () ?
:
: if ((len == -1) || (len > 0) && (len < count)) {
:
: assumig that && has precedence over || (I believe so)
Yes, but the precedence of ==, <, and > is even higher.
However, I've found a problem with the previous patch: The first chunk should
read:
- if((len = sendfile(session.d->outf->fd, retr_fd, offset, count)) == -1) {
+ len = sendfile(session.d->outf->fd, retr_fd, offset, count);
+ if (len == -1 || len > 0 && len < count) {
+ if (len != -1)
+ errno = EINTR;
i.e. we should not overwrite errno, when it is valid.
-Yenya
PS.: You can find the C operators precedence for example at
http://www.howstuffworks.com/c14.htm (found by Google).
--
\ Jan "Yenya" Kasprzak <kas at fi.muni.cz> http://www.fi.muni.cz/~kas/
\\ PGP: finger kas at aisa.fi.muni.cz 0D99A7FB206605D7 8B35FCDE05B18A5E //
\\\ Czech Linux Homepage: http://www.linux.cz/ ///
Mantra: "everything is a stream of bytes". Repeat until enlightened. --Linus