2006-02-14 13:50:19

by Adrian Bunk

[permalink] [raw]
Subject: 2.6.16-rc: CIFS reproducibly freezes the computer

Hi Steve,

I do obvserve the following on my i386 computer:

I'm connecting to a Samba server.

Copying data to the server works without any problems.

When trying to copy some GB from the server, my computer completely
frezzes after some 100 MB. This is reproducible.

"Complete freeze" is:
- no reaction to any input, even when I was in the console the magic
SysRq key is not working
- if XMMS was playing, the approx. half a second of the song that was
playing at the time when it happened is played in an endless loop by
the sound chip

I once switched to the console waiting for the crash, and I saw the
following messages:
CIFS VFS: No response to cmd 46 mid 5907
CIFS VFS: Send error in read = -11

There are no other CIFS messages in my logs, and the messages above
didn't make it into the logs (there's nothing recorded in the logs at
the time of the crashes).

I tried kernel 2.6.16-rc2 and 2.6.16-rc3.

CIFS options in my kernel:
CONFIG_CIFS=y
# CONFIG_CIFS_STATS is not set
# CONFIG_CIFS_XATTR is not set
# CONFIG_CIFS_EXPERIMENTAL is not set

I'm mounting with (slightly anonymized):
mount -t cifs -o user="foo",ip=11.22.33.44 //DAT/bar bar

I'm using the smbfs 3.0.21a-4 package from Debian.

It doesn't occur in 2.6.15.4, because with this kernel (and AFAIR also
with older kernels) my computer refuses to mount this share.

Mounting the same share with smbfs works without big problems (on some
rare occassions the connection might become stale and I have to umount
and remount the share, but this is rare and it never affects the
stability of my computer).

I'm using an e100 network card with a 10 MBit/s connection.

Any other information I can provide for helping to debug this problem?

TIA
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed


2006-02-14 15:00:46

by Marc Burkhardt

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

* Adrian Bunk <[email protected]> [2006-02-14 14:50:16 +0100]:

> Hi Steve,
>
> I do obvserve the following on my i386 computer:
>
> I'm connecting to a Samba server.
>
> Copying data to the server works without any problems.
>
> When trying to copy some GB from the server, my computer completely
> frezzes after some 100 MB. This is reproducible.
>
> "Complete freeze" is:
> - no reaction to any input, even when I was in the console the magic
> SysRq key is not working
> - if XMMS was playing, the approx. half a second of the song that was
> playing at the time when it happened is played in an endless loop by
> the sound chip
>

Adrian,

I just copied some ~2 GB file and my system is running OK. The share is
a CIFS on a Samba-Server of MacOS X.

Marc

2006-02-14 15:59:46

by Christian

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

Am Dienstag, 14. Februar 2006 14:50 schrieb Adrian Bunk:
> Hi Steve,
>
> I do obvserve the following on my i386 computer:
>
> I'm connecting to a Samba server.
>
> Copying data to the server works without any problems.
>
> When trying to copy some GB from the server, my computer completely
> frezzes after some 100 MB. This is reproducible.
>
> "Complete freeze" is:
> - no reaction to any input, even when I was in the console the magic
> SysRq key is not working
> - if XMMS was playing, the approx. half a second of the song that was
> playing at the time when it happened is played in an endless loop by
> the sound chip
>
> I once switched to the console waiting for the crash, and I saw the
> following messages:
> CIFS VFS: No response to cmd 46 mid 5907
> CIFS VFS: Send error in read = -11
>
> There are no other CIFS messages in my logs, and the messages above
> didn't make it into the logs (there's nothing recorded in the logs at
> the time of the crashes).
>
> I tried kernel 2.6.16-rc2 and 2.6.16-rc3.
>
> CIFS options in my kernel:
> CONFIG_CIFS=y
> # CONFIG_CIFS_STATS is not set
> # CONFIG_CIFS_XATTR is not set
> # CONFIG_CIFS_EXPERIMENTAL is not set
>
> I'm mounting with (slightly anonymized):
> mount -t cifs -o user="foo",ip=11.22.33.44 //DAT/bar bar
>
> I'm using the smbfs 3.0.21a-4 package from Debian.
>
> It doesn't occur in 2.6.15.4, because with this kernel (and AFAIR also
> with older kernels) my computer refuses to mount this share.
>
> Mounting the same share with smbfs works without big problems (on some
> rare occassions the connection might become stale and I have to umount
> and remount the share, but this is rare and it never affects the
> stability of my computer).
>
> I'm using an e100 network card with a 10 MBit/s connection.
>
> Any other information I can provide for helping to debug this problem?
>
> TIA
> Adrian

I'm experiencing something like this too. I can confirm it for at least since
2.6.12. My current System is Ubuntu Dapper Drake. Whenever I copy a
reasonably large file (> 150 MB) from my box to a WinXp SP2 box my system
first gets very large latencies going up to 5 seconds. After nearly a minute
of copying it will freeze completely. No Numlock no SysRq working anymore.
No output in dmesg at all. It just starts getting slower with high latencies
and will freeze completely if you not kill -9 the cp process. This so far has
only happened to me on outbound copying. (E.g. cp a local file to remote)
host. Inbound (WinXP to my Linux box) works flawlessly.


Some more Info:

cp process hangs with status D in wchan SendRe...
kill -9 not working unless you delete the file on the remote site. Then cp
exits as it should

Mounting a CIFS fs from a WinXP SP2 box
//xxx/DriveD on /mnt/xxx/D type cifs (rw,mand,noexec,nosuid,nodev)

10 Mbit lan
3Com Corporation 3c905B 100BaseTX


System Info:

uname -a
Linux ubuntu 2.6.15-15-686 #1 SMP PREEMPT Thu Feb 9 20:19:53 UTC 2006 i686
GNU/Linux

lspci -vvv
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host
bridge (rev 02)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 64
Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M]
Capabilities: <available only to root>

0000:00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP
bridge (rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: fff00000-000fffff
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B+

0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0

0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
(prog-if 80 [Master])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Region 4: I/O ports at f000 [size=16]

0000:00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev
01) (prog-if 00 [UHCI])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at e000 [size=32]

0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin ? routed to IRQ 9

0000:00:11.0 VGA compatible controller: nVidia Corporation NV4 [RIVA TNT] (rev
04) (prog-if 00 [VGA])
Subsystem: Diamond Multimedia Systems Viper V550 with TV out
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 248 (1250ns min, 250ns max)
Interrupt: pin A routed to IRQ 10
Region 0: Memory at e8000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at e9000000 (32-bit, prefetchable) [size=16M]
Expansion ROM at ea000000 [disabled] [size=64K]
Capabilities: <available only to root>

0000:00:12.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 12
Region 0: I/O ports at e400 [size=128]
Region 1: Memory at ec000000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at eb000000 [disabled] [size=128K]
Capabilities: <available only to root>

0000:00:14.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev
06)
Subsystem: Ensoniq Creative Sound Blaster AudioPCI64V, AudioPCI128
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort-
<TAbort+ <MAbort- >SERR- <PERR-
Latency: 64 (3000ns min, 32000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at e800 [size=64]
Capabilities: <available only to root>

fgrep -i CIFS /boot/config-2.6.15-15-686
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
# CONFIG_CIFS_XATTR is not set
# CONFIG_CIFS_EXPERIMENTAL is not set

-Christian

2006-02-14 16:39:54

by Marc Burkhardt

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

Is that maybe dependant on _what_ version of Samba is running on the receiving
end?

BTW, my fstab entry is like this:

//192.168.100.2/X_Server_Export /mnt/cifs_ggh_export cifs user,username=mkoschewski,password=********,uid=1000,gid=1000,dir_mode=777,file_mode=666,rw 0 0


2006-02-14 18:13:52

by Jan Engelhardt

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

>"Complete freeze" is:
>[..]
> is played in an endless loop by the sound chip

Sounds like a portion of code disabled interrupts?



Jan Engelhardt
--

2006-02-14 18:34:37

by Lee Revell

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

On Tue, 2006-02-14 at 19:12 +0100, Jan Engelhardt wrote:
> >"Complete freeze" is:
> >[..]
> > is played in an endless loop by the sound chip
>
> Sounds like a portion of code disabled interrupts?

Anything that locks the machine while sound is playing will cause the
last period of audio to repeat in an endless loop, because the DMA
engine keeps running but the soundcard isn't getting new data. It's not
specific to interrupt disabling.

Isn't it fortunate that network cards don't work this way? ;-)

Lee

2006-02-14 18:47:10

by ross

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

On Tue, Feb 14, 2006 at 05:40:03PM +0100, Marc Koschewski wrote:
> Is that maybe dependant on _what_ version of Samba is running on the receiving
> end?

I've seen it copying to Windows 2k3. Only uploading large files, and
it's not every time. I'd say 50% of the time, my box freezes when
copying something around 100MiB or larger.

IIRC, my workstation at the office is running 2.6.15.1 or .4.

--
Ross Vandegrift
[email protected]

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

2006-02-15 13:35:27

by Marc Burkhardt

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

* Ross Vandegrift <[email protected]> [2006-02-14 13:47:08 -0500]:

> On Tue, Feb 14, 2006 at 05:40:03PM +0100, Marc Koschewski wrote:
> > Is that maybe dependant on _what_ version of Samba is running on the receiving
> > end?
>
> I've seen it copying to Windows 2k3. Only uploading large files, and
> it's not every time. I'd say 50% of the time, my box freezes when
> copying something around 100MiB or larger.
>
> IIRC, my workstation at the office is running 2.6.15.1 or .4.

I moved to CIFS because SMB didn't work well for me, as well as did NFS. Both
seems to stall in a way, I could never really reproduce. But CIFS is very stable
over here. Never ever had a problem with it, whereas both NFS and SMB are likely
to cause trouble at least once a week. Without log records, without any chance
of recovery. Mostly hard-freezes.

Marc

2006-02-15 16:47:34

by ross

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

On Wed, Feb 15, 2006 at 02:35:23PM +0100, Marc Koschewski wrote:
> I moved to CIFS because SMB didn't work well for me, as well as did NFS. Both
> seems to stall in a way, I could never really reproduce. But CIFS is very stable
> over here. Never ever had a problem with it, whereas both NFS and SMB are likely
> to cause trouble at least once a week. Without log records, without any chance
> of recovery. Mostly hard-freezes.

Well, any interaction with Windows 2k3 has to use CIFS. SMB doesn't
work - I know, I tried smbfs first. Of course smbclient can't really trigger
a hard lockup since it's in userspace. I try to use it for any large
uploads I have to do, since the issue seems exclusive to the cifs
code.

NFS on the other hand, I'm not sure what issues you've seen. I
haven't had a reproducable problem with NFS in probably ten years.
Well, at least that was Linux's fault. I'd love it if Juniper could
explain why their VPNs are currently eating my fragmented UDP packets,
but I digress...

--
Ross Vandegrift
[email protected]

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

2006-02-16 12:11:26

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

On Wed, Feb 15, 2006 at 02:35:23PM +0100, Marc Koschewski wrote:
> * Ross Vandegrift <[email protected]> [2006-02-14 13:47:08 -0500]:
>
> > On Tue, Feb 14, 2006 at 05:40:03PM +0100, Marc Koschewski wrote:
> > > Is that maybe dependant on _what_ version of Samba is running on the receiving
> > > end?
> >
> > I've seen it copying to Windows 2k3. Only uploading large files, and
> > it's not every time. I'd say 50% of the time, my box freezes when
> > copying something around 100MiB or larger.
> >
> > IIRC, my workstation at the office is running 2.6.15.1 or .4.
>
> I moved to CIFS because SMB didn't work well for me, as well as did NFS. Both
> seems to stall in a way, I could never really reproduce. But CIFS is very stable
> over here. Never ever had a problem with it, whereas both NFS and SMB are likely
> to cause trouble at least once a week. Without log records, without any chance
> of recovery. Mostly hard-freezes.

The problems are often error paths.

I might be running in some unusual error paths in CIFS, and the same
might be true for you in the SMB and NFS cases.

The SMB file system in the kernel is unfrtunately unmaintained, but NFS
is well maintained. Have you sent a bug report for your NFS problems
similar to my bug report for my CIFS problems?

> Marc

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-16 13:32:49

by Marc Burkhardt

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

* Adrian Bunk <[email protected]> [2006-02-16 13:11:23 +0100]:

> On Wed, Feb 15, 2006 at 02:35:23PM +0100, Marc Koschewski wrote:
> > * Ross Vandegrift <[email protected]> [2006-02-14 13:47:08 -0500]:
> >
> > > On Tue, Feb 14, 2006 at 05:40:03PM +0100, Marc Koschewski wrote:
> > > > Is that maybe dependant on _what_ version of Samba is running on the receiving
> > > > end?
> > >
> > > I've seen it copying to Windows 2k3. Only uploading large files, and
> > > it's not every time. I'd say 50% of the time, my box freezes when
> > > copying something around 100MiB or larger.
> > >
> > > IIRC, my workstation at the office is running 2.6.15.1 or .4.
> >
> > I moved to CIFS because SMB didn't work well for me, as well as did NFS. Both
> > seems to stall in a way, I could never really reproduce. But CIFS is very stable
> > over here. Never ever had a problem with it, whereas both NFS and SMB are likely
> > to cause trouble at least once a week. Without log records, without any chance
> > of recovery. Mostly hard-freezes.
>
> The problems are often error paths.
>
> I might be running in some unusual error paths in CIFS, and the same
> might be true for you in the SMB and NFS cases.
>
> The SMB file system in the kernel is unfrtunately unmaintained, but NFS
> is well maintained. Have you sent a bug report for your NFS problems
> similar to my bug report for my CIFS problems?

No, I didn't. That's was not because I was lazy (and dude, I was lazy back then
when it came to reporting bugs that were not in my software). I just could not
reproduce the phenomenon. It was SMB as well as NFS. No matter what file size
(we tried any size you can think of), no logs entries (neither client nor server),
no IPv4/IPv6 specific thing (we tried both), no matter what serving OS (Linux,
Mac OS X), no matter if the server was busy with other client or not... we tried
months, and I _mean_ months, to reproduce this. No chance. The s**t happened
always when you thought it's impossible to ever come back. Moreover, it was some
sort of personal fight of me against the networking guys over here. I'm the only
one in the company developing on a Linux machine. All others do on Windows, BSD
or Mac OS X (yes, I know it's a BSD). They always wanted me to go over to BSD or
even _Windows_ (!!!) and then CIFS finally got me back on the road again. NFS
servers are currently not running. They were just run for me. No, that I use
CIFS, there's just no need...

Maybe, when I have time I'll try to convince the to setup the NFS thing again
and let me run tests on it with various kernels and maybe some of your guys help
(testcases, scenarios, ...).

Marc

2006-02-27 06:29:28

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

On Tue, Feb 14, 2006 at 01:47:08PM -0500, Ross Vandegrift wrote:
> On Tue, Feb 14, 2006 at 05:40:03PM +0100, Marc Koschewski wrote:
> > Is that maybe dependant on _what_ version of Samba is running on the receiving
> > end?
>
> I've seen it copying to Windows 2k3. Only uploading large files, and
> it's not every time. I'd say 50% of the time, my box freezes when
> copying something around 100MiB or larger.
>
> IIRC, my workstation at the office is running 2.6.15.1 or .4.

Christian, Ross, my freezes are fixed in 2.6.16-rc5.
Can you check whether 2.6.16-rc5 also fixes your freezes?

> Ross Vandegrift

TIA
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-02-28 03:02:44

by ross

[permalink] [raw]
Subject: Re: 2.6.16-rc: CIFS reproducibly freezes the computer

On Mon, Feb 27, 2006 at 07:29:26AM +0100, Adrian Bunk wrote:
> Christian, Ross, my freezes are fixed in 2.6.16-rc5.
> Can you check whether 2.6.16-rc5 also fixes your freezes?

During the last week my workstation at the office was upgraded. I
haven't been able to reproduce the freeze on the new box after
uploading a few 600MB ISO images. This certainly would've tripped the
old machine, but this one seems fine.

Similar kernel version, 2.6.15-1-686-smp from Debian etch. The
possible major difference is that the old box wasn't SMP, the new one
is.

Sorry, not much help!

--
Ross Vandegrift
[email protected]

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37