2002-02-08 15:11:40

by Denis Vlasenko

[permalink] [raw]
Subject: [BUG] 2.4.18pre6+ll: autofs+smbfs: processes in D state

Is here somebody willing to look into smbfs related bugs?

Ok, now to the bug report:

I have a multistream downloader (JetCar) on my NT box at work.
It locks files exclusively while downloading them.

I have automounter running on my Linux box. I use it to access
SMB filesystems over the network.

I wanted to verify that mp3 was good before I got all of it,
so cd'ed to /mnt/auto/smb.host.dir, paused download,
copied incomplete file to Linux and tried to resume dl.
It failed (JetCar: "can't open file").
I rebooted NT box.

Now I have some processes in D state. mc and ls hung trying to stat
/mnt/auto/smb.host.dir it seems. Ksymoopsed SysRq-T output below.

BTW, do I have to run klogd with -x? It seems to produce half-ksymoopsed
output otherwise (no module info I guess). I had to restart syslogd+klogd
to make usable trace.
--
vda

automount D C02AE060 0 1986 1 2007 1962 (NOTLB)
Call Trace: [<c0105d25>] [<c0105eb8>] [<c9056ad6>] [<c905b960>] [<c9058f72>]
[<c9058f54>] [<c014c23f>] [<c0149d06>] [<c01177d9>] [<c9058e01>] [<c90589fc>]
[<c905b2c0>] [<c90563fb>] [<c90565a4>] [<c90565d4>] [<c9058e23>] [<c9058f19>]
[<c013e406>] [<c01071ff>]
Trace; c0105d25 <__down+61/c0>
Trace; c0105eb8 <__down_failed+8/c>
Trace; c9056ad6 <[smbfs]_text_lock_proc+55/17f>
Trace; c905b960 <[smbfs]smb_sops+0/60>
Trace; c9058f72 <[smbfs]smb_delete_inode+1e/6c>
Trace; c9058f54 <[smbfs]smb_delete_inode+0/6c>
Trace; c014c23f <iput+1a3/1cc>
Trace; c0149d06 <shrink_dcache_sb+146/180>
Trace; c01177d9 <printk+131/164>
Trace; c9058e01 <[smbfs]smb_invalidate_inodes+11/1c>
Trace; c90589fc <[smbfs]smb_trans2_request+60/1fd>
Trace; c905b2c0 <[smbfs].rodata.start+1220/175f>
Trace; c90563fb <[smbfs]smb_proc_getattr_trans2+87/188>
Trace; c90565a4 <[smbfs]smb_proc_do_getattr+a8/ac>
Trace; c90565d4 <[smbfs]smb_proc_getattr+2c/44>
Trace; c9058e23 <[smbfs]smb_refresh_inode+17/b8>
Trace; c9058f19 <[smbfs]smb_revalidate_inode+55/90>
Trace; c013e406 <sys_lstat64+3e/6c>
Trace; c01071ff <system_call+33/38>

mc D C02AE060 0 2604 1 2757 2615 (NOTLB)
Call Trace: [<c0105d25>] [<c0105eb8>] [<c9056bb2>] [<c9058e23>] [<c014970c>]
[<c0140778>] [<c9058f19>] [<c013e406>] [<c01071ff>]
Trace; c0105d25 <__down+61/c0>
Trace; c0105eb8 <__down_failed+8/c>
Trace; c9056bb2 <[smbfs]_text_lock_proc+131/17f>
Trace; c9058e23 <[smbfs]smb_refresh_inode+17/b8>
Trace; c014970c <dput+1c/15c>
Trace; c0140778 <getname+70/a8>
Trace; c9058f19 <[smbfs]smb_revalidate_inode+55/90>
Trace; c013e406 <sys_lstat64+3e/6c>
Trace; c01071ff <system_call+33/38>

mc D C02AE060 0 2612 1 2615 2618 (NOTLB)
Call Trace: [<c025c050>] [<c0105d25>] [<c0105eb8>] [<c9056bb2>] [<c9058e23>]
[<c014970c>] [<c0140778>] [<c9058f19>] [<c013e406>] [<c01071ff>]
Trace; c025c050 <rb_insert_color+a8/dc>
Trace; c0105d25 <__down+61/c0>
Trace; c0105eb8 <__down_failed+8/c>
Trace; c9056bb2 <[smbfs]_text_lock_proc+131/17f>
Trace; c9058e23 <[smbfs]smb_refresh_inode+17/b8>
Trace; c014970c <dput+1c/15c>
Trace; c0140778 <getname+70/a8>
Trace; c9058f19 <[smbfs]smb_revalidate_inode+55/90>
Trace; c013e406 <sys_lstat64+3e/6c>
Trace; c01071ff <system_call+33/38>

ls D C02AE060 0 2615 1 2604 2612 (NOTLB)
Call Trace: [<c012986b>] [<c0105d25>] [<c0105eb8>] [<c9056bb2>] [<c9058e23>]
[<c0145025>] [<c0140778>] [<c9058f19>] [<c013e406>] [<c01072f0>] [<c01071ff>]
Trace; c012986b <do_generic_file_read+253/498>
Trace; c0105d25 <__down+61/c0>
Trace; c0105eb8 <__down_failed+8/c>
Trace; c9056bb2 <[smbfs]_text_lock_proc+131/17f>
Trace; c9058e23 <[smbfs]smb_refresh_inode+17/b8>
Trace; c0145025 <vfs_readdir+89/bc>
Trace; c0140778 <getname+70/a8>
Trace; c9058f19 <[smbfs]smb_revalidate_inode+55/90>
Trace; c013e406 <sys_lstat64+3e/6c>
Trace; c01072f0 <error_code+34/3c>
Trace; c01071ff <system_call+33/38>


2002-02-08 20:56:13

by Urban Widmark

[permalink] [raw]
Subject: Re: [BUG] 2.4.18pre6+ll: autofs+smbfs: processes in D state

On Fri, 8 Feb 2002, Denis Vlasenko wrote:

> Is here somebody willing to look into smbfs related bugs?

Why not read the MAINTAINERS file?

> I wanted to verify that mp3 was good before I got all of it,
> so cd'ed to /mnt/auto/smb.host.dir, paused download,
> copied incomplete file to Linux and tried to resume dl.
> It failed (JetCar: "can't open file").
> I rebooted NT box.

This has nothing to do with the automounter.

Rebooting the NT box with smbfs mounted probably made some process go to
sleep for a long time, waiting for a reply or some network timeout.

smbfs has an ugly and incorrect assumption that it will always get a reply
to it's requests. And one process will block all others from sending any
requests until it is done.

You may want to test the 2.4.17 patch on:
http://www.hojdpunkten.ac.se/054/samba/index.html

But that code is (very) experimental. It will allow multiple processes
sending data, the processes will be in interruptible sleep and capable of
timeout.

Further down on that page there is a patch for 2.4.4 to make it poll()
before reading and time-out. That is a smaller change and I have some good
feedback on the 2.2 version of it.


> Now I have some processes in D state. mc and ls hung trying to stat
> /mnt/auto/smb.host.dir it seems. Ksymoopsed SysRq-T output below.

They are all waiting for access to the smbfs "server" struct. None of them
is the real problem.

The process that is blocking the others is probably sleeping in
tcp_data_wait(), and if so it is interruptible.

> BTW, do I have to run klogd with -x? It seems to produce half-ksymoopsed

Yes, it is the recommended setting by the ksymoops maintainer.

/Urban

2002-02-09 12:21:14

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [BUG] 2.4.18pre6+ll: autofs+smbfs: processes in D state

On 8 February 2002 18:55, Urban Widmark wrote:

> > I wanted to verify that mp3 was good before I got all of it,
> > so cd'ed to /mnt/auto/smb.host.dir, paused download,
> > copied incomplete file to Linux and tried to resume dl.
> > It failed (JetCar: "can't open file").
> > I rebooted NT box.
>
> This has nothing to do with the automounter.
>
> Rebooting the NT box with smbfs mounted probably made some process go to
> sleep for a long time, waiting for a reply or some network timeout.
>
> smbfs has an ugly and incorrect assumption that it will always get a reply
> to it's requests. And one process will block all others from sending any
> requests until it is done.

Note that NT box was rebooted, not shut down. I brought it back on and even
restarted download of unlucky mp3 (via NT downloader). Since NT box was
available on the network I thought smbfs will reestablish SMB connection, as
NFS do in similar scenario (server reboot).

> You may want to test the 2.4.17 patch on:
> http://www.hojdpunkten.ac.se/054/samba/index.html
>
> But that code is (very) experimental. It will allow multiple processes
> sending data, the processes will be in interruptible sleep and capable of
> timeout.
>
> Further down on that page there is a patch for 2.4.4 to make it poll()
> before reading and time-out. That is a smaller change and I have some good
> feedback on the 2.2 version of it.
>
> > Now I have some processes in D state. mc and ls hung trying to stat
> > /mnt/auto/smb.host.dir it seems. Ksymoopsed SysRq-T output below.
>
> They are all waiting for access to the smbfs "server" struct. None of them
> is the real problem.
>
> The process that is blocking the others is probably sleeping in
> tcp_data_wait(), and if so it is interruptible.

Hmmm, maybe... I tried to kill -KILL automounter, forgot to do that to
smbmount.

I will try to find reproducible scenario and then will try that patches.
Maybe you'll have to wait till Monday.
--
vda

2002-02-09 12:43:32

by Urban Widmark

[permalink] [raw]
Subject: Re: [BUG] 2.4.18pre6+ll: autofs+smbfs: processes in D state

On Sat, 9 Feb 2002, Denis Vlasenko wrote:

> Note that NT box was rebooted, not shut down. I brought it back on and even
> restarted download of unlucky mp3 (via NT downloader). Since NT box was
> available on the network I thought smbfs will reestablish SMB connection, as
> NFS do in similar scenario (server reboot).

It does, but that reconnect is limited by the same server lock as the
others. Also, SMB has state so you do loose certain things on a server
reboot (eg open files).

> Hmmm, maybe... I tried to kill -KILL automounter, forgot to do that to
> smbmount.

That does not help. Killing smbmount does not affect smbfs until the
connection is lost, and it doesn't improve anything. Just a normal umount.

/Urban