2002-04-09 11:56:28

by Erik Inge Bolsø

[permalink] [raw]
Subject: 2.2.20 umount oops (probably smbfs related)

Here's an oops that appeared yesterday in umount, after 81 days of uptime
and much automated smbfs mount/umount activity:

Stock kernel 2.2.20. No charset= or other weird options to smbfs.

I seem to remember having seen this once on a 2.2.19pre series kernel as
well.

Ksymoops:

Unable to handle kernel NULL pointer dereference at virtual address 0000001c
current->tss.cr3 = 08f1f000, %cr3 = 08f1f000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0126389>]
EFLAGS: 00010286
eax: 00000000 ebx: 00000000 ecx: cb428000 edx: 0000003c
esi: cd8ef600 edi: 00000000 ebp: ce6a0004 esp: cb429f4c
ds: 0018 es: 0018 ss: 0018
Process umount (pid: 30793, process nr: 116, stackpage=cb429000)
Stack: 00000000 cd8ef644 cd8ef644 cd8ef600 00000004 c012914e cd8ef600 00000004
fffffffa c14f0004 ce6a8188 c01291f8 00000004 00000000 00000000 00000000
08050004 c14f2a00 00000000 c01292ed 00000004 00000000 cb428000 08051ea9
Call Trace: [<c012914e>] [<c01291f8>] [<c01292ed>] [<c0129308>] [<c0109144>]
Code: 8b 43 1c 48 75 35 53 e8 9f 9b 00 00 53 e8 31 ee ff ff c7 43

>>EIP: c0126389 <fput+5/48>
Trace: c012914e <do_umount+ee/144>
Trace: c01291f8 <umount_dev+54/9c>
Trace: c01292ed <sys_umount+ad/bc>
Trace: c0129308 <sys_oldumount+c/10>
Trace: c0109144 <system_call+34/38>
Code: c0126389 <fput+5/48> 00000000 <_EIP>: <===
Code: c0126389 <fput+5/48> 0: 8b 43 1c movl 0x1c(%ebx),%eax <===
Code: c012638c <fput+8/48> 3: 48 decl %eax
Code: c012638d <fput+9/48> 4: 75 35 jne c01263c4 <fput+40/48>
Code: c012638f <fput+b/48> 6: 53 pushl %ebx
Code: c0126390 <fput+c/48> 7: e8 9f 9b 00 00 call c012ff34 <locks_remove_flock+0/90>
Code: c0126395 <fput+11/48> c: 53 pushl %ebx
Code: c0126396 <fput+12/48> d: e8 31 ee ff ff call c01251cc <__fput+0/48>
Code: c012639b <fput+17/48> 12: c7 43 00 00 00 00 00 movl $0x0,0x0(%ebx)

3 warnings issued. Results may not be reliable.

Right before the oops, I got these lines in dmesg:

ind //email.txt failed, error=-5
smb_lookup: find //email.txt failed, error=-5
smb_retry: signal failed, error=-3
smb_lookup: find //email.txt failed, error=-5
smb_get_length: recv error = 512
smb_request: result -512, setting invalid
smb_dont_catch_keepalive: did not get valid server!

Especially the last line - happened in the same second as the oops,
according to syslog.

Note that the smb share in question is mounted, alive and well as of this
moment, I can read files on it just fine - it's just the umount of it that
oopsed.

This is a production server in heavy use, so no _too_ experimental patches
please, can't reboot it very often :-/

Any fixes handy, anyone? Can't seem to find anything that would fix this
in the 2.2.21pre changelog...

Please CC: me, I'm not on either of the linux-kernel or samba lists.

--
Erik I. Bols?, Triangel Maritech Software AS | Skybert AS
Tlf: 712 41 694 Mobil: 915 79 512


2002-04-11 20:21:39

by Urban Widmark

[permalink] [raw]
Subject: Re: 2.2.20 umount oops (probably smbfs related)

On Tue, 9 Apr 2002, Erik Inge Bols? wrote:

> Process umount (pid: 30793, process nr: 116, stackpage=cb429000)
> Stack: 00000000 cd8ef644 cd8ef644 cd8ef600 00000004 c012914e cd8ef600 00000004
> fffffffa c14f0004 ce6a8188 c01291f8 00000004 00000000 00000000 00000000
> 08050004 c14f2a00 00000000 c01292ed 00000004 00000000 cb428000 08051ea9
> Call Trace: [<c012914e>] [<c01291f8>] [<c01292ed>] [<c0129308>] [<c0109144>]
> Code: 8b 43 1c 48 75 35 53 e8 9f 9b 00 00 53 e8 31 ee ff ff c7 43
>
> >>EIP: c0126389 <fput+5/48>
> Trace: c012914e <do_umount+ee/144>
> Trace: c01291f8 <umount_dev+54/9c>
> Trace: c01292ed <sys_umount+ad/bc>
> Trace: c0129308 <sys_oldumount+c/10>
> Trace: c0109144 <system_call+34/38>
> Code: c0126389 <fput+5/48> 00000000 <_EIP>: <===
> Code: c0126389 <fput+5/48> 0: 8b 43 1c movl 0x1c(%ebx),%eax <===

Your trace doesn't include any smb_ references, but I suppose the cd8ef644
ones might be. I don't see where do_umount calls fput so ...

> Right before the oops, I got these lines in dmesg:
>
> ind //email.txt failed, error=-5
> smb_lookup: find //email.txt failed, error=-5
> smb_retry: signal failed, error=-3

"signal failed, error=-3" means that smbmount is no longer with us. When
that happens smbfs can't get a new connection when the connection is lost
(which is a normal event).

This is usually bad and you may want to investigate why it died/upgrade
your samba version regardless of the patch below. Recent smbmounts can log
to file and with a suitable debuglevel you may find out what happened
(debug=4 or so).

> smb_lookup: find //email.txt failed, error=-5
> smb_get_length: recv error = 512
> smb_request: result -512, setting invalid
> smb_dont_catch_keepalive: did not get valid server!

smbfs unmount code "put_super" does:
if (server->sock_file) {
smb_proc_disconnect(server);
smb_dont_catch_keepalive(server);
fput(server->sock_file);
}

I think what happened is that there was a server->sock_file, but that the
tcp connection behind it was actually dead. -5 is an indication of that.

When it tries to send the disconnect message in smb_proc_disconnect it
detects this, closes sock_file and sets it to NULL.

smb_dont_catch_keepalive prints that error message on a NULL sock_file.

Then when the fput is run the put_super code assumes there is a
sock_file, because it was one in the if ...

If that is what happened the patch below should help. It simply changes
smbfs not to try and send a disconnect message if it isn't connected.
Which makes sense anyway, no need to connect just to say goodbye. Even if
that may the polite thing to do :)


> Note that the smb share in question is mounted, alive and well as of this
> moment, I can read files on it just fine - it's just the umount of it that
> oopsed.

Sounds strange. Could that be some automounter that mounted another one
for you?

If the patch below doesn't work, try just removing the smb_proc_disconnect
line from put_super. Closing the file disconnects anyway.

/Urban


diff -urN -X exclude linux-2.2.20-orig/fs/smbfs/proc.c linux-2.2.20-smbfs/fs/smbfs/proc.c
--- linux-2.2.20-orig/fs/smbfs/proc.c Thu Apr 11 21:25:09 2002
+++ linux-2.2.20-smbfs/fs/smbfs/proc.c Thu Apr 11 22:01:48 2002
@@ -2152,10 +2152,16 @@
int
smb_proc_disconnect(struct smb_sb_info *server)
{
- int result;
+ int result = -EIO;
+
smb_lock_server(server);
+ if (server->state != CONN_VALID)
+ goto out;
+
smb_setup_header(server, SMBtdis, 0, 0);
result = smb_request_ok(server, SMBtdis, 0, 0);
+
+out:
smb_unlock_server(server);
return result;
}

2002-04-12 08:29:25

by Erik Inge Bolsø

[permalink] [raw]
Subject: Re: 2.2.20 umount oops (probably smbfs related)

On Thu, 11 Apr 2002, Urban Widmark wrote:
> On Tue, 9 Apr 2002, Erik Inge Bols? wrote:
> > >>EIP: c0126389 <fput+5/48>
> > Trace: c012914e <do_umount+ee/144>
> > Trace: c01291f8 <umount_dev+54/9c>
> > Trace: c01292ed <sys_umount+ad/bc>
> > Trace: c0129308 <sys_oldumount+c/10>
> > Trace: c0109144 <system_call+34/38>
> > Code: c0126389 <fput+5/48> 00000000 <_EIP>: <===
> > Code: c0126389 <fput+5/48> 0: 8b 43 1c movl 0x1c(%ebx),%eax <===
>
> Your trace doesn't include any smb_ references, but I suppose the cd8ef644
> ones might be. I don't see where do_umount calls fput so ...

Right. Seems that the somewhat ancient ksymoops (0.6e) didn't pick up the
smbfs module's symbols. Will update.

> This is usually bad and you may want to investigate why it died/upgrade
> your samba version regardless of the patch below. Recent smbmounts can log
> to file and with a suitable debuglevel you may find out what happened
> (debug=4 or so).

Thanks for the tip. Upgrading the 2.0.6 to 2.0.10 ASAP.

> > smb_lookup: find //email.txt failed, error=-5
> > smb_get_length: recv error = 512
> > smb_request: result -512, setting invalid
> > smb_dont_catch_keepalive: did not get valid server!
>
> smbfs unmount code "put_super" does:
> if (server->sock_file) {
> smb_proc_disconnect(server);
> smb_dont_catch_keepalive(server);
> fput(server->sock_file);
> }

<snip good explanation>

Aha! I traced it as far as these lines myself yesterday, but couldn't
figure out what nulled sock_file, and why. Thanks!

> If that is what happened the patch below should help. It simply changes
> smbfs not to try and send a disconnect message if it isn't connected.
> Which makes sense anyway, no need to connect just to say goodbye. Even if
> that may the polite thing to do :)

Thanks, will try the patch as soon as I find time to rebuild. Looks sane
:)

> > Note that the smb share in question is mounted, alive and well as of this
> > moment, I can read files on it just fine - it's just the umount of it that
> > oopsed.
>
> Sounds strange. Could that be some automounter that mounted another one
> for you?

Could be, I suppose. No automounter running, but the script that oopsed is
run once an hour and does an umount/mount to deal with the windows server
being rebooted - we want the share to stay mounted, no matter if we reboot
the old NT4 box. (If we reboot it and don't do this, we get I/O errors on
accessing the mount point.)

--
Erik I. Bols?, Triangel Maritech Software AS | Skybert AS
Tlf: 712 41 694 Mobil: 915 79 512