2007-05-23 00:50:27

by Dave Young

[permalink] [raw]
Subject: 2.6.22-rc1-mm1 cifs_mount oops

Hi,
when I use mount -t cifs , the kernel oops, seems break at
kthread_stop, I'm not sure.

But if I add the CONFIG_CIFS_DEBUG2=y to config file, rebuild kernel,
then the oops disappeared.

Below is the oops message:

BUG: unable to handle kernel NULL pointer dereference at virtual
address 00000008
printing eip:
c012e910
*pde = 00000000
Oops: 0002 [#1]
PREEMPT
Modules linked in: cifs smbfs radeon drm ipv6 snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss capability commoncap e100 mii psmouse sg evdev serio_raw
snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp
agpgart i2c_i801 pcspkr
CPU: 0
EIP: 0060:[<c012e910>] Not tainted VLI
EFLAGS: 00210246 (2.6.22-rc1-mm1 #3)
EIP is at kthread_stop+0x10/0x90
eax: c051bde0 ebx: 00000000 ecx: c1fba000 edx: c1fef040
esi: 00000000 edi: 00000064 ebp: c2a36c80 esp: c1fbbd58
ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
Process mount.cifs (pid: 3955, ti=c1fba000 task=c2b38540 task.ti=c1fba000)
Stack: c1fef040 ffffff90 ffffff90 f8a7a328 c285a504 f8a9a9fb 00000083 000000cf
000000dc 0000000b c2b38540 c2af5740 c292c540 00000000 00000000 c285a4c0
00000000 c411b400 c3a4f500 c3cec200 c1fef052 c291c1e0 c1fef037 c291c940
Call Trace:
[<f8a7a328>] cifs_mount+0xbe8/0xf10 [cifs]
[<c02633fe>] idr_get_new_above_int+0x3e/0x50
[<f8a6d04e>] cifs_read_super+0x4e/0x160 [cifs]
[<c0166fc0>] set_anon_super+0x0/0xd0
[<f8a6d6c0>] cifs_get_sb+0x60/0xd0 [cifs]
[<c0167491>] vfs_kern_mount+0x91/0x130
[<c017d648>] permit_mount+0x28/0xa0
[<c017e09a>] do_new_mount+0x8a/0x140
[<c017e95e>] do_mount+0x25e/0x280
[<c0440000>] schedule+0x2e0/0x680
[<c017e602>] exact_copy_from_user+0x32/0x70
[<c017e69a>] copy_mount_options+0x5a/0xc0
[<c017ec19>] sys_mount+0x79/0xc0
[<c01040bc>] syscall_call+0x7/0xb
=======================
Code: 88 d1 d3 e0 89 43 5c 83 c4 18 5b c3 eb 0d 90 90 90 90 90 90 90
90 90 90 90 90 90 53 83 ec 08 89 c3 b8 e0 bd 51 c0 e8 90 26 31 00 <ff>
43 08 31 c9 b8 f0 c1 58 c0 89 0d ec c1 58 c0 e8 3b 01 00 00
EIP: [<c012e910>] kthread_stop+0x10/0x90 SS:ESP 0068:c1fbbd58

Regards
dave


2007-05-23 02:22:55

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

On Wed, 23 May 2007 00:50:13 +0000 "young dave" <[email protected]> wrote:

> Hi,
> when I use mount -t cifs , the kernel oops, seems break at
> kthread_stop, I'm not sure.
>
> But if I add the CONFIG_CIFS_DEBUG2=y to config file, rebuild kernel,
> then the oops disappeared.
>
> Below is the oops message:
>
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000008
> printing eip:
> c012e910
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: cifs smbfs radeon drm ipv6 snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> snd_mixer_oss capability commoncap e100 mii psmouse sg evdev serio_raw
> snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp
> agpgart i2c_i801 pcspkr
> CPU: 0
> EIP: 0060:[<c012e910>] Not tainted VLI
> EFLAGS: 00210246 (2.6.22-rc1-mm1 #3)
> EIP is at kthread_stop+0x10/0x90
> eax: c051bde0 ebx: 00000000 ecx: c1fba000 edx: c1fef040
> esi: 00000000 edi: 00000064 ebp: c2a36c80 esp: c1fbbd58
> ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
> Process mount.cifs (pid: 3955, ti=c1fba000 task=c2b38540 task.ti=c1fba000)
> Stack: c1fef040 ffffff90 ffffff90 f8a7a328 c285a504 f8a9a9fb 00000083 000000cf
> 000000dc 0000000b c2b38540 c2af5740 c292c540 00000000 00000000 c285a4c0
> 00000000 c411b400 c3a4f500 c3cec200 c1fef052 c291c1e0 c1fef037 c291c940
> Call Trace:
> [<f8a7a328>] cifs_mount+0xbe8/0xf10 [cifs]
> [<c02633fe>] idr_get_new_above_int+0x3e/0x50
> [<f8a6d04e>] cifs_read_super+0x4e/0x160 [cifs]
> [<c0166fc0>] set_anon_super+0x0/0xd0
> [<f8a6d6c0>] cifs_get_sb+0x60/0xd0 [cifs]
> [<c0167491>] vfs_kern_mount+0x91/0x130
> [<c017d648>] permit_mount+0x28/0xa0
> [<c017e09a>] do_new_mount+0x8a/0x140
> [<c017e95e>] do_mount+0x25e/0x280
> [<c0440000>] schedule+0x2e0/0x680
> [<c017e602>] exact_copy_from_user+0x32/0x70
> [<c017e69a>] copy_mount_options+0x5a/0xc0
> [<c017ec19>] sys_mount+0x79/0xc0
> [<c01040bc>] syscall_call+0x7/0xb
> =======================
> Code: 88 d1 d3 e0 89 43 5c 83 c4 18 5b c3 eb 0d 90 90 90 90 90 90 90
> 90 90 90 90 90 90 53 83 ec 08 89 c3 b8 e0 bd 51 c0 e8 90 26 31 00 <ff>
> 43 08 31 c9 b8 f0 c1 58 c0 89 0d ec c1 58 c0 e8 3b 01 00 00
> EIP: [<c012e910>] kthread_stop+0x10/0x90 SS:ESP 0068:c1fbbd58
>

I assume cifs_demultiplex_thread() took the SIGKILL, zeroed server->tsk
then exitted. Then, cifs_mount() did a kthread_stop() on the now-NULL
pointer.

I don't see a non-racy way of fixing this as the code stands at present.
This:

--- a/fs/cifs/connect.c~cifs-oops-fix
+++ a/fs/cifs/connect.c
@@ -2086,7 +2086,6 @@ cifs_mount(struct super_block *sb, struc
if ((temp_rc == -ESHUTDOWN) &&
(pSesInfo->server) && (pSesInfo->server->tsk)) {
send_sig(SIGKILL,pSesInfo->server->tsk,1);
- kthread_stop(pSesInfo->server->tsk);
}
} else
cFYI(1, ("No session or bad tcon"));
_

has a decent chance of fixing it. But it's now racy against thread
*startup*: if we send SIGKILL to that task before it has done its
allow_signal(), it will presumably never get shut down.

Steve, can we just pull all the signal stuff out of there and use the
kthread machinery alone?

2007-05-23 02:59:19

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

Hi,
maybe we can add if sentence before kthread_stop.

diff -ur linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
--- linux/fs/cifs/connect.c 2007-05-23 10:59:13.000000000 +0000
+++ linux.new/fs/cifs/connect.c 2007-05-23 10:58:39.000000000 +0000
@@ -2070,7 +2070,8 @@
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ if(srvTcp->tsk)
+ kthread_stop(srvTcp->tsk);
}
}


Regards
dave

2007-05-23 03:21:51

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

On Wed, 23 May 2007 02:59:07 +0000 "young dave" <[email protected]> wrote:

> Hi,
> maybe we can add if sentence before kthread_stop.
>
> diff -ur linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
> --- linux/fs/cifs/connect.c 2007-05-23 10:59:13.000000000 +0000
> +++ linux.new/fs/cifs/connect.c 2007-05-23 10:58:39.000000000 +0000
> @@ -2070,7 +2070,8 @@
> spin_unlock(&GlobalMid_Lock);
> if (srvTcp->tsk) {
> send_sig(SIGKILL,srvTcp->tsk,1);
> - kthread_stop(srvTcp->tsk);
> + if(srvTcp->tsk)
> + kthread_stop(srvTcp->tsk);
> }
> }

Yeah, that's racy: once we've sent the signal, the kernel thread can write
NULL to srvTcp->tsk at any time.

2007-05-23 07:16:54

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

Hi,

> Yeah, that's racy: once we've sent the signal, the kernel thread can write
> NULL to srvTcp->tsk at any time.

Yes, here is another patch :

diff -ur linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
--- linux/fs/cifs/connect.c 2007-05-23 10:59:13.000000000 +0000
+++ linux.new/fs/cifs/connect.c 2007-05-23 15:16:11.000000000 +0000
@@ -650,6 +650,7 @@

spin_lock(&GlobalMid_Lock);
server->tcpStatus = CifsExiting;
+ kthread_stop(server->tsk);
server->tsk = NULL;
/* check if we have blocked requests that need to free */
/* Note that cifs_max_pending is normally 50, but
@@ -2070,7 +2071,6 @@
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
}
}
/* If find_unc succeeded then rc == 0 so we can not end */

Regards
dave

2007-05-23 08:37:18

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

Hi,
Sorry for the wrong patch in my last post.

How about save the tsk then call kthread_stop like this:

diff -udr linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
--- linux/fs/cifs/connect.c 2007-05-23 10:59:13.000000000 +0000
+++ linux.new/fs/cifs/connect.c 2007-05-23 16:33:54.000000000 +0000
@@ -2069,8 +2069,9 @@
srvTcp->tcpStatus = CifsExiting;
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
+ struct task_struct * tsk = srvTcp->tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ kthread_stop(tsk);
}
}
/* If find_unc succeeded then rc == 0 so we can not end */

Regards
dave

2007-05-23 13:29:00

by Steven French

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

Yes - this patch looks better.

I also am not sure whether the send_sig is still necessary to wake up a
thread blocked in tcp recv_msg (only do a wake_up_process vs. doing a
send_sig(SIGKILL) )

Unless someone knows for sure whether the send_sig is redundant, I would
like to merge Shaggy's version of the patch


"young dave" <[email protected]> wrote on 05/23/2007 03:37:04 AM:

> Hi,
> Sorry for the wrong patch in my last post.
>
> How about save the tsk then call kthread_stop like this:
>
> diff -udr linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
> --- linux/fs/cifs/connect.c 2007-05-23 10:59:13.000000000 +0000
> +++ linux.new/fs/cifs/connect.c 2007-05-23 16:33:54.000000000 +0000
> @@ -2069,8 +2069,9 @@
> srvTcp->tcpStatus = CifsExiting;
> spin_unlock(&GlobalMid_Lock);
> if (srvTcp->tsk) {
> + struct task_struct * tsk = srvTcp->tsk;
> send_sig(SIGKILL,srvTcp->tsk,1);
> - kthread_stop(srvTcp->tsk);
> + kthread_stop(tsk);
> }
> }
> /* If find_unc succeeded then rc == 0 so we can not end
*/
>
> Regards
> dave

Shaggy's suggested patch seems slightly better:

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 216fb62..b6e2158 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct
cifs_sb_info *cifs_sb,
srvTcp->tcpStatus =
CifsExiting;
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
+ struct
task_struct *tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ /*
srvTcp->tsk can be zeroed at any time */
+ tsk =
srvTcp->tsk;
+ if (tsk)
+ kthread_stop(tsk);
}
}
/* If find_unc succeeded then rc == 0 so
we can not end */

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com

2007-05-23 13:56:26

by Steven French

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

I don't think it is racy against thread startup since server->tsk is not
filled in until after the demultiplex thread does allow_signal.

I looked more at each of the three send_sig calls which precede the three
places we do kthread_stop on this thread. Without the three send_sig
calls (e.g. in the umount path) umount takes 7 more seconds (presumably
because the socket does not wake up as quickly) - so at first glance it
looks like we still need a way of waking up this thread when it is stuck
in a socket - and send_sig is the obvious way to do it.
I will merge Shaggy's version (similar to Dave Young's) into the cifs-2.6
tree now.


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com



Andrew Morton <[email protected]>
05/22/2007 09:22 PM

To
"young dave" <[email protected]>
cc
"Linux Kernel Mailing List" <[email protected]>, Steven
French/Austin/IBM@IBMUS
Subject
Re: 2.6.22-rc1-mm1 cifs_mount oops






On Wed, 23 May 2007 00:50:13 +0000 "young dave"
<[email protected]> wrote:

> Hi,
> when I use mount -t cifs , the kernel oops, seems break at
> kthread_stop, I'm not sure.
>
> But if I add the CONFIG_CIFS_DEBUG2=y to config file, rebuild kernel,
> then the oops disappeared.
>
> Below is the oops message:
>
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000008
> printing eip:
> c012e910
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: cifs smbfs radeon drm ipv6 snd_seq_dummy
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> snd_mixer_oss capability commoncap e100 mii psmouse sg evdev serio_raw
> snd_hda_intel snd_pcm snd_timer snd soundcore snd_page_alloc intel_agp
> agpgart i2c_i801 pcspkr
> CPU: 0
> EIP: 0060:[<c012e910>] Not tainted VLI
> EFLAGS: 00210246 (2.6.22-rc1-mm1 #3)
> EIP is at kthread_stop+0x10/0x90
> eax: c051bde0 ebx: 00000000 ecx: c1fba000 edx: c1fef040
> esi: 00000000 edi: 00000064 ebp: c2a36c80 esp: c1fbbd58
> ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
> Process mount.cifs (pid: 3955, ti=c1fba000 task=c2b38540
task.ti=c1fba000)
> Stack: c1fef040 ffffff90 ffffff90 f8a7a328 c285a504 f8a9a9fb 00000083
000000cf
> 000000dc 0000000b c2b38540 c2af5740 c292c540 00000000 00000000
c285a4c0
> 00000000 c411b400 c3a4f500 c3cec200 c1fef052 c291c1e0 c1fef037
c291c940
> Call Trace:
> [<f8a7a328>] cifs_mount+0xbe8/0xf10 [cifs]
> [<c02633fe>] idr_get_new_above_int+0x3e/0x50
> [<f8a6d04e>] cifs_read_super+0x4e/0x160 [cifs]
> [<c0166fc0>] set_anon_super+0x0/0xd0
> [<f8a6d6c0>] cifs_get_sb+0x60/0xd0 [cifs]
> [<c0167491>] vfs_kern_mount+0x91/0x130
> [<c017d648>] permit_mount+0x28/0xa0
> [<c017e09a>] do_new_mount+0x8a/0x140
> [<c017e95e>] do_mount+0x25e/0x280
> [<c0440000>] schedule+0x2e0/0x680
> [<c017e602>] exact_copy_from_user+0x32/0x70
> [<c017e69a>] copy_mount_options+0x5a/0xc0
> [<c017ec19>] sys_mount+0x79/0xc0
> [<c01040bc>] syscall_call+0x7/0xb
> =======================
> Code: 88 d1 d3 e0 89 43 5c 83 c4 18 5b c3 eb 0d 90 90 90 90 90 90 90
> 90 90 90 90 90 90 53 83 ec 08 89 c3 b8 e0 bd 51 c0 e8 90 26 31 00 <ff>
> 43 08 31 c9 b8 f0 c1 58 c0 89 0d ec c1 58 c0 e8 3b 01 00 00
> EIP: [<c012e910>] kthread_stop+0x10/0x90 SS:ESP 0068:c1fbbd58
>

I assume cifs_demultiplex_thread() took the SIGKILL, zeroed server->tsk
then exitted. Then, cifs_mount() did a kthread_stop() on the now-NULL
pointer.

I don't see a non-racy way of fixing this as the code stands at present.
This:

--- a/fs/cifs/connect.c~cifs-oops-fix
+++ a/fs/cifs/connect.c
@@ -2086,7 +2086,6 @@ cifs_mount(struct super_block *sb, struc
if ((temp_rc == -ESHUTDOWN) &&
(pSesInfo->server) && (pSesInfo->server->tsk)) {
send_sig(SIGKILL,pSesInfo->server->tsk,1);
- kthread_stop(pSesInfo->server->tsk);
}
} else
cFYI(1, ("No session or bad tcon"));
_

has a decent chance of fixing it. But it's now racy against thread
*startup*: if we send SIGKILL to that task before it has done its
allow_signal(), it will presumably never get shut down.

Steve, can we just pull all the signal stuff out of there and use the
kthread machinery alone?



2007-05-23 14:59:19

by Steve French

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

This is what I now have in the cifs git tree. (only minor change is
that I now have since fixed the missing space after the if)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 216fb62..f6963d1 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,15 @@ cifs_mount(struct super_block *sb, struct
cifs_sb_info *cifs_sb,
srvTcp->tcpStatus = CifsExiting;
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
+ struct task_struct *tsk;
+ /* If we could verify that kthread_stop would
+ always wake up processes blocked in
+ tcp in recv_mesg then we could remove the
+ send_sig call */
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ tsk = srvTcp->tsk;
+ if(tsk)
+ kthread_stop(srvTcp->tsk);
}
}
/* If find_unc succeeded then rc == 0 so we can not end */
@@ -2085,8 +2092,11 @@ cifs_mount(struct super_block *sb, struct
cifs_sb_info *cifs_sb,
/* if the socketUseCount is now zero */
if ((temp_rc == -ESHUTDOWN) &&
(pSesInfo->server) &&
(pSesInfo->server->tsk)) {
+ struct task_struct *tsk;

send_sig(SIGKILL,pSesInfo->server->tsk,1);
-
kthread_stop(pSesInfo->server->tsk);
+ tsk = pSesInfo->server->tsk;
+ if(tsk)
+ kthread_stop(tsk);
}
} else
cFYI(1, ("No session or bad tcon"));


--
Thanks,

Steve

2007-05-23 16:10:31

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

On Wed, 23 May 2007 08:28:47 -0500 Steven French <[email protected]> wrote:

> Yes - this patch looks better.
>
> I also am not sure whether the send_sig is still necessary to wake up a
> thread blocked in tcp recv_msg (only do a wake_up_process vs. doing a
> send_sig(SIGKILL) )
>
> Unless someone knows for sure whether the send_sig is redundant, I would
> like to merge Shaggy's version of the patch
>
>
> "young dave" <[email protected]> wrote on 05/23/2007 03:37:04 AM:
>
> > Hi,
> > Sorry for the wrong patch in my last post.
> >
> > How about save the tsk then call kthread_stop like this:
> >
> > diff -udr linux/fs/cifs/connect.c linux.new/fs/cifs/connect.c
> > --- linux/fs/cifs/connect.c 2007-05-23 10:59:13.000000000 +0000
> > +++ linux.new/fs/cifs/connect.c 2007-05-23 16:33:54.000000000 +0000
> > @@ -2069,8 +2069,9 @@
> > srvTcp->tcpStatus = CifsExiting;
> > spin_unlock(&GlobalMid_Lock);
> > if (srvTcp->tsk) {
> > + struct task_struct * tsk = srvTcp->tsk;
> > send_sig(SIGKILL,srvTcp->tsk,1);
> > - kthread_stop(srvTcp->tsk);
> > + kthread_stop(tsk);
> > }
> > }
> > /* If find_unc succeeded then rc == 0 so we can not end
> */
> >
> > Regards
> > dave
>
> Shaggy's suggested patch seems slightly better:
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 216fb62..b6e2158 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct
> cifs_sb_info *cifs_sb,
> srvTcp->tcpStatus =
> CifsExiting;
> spin_unlock(&GlobalMid_Lock);
> if (srvTcp->tsk) {
> + struct
> task_struct *tsk;
> send_sig(SIGKILL,srvTcp->tsk,1);
> - kthread_stop(srvTcp->tsk);
> + /*
> srvTcp->tsk can be zeroed at any time */
> + tsk =
> srvTcp->tsk;
> + if (tsk)
> + kthread_stop(tsk);
> }
> }
> /* If find_unc succeeded then rc == 0 so
> we can not end */

The wordwrapping made that extraordinarily hard to read. Repairing...

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct cifs_sb_info *cifs_sb,
srvTcp->tcpStatus = CifsExiting;
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
struct task_struct *tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ /* srvTcp->tsk can be zeroed at any time */
+ tsk = srvTcp->tsk;
+ if (tsk)
+ kthread_stop(tsk);
}
}
/* If find_unc succeeded then rc == 0 so we can not end */


This can end up running kthread_stop() against an already-exited task.

2007-05-23 22:14:55

by Steven French

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

> This can end up running kthread_stop() against an already-exited task.
I don't think so since cifs_demultiplex_thread waits sufficiently long
before
exit but after setting srvTcp->tsk to zero (the wait is immediately after
waking up any processes that may be blocked on requests on this socket to
give
file requests time to exit from the cifs vfs). As long as this (mount)
process were
scheduled within 1.25 seconds it should be ok although more complicated
than I
would like (that is why this thread was the last one in cifs to switch
to kthread API).

I wish there were an obvious way to do this, perhaps without using
kthread_stop at all
for this thread (since that by itself does not seem to work for threads
blocked
in various socket calls).


--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2069,8 +2069,12 @@ cifs_mount(struct super_block *sb, struct
cifs_sb_info *cifs_sb,
srvTcp->tcpStatus =
CifsExiting;
spin_unlock(&GlobalMid_Lock);
if (srvTcp->tsk) {
struct
task_struct *tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ /*
srvTcp->tsk can be zeroed at any time */
+ tsk =
srvTcp->tsk;
+ if (tsk)
+ kthread_stop(tsk);
}


I don't think so


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com

2007-05-24 01:05:22

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

Hi,

I have one problem about this: after the srvTcp->tsk is set to NULL
(maybe the thread is still there, isn't it?), is the kthread still
needed to be stopped by calling kthread_stop()? If it is true, then
the task_struct should be saved before send_sig like my patch:

if (srvTcp->tsk) {
+ struct task_struct * tsk = srvTcp->tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ kthread_stop(tsk);

Regards
dave

2007-05-24 02:38:55

by Steven French

[permalink] [raw]
Subject: Re: 2.6.22-rc1-mm1 cifs_mount oops

If srvTcp->tsk is NULL then the thread (cifs_demultiplex_thread) is
getting ready to exit and kthread_stop would not be needed.

It would probably be possible to recode this so we don't need to call
kthread_stop at all (send_sig is apparently required to wake up this
thread when blocked in certain places in the tcp stack - and in
combination with the existing flags might be good enough) - but I don't
know if it would make it simpler. Fortunately there is no race, a few
lines after srvTcp->tsk is set to zero by the cifs_demultipex_thread, it
will sleep briefly before exiting (kthread_stop won't be called on a
thread that does not exist).

That section of code in fs/cifs/connect.c now looks like:

2071 if (srvTcp->tsk) {
2072 struct task_struct *tsk;
2073
/* If we could verify that kthread_stop would
2074
always wake up processes blocked in
2075
tcp in recv_mesg then we could remove the
2076 send_sig call */
2077 send_sig(SIGKILL,srvTcp->tsk,1);
2078 tsk = srvTcp->tsk;
2079 if(tsk)
2080 kthread_stop(srvTcp->tsk);
2081 }
2082 }
2083
/* If find_unc succeeded then rc == 0 so we can not end */
2084
if (tcon) /* up accidently freeing someone elses tcon struct */
2085 tconInfoFree(tcon);
2086 if (existingCifsSes == NULL) {
2087 if (pSesInfo) {
2088 if ((pSesInfo->server) &&
2089 (pSesInfo->status == CifsGood)) {
2090 int temp_rc;
2091
temp_rc = CIFSSMBLogoff(xid, pSesInfo);
2092
/* if the socketUseCount is now zero */
2093
if ((temp_rc == -ESHUTDOWN) &&
2094
(pSesInfo->server) && (pSesInfo->server->tsk)) {
2095
struct task_struct *tsk;
2096
send_sig(SIGKILL,pSesInfo->server->tsk,1);
2097
tsk = pSesInfo->server->tsk;
2098 if(tsk)
2099
kthread_stop(tsk);


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com



"young dave" <[email protected]>
05/23/2007 08:05 PM

To
Steven French/Austin/IBM@IBMUS
cc
"Andrew Morton" <[email protected]>, David
Kleikamp/Austin/IBM@IBMUS, "Linux Kernel Mailing List"
<[email protected]>, Shirish S Pargaonkar/Austin/IBM@IBMUS
Subject
Re: 2.6.22-rc1-mm1 cifs_mount oops






Hi,

I have one problem about this: after the srvTcp->tsk is set to NULL
(maybe the thread is still there, isn't it?), is the kthread still
needed to be stopped by calling kthread_stop()? If it is true, then
the task_struct should be saved before send_sig like my patch:

if (srvTcp->tsk) {
+ struct task_struct * tsk = srvTcp->tsk;
send_sig(SIGKILL,srvTcp->tsk,1);
- kthread_stop(srvTcp->tsk);
+ kthread_stop(tsk);

Regards
dave