Jan 22 13:27:59 warsheep Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jan 22 13:27:59 warsheep printing eip:
Jan 22 13:27:59 warsheep 00000000
Jan 22 13:27:59 warsheep *pgd = cde9ddb400000000
Jan 22 13:27:59 warsheep *pmd = cde9ddb400000000
Jan 22 13:27:59 warsheep Oops: 0000 [#1]
Jan 22 13:27:59 warsheep SMP
Jan 22 13:27:59 warsheep CPU: 0
Jan 22 13:27:59 warsheep EIP: 0060:[<00000000>] Not tainted VLI
Jan 22 13:27:59 warsheep EFLAGS: 00010282 (2.6.10-hardened-r2-warsheep62)
Jan 22 13:27:59 warsheep EIP is at 0x0
Jan 22 13:27:59 warsheep eax: 00000000 ebx: de455000 ecx: c02c60e0 edx: c6b41000
Jan 22 13:27:59 warsheep esi: de455000 edi: 00000000 ebp: dd0a2680 esp: cde9de9c
Jan 22 13:27:59 warsheep ds: 007b es: 007b ss: 0068
Jan 22 13:27:59 warsheep Process pptpctrl (pid: 16689, threadinfo=cde9c000 task=d112ca20)
Jan 22 13:27:59 warsheep Stack: c02c97bc c6b41000 00000000 c02c895c de455000 04949168 c03d0106 de455000
Jan 22 13:27:59 warsheep de45500c dd0a2680 00000000 c02c4141 de455000 dd0a2680 00000000 c01c7d49
Jan 22 13:27:59 warsheep dd0a2680 00000020 00000005 00000005 c01da72f dd0a2680 00000000 00000000
Jan 22 13:27:59 warsheep Call Trace:
Jan 22 13:27:59 warsheep [<c02c97bc>] pty_chars_in_buffer+0x2c/0x50
Jan 22 13:27:59 warsheep [<c02c895c>] normal_poll+0xfc/0x16b
Jan 22 13:27:59 warsheep [<c03d0106>] schedule_timeout+0x76/0xc0
Jan 22 13:27:59 warsheep [<c02c4141>] tty_poll+0xa1/0xc0
Jan 22 13:27:59 warsheep [<c01c7d49>] fget+0x49/0x60
Jan 22 13:27:59 warsheep [<c01da72f>] do_select+0x26f/0x2e0
Jan 22 13:27:59 warsheep [<c01da2f0>] __pollwait+0x0/0xd0
Jan 22 13:27:59 warsheep [<c01daabb>] sys_select+0x2db/0x4f0
Jan 22 13:27:59 warsheep [<c0173049>] sysenter_past_esp+0x52/0x79
Jan 22 13:27:59 warsheep Code: Bad EIP value.
The oops ocures only when the kernel is build with SMP and HT support, in UP mode the oops doesn't occur!
I have a 2.6.10 kernel with SMP and HT compiled kernel, I have a P4 3GHz with HT
a have a VPN server with pppd and pptpd(poptop) and and average of 130
simultanious connections, the oops doesn't occur at a particular number
of simulationus VPN connection.I can build a kernel with debugging enabled or something to help to track th
source of the problem. Please CC as I am not subscribed to this mailing list.
--
ierdnah <[email protected]>
Interesting. That last call trace entry is the call in
pty_chars_in_buffer() to
/* The ldisc must report 0 if no characters available to be read */
count = to->ldisc.chars_in_buffer(to);
and it looks like it has jumped to address zero.
However, we _just_ compared the fn pointer to zero immediately before, and
while there could certainly have been a race that cleared it in between
the test and the call, normally we wouldn't even have re-loaded the value
at all, but kept it in a register instead.
That said, it does act like a race. Somebody clearing the ldisc and racing
with somebody using it?
Can you do a
gdb vmlinux
disassemble pty_chars_in_buffer
to show what it looks like (whether it reloads the value, and what the
registers are - it looks like either %eax or %edi is all zeroes, but I'd
like to verify that it matches your code generation).
Alan? Any ideas? The tty_select() path seems to take a ldisc reference,
but does that guarantee that the ldisc won't _change_? What happens if the
line discipline is reset from ppp to regular (or set to ppp)
asymchronously? You've been deep in this area lately..
Linus
On Sun, 23 Jan 2005, ierdnah wrote:
>
> Jan 22 13:27:59 warsheep Unable to handle kernel NULL pointer dereference at virtual address 00000000
> Jan 22 13:27:59 warsheep printing eip:
> Jan 22 13:27:59 warsheep 00000000
> Jan 22 13:27:59 warsheep *pgd = cde9ddb400000000
> Jan 22 13:27:59 warsheep *pmd = cde9ddb400000000
> Jan 22 13:27:59 warsheep Oops: 0000 [#1]
> Jan 22 13:27:59 warsheep SMP
> Jan 22 13:27:59 warsheep CPU: 0
> Jan 22 13:27:59 warsheep EIP: 0060:[<00000000>] Not tainted VLI
> Jan 22 13:27:59 warsheep EFLAGS: 00010282 (2.6.10-hardened-r2-warsheep62)
> Jan 22 13:27:59 warsheep EIP is at 0x0
> Jan 22 13:27:59 warsheep eax: 00000000 ebx: de455000 ecx: c02c60e0 edx: c6b41000
> Jan 22 13:27:59 warsheep esi: de455000 edi: 00000000 ebp: dd0a2680 esp: cde9de9c
> Jan 22 13:27:59 warsheep ds: 007b es: 007b ss: 0068
> Jan 22 13:27:59 warsheep Process pptpctrl (pid: 16689, threadinfo=cde9c000 task=d112ca20)
> Jan 22 13:27:59 warsheep Stack: c02c97bc c6b41000 00000000 c02c895c de455000 04949168 c03d0106 de455000
> Jan 22 13:27:59 warsheep de45500c dd0a2680 00000000 c02c4141 de455000 dd0a2680 00000000 c01c7d49
> Jan 22 13:27:59 warsheep dd0a2680 00000020 00000005 00000005 c01da72f dd0a2680 00000000 00000000
> Jan 22 13:27:59 warsheep Call Trace:
> Jan 22 13:27:59 warsheep [<c02c97bc>] pty_chars_in_buffer+0x2c/0x50
> Jan 22 13:27:59 warsheep [<c02c895c>] normal_poll+0xfc/0x16b
> Jan 22 13:27:59 warsheep [<c03d0106>] schedule_timeout+0x76/0xc0
> Jan 22 13:27:59 warsheep [<c02c4141>] tty_poll+0xa1/0xc0
> Jan 22 13:27:59 warsheep [<c01c7d49>] fget+0x49/0x60
> Jan 22 13:27:59 warsheep [<c01da72f>] do_select+0x26f/0x2e0
> Jan 22 13:27:59 warsheep [<c01da2f0>] __pollwait+0x0/0xd0
> Jan 22 13:27:59 warsheep [<c01daabb>] sys_select+0x2db/0x4f0
> Jan 22 13:27:59 warsheep [<c0173049>] sysenter_past_esp+0x52/0x79
> Jan 22 13:27:59 warsheep Code: Bad EIP value.
>
> The oops ocures only when the kernel is build with SMP and HT support, in UP mode the oops doesn't occur!
> I have a 2.6.10 kernel with SMP and HT compiled kernel, I have a P4 3GHz with HT
> a have a VPN server with pppd and pptpd(poptop) and and average of 130
> simultanious connections, the oops doesn't occur at a particular number
> of simulationus VPN connection.I can build a kernel with debugging enabled or something to help to track th
> source of the problem. Please CC as I am not subscribed to this mailing list.
>
> --
> ierdnah <[email protected]>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
(gdb) disassemble pty_chars_in_buffer
Dump of assembler code for function pty_chars_in_buffer:
0xc02c9790 <pty_chars_in_buffer+0>: sub $0x8,%esp
0xc02c9793 <pty_chars_in_buffer+3>: xor %eax,%eax
0xc02c9795 <pty_chars_in_buffer+5>: mov %ebx,0x4(%esp,1)
0xc02c9799 <pty_chars_in_buffer+9>: mov 0xc(%esp,1),%ebx
0xc02c979d <pty_chars_in_buffer+13>: mov 0xd0(%ebx),%edx
0xc02c97a3 <pty_chars_in_buffer+19>: test %edx,%edx
0xc02c97a5 <pty_chars_in_buffer+21>: je 0xc02c97ae
<pty_chars_in_buffer+30>
0xc02c97a7 <pty_chars_in_buffer+23>: mov 0x28(%edx),%ecx
0xc02c97aa <pty_chars_in_buffer+26>: test %ecx,%ecx
0xc02c97ac <pty_chars_in_buffer+28>: jne 0xc02c97b6
<pty_chars_in_buffer+38>
0xc02c97ae <pty_chars_in_buffer+30>: mov 0x4(%esp,1),%ebx
0xc02c97b2 <pty_chars_in_buffer+34>: add $0x8,%esp
0xc02c97b5 <pty_chars_in_buffer+37>: ret
0xc02c97b6 <pty_chars_in_buffer+38>: mov %edx,(%esp,1)
0xc02c97b9 <pty_chars_in_buffer+41>: call *0x28(%edx)
0xc02c97bc <pty_chars_in_buffer+44>: mov %eax,%edx
0xc02c97be <pty_chars_in_buffer+46>: mov 0x4(%ebx),%eax
0xc02c97c1 <pty_chars_in_buffer+49>: cmpw $0x2,0x76(%eax)
0xc02c97c6 <pty_chars_in_buffer+54>: je 0xc02c97d5
<pty_chars_in_buffer+69>
0xc02c97c8 <pty_chars_in_buffer+56>: xor %eax,%eax
0xc02c97ca <pty_chars_in_buffer+58>: cmp $0x800,%edx
0xc02c97d0 <pty_chars_in_buffer+64>: cmovge %edx,%eax
0xc02c97d3 <pty_chars_in_buffer+67>: jmp 0xc02c97ae
<pty_chars_in_buffer+30>
0xc02c97d5 <pty_chars_in_buffer+69>: mov %edx,%eax
0xc02c97d7 <pty_chars_in_buffer+71>: jmp 0xc02c97ae
<pty_chars_in_buffer+30>
0xc02c97d9 <pty_chars_in_buffer+73>: lea 0x0(%esi,1),%esi
End of assembler dump.
this is another compiled kernel, but is compiled with the same .config
file and same gcc version...because I only have the bzImage, how do I
convert it to vmlinux?
On Sat, 2005-01-22 at 22:43 -0800, Linus Torvalds wrote:
> Interesting. That last call trace entry is the call in
> pty_chars_in_buffer() to
>
> /* The ldisc must report 0 if no characters available to be read */
> count = to->ldisc.chars_in_buffer(to);
>
> and it looks like it has jumped to address zero.
>
> However, we _just_ compared the fn pointer to zero immediately before, and
> while there could certainly have been a race that cleared it in between
> the test and the call, normally we wouldn't even have re-loaded the value
> at all, but kept it in a register instead.
>
> That said, it does act like a race. Somebody clearing the ldisc and racing
> with somebody using it?
>
> Can you do a
>
> gdb vmlinux
>
> disassemble pty_chars_in_buffer
>
> to show what it looks like (whether it reloads the value, and what the
> registers are - it looks like either %eax or %edi is all zeroes, but I'd
> like to verify that it matches your code generation).
>
> Alan? Any ideas? The tty_select() path seems to take a ldisc reference,
> but does that guarantee that the ldisc won't _change_? What happens if the
> line discipline is reset from ppp to regular (or set to ppp)
> asymchronously? You've been deep in this area lately..
>
> Linus
>
> On Sun, 23 Jan 2005, ierdnah wrote:
> >
> > Jan 22 13:27:59 warsheep Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > Jan 22 13:27:59 warsheep printing eip:
> > Jan 22 13:27:59 warsheep 00000000
> > Jan 22 13:27:59 warsheep *pgd = cde9ddb400000000
> > Jan 22 13:27:59 warsheep *pmd = cde9ddb400000000
> > Jan 22 13:27:59 warsheep Oops: 0000 [#1]
> > Jan 22 13:27:59 warsheep SMP
> > Jan 22 13:27:59 warsheep CPU: 0
> > Jan 22 13:27:59 warsheep EIP: 0060:[<00000000>] Not tainted VLI
> > Jan 22 13:27:59 warsheep EFLAGS: 00010282 (2.6.10-hardened-r2-warsheep62)
> > Jan 22 13:27:59 warsheep EIP is at 0x0
> > Jan 22 13:27:59 warsheep eax: 00000000 ebx: de455000 ecx: c02c60e0 edx: c6b41000
> > Jan 22 13:27:59 warsheep esi: de455000 edi: 00000000 ebp: dd0a2680 esp: cde9de9c
> > Jan 22 13:27:59 warsheep ds: 007b es: 007b ss: 0068
> > Jan 22 13:27:59 warsheep Process pptpctrl (pid: 16689, threadinfo=cde9c000 task=d112ca20)
> > Jan 22 13:27:59 warsheep Stack: c02c97bc c6b41000 00000000 c02c895c de455000 04949168 c03d0106 de455000
> > Jan 22 13:27:59 warsheep de45500c dd0a2680 00000000 c02c4141 de455000 dd0a2680 00000000 c01c7d49
> > Jan 22 13:27:59 warsheep dd0a2680 00000020 00000005 00000005 c01da72f dd0a2680 00000000 00000000
> > Jan 22 13:27:59 warsheep Call Trace:
> > Jan 22 13:27:59 warsheep [<c02c97bc>] pty_chars_in_buffer+0x2c/0x50
> > Jan 22 13:27:59 warsheep [<c02c895c>] normal_poll+0xfc/0x16b
> > Jan 22 13:27:59 warsheep [<c03d0106>] schedule_timeout+0x76/0xc0
> > Jan 22 13:27:59 warsheep [<c02c4141>] tty_poll+0xa1/0xc0
> > Jan 22 13:27:59 warsheep [<c01c7d49>] fget+0x49/0x60
> > Jan 22 13:27:59 warsheep [<c01da72f>] do_select+0x26f/0x2e0
> > Jan 22 13:27:59 warsheep [<c01da2f0>] __pollwait+0x0/0xd0
> > Jan 22 13:27:59 warsheep [<c01daabb>] sys_select+0x2db/0x4f0
> > Jan 22 13:27:59 warsheep [<c0173049>] sysenter_past_esp+0x52/0x79
> > Jan 22 13:27:59 warsheep Code: Bad EIP value.
> >
> > The oops ocures only when the kernel is build with SMP and HT support, in UP mode the oops doesn't occur!
> > I have a 2.6.10 kernel with SMP and HT compiled kernel, I have a P4 3GHz with HT
> > a have a VPN server with pppd and pptpd(poptop) and and average of 130
> > simultanious connections, the oops doesn't occur at a particular number
> > of simulationus VPN connection.I can build a kernel with debugging enabled or something to help to track th
> > source of the problem. Please CC as I am not subscribed to this mailing list.
> >
> > --
> > ierdnah <[email protected]>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
--
ierdnah <[email protected]>
On Sat, 22 Jan 2005 22:43:50 -0800 (PST) Linus Torvalds wrote:
> Interesting. That last call trace entry is the call in
> pty_chars_in_buffer() to
>
> /* The ldisc must report 0 if no characters available to be read */
> count = to->ldisc.chars_in_buffer(to);
>
> and it looks like it has jumped to address zero.
>
> However, we _just_ compared the fn pointer to zero immediately before, and
> while there could certainly have been a race that cleared it in between
> the test and the call, normally we wouldn't even have re-loaded the value
> at all, but kept it in a register instead.
>
> That said, it does act like a race. Somebody clearing the ldisc and racing
> with somebody using it?
>
> Can you do a
>
> gdb vmlinux
>
> disassemble pty_chars_in_buffer
>
> to show what it looks like (whether it reloads the value, and what the
> registers are - it looks like either %eax or %edi is all zeroes, but I'd
> like to verify that it matches your code generation).
>
> Alan? Any ideas? The tty_select() path seems to take a ldisc reference,
> but does that guarantee that the ldisc won't _change_?
tty_poll() grabs ldisc reference for the tty it was called with;
however, in this case pty_chars_in_buffer() accesses another ldisc
(tty->link->ldisc) without grabbing a reference to it. BTW, many other
pty_* functions do the same thing.
Is calling tty_ldisc_ref(tty->link) safe here? There is a comment
warning about possible deadlocks before pty_write().
On Sun, 23 Jan 2005, ierdnah wrote:
>
> (gdb) disassemble pty_chars_in_buffer
...
> 0xc02c97a7 <pty_chars_in_buffer+23>: mov 0x28(%edx),%ecx **
> 0xc02c97aa <pty_chars_in_buffer+26>: test %ecx,%ecx
> 0xc02c97ac <pty_chars_in_buffer+28>: jne 0xc02c97b6
> 0xc02c97ae <pty_chars_in_buffer+30>: mov 0x4(%esp,1),%ebx
> 0xc02c97b2 <pty_chars_in_buffer+34>: add $0x8,%esp
> 0xc02c97b5 <pty_chars_in_buffer+37>: ret
> 0xc02c97b6 <pty_chars_in_buffer+38>: mov %edx,(%esp,1)
> 0xc02c97b9 <pty_chars_in_buffer+41>: call *0x28(%edx) **
Ahh, indeed. When I compiled this function, it kept 0x28(%edx) in a
register. Your config/compiler combination does not, so there actually is
a race condition if somebody changes the function pointer.
> this is another compiled kernel, but is compiled with the same .config
> file and same gcc version...because I only have the bzImage, how do I
> convert it to vmlinux?
Don't worry, it clearly shows that it's at least possible, and worth
looking at.
For testing, a patch like this might get rid of the problem by hiding the
race (but for all I know, gcc ends up re-loading it anyway) - you might
want to check the disassembly.
However, this patch is just for testing, to verify that your problem
really is that particular race. It's not a proper fix.
I suspect that pty's should always lock each others line disciplines too,
not just their "own" side.
Linus
----
--- 1.32/drivers/char/pty.c 2005-01-10 17:29:36 -08:00
+++ edited/drivers/char/pty.c 2005-01-23 09:49:16 -08:00
@@ -149,13 +149,15 @@
static int pty_chars_in_buffer(struct tty_struct *tty)
{
struct tty_struct *to = tty->link;
+ ssize_t (*chars_in_buffer)(struct tty_struct *);
int count;
- if (!to || !to->ldisc.chars_in_buffer)
+ /* We should get the line discipline lock for "tty->link" */
+ if (!to || !(chars_in_buffer = to->ldisc.chars_in_buffer))
return 0;
/* The ldisc must report 0 if no characters available to be read */
- count = to->ldisc.chars_in_buffer(to);
+ count = chars_in_buffer(to);
if (tty->driver->subtype == PTY_TYPE_SLAVE) return count;
On Sun, 23 Jan 2005, Sergey Vlasov wrote:
>
> tty_poll() grabs ldisc reference for the tty it was called with;
> however, in this case pty_chars_in_buffer() accesses another ldisc
> (tty->link->ldisc) without grabbing a reference to it. BTW, many other
> pty_* functions do the same thing.
Yes, I think you put the finger on it.
> Is calling tty_ldisc_ref(tty->link) safe here? There is a comment
> warning about possible deadlocks before pty_write().
I think it's only the tty_ldisc_ref_wait() thing that can deadlock (and
even that is likely safe if you just specify an order - "masters first" or
something). Adding a nonblocking "tty_ldisc_ref()" looks safe, ie
something like the appended.
This has the problem (apart from the fact that it's obviously totally
untested ;) that it looks like every single pty function would need to do
it, so it would be nicer if "tty_ldisc_ref_wait()" would just always get
both references (ie do the ordering). Alan?
Linus
----
--- 1.32/drivers/char/pty.c 2005-01-10 17:29:36 -08:00
+++ edited/drivers/char/pty.c 2005-01-23 10:21:04 -08:00
@@ -149,13 +149,17 @@
static int pty_chars_in_buffer(struct tty_struct *tty)
{
struct tty_struct *to = tty->link;
- int count;
+ int count = 0;
- if (!to || !to->ldisc.chars_in_buffer)
- return 0;
-
- /* The ldisc must report 0 if no characters available to be read */
- count = to->ldisc.chars_in_buffer(to);
+ if (to) {
+ struct tty_ldisc *ld = tty_ldisc_ref(to);
+ if (ld) {
+ if (ld->chars_in_buffer) {
+ count = ld->chars_in_buffer(to);
+ tty_ldisc_deref(ld);
+ }
+ }
+ }
if (tty->driver->subtype == PTY_TYPE_SLAVE) return count;
On Sul, 2005-01-23 at 13:15, Sergey Vlasov wrote:
> On Sat, 22 Jan 2005 22:43:50 -0800 (PST) Linus Torvalds wrote:
> tty_poll() grabs ldisc reference for the tty it was called with;
> however, in this case pty_chars_in_buffer() accesses another ldisc
> (tty->link->ldisc) without grabbing a reference to it. BTW, many other
> pty_* functions do the same thing.
Correct - the pty code still relies on all sorts of "lock_kernel"
assumptions so its probably very very racy. It can't however easily use
tty_ldisc_()
functions that wait because the lock ordering is indeterminate.
On Sul, 2005-01-23 at 18:22, Linus Torvalds wrote:
> I think it's only the tty_ldisc_ref_wait() thing that can deadlock (and
> even that is likely safe if you just specify an order - "masters first" or
> something). Adding a nonblocking "tty_ldisc_ref()" looks safe, ie
> something like the appended.
Yes.
> This has the problem (apart from the fact that it's obviously totally
> untested ;) that it looks like every single pty function would need to do
> it, so it would be nicer if "tty_ldisc_ref_wait()" would just always get
> both references (ie do the ordering). Alan?
Almost every user of tty_ldisc_ref_* doesn't need to lock two objects
and
the code at that layer has no knowledge of pty/tty pairs. The needed
info is exposed however in order to do this since the tty knows if its a
tty/pty pair. I'll take a look - chances are it can be buried in
tty_ldisc_ref.
I'm dubious this is the actual bug although vhangup on a pty might
trigger it I guess.
Alan
On Mon, 24 Jan 2005, Alan Cox wrote:
>
> Almost every user of tty_ldisc_ref_* doesn't need to lock two objects and
> the code at that layer has no knowledge of pty/tty pairs.
Hmm.. It looks different to me - almost all users of the xxx_ref_wait()
that act on pty's _do_ seem to use tty->link->ldisc.*()". The one counter-
example seems to be pty_set_termios().
But I only did a quick grep in drivers/char, so there might be other users
out there I didn't even think of.
While doing that, I did notice that all the other tty ops do take the
kernel lock still - both read() and write() do. I wonder if we should just
make poll() consistent, regardless of other issues (obviously, the nicest
way to make it consistent would be to remove the kernel lock from the
read/write paths, but I assume nobody wants to go through the locking).
> I'm dubious this is the actual bug although vhangup on a pty might
> trigger it I guess.
ierdnah - can you verify whether either of my silly patches seems to have
made any difference?
And I don't think it can be vhangup - that wouldn't clear the
chars_in_buffer function pointer (it might change it to the
n_tty_chars_in_buffer or something by resetting to N_TTY).
I'd assume that it's when the other end is changed to using the
"ppp_[sync_]ldisc", both of which do actually have a NULL
"chars_in_buffer". (Alternatively, maybe "chars_in_buffer" isn't actually
NULL at all - the backtrace might be confused by either some really
strange memory corruption or a tail-call to a NULL function pointer by
some other line discipline thing, although quite frankly I think that
sounds extremely unlikely).
So another potential fix is to just make all tty line disciplines always
have a valid "chars_in_buffer" pointer. We could even do automatically in
"tty_set_operations()", ie just do a
driver->chars_in_buffer = op->chars_in_buffer ? : default_chars_in_buffer;
(where the default just returns zero) thing or something.
That would even speed up the normal cases (no need to test the pointer for
NULL).
Linus
On Llu, 2005-01-24 at 17:58, Linus Torvalds wrote:
> On Mon, 24 Jan 2005, Alan Cox wrote:
> While doing that, I did notice that all the other tty ops do take the
> kernel lock still - both read() and write() do. I wonder if we should just
> make poll() consistent, regardless of other issues (obviously, the nicest
> way to make it consistent would be to remove the kernel lock from the
> read/write paths, but I assume nobody wants to go through the locking).
The lock is needed for the console at least and for some driver level
bits.
> And I don't think it can be vhangup - that wouldn't clear the
> chars_in_buffer function pointer (it might change it to the
> n_tty_chars_in_buffer or something by resetting to N_TTY).
True - that would need an ldisc.close and the vhangup no longer does
that.
> So another potential fix is to just make all tty line disciplines always
> have a valid "chars_in_buffer" pointer. We could even do automatically in
> "tty_set_operations()", ie just do a
Not really. I'm in n_tty ->chars_in_buffer and the race occurs -> I'm
dead
as I'm already running the wrong chars_in_buffer function on CPU #0 when
CPU #1
comes along and flips ldisc. Holding the right locks matters here. We
also
need both locks holding really for tty/pty pairs because pty_write wants
to
output to the ldisc of the other side. Treating no ldisc as no
characters
seems very sane however and is easy to code up - if the ldisc_get fails
we
can sleep on the ldisc level wait queue then retry. Ugly enough to want
to
hide the contents in tty_io.c but doable. (When I get time - likely to
be
a couple of weeks if nobody does it first)
Alan
On Sun, 2005-01-23 at 09:51 -0800, Linus Torvalds wrote:
I applied the patch and the uptime is 1 day and no oops, i will wait 2
more days too see if the oops appears, then I will compile the kernel
with CONFIG_PREEMPT. Before, when PREEMPT was enabled, the oops appeared
immediatly after booting and more often.
> ----
> --- 1.32/drivers/char/pty.c 2005-01-10 17:29:36 -08:00
> +++ edited/drivers/char/pty.c 2005-01-23 09:49:16 -08:00
> @@ -149,13 +149,15 @@
> static int pty_chars_in_buffer(struct tty_struct *tty)
> {
> struct tty_struct *to = tty->link;
> + ssize_t (*chars_in_buffer)(struct tty_struct *);
> int count;
>
> - if (!to || !to->ldisc.chars_in_buffer)
> + /* We should get the line discipline lock for "tty->link" */
> + if (!to || !(chars_in_buffer = to->ldisc.chars_in_buffer))
> return 0;
>
> /* The ldisc must report 0 if no characters available to be read */
> - count = to->ldisc.chars_in_buffer(to);
> + count = chars_in_buffer(to);
>
> if (tty->driver->subtype == PTY_TYPE_SLAVE) return count;
>
>
--
ierdnah <[email protected]>
On Sun, 2005-01-23 at 09:51 -0800, Linus Torvalds wrote:
with this patch the oops is gone(also tested with PREEMPT and no oops)
>
> ----
> --- 1.32/drivers/char/pty.c 2005-01-10 17:29:36 -08:00
> +++ edited/drivers/char/pty.c 2005-01-23 09:49:16 -08:00
> @@ -149,13 +149,15 @@
> static int pty_chars_in_buffer(struct tty_struct *tty)
> {
> struct tty_struct *to = tty->link;
> + ssize_t (*chars_in_buffer)(struct tty_struct *);
> int count;
>
> - if (!to || !to->ldisc.chars_in_buffer)
> + /* We should get the line discipline lock for "tty->link" */
> + if (!to || !(chars_in_buffer = to->ldisc.chars_in_buffer))
> return 0;
>
> /* The ldisc must report 0 if no characters available to be read */
> - count = to->ldisc.chars_in_buffer(to);
> + count = chars_in_buffer(to);
>
> if (tty->driver->subtype == PTY_TYPE_SLAVE) return count;
>
is this patch better? should i test this too?
--- 1.32/drivers/char/pty.c 2005-01-10 17:29:36 -08:00
+++ edited/drivers/char/pty.c 2005-01-23 10:21:04 -08:00
@@ -149,13 +149,17 @@
static int pty_chars_in_buffer(struct tty_struct *tty)
{
struct tty_struct *to = tty->link;
- int count;
+ int count = 0;
- if (!to || !to->ldisc.chars_in_buffer)
- return 0;
-
- /* The ldisc must report 0 if no characters available to be read
*/
- count = to->ldisc.chars_in_buffer(to);
+ if (to) {
+ struct tty_ldisc *ld = tty_ldisc_ref(to);
+ if (ld) {
+ if (ld->chars_in_buffer) {
+ count = ld->chars_in_buffer(to);
+ tty_ldisc_deref(ld);
+ }
+ }
+ }
if (tty->driver->subtype == PTY_TYPE_SLAVE) return count;
--
ierdnah <[email protected]>
On Fri, 28 Jan 2005, ierdnah wrote:
>
> is this patch better? should i test this too?
You probably should. The patch you've tested is really ugly, and not a fix
at all - it's really just depending on the compiler generating a specific
code sequence that will hide the race. As such, it's a patch I would only
accept in the standard kernel as an absolute last resort.
In contrast, the second patch I tested may actually _fix_ the race.
The fact that the first patch makes the oops go away is a good thing,
though: it shows that your oops really was due to that small race window,
and as such it helps validate that it wasn't anything else.
Thanks,
Linus
On Thu, 2005-01-27 at 15:35 -0800, Linus Torvalds wrote:
the last patch works, but the load increases very much (normally with
200 VPN connections I have a load of maximum 10, with this patch I have
a load of 50-100 - after 30 min of uptime)
> You probably should. The patch you've tested is really ugly, and not a fix
> at all - it's really just depending on the compiler generating a specific
> code sequence that will hide the race. As such, it's a patch I would only
> accept in the standard kernel as an absolute last resort.
>
> In contrast, the second patch I tested may actually _fix_ the race.
>
> The fact that the first patch makes the oops go away is a good thing,
> though: it shows that your oops really was due to that small race window,
> and as such it helps validate that it wasn't anything else.
--- 1.32/drivers/char/pty.c 2005-01-10 17:29:36 -08:00
+++ edited/drivers/char/pty.c 2005-01-23 10:21:04 -08:00
@@ -149,13 +149,17 @@
static int pty_chars_in_buffer(struct tty_struct *tty)
{
struct tty_struct *to = tty->link;
- int count;
+ int count = 0;
- if (!to || !to->ldisc.chars_in_buffer)
- return 0;
-
- /* The ldisc must report 0 if no characters available to be read
*/
- count = to->ldisc.chars_in_buffer(to);
+ if (to) {
+ struct tty_ldisc *ld = tty_ldisc_ref(to);
+ if (ld) {
+ if (ld->chars_in_buffer) {
+ count = ld->chars_in_buffer(to);
+ tty_ldisc_deref(ld);
+ }
+ }
+ }
if (tty->driver->subtype == PTY_TYPE_SLAVE) return count;
--
ierdnah <[email protected]>
On Fri, 28 Jan 2005, ierdnah wrote:
>
> the last patch works, but the load increases very much (normally with
> 200 VPN connections I have a load of maximum 10, with this patch I have
> a load of 50-100 - after 30 min of uptime)
Yeah, that "tty_ldisc_try()" is pretty expensive, and it really would be
worth trying to get both sides of the tty at the same time, since with my
patch in place it has to get that "tty_ldisc_lock" four times (and disable
interrupts etc - twice for getting the lock, twice for releasing it).
I'm surprised that it makes _that_ much of a difference, but it sounds
like you used to be borderline on CPU usage before, and this just made it
much worse.
One option is to instead of locking both master and slave on use in the
pty side, lock both when _changing_ the ldisc instead. That's the rare
case. Much more complex, though.
Alan, any clever ideas?
Linus
On Fri, 2005-01-28 at 12:28 -0800, Linus Torvalds wrote:
> I'm surprised that it makes _that_ much of a difference, but it sounds
> like you used to be borderline on CPU usage before, and this just made it
> much worse.
it's musch worst, I had a load of 5 with 250 VPN connections, and now, I
have a load of 200 with 150 connections
--
ierdnah <[email protected]>