Hi,
I'm seeing a kernel oops with 2.4.20 which seems to be related to the
PopTop PPTP server. When certain clients connect in (seems to be
Win98) and begin large data transfers the kernel will reliably oops.
The system crashes hard, the oops doesn't make it to the logs.
The problem appears to have been around for a while, although few
people have been affected. The following posts report similar
symptoms. There were no replies to either of them on linux-kernel.
http://www.cs.helsinki.fi/linux/linux-kernel/2002-10/0407.html (2.4.17)
http://www.cs.helsinki.fi/linux/linux-kernel/2001-28/0281.html (2.4.6!)
I have been able to deal with the issue by using the workaround
suggested in the the second post. That is, adding netfilter rules with
the TCPMSS target to limit the TCP MSS to PMTU - 40. Apparently the
problem is triggered by the MSS being bigger than the MTU (which is
750 in this case).
I have tcpdump logs for network traffic on both sides of the pptpd
server for several crash instances if that helps. I can perform other
tests and gather more information if required.
Decoded oops and module list follow. The crash is reproducable on a
variety of different hardware. PopTop version is 1.1.3-20030409. PPP
version is 2.4.1 with MPPE patches.
skput:over: c4b63442:1338 put:1338 dev:<NULL>kernel BUG at skbuff.c:92!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01ca979>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 0000002d ebx: c20acb20 ecx: 0000002d edx: 00000000
esi: c20acb20 edi: 0000053a ebp: c19f9380 esp: c246de00
ds: 0018 es: 0018 ss: 0018
Process pptpctrl (pid: 4681, stackpage=c246d000)
Stack: c0234480 c4b63442 0000053a 0000053a c023446b c4b6344a c20acb20 0000053a
c4b63442 c19f9380 c19f9380 c25eb4a0 c2fb1813 c4b62e1e c19f9380 c25eb4a0
c19f9380 c246c000 c058d3c0 c2fb1813 0000055b 0000055b c246df7c c4b62d9c
Call Trace: [<c4b63442>] [<c4b6344a>] [<c4b63442>] [<c4b62e1e>] [<c4b62d9c>]
[<c4b62bd6>] [<c4b6930d>] [<c4b68436>] [<c018fed6>] [<c018e7aa>] [<c018a226>]
[<c018e614>] [<c0134066>] [<c0106e33>]
Code: 0f 0b 5c 00 a0 44 23 c0 83 c4 14 c3 8d 76 00 8b 54 24 04 8b
>>EIP; c01ca979 <skb_over_panic+29/38> <=====
>>ebx; c20acb20 <_end+1def628/4814b08>
>>esi; c20acb20 <_end+1def628/4814b08>
>>ebp; c19f9380 <_end+173be88/4814b08>
>>esp; c246de00 <_end+21b0908/4814b08>
Trace; c4b63442 <END_OF_CODE+902b/????>
Trace; c4b6344a <END_OF_CODE+9033/????>
Trace; c4b63442 <END_OF_CODE+902b/????>
Trace; c4b62e1e <END_OF_CODE+8a07/????>
Trace; c4b62d9c <END_OF_CODE+8985/????>
Trace; c4b62bd6 <END_OF_CODE+87bf/????>
Trace; c4b6930d <END_OF_CODE+eef6/????>
Trace; c4b68436 <END_OF_CODE+e01f/????>
Trace; c018fed6 <pty_write+de/12c>
Trace; c018e7aa <write_chan+196/1f8>
Trace; c018a226 <tty_write+22e/2a8>
Trace; c018e614 <write_chan+0/1f8>
Trace; c0134066 <sys_write+96/118>
Trace; c0106e33 <system_call+33/40>
Code; c01ca979 <skb_over_panic+29/38>
00000000 <_EIP>:
Code; c01ca979 <skb_over_panic+29/38> <=====
0: 0f 0b ud2a <=====
Code; c01ca97b <skb_over_panic+2b/38>
2: 5c pop %esp
Code; c01ca97c <skb_over_panic+2c/38>
3: 00 a0 44 23 c0 83 add %ah,0x83c02344(%eax)
Code; c01ca982 <skb_over_panic+32/38>
9: c4 14 c3 les (%ebx,%eax,8),%edx
Code; c01ca985 <skb_over_panic+35/38>
c: 8d 76 00 lea 0x0(%esi),%esi
Code; c01ca988 <skb_under_panic+0/38>
f: 8b 54 24 04 mov 0x4(%esp,1),%edx
Code; c01ca98c <skb_under_panic+4/38>
13: 8b 00 mov (%eax),%eax
<0>Kernel panic: Aiee, killing interrupt handler!
-----
Module Size Used by Tainted: PF
ipt_MASQUERADE 2104 1 (autoclean)
ipt_REDIRECT 1368 4 (autoclean)
ipt_mark 952 8 (autoclean)
ipt_MARK 1368 2 (autoclean)
iptable_mangle 2804 1 (autoclean)
ipt_state 1048 14 (autoclean)
ppp_deflate 3800 0 (autoclean)
zlib_inflate 21316 0 (autoclean) [ppp_deflate]
zlib_deflate 20888 0 (autoclean) [ppp_deflate]
ppp_mppe 23320 0 (autoclean)
bsd_comp 4920 0 (autoclean)
ppp_async 8288 0 (autoclean)
ppp_generic 20832 0 (autoclean) [ppp_deflate ppp_mppe bsd_comp ppp_async]
slhc 5776 0 (autoclean) [ppp_generic]
e100 78244 1
8139too 16616 1
mii 3324 0 [8139too]
ipt_TCPMSS 3064 2 (autoclean)
ipt_REJECT 3576 1 (autoclean)
iptable_filter 2312 1 (autoclean)
ip_conntrack_h323 3600 1 (autoclean)
ip_nat_h323 3596 0 (unused)
ip_conntrack_irc 4080 1 (autoclean)
ip_nat_irc 3312 0 (unused)
ip_conntrack_ftp 5040 1 (autoclean)
ip_nat_ftp 4208 0 (unused)
iptable_nat 19672 4 [ipt_MASQUERADE ipt_REDIRECT ip_nat_h323 ip_nat_irc ip_nat_ftp]
ip_tables 13432 12 [ipt_MASQUERADE ipt_REDIRECT ipt_mark ipt_MARK iptable_mangle ipt_state ipt_TCPMSS ipt_REJECT iptable_filter iptable_nat]
ip_conntrack 33504 5 [ipt_MASQUERADE ipt_REDIRECT ipt_state ip_conntrack_h323 ip_nat_h323 ip_conntrack_irc ip_nat_irc ip_conntrack_ftp ip_nat_ftp iptable_nat]
keybdev 2400 0 (unused)
hid 15048 0 (unused)
input 5056 0 [keybdev hid]
usbcore 70112 1 [hid]
Regards,
Menno Smits <[email protected]>
Hi Menno,
> I'm seeing a kernel oops with 2.4.20 which seems to be related to the
> PopTop PPTP server.
> Decoded oops and module list follow. The crash is reproducable on a
> variety of different hardware. PopTop version is 1.1.3-20030409. PPP
> version is 2.4.1 with MPPE patches.
> Trace; c4b63442 <END_OF_CODE+902b/????>
> Trace; c4b6344a <END_OF_CODE+9033/????>
Unfortunately, your Oops doesn't contain symbol infomartion for modules.
Did you really follow the steps in Documentation/oops-tracing.txt?
> Module Size Used by Tainted: PF
> ppp_mppe 23320 0 (autoclean)
However, i still have a sneeky suspicion, that the bug is in ppp_mppe (why
did you have to load it using insmod -f?). IIRC from discussions before, a
compressor is not allowed to grow a packet, but when using encryption this
might well happen. If then ppp_mppe calls skb_put to update the len field,
it will trigger the above bug() and cause the oops.
So better check why the above trace misses the module information and if
the trace really shows ppp_mppe in the path, forward the problem to the
PopTop people ;-)
--jochen
Thanks for the quick and useful reply.
On Tue, 20 May 2003 21:43:01 -0700
Frank Cusack <[email protected]> wrote:
> On Wed, May 21, 2003 at 09:14:42AM +1000, Menno Smits wrote:
> > I'm seeing a kernel oops with 2.4.20 which seems to be related to the
> > PopTop PPTP server. When certain clients connect in (seems to be
> > Win98) and begin large data transfers the kernel will reliably oops.
> > The system crashes hard, the oops doesn't make it to the logs.
> ...
> > I have been able to deal with the issue by using the workaround
> > suggested in the the second post. That is, adding netfilter rules with
> > the TCPMSS target to limit the TCP MSS to PMTU - 40. Apparently the
> > problem is triggered by the MSS being bigger than the MTU (which is
> > 750 in this case).
>
> Yup. win98 ignores the negotiated MRU from the PPP peer (MTU on the win98
> side) and sends PPP packets larger than MTU. As you've discovered. :-)
>
> Linux doesn't allocate enough space for the decompressor output, and the
> mppe module doesn't properly check that enough space exists. (That's
> because PPP MPPE packets *shrink* after "decompression", and the mppe
> module assumes at least the same amount of space as the PPP packet is
> allocated for the decompressor.)
That certainly makes sense.
> Grab the latest ftp://ftp.samba.org/pub/unpacked/ppp which corrects both
> of the above problems.
I'll try this and let you know how I go.
> I'll be posting a patch to lkml to correct the decompressor allocation
> problem, shortly (a few weeks).
Look forward to it.
Regards,
Menno Smits
On Wed, May 21, 2003 at 09:14:42AM +1000, Menno Smits wrote:
> I'm seeing a kernel oops with 2.4.20 which seems to be related to the
> PopTop PPTP server. When certain clients connect in (seems to be
> Win98) and begin large data transfers the kernel will reliably oops.
> The system crashes hard, the oops doesn't make it to the logs.
...
> I have been able to deal with the issue by using the workaround
> suggested in the the second post. That is, adding netfilter rules with
> the TCPMSS target to limit the TCP MSS to PMTU - 40. Apparently the
> problem is triggered by the MSS being bigger than the MTU (which is
> 750 in this case).
Yup. win98 ignores the negotiated MRU from the PPP peer (MTU on the win98
side) and sends PPP packets larger than MTU. As you've discovered. :-)
Linux doesn't allocate enough space for the decompressor output, and the
mppe module doesn't properly check that enough space exists. (That's
because PPP MPPE packets *shrink* after "decompression", and the mppe
module assumes at least the same amount of space as the PPP packet is
allocated for the decompressor.)
Grab the latest ftp://ftp.samba.org/pub/unpacked/ppp which corrects both
of the above problems.
I'll be posting a patch to lkml to correct the decompressor allocation
problem, shortly (a few weeks).
/fc
> > Grab the latest ftp://ftp.samba.org/pub/unpacked/ppp which corrects both
> > of the above problems.
>
> I'll try this and let you know how I go.
Works beautifully, once I'd figured out the correct pppd options to
pass. Thank you.
Curiously I couldn't get the server and client to use MS-CHAPv2 or
MPPE unless the "require-mschap-v2" and "require-mppe" options were
provided. I assumed the client and server would negotiate at the
'highest' common authentication and encryption level. Instead I found
I had to include these two options or else standard CHAP with no
encryption was used (or the connection failed totally if Windows was
told to _require_ encryption or MS-CHAP).
Some other observations:
- pptp connections seem more reliable now. Previously it could take
several attempts before a PPTP connection could be established
(particularly from Win98?). This seems to be fixed with this
MPPE/MS-CHAP implementation or perhaps its something else in 2.4.2.
Either way, good stuff!
- If I load the netfilter pptp connection tracking modules on the PPTP
server I can't connect at all from Win98. Win2k works fine. Will
test other clients soon. Lots of the following from poptop:
GRE: xmit failed from decaps_hdlc: Operation not permitted
...even with no firewall rules active. Unload the conntrack modules
it works fine. Strange! Previously, with the third party MPPE
patches connection attempts were less reliable with the conntrack
modules loaded but were workable. This is probably out of the scope
of our discussion but any thoughts welcome :)
Its great to have MPPE and MS-CHAPv2 support in the main pppd dist
now... one less patch to worry about. Hopefully the MPPE kernel patch
will make it to the mainline kernel soon too.
Thanks for your help,
Menno