2002-10-21 02:57:13

by Alain Fauconnet

[permalink] [raw]
Subject: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

Hello,

I'm updating this information: the box that has been upgraded to
2.4.19 crashes as well exactly at *the same* instruction.

I'm not sure what to do next: AFAIK the kernel crash dump patches are
not available for 2.4.19 yet so I can't generate a core dump on this
box. Since it's definitely the same problem, I'd think that collecting
a full crash core dump under 2.4.17 or 2.4.18 would help, wouldn't it?

As I wrote down below, these box have 200++ ppp interfaces up all the
time. Such interfaces come and go many times per minute as users
connect and disconnect. I really suspect like a race condition on the
net interfaces structures and the comments in ./net/ipv4/route.c
imply that such race conditions have been identified and fixed in the
past.

What I can't explain is why these servers used to be quite stable.
Their load has increased, sure, but slowly. It really looked to me
like some traffic caused the crashes at first sight (kind of DoS
attack) but the crash context doesn't quite back up this guess.

I'm Cc'ing this message to the people whose names appear in the
route.c file as per the kernel crash "howto" docs. Hope that this
won't get me flamed.

Greets,
_Alain_

----- Forwarded message from Alain Fauconnet <[email protected]> -----

Date: Fri, 18 Oct 2002 10:38:49 +0700
From: Alain Fauconnet <[email protected]>
To: [email protected]
Subject: Frequent/consistent panics in 2.4.17/.18 at ip_route_input_slow, in_dev_get(dev)

Hello all,

I've been getting frequent and consistent panics on three different
boxes running 2.4.17 and 2.4.18 kernels, always at the same exact
instruction, a "incl 0x4(%edx)" at the beginning of
ip_route_input_slow().

The boxes are very busy PPTP (PoPToP) servers on brand single-CPU (P3,
P4) SCSI-based hardware (2 x IBM, 1 x Asus MB).

Example for a 2.4.17 kernel:

Unable to handle kernel paging request at virtual address f73cdc48
Oops = 0002
EIP = 0010:c0223ff9 not tainted

(gdb) x/10i 0xc0223ff9
0xc0223ff9 <ip_route_input_slow+33>: incl 0x4(%edx)

EAX = 0 EBX = 410c0100 ECX = f69a30ac EDX = f73cdc44

The indexed access through EDX is causing the fault

ESI = ccb23fc6 EDI = d71a10ac EBP = c583b0a0
ESP = d01d1e6c

DS = 0010 ES = 0018 SS = 0018

Process varies, in this case pptpctrl.

Stack: 410c0100 ccb23fcb d71a10ac c583b0ad 0000003d

Call trace:

c023f91b <raw_recvmsg+111>
c022491a <ip_route_input+338>
c0226834 <ip_rcv_finish+40>
d88251ae ?
d8825180 ?
c0117867 <update_wall_time+11>
c0117c42 <timer_bh+546>
...

Sorry, this partial data was hand-written in haste, I *really* had to
reboot this production box immediately.

Looking at the code in net/ipv4/route.c, I strongly suspect that the
offending line is:

int ip_route_input_slow(struct sk_buff *skb, u32 daddr, u32 saddr,
u8 tos, struct net_device *dev)
{
struct rt_key key;
struct fib_result res;
------> struct in_device *in_dev = in_dev_get(dev); <-------

Because that's quite close to the beginning of the function:

0xc0223fd8 <ip_route_input_slow>: sub $0x50,%esp
0xc0223fdb <ip_route_input_slow+3>: push %ebp
0xc0223fdc <ip_route_input_slow+4>: push %edi
0xc0223fdd <ip_route_input_slow+5>: push %esi
0xc0223fde <ip_route_input_slow+6>: push %ebx
0xc0223fdf <ip_route_input_slow+7>: mov 0x74(%esp,1),%edx
0xc0223fe3 <ip_route_input_slow+11>: mov 0x70(%esp,1),%eax
0xc0223fe7 <ip_route_input_slow+15>: mov %al,0x2f(%esp,1)
0xc0223feb <ip_route_input_slow+19>: mov 0xac(%edx),%edx
0xc0223ff1 <ip_route_input_slow+25>: mov %edx,0x24(%esp,1)
0xc0223ff5 <ip_route_input_slow+29>: test %edx,%edx
0xc0223ff7 <ip_route_input_slow+31>: je 0xc0223ffc <ip_route_input_slow+36>
0xc0223ff9 <ip_route_input_slow+33>: incl 0x4(%edx) <----- offending instruction

and in_dev_get() inlines to:

in_dev_get(const struct net_device *dev)
{
struct in_device *in_dev;

read_lock(&inetdev_lock);
in_dev = dev->ip_ptr;
if (in_dev)
atomic_inc(&in_dev->refcnt); <---- inlines to "lock incl %0"
read_unlock(&inetdev_lock);
return in_dev;
}

So unless I am seriously misled, we fault when trying to bump the
in_dev reference count up. What could that mean? This box is a busy
PPTP (PoPToP) server usually serving 200-350 sessions at once, with a
lot of connections/disconnections per minute. Could it be a race
condition that causes the in_dev structure to go away before we
reference it? I'm not familiar with the Linux kernel enough to
investigate further.

I've got up to 4 crashes on these 3 boxes in a single day, all at that
precise instruction. Nothing has changed lately on the servers, except
a steadily increasing load. They had been rock solid for several
months before this started happening about a week ago.

They run a quite simple configuration, almost vanilla hardened
Slackware with ppptp compiled from source (no encryption). Only
modification to kernel is in ./net/core/dev.c where I have upped the
max number of instances for a device from 100 to 500 in
dev_alloc_name(). Algorithm is unchanged yet (yes, I have read the
comments).

One of the boxes has been upgraded to 2.4.19 (it broke everything, I
had to get pppd 2.4.2b1 to fix - don't know if that's to be expected)
and has been running stable since then (only 1 day, doesn't prove
much yet). My next plan is to implement LKCD on the 2.4.17 box
(not available for 2.4.19 yet as it seems) and capture a complete
crash dump. Would that help tracking down?

Thanks in advance for any help,

Greets,

_Alain_
"I've RTFM. It says: `see your system administrator'. But... *I* am
the system administrator"
(DECUS US symposium session title, author unknown, ca. 1990)


----- End forwarded message -----


2002-10-21 16:34:40

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

Hello!

> much yet). My next plan is to implement LKCD on the 2.4.17 box
> (not available for 2.4.19 yet as it seems) and capture a complete
> crash dump. Would that help tracking down?

I do not think that this will help.

Try better to enable slab poisoning in slab.h. If it that thing
which I think of, it would provoke crash.

Alexey

2002-10-22 03:19:05

by Alain Fauconnet

[permalink] [raw]
Subject: Re: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

On Mon, Oct 21, 2002 at 08:40:16PM +0400, [email protected] wrote:
> Hello!
>
> > much yet). My next plan is to implement LKCD on the 2.4.17 box
> > (not available for 2.4.19 yet as it seems) and capture a complete
> > crash dump. Would that help tracking down?
>
> I do not think that this will help.
>
> Try better to enable slab poisoning in slab.h. If it that thing
> which I think of, it would provoke crash.

Thanks for your reply,

Does that mean just enabling CONFIG_DEBUG_SLAB in kernel config or
more?

I see:

#define SLAB_POISON 0x00000800UL /* Poison objects */

in slab.h, but there's nothing settable in there.

Anyway I've recompiled the 2.4.19 kernel with CONFIG_DEBUG_SLAB=y.
Let's see what happens.

Greets,
_Alain_

2002-10-28 10:14:08

by Alain Fauconnet

[permalink] [raw]
Subject: Re: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

Hello,

On Mon, Oct 21, 2002 at 08:40:16PM +0400, [email protected] wrote:
>
> Try better to enable slab poisoning in slab.h. If it that thing
> which I think of, it would provoke crash.
>
> Alexey
>

I have got a new crash on the same box with CONFIG_DEBUG_SLAB=y
(reminder: Kernel 2.4.19)

Unable to handle kernel paging request at virtual address 5a5a5a5e

EIP : 0010:[<c02318a9>] <ip_route_input_slow+33>
EAX : 00000000 EBX : 5a5a5a5a ECX : 9c515bec
EDX : 5a5a5a5a ESI : ccb5b542 EDI : d71a10ac
EBP : c1aae690 ESP : c02f3e44

Process Swapper (pid: 0, stackpage = c02f3000)

Stack: 5a5a5a ccb5b542 d71a10ac c1aae690
(sorry, first stack word was written down incorrectly by operator, one
byte lost)

Call Trace:
[<c010980d>] <get_irq_list+137>
[<c0109a71>] <do_IRQ+133>
[<c02321ca>] <ip_route_input+338>
[<c02342e4>] <ip_rcv_finish+40>

(I find this stack trace strange. Once again I haven't written this
down myself so I can't be sure it's 100% accurate)


Code : ff 42 04 b8 54 24 24 89 54 24 28 c7 44 24 20 00 00 00 00 c7

Still the same exact place in ip_route_input_slow:

(gdb) x/i 0xc02318a9
0xc02318a9 <ip_route_input_slow+33>: incl 0x4(%edx)
0xc02318ac <ip_route_input_slow+36>: mov 0x24(%esp,1),%edx
0xc02318b0 <ip_route_input_slow+40>: mov %edx,0x28(%esp,1)
0xc02318b4 <ip_route_input_slow+44>: movl $0x0,0x20(%esp,1)
0xc02318bc <ip_route_input_slow+52>: movl $0x0,0x38(%esp,1)
0xc02318c4 <ip_route_input_slow+60>: movl $0x0,0x34(%esp,1)

It looks to me that the value in DX is "poisoned" data isn't it?
I assume that the kernel is trying to use dynamic memory that has been
released already, right?

What's next in tracing this one down?

Thanks for any hint,
_Alain_

2002-10-29 01:24:26

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

Hello!

> I assume that the kernel is trying to use dynamic memory that has been
> released already, right?

Right.

> What's next in tracing this one down?

To tell what exactly driver makes this. Apparently, it continues
to inject packets to the stack even after it has been destroyed.

If you did not see message "Freeing alive device", this means
that driver unregistered it. Usual ppp seems to be sane...

Alexey

2002-10-29 04:07:26

by Alain Fauconnet

[permalink] [raw]
Subject: Re: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

On Tue, Oct 29, 2002 at 04:30:25AM +0300, [email protected] wrote:

> > I assume that the kernel is trying to use dynamic memory that has been
> > released already, right?
>
> Right.
>
> > What's next in tracing this one down?
>
> To tell what exactly driver makes this. Apparently, it continues
> to inject packets to the stack even after it has been destroyed.
>

In this case, would they be packets TO or FROM the ppp device?
(reminder: crash happens in ip_route_input_slow, called with struct
net_device *dev presumably pointing at what used to be the PPP device)

> If you did not see message "Freeing alive device", this means
> that driver unregistered it. Usual ppp seems to be sane...

No such message in logs, but "PPPIOCDETACH file->f_count=3" appears
quite often. If I understand it right, it'd mean that the ppp interface
is destroyed while having channels open to it... Does it give any
hint?

Let's try to summarize anything unusual this box has:

- pptp (PoPToP). But pptp is only userland software, how could it cause
such a problem? It merely forks pppd as a child.

- NAT, both SNAT and DNAT (for transparent redirection to a Squid), and
a lot of rules used to do traffic accounting so iptables configuration
is kind of hairy. Could it somehow cause packets to be "delayed" and
thus re-injected to the stack later than usual? (I'm probably just
talking nonsense here - sorry - trying to guess).

- assymetrical routing: packets come from clients over VPN, thus over the
PPP interface. They are routed back though a LAN interface that goes
to the satellite uplink. To do this, the route to the peer IP through
the PPP interface is deleted in the ip-up script.

Ah.. something else that could be relevant (??): I see that ifconfig
-a shows "duplicate" ppp devices e.g.:

ppp96 Link encap:Point-to-Point Protocol
inet addr:xxx.yyy.zzz.2 P-t-P:172.16.27.104 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:44238 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3
RX bytes:2192741 (2.0 Mb) TX bytes:153 (153.0 b)

ppp96 Link encap:Point-to-Point Protocol
inet addr:xxx.yyy.zzz.2 P-t-P:172.16.27.104 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1

Is this "normal"?

Greets,
_Alain_

2002-10-29 04:25:24

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: UPD: Frequent/consistent panics in 2.4.19 at ip_route_input_slow, in_dev_get(dev)

Hello!

> In this case, would they be packets TO or FROM the ppp device?

FROM, of course. When a packet reaches IP, the device is already destroyed.
This is impossible, unless driver is broken or we have an error
in reference counting. The second possibility is eliminated by the fact,
that you do not see "Freeing alive device". So, device was cleanly
unregistered.

> No such message in logs, but "PPPIOCDETACH file->f_count=3"

I will look. It sounds as an interesting hint.

> - pptp (PoPToP). But pptp is only userland software, how could it cause
> such a problem?

In no way.

> thus re-injected to the stack later than usual? (I'm probably just
> talking nonsense here - sorry - trying to guess).

It is not a nonsense. However, iptables hold necessary reference counts.

> - assymetrical routing:

This is not essential.

> -a shows "duplicate" ppp devices e.g.:
....
> Is this "normal"?

Well, ifconfig can be racy. If some device sitting in the list before ppp96
was deleted while ifconfig scanned the list, you can see ppp96 twice.
If some device was added, ifconfig can lose some device. Awkward, of course.

Alexey