2001-12-14 10:31:52

by Pawel Kot

[permalink] [raw]
Subject: [BUG()] IrDA in 2.4.16 + preempt

Hi,

I found an annoying problem with irda on 2.4.16.
When I remove irlan module I get sementation fault:
root@blurp:~# rmmod irlan
Dec 14 02:27:29 blurp kernel: irlan_init()
Dec 14 02:27:29 blurp kernel: irlmp_register_client()
Dec 14 02:27:29 blurp kernel: irlan_register_netdev()
Dec 14 02:27:35 blurp kernel: kernel BUG at slab.c:1200!
Dec 14 02:27:35 blurp kernel: invalid operand: 0000
Dec 14 02:27:35 blurp kernel: CPU: 0
Dec 14 02:27:35 blurp kernel: EIP: 0010:[kmem_extra_free_checks+81/140] Not tainted
Dec 14 02:27:35 blurp kernel: EFLAGS: 00010082
Dec 14 02:27:35 blurp kernel: eax: 0000001b ebx: c14055f0 ecx: ffffffe2 edx: 000017eb
Dec 14 02:27:35 blurp kernel: esi: 00000000 edi: cf2eb974 ebp: cc4d9030 esp: cc045ebc
Dec 14 02:27:35 blurp kernel: ds: 0018 es: 0018 ss: 0018
Dec 14 02:27:35 blurp kernel: Process rmmod (pid: 110, stackpage=cc045000)
Dec 14 02:27:35 blurp kernel: Stack: c024a8ac 000004b0 cf2eb974 cc4d9030 cc4d9030 c14055f0 c012d3c2 c14055f0
Dec 14 02:27:35 blurp kernel: cf2eb974 cc4d9030 cc4d9030 cc4d9030 ffffe000 00000286 ffffe000 00000400
Dec 14 02:27:35 blurp kernel: cc4d9030 00000286 c01eab85 cc4d9030 cc4d9030 c01ead4f cc4d9030 cc4d9030
Dec 14 02:27:35 blurp kernel: Call Trace: [kfree+450/576] [netdev_finish_unregister+145/152] [unregister_netdevice+451/632] [unregister_netdev+16/40] [<d285b490>]
Dec 14 02:27:35 blurp kernel: [<d285b34c>] [hashbin_delete+107/152] [<d285b160>] [<d285b34c>] [<d28602e3>] [<d285ccdd>]
Dec 14 02:27:36 blurp kernel: [free_module+23/192] [sys_delete_module+300/732] [system_call+51/56]
Dec 14 02:27:36 blurp kernel:
Dec 14 02:27:36 blurp kernel: Code: 0f 0b 83 c4 08 8b 5f 14 83 fb ff 74 27 8d b6 00 00 00 00 39
[and segmentation fault]

And then the module is not removed in fact:
root@blurp:~# lsmod
Module Size Used by
irlan 0 0 (deleted)
root@blurp:~# insmod irlan
Using /lib/modules/2.4.16preempt/kernel/net/irda/irlan/irlan.o
insmod: a module named irlan already exists

Moreover, some processes go into a 'D' state. Eg. ifconfig, irattach.

Additionally, when I try to login an the other console, I get bug:
Dec 14 02:28:52 blurp kernel: kernel BUG at slab.c:1243!
Dec 14 02:28:52 blurp kernel: invalid operand: 0000
Dec 14 02:28:52 blurp kernel: CPU: 0
Dec 14 02:28:52 blurp kernel: EIP: 0010:[kmalloc+346/504] Not tainted
Dec 14 02:28:52 blurp kernel: EFLAGS: 00010086
Dec 14 02:28:52 blurp kernel: eax: 0000001b ebx: c14055f0 ecx: ffffffe5 edx: 00001b5b
Dec 14 02:28:52 blurp kernel: esi: cc4d9400 edi: cc4d942f ebp: cc4d942f esp: ccc41f30
Dec 14 02:28:52 blurp kernel: ds: 0018 es: 0018 ss: 0018
Dec 14 02:28:52 blurp kernel: Process bash (pid: 94, stackpage=ccc41000)
Dec 14 02:28:52 blurp kernel: Stack: c024a8ac 000004db 00000100 fffffff4 000000ff c15181ec 00000400 00000246
Dec 14 02:28:52 blurp kernel: c0149d19 00000400 000001f0 c0149e0a 00000100 000000ff c15181ec 00000001
Dec 14 02:28:52 blurp kernel: c15181ec 00000003 ccc40000 c15181ec bffffd12 c0141b46 c15181ec 000000ff
Dec 14 02:28:52 blurp kernel: Call Trace: [alloc_fd_array+25/52] [expand_fd_array+142/384] [expand_files+54/68] [sys_dup2+125/324] [system_call+51/56]
Dec 14 02:28:52 blurp kernel:
Dec 14 02:28:52 blurp kernel: Code: 0f 0b 83 c4 08 90 f6 43 1d 04 74 4e b8 a5 c2 0f 17 87 06 3d

There's only a problem with a first login. But I suppose it is a consequence
of the first bug.

I can reproduce it.

pkot
--
mailto:[email protected] :: mailto:[email protected]
http://kt.linuxnews.pl/ :: Kernel Traffic po polsku
http://tfuj.pl/cv.html :: http://tfuj.pl/pgp.asc


2001-12-17 09:27:05

by Martin Diehl

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt


[Jean added to CC]

On Fri, 14 Dec 2001, Pawel Kot wrote:

> I found an annoying problem with irda on 2.4.16.
> When I remove irlan module I get sementation fault:
> root@blurp:~# rmmod irlan
> Dec 14 02:27:35 blurp kernel: kernel BUG at slab.c:1200!
> Dec 14 02:27:35 blurp kernel: invalid operand: 0000
> Dec 14 02:27:35 blurp kernel: CPU: 0
> Dec 14 02:27:35 blurp kernel: EIP: 0010:[kmem_extra_free_checks+81/140] Not tainted
[...]
> Dec 14 02:27:35 blurp kernel: Process rmmod (pid: 110, stackpage=cc045000)
[..]
> Dec 14 02:27:35 blurp kernel: Call Trace:
[kfree+450/576]
[netdev_finish_unregister+145/152]
[unregister_netdevice+451/632]
[unregister_netdev+16/40]

Seems some inconsistency in the way how the irlan netdev is handled:
having NETIF_F_DYNALLOC set for a netdev which is not allocated as an
independent object doesn't seem to be a good idea to me ;-)

The patch below simply removes NETIF_F_DYNALLOC just before calling
unregister_netdev() und should fix the issue. It's untested however,
since I'm unable to reproduce the Oops on UP without preempt (but it
should be there as well, due to ipfrag_time for example). At least it
compiles and doesn't do any harm to me.

IMHO, retiring dynalloc is just some sort of band-aid because I do
believe, using it would be a good idea - but would need some more
changes for irlan.

Btw., I'm not sure about the status of irlan - I'm only using ppp over
ircomm or irnet.

HTH
Martin

-----------------------------

--- linux-2.4.16/net/irda/irlan/irlan_common.c Fri Oct 12 22:04:30 2001
+++ v2.4.16-md/net/irda/irlan/irlan_common.c Mon Dec 17 10:01:53 2001
@@ -282,6 +282,18 @@
while ((skb = skb_dequeue(&self->client.txq)))
dev_kfree_skb(skb);

+ /* NETIF_F_DYNALLOC feature was set by irlan_eth_init() and would
+ * cause the unregister_netdev() to do asynch completion _and_
+ * kfree self->dev afterwards. Which is really bad because the
+ * netdevice was not allocated separately but is embedded in
+ * our control block and therefore gets freed with *self.
+ * Probably there are better solutions, but simply removing
+ * the feature before unregister should solve the Oops.
+ * Note however, this may cause unregister_netdev() to block
+ * until the refcount decreases to zero - which might take
+ * some time, say /proc/sys/net/ipv4/ipfrag_time for example.
+ */
+ self->dev.features &= ~NETIF_F_DYNALLOC;
unregister_netdev(&self->dev);

self->magic = 0;


2001-12-17 22:26:16

by Pawel Kot

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Mon, 17 Dec 2001, Martin Diehl wrote:

> On Fri, 14 Dec 2001, Pawel Kot wrote:
>
> > I found an annoying problem with irda on 2.4.16.
> > When I remove irlan module I get sementation fault:
> > root@blurp:~# rmmod irlan
> > Dec 14 02:27:35 blurp kernel: kernel BUG at slab.c:1200!
> > Dec 14 02:27:35 blurp kernel: invalid operand: 0000
> > Dec 14 02:27:35 blurp kernel: CPU: 0
> > Dec 14 02:27:35 blurp kernel: EIP: 0010:[kmem_extra_free_checks+81/140] Not tainted
> [...]
> > Dec 14 02:27:35 blurp kernel: Process rmmod (pid: 110, stackpage=cc045000)
> [..]
> > Dec 14 02:27:35 blurp kernel: Call Trace:
> [kfree+450/576]
> [netdev_finish_unregister+145/152]
> [unregister_netdevice+451/632]
> [unregister_netdev+16/40]
>
> Seems some inconsistency in the way how the irlan netdev is handled:
> having NETIF_F_DYNALLOC set for a netdev which is not allocated as an
> independent object doesn't seem to be a good idea to me ;-)
>
> The patch below simply removes NETIF_F_DYNALLOC just before calling
> unregister_netdev() und should fix the issue. It's untested however,
> since I'm unable to reproduce the Oops on UP without preempt (but it
> should be there as well, due to ipfrag_time for example). At least it
> compiles and doesn't do any harm to me.

It didn't help. Still the same BUGs(). Moreover: it seems that every
network connected process goes into D state. It happens with ifconfig and
ppd for sure.

> Btw., I'm not sure about the status of irlan - I'm only using ppp over
> ircomm or irnet.

In fact I discovered it accidently. I have loaded irlan instead of ircomm
when trying to firnd the reason why connect() on /dev/ircomm0 gives my "No
route to host" every time (with no success) although discivery succeeds.

I'll give it a try without the preempt patch.

pkot
--
mailto:[email protected] :: mailto:[email protected]
http://kt.linuxnews.pl/ :: Kernel Traffic po polsku
http://tfuj.pl/cv.html :: http://tfuj.pl/pgp.asc

2001-12-17 22:47:06

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Mon, Dec 17, 2001 at 10:28:45AM +0100, Martin Diehl wrote:
>
> [Jean added to CC]
>
> On Fri, 14 Dec 2001, Pawel Kot wrote:
>
> > I found an annoying problem with irda on 2.4.16.
> > When I remove irlan module I get sementation fault:
> > root@blurp:~# rmmod irlan
> > Dec 14 02:27:35 blurp kernel: kernel BUG at slab.c:1200!
> > Dec 14 02:27:35 blurp kernel: invalid operand: 0000
> > Dec 14 02:27:35 blurp kernel: CPU: 0
> > Dec 14 02:27:35 blurp kernel: EIP: 0010:[kmem_extra_free_checks+81/140] Not tainted
> [...]
> > Dec 14 02:27:35 blurp kernel: Process rmmod (pid: 110, stackpage=cc045000)
> [..]
> > Dec 14 02:27:35 blurp kernel: Call Trace:


Where is this comming from ? Was it sent to the IrDA mailing list ?


> [kfree+450/576]
> [netdev_finish_unregister+145/152]
> [unregister_netdevice+451/632]
> [unregister_netdev+16/40]
>
> Seems some inconsistency in the way how the irlan netdev is handled:
> having NETIF_F_DYNALLOC set for a netdev which is not allocated as an
> independent object doesn't seem to be a good idea to me ;-)
>
> The patch below simply removes NETIF_F_DYNALLOC just before calling
> unregister_netdev() und should fix the issue. It's untested however,
> since I'm unable to reproduce the Oops on UP without preempt (but it
> should be there as well, due to ipfrag_time for example). At least it
> compiles and doesn't do any harm to me.

Why don't you just fix irlan_eth_init() ? The NETIF_F_DYNALLOC
is only used in the unregister_netdevice() functions (check your
kernel), so it's cleaner to never set the flag in the first place.

Also : I suspect the Dag added this flag as a workaround for
some refcount problem, because with it the code does one more unref
that without. So, I suspect the refcount is broken. By the way, this
flag doesn't change the behaviour as far as waiting for people that
hold some refcount on the device.

> IMHO, retiring dynalloc is just some sort of band-aid because I do
> believe, using it would be a good idea - but would need some more
> changes for irlan.

No, that the right way. NETIF_F_DYNALLOC is only ever used for
that. One the other hand, you might need to fix the refcount.

> Btw., I'm not sure about the status of irlan - I'm only using ppp over
> ircomm or irnet.

Same for me.

> HTH
> Martin

Have fun...

Jean

2001-12-17 23:22:52

by Pawel Kot

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Mon, 17 Dec 2001, Jean Tourrilhes wrote:

> > On Fri, 14 Dec 2001, Pawel Kot wrote:
> >
> > > I found an annoying problem with irda on 2.4.16.
> > > When I remove irlan module I get sementation fault:
> > > root@blurp:~# rmmod irlan
> > > Dec 14 02:27:35 blurp kernel: kernel BUG at slab.c:1200!
> > > Dec 14 02:27:35 blurp kernel: invalid operand: 0000
> > > Dec 14 02:27:35 blurp kernel: CPU: 0
> > > Dec 14 02:27:35 blurp kernel: EIP: 0010:[kmem_extra_free_checks+81/140] Not tainted
> > [...]
> > > Dec 14 02:27:35 blurp kernel: Process rmmod (pid: 110, stackpage=cc045000)
> > [..]
> > > Dec 14 02:27:35 blurp kernel: Call Trace:
>
>
> Where is this comming from ? Was it sent to the IrDA mailing list ?

It was originally sent to lkml and Dag (email taken from irlan sources).
Now I reposted it also to irda-linux.

pkot
--
mailto:[email protected] :: mailto:[email protected]
http://kt.linuxnews.pl/ :: Kernel Traffic po polsku
http://tfuj.pl/cv.html :: http://tfuj.pl/pgp.asc


2001-12-17 23:25:12

by Pawel Kot

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Mon, 17 Dec 2001, Pawel Kot wrote:

> I'll give it a try without the preempt patch.

The same on 2.4.16-vanilla. The bug report again (This time CCed to
linux-irda ml):

root@blurp:~# rmmod irlan
Causes segmentation fault and:
Dec 18 00:13:44 blurp kernel: irlan_init()
Dec 18 00:13:44 blurp kernel: irlan_register_netdev()
Dec 18 00:13:48 blurp kernel: kernel BUG at slab.c:1198!
Dec 18 00:13:48 blurp kernel: invalid operand: 0000
Dec 18 00:13:48 blurp kernel: CPU: 0
Dec 18 00:13:48 blurp kernel: EIP: 0010:[sys_msync+5/256] Not tainted
Dec 18 00:13:48 blurp kernel: EFLAGS: 00010086
Dec 18 00:13:48 blurp kernel: eax: 0000001b ebx: c14055f0 ecx: c02867c8 edx: 000017c6
Dec 18 00:13:48 blurp kernel: esi: 00000000 edi: cf9b6974 ebp: ce30f030 esp: ce27decc
Dec 18 00:13:48 blurp kernel: ds: 0018 es: 0018 ss: 0018
Dec 18 00:13:48 blurp kernel: Process rmmod (pid: 110, stackpage=ce27d000)
Dec 18 00:13:48 blurp kernel: Stack: c0235e0c 000004ae cf9b6974 ce30f030 ce30f030 c14055f0 c0129552 c14055f0
Dec 18 00:13:48 blurp kernel: cf9b6974 ce30f030 ce30f030 ce30f310 ce30f000 ce30f030 ce30f000 00000400
Dec 18 00:13:48 blurp kernel: ce30f030 00000286 c01de0d5 ce30f030 ce30f030 c01de213 ce30f030 ce30f030
Dec 18 00:13:48 blurp kernel: Call Trace: [mincore_page+10/164] [fbcon_cfb8_bmove+225/612] [fbcon_cfb8_bmove+543/612] [getkeycode+12/16] [<d285b449>]
Dec 18 00:13:48 blurp kernel: [<d285b34c>] [unix_sock_destructor+127/304] [<d285b160>] [<d285b34c>] [<d28601a3>] [<d285cc5d>]
Dec 18 00:13:48 blurp kernel: [register_console+203/392] [panic+55/224] [do_signal+347/672]
Dec 18 00:13:48 blurp kernel:
Dec 18 00:13:48 blurp kernel: Code: 0f 0b 83 c4 08 8b 5f 14 83 fb ff 74 27 8d b6 00 00 00 00 39

Now I log on to the other console:
Dec 18 00:14:17 blurp kernel: kernel BUG at slab.c:1241!
Dec 18 00:14:17 blurp kernel: invalid operand: 0000
Dec 18 00:14:17 blurp kernel: CPU: 0
Dec 18 00:14:17 blurp kernel: EIP: 0010:[madvise_fixup_middle+38/404] Not tainted
Dec 18 00:14:17 blurp kernel: EFLAGS: 00010082
Dec 18 00:14:17 blurp kernel: eax: 0000001b ebx: c14055f0 ecx: c02867c8 edx: 00001b36
Dec 18 00:14:17 blurp kernel: esi: ce30f400 edi: ce30f42f ebp: ce30f42f esp: c1517ef8
Dec 18 00:14:17 blurp kernel: ds: 0018 es: 0018 ss: 0018
Dec 18 00:14:17 blurp kernel: Process bash (pid: 93, stackpage=c1517000)
Dec 18 00:14:17 blurp kernel: Stack: c0235e0c 000004d9 00000100 fffffff4 fffffff4 c1517fbc 00000400 00000246
Dec 18 00:14:17 blurp kernel: c0141dc9 00000400 000001f0 c0141e9a 00000100 ce43d57c 00000400 fffffff4
Dec 18 00:14:17 blurp kernel: c1517fbc c1517fbc 000001a0 00000246 c011469a ce43d57c 000000ff ce214000
Dec 18 00:14:17 blurp kernel: Call Trace: [sys_dup2+181/324] [setfl+30/168] [setscheduler+342/488] [sys_sched_rr_get_interval+190/248] [__switch_to+28/184]
Dec 18 00:14:17 blurp kernel: [do_signal+347/672]
Dec 18 00:14:17 blurp kernel:
Dec 18 00:14:17 blurp kernel: Code: 0f 0b 83 c4 08 90 f6 43 1d 04 74 4e b8 a5 c2 0f 17 87 06 3d

This happens only *once* (the second BUG()). It is fully reproducable.
Additional behaviour: after rmmod some processes (network related:
ifconfig, pppd gor into D state).

root@blurp:~# lsmod
Module Size Used by
irlan 0 0 (deleted)

.config attached.

pkot
--
mailto:[email protected] :: mailto:[email protected]
http://kt.linuxnews.pl/ :: Kernel Traffic po polsku
http://tfuj.pl/cv.html :: http://tfuj.pl/pgp.asc



Attachments:
.config (19.71 kB)

2001-12-18 00:22:28

by Pawel Kot

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Mon, 17 Dec 2001, Jean Tourrilhes wrote:

> On Tue, Dec 18, 2001 at 12:22:22AM +0100, Pawel Kot wrote:
> >
> > It was originally sent to lkml and Dag (email taken from irlan sources).
> > Now I reposted it also to irda-linux.
>
> By the way, don't expect miracles, most developpers don't use
> IrLAN.

I'm *NOT* using IrLAN. Just accidently found the bug ;-)

pkot
--
mailto:[email protected] :: mailto:[email protected]
http://kt.linuxnews.pl/ :: Kernel Traffic po polsku
http://tfuj.pl/cv.html :: http://tfuj.pl/pgp.asc

2001-12-18 00:11:18

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Tue, Dec 18, 2001 at 12:22:22AM +0100, Pawel Kot wrote:
>
> It was originally sent to lkml and Dag (email taken from irlan sources).
> Now I reposted it also to irda-linux.
>
> pkot

By the way, don't expect miracles, most developpers don't use
IrLAN.
I'm surprised the fix of Martin didn't work...

Jean

2001-12-18 01:16:41

by Pawel Kot

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Mon, 17 Dec 2001, Jean Tourrilhes wrote:

> I'm surprised the fix of Martin didn't work...

It did in fact. My apologiese. I have applied it to the wrong kernel tree
:/

Thanks Martin.

Any chance to have this patch in 2.4.17?

pkot
--
mailto:[email protected] :: mailto:[email protected]
http://kt.linuxnews.pl/ :: Kernel Traffic po polsku
http://tfuj.pl/cv.html :: http://tfuj.pl/pgp.asc

2001-12-18 01:24:11

by Jean Tourrilhes

[permalink] [raw]
Subject: Re: [BUG()] IrDA in 2.4.16 + preempt

On Tue, Dec 18, 2001 at 02:16:16AM +0100, Pawel Kot wrote:
> On Mon, 17 Dec 2001, Jean Tourrilhes wrote:
>
> > I'm surprised the fix of Martin didn't work...
>
> It did in fact. My apologiese. I have applied it to the wrong kernel tree
> :/

No comment ;-)

> Thanks Martin.
>
> Any chance to have this patch in 2.4.17?
>
> pkot

You see, I try to be a good citizen (even if sometime I feel
very "emotional" about dropped patches). There is already a good chunk
of IrDA patches in 2.4.17, so this will wait 2.4.18. Moreover, this is
"rc1", and I make a point of always sending my patches at the
beggining of the "pre" series and batched.
I have already another IrDA minor cleanup sent to me by Kai
Germaschewski. I'll batch that together. It's not like it's urgent...
Regards,

Jean