2004-09-17 06:33:24

by David Gibson

[permalink] [raw]
Subject: [TRIVIAL] Fix recent bug in fib_semantics.c

Andrew, please apply:

When fib_create_info() allocates new hash tables, it neglects to
initialize them. This leads to an oops during boot on at least
machine I use. This patch addresses the problem.

Signed-off-by: David Gibson <[email protected]>

Index: working-2.6/net/ipv4/fib_semantics.c
===================================================================
--- working-2.6.orig/net/ipv4/fib_semantics.c 2004-09-17 09:20:04.000000000 +1000
+++ working-2.6/net/ipv4/fib_semantics.c 2004-09-17 16:24:42.634638304 +1000
@@ -604,8 +604,12 @@
if (!new_info_hash || !new_laddrhash) {
fib_hash_free(new_info_hash, bytes);
fib_hash_free(new_laddrhash, bytes);
- } else
+ } else {
+ memset(new_info_hash, 0, bytes);
+ memset(new_laddrhash, 0, bytes);
+
fib_hash_move(new_info_hash, new_laddrhash, new_size);
+ }

if (!fib_hash_size)
goto failure;



--
David Gibson | For every complex problem there is a
david AT gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson


2004-09-17 06:37:25

by Jeff Garzik

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

David Gibson wrote:
> Andrew, please apply:
>
> When fib_create_info() allocates new hash tables, it neglects to
> initialize them. This leads to an oops during boot on at least
> machine I use. This patch addresses the problem.
>
> Signed-off-by: David Gibson <[email protected]>


This may be the oops in fib_xxx I just saw on my Athlon64 box...

Jeff


2004-09-17 18:31:03

by David Miller

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c


Thanks David, I'll push this upstream asap.

I can't believe in all the route testing I did I never
triggered this on my sparc64 boxes, must have been lucky :(

2004-09-18 00:22:07

by [email protected]

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

I'm still OOPsing at boot in fib_disable_ip+21 from
fib_netdev_event+63. Both e1000 and tg3 are effected. I have current
linus bk as of time of this message.

It only occurs when Redhat goes through the scaning for new hardware
phase during boot. Is RH loading the drivers in some special way
during this phase? If I load the drivers manually after I'm booted
they load ok. I'm running with the drivers as modules, I'll try
switching to compiled in.

The change referenced in this thread is in my kernel:
fib_semantics.c, 604
} else {
memset(new_info_hash, 0, bytes);
memset(new_laddrhash, 0, bytes);

fib_hash_move(new_info_hash, new_laddrhash, new_size);
}



On Fri, 17 Sep 2004 11:27:44 -0700, David S. Miller <[email protected]> wrote:
>
> Thanks David, I'll push this upstream asap.
>
> I can't believe in all the route testing I did I never
> triggered this on my sparc64 boxes, must have been lucky :(
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>



--
Jon Smirl
[email protected]

2004-09-18 00:28:27

by Herbert Xu

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

Jon Smirl <[email protected]> wrote:
> I'm still OOPsing at boot in fib_disable_ip+21 from
> fib_netdev_event+63. Both e1000 and tg3 are effected. I have current
> linus bk as of time of this message.

Please post the complete error message.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2004-09-18 00:59:26

by [email protected]

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

I have verified that compiling the drivers in avoids the problem.

I'll boot again and get more of the error message. It's not making it
to the logs so I am copying it by hand from the screen.


On Sat, 18 Sep 2004 10:27:47 +1000, Herbert Xu
<[email protected]> wrote:
> Jon Smirl <[email protected]> wrote:
> > I'm still OOPsing at boot in fib_disable_ip+21 from
> > fib_netdev_event+63. Both e1000 and tg3 are effected. I have current
> > linus bk as of time of this message.
>
> Please post the complete error message.
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>



--
Jon Smirl
[email protected]

2004-09-18 01:37:25

by [email protected]

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

Call stack at failure:
e1000_exit_module
...pci calls...
e1000_remove
unregister_netdev
unregister_netdevice
notifier_call_chain
fib_netdev_event
fib_disable_ip
error_code

Rest of the info has scrolled off the screen.

The problem is when RH/Fedora is doing it's modprobe/rmmod to detect
what hardware is in the system since that's the only thing that would
be rmmod'ing e1000.

On the same system if I disable networking and boot, I can
modprobe/rmmod the drivers without problem. So I'd conclude that RH is
doing something special during it's probing phase, but I don't know
enough about the RH init scripts to know what it is.


On Sat, 18 Sep 2004 10:27:47 +1000, Herbert Xu
<[email protected]> wrote:
> Jon Smirl <[email protected]> wrote:
> > I'm still OOPsing at boot in fib_disable_ip+21 from
> > fib_netdev_event+63. Both e1000 and tg3 are effected. I have current
> > linus bk as of time of this message.
>
> Please post the complete error message.
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>



--
Jon Smirl
[email protected]

2004-09-18 04:16:57

by Herbert Xu

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

On Fri, Sep 17, 2004 at 09:37:15PM -0400, Jon Smirl wrote:
> Call stack at failure:
> e1000_exit_module
> ...pci calls...
> e1000_remove
> unregister_netdev
> unregister_netdevice
> notifier_call_chain
> fib_netdev_event
> fib_disable_ip
> error_code

Thanks. The following bug is probably your problem.

> Rest of the info has scrolled off the screen.

You should be able to hit Shift-PageUp to scroll up.

There is a thinko in the allocation for the devindex hash. We're
only giving it 8 elements when it should be 1<<8 elements.

Signed-off-by: Herbert Xu <[email protected]>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Attachments:
(No filename) (810.00 B)
p (718.00 B)
Download all attachments

2004-09-18 05:22:39

by [email protected]

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

Still getting the same fault with the patch. Someone else has this
problem too. The full stack trace is in this thread....

[2.6.9-rc2-bk] Network-related panic on boot

--
Jon Smirl
[email protected]

2004-09-18 06:34:47

by David Miller

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

On Sat, 18 Sep 2004 14:16:28 +1000
Herbert Xu <[email protected]> wrote:

> Thanks. The following bug is probably your problem.

Good catch on this fix, but really he's hitting the
BUG_ON() in fib_sync_down() (I hate i386 backtraces,
it's an art to decode them properly)

So if you rmmod() a device before any routes are ever
created in ipv4, this triggers. I didn't think this
was possible, but it is.

The fix is simple enough.

===== net/ipv4/fib_semantics.c 1.16 vs edited =====
--- 1.16/net/ipv4/fib_semantics.c 2004-09-17 11:11:04 -07:00
+++ edited/net/ipv4/fib_semantics.c 2004-09-17 23:14:44 -07:00
@@ -1040,9 +1040,7 @@
if (force)
scope = -1;

- BUG_ON(!fib_info_laddrhash);
-
- if (local) {
+ if (local && fib_info_laddrhash) {
unsigned int hash = fib_laddr_hashfn(local);
struct hlist_head *head = &fib_info_laddrhash[hash];
struct hlist_node *node;

2004-09-18 15:31:33

by [email protected]

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

The last patch fixes things so that I can boot. The net is working too.

--
Jon Smirl
[email protected]

2004-09-18 20:31:12

by jamal

[permalink] [raw]
Subject: Re: [TRIVIAL] Fix recent bug in fib_semantics.c

On Sat, 2004-09-18 at 02:31, David S. Miller wrote:
> On Sat, 18 Sep 2004 14:16:28 +1000

> So if you rmmod() a device before any routes are ever
> created in ipv4, this triggers. I didn't think this
> was possible, but it is.

May have been exposed by LLTX. When i turned off LLTX on e1000
before seeing your fix, the oops disapeared.

cheers,
jamal