2003-07-24 17:11:00

by Doug Hunley

[permalink] [raw]
Subject: 2.6.0: Badness in pci_find_subsys!!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just had my athlon box lock-up solid. needed SysRq to reboot the thing..
kernel info follows:
Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
drivers/pci/search.c:132
Jul 24 13:08:23 doug kernel: Call Trace:
Jul 24 13:08:23 doug kernel: [<c02064a1>] pci_find_subsys+0x111/0x120
Jul 24 13:08:23 doug kernel: [<c02064df>] pci_find_device+0x2f/0x40
Jul 24 13:08:23 doug kernel: [<c0206368>] pci_find_slot+0x28/0x50
Jul 24 13:08:23 doug kernel: [<f8a2ada4>] os_pci_init_handle+0x3a/0x67
[nvidia]
Jul 24 13:08:23 doug kernel: [<f8a3cedf>] __nvsym00015+0x1f/0x24 [nvidia]
Jul 24 13:08:23 doug kernel: [<f8b3ea56>] __nvsym04619+0xf6/0x164 [nvidia]
Jul 24 13:08:23 doug kernel: [<f8b3e82a>] __nvsym00717+0x21a/0x224 [nvidia]
Jul 24 13:08:23 doug kernel: [<f8adc8b4>] __nvsym03735+0x60/0x88 [nvidia]
Jul 24 13:08:23 doug kernel: [<f8adc465>] __nvsym00553+0x67d/0x944 [nvidia]
Jul 24 13:08:23 doug kernel: [<c015ca2b>] do_sync_read+0x8b/0xc0
Jul 24 13:08:23 doug kernel: [<c011d87a>] __wake_up_common+0x3a/0x60
Jul 24 13:08:23 doug kernel: [<f8b0691a>] __nvsym00630+0x9a/0x17c [nvidia]
Jul 24 13:08:23 doug kernel: [<f8a3f399>] __nvsym00746+0xd/0x1c [nvidia]
Jul 24 13:08:23 doug kernel: [<f8a3ffe0>] rm_isr_bh+0xc/0x10 [nvidia]
Jul 24 13:08:23 doug kernel: [<c01266a2>] tasklet_action+0x72/0xc0
Jul 24 13:08:23 doug kernel: [<c01263d3>] do_softirq+0xd3/0xe0
Jul 24 13:08:23 doug kernel: [<c010bd18>] do_IRQ+0x148/0x1a0
Jul 24 13:08:23 doug kernel: [<c0109dd8>] common_interrupt+0x18/0x20
Jul 24 13:08:23 doug kernel:

- --
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net && http://www.linux-sxs.org

Ahhh...I see the screw-up fairy has visited us again...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IBap2MO5UukaubkRArpmAKCMwD4E3SQ0mUXWKwor4qby0unejwCeNAkk
X4sGiKj7rCD9n/7yYfrQy6Y=
=DzuB
-----END PGP SIGNATURE-----


2003-07-24 17:20:31

by Greg KH

[permalink] [raw]
Subject: Re: 2.6.0: Badness in pci_find_subsys!!

On Thu, Jul 24, 2003 at 01:26:01PM -0400, Douglas J Hunley wrote:
> Just had my athlon box lock-up solid. needed SysRq to reboot the thing..
> kernel info follows:
> Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
> drivers/pci/search.c:132
> Jul 24 13:08:23 doug kernel: Call Trace:
> Jul 24 13:08:23 doug kernel: [<c02064a1>] pci_find_subsys+0x111/0x120
> Jul 24 13:08:23 doug kernel: [<c02064df>] pci_find_device+0x2f/0x40
> Jul 24 13:08:23 doug kernel: [<c0206368>] pci_find_slot+0x28/0x50
> Jul 24 13:08:23 doug kernel: [<f8a2ada4>] os_pci_init_handle+0x3a/0x67
> [nvidia]

You are using the nvidia driver. Go complain to them as we can do
nothing about their code, sorry.

greg k-h

2003-07-24 17:33:37

by Doug Hunley

[permalink] [raw]
Subject: Re: 2.6.0: Badness in pci_find_subsys!!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Greg KH shocked and awed us all by speaking:
> On Thu, Jul 24, 2003 at 01:26:01PM -0400, Douglas J Hunley wrote:
> > Just had my athlon box lock-up solid. needed SysRq to reboot the thing..
> > kernel info follows:
> > Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
> > drivers/pci/search.c:132
> > Jul 24 13:08:23 doug kernel: Call Trace:
> > Jul 24 13:08:23 doug kernel: [<c02064a1>] pci_find_subsys+0x111/0x120
> > Jul 24 13:08:23 doug kernel: [<c02064df>] pci_find_device+0x2f/0x40
> > Jul 24 13:08:23 doug kernel: [<c0206368>] pci_find_slot+0x28/0x50
> > Jul 24 13:08:23 doug kernel: [<f8a2ada4>] os_pci_init_handle+0x3a/0x67
> > [nvidia]
>
> You are using the nvidia driver. Go complain to them as we can do
> nothing about their code, sorry.

sure. I didn't know for sure that the fault was nvidia's.
- --
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net && http://www.linux-sxs.org

What am I?... Flypaper for freaks!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IBuY2MO5UukaubkRAiBrAJ9ScHHY8Z7Otu733OOhMAUgZkraDgCfb6tn
dsvrskTMQXSSWK4wejtpBes=
=vUV7
-----END PGP SIGNATURE-----

2003-07-24 22:55:33

by Kurt Wall

[permalink] [raw]
Subject: Re: 2.6.0: Badness in pci_find_subsys!!

Quoth Douglas J Hunley:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Greg KH shocked and awed us all by speaking:
> > On Thu, Jul 24, 2003 at 01:26:01PM -0400, Douglas J Hunley wrote:
> > > Just had my athlon box lock-up solid. needed SysRq to reboot the thing..
> > > kernel info follows:
> > > Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
> > > drivers/pci/search.c:132
> > > Jul 24 13:08:23 doug kernel: Call Trace:
> > > Jul 24 13:08:23 doug kernel: [<c02064a1>] pci_find_subsys+0x111/0x120
> > > Jul 24 13:08:23 doug kernel: [<c02064df>] pci_find_device+0x2f/0x40
> > > Jul 24 13:08:23 doug kernel: [<c0206368>] pci_find_slot+0x28/0x50
> > > Jul 24 13:08:23 doug kernel: [<f8a2ada4>] os_pci_init_handle+0x3a/0x67
> > > [nvidia]
> >
> > You are using the nvidia driver. Go complain to them as we can do
> > nothing about their code, sorry.
>
> sure. I didn't know for sure that the fault was nvidia's.

Do you get the same lock up *sans* the nvidia binary-only module?

Kurt
--
I am so optimistic about beef prices that I've just leased a pot roast
with an option to buy.

2003-07-25 01:51:58

by Doug Hunley

[permalink] [raw]
Subject: Re: 2.6.0: Badness in pci_find_subsys!!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kurt Wall shocked and awed us all by speaking:
> Quoth Douglas J Hunley:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Greg KH shocked and awed us all by speaking:
> > > On Thu, Jul 24, 2003 at 01:26:01PM -0400, Douglas J Hunley wrote:
> > > > Just had my athlon box lock-up solid. needed SysRq to reboot the
> > > > thing.. kernel info follows:
> > > > Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
> > > > drivers/pci/search.c:132
> > > > Jul 24 13:08:23 doug kernel: Call Trace:
> > > > Jul 24 13:08:23 doug kernel: [<c02064a1>]
> > > > pci_find_subsys+0x111/0x120 Jul 24 13:08:23 doug kernel:
> > > > [<c02064df>] pci_find_device+0x2f/0x40 Jul 24 13:08:23 doug kernel:
> > > > [<c0206368>] pci_find_slot+0x28/0x50 Jul 24 13:08:23 doug kernel:
> > > > [<f8a2ada4>] os_pci_init_handle+0x3a/0x67 [nvidia]
> > >
> > > You are using the nvidia driver. Go complain to them as we can do
> > > nothing about their code, sorry.
> >
> > sure. I didn't know for sure that the fault was nvidia's.
>
> Do you get the same lock up *sans* the nvidia binary-only module?

dont know. haven't run 2.6 w/o the nvidia driver. this *never* locked up using
the same nvidia driver under 2.4.x though
- --
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net && http://www.linux-sxs.org

Quando omni flunkus moritati (when all else fails play dead)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IJCq2MO5UukaubkRAtXzAJ9u5U8peQ024B3kdEosilNrrNZtWACbBZX4
ieWcrn+zkd+QcD2//6imowA=
=JGSs
-----END PGP SIGNATURE-----

2003-07-25 03:30:42

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.0: Badness in pci_find_subsys!!

On Thu, 24 Jul 2003 13:26:01 EDT, Douglas J Hunley <[email protected]> said:

> Just had my athlon box lock-up solid. needed SysRq to reboot the thing..
> kernel info follows:
> Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
> drivers/pci/search.c:132
> Jul 24 13:08:23 doug kernel: Call Trace:
> Jul 24 13:08:23 doug kernel: [<c02064a1>] pci_find_subsys+0x111/0x120
> Jul 24 13:08:23 doug kernel: [<c02064df>] pci_find_device+0x2f/0x40
> Jul 24 13:08:23 doug kernel: [<c0206368>] pci_find_slot+0x28/0x50
> Jul 24 13:08:23 doug kernel: [<f8a2ada4>] os_pci_init_handle+0x3a/0x67

The 'badness in pci_find_subsys' may not be related to your hang.

The NVidia msgs are basically caused by the fact that pci_find_slot() is
getting called in an interrupt, so we trigger the WARN_ON in pci_find_subsys().
The worry here is that we may be walking the PCI list on the interrupt side
while something else is hotplugging a new device into existence, causing it to
walk off the end of a inconsistent list. Unless you actually crapped out right
at 13:08:23, it's probably unrelated.

(I was getting the same NVidia traceback on a regular basis (3-4 at every start
of the X server, and 1 at X server shutdown) under 2.5.72-mm3, they stopped
when I went to 2.5.73-mm1. If you're still seeing them in 2.6.0-test1, I would
suspect something different in the -mm series is fixing them for me - first place
to look is what got added between 72-mm3 and 73-mm1.


Attachments:
(No filename) (226.00 B)

2003-07-25 15:03:32

by Doug Hunley

[permalink] [raw]
Subject: Re: 2.6.0: Badness in pci_find_subsys!!

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[email protected] shocked and awed us all by speaking:
> On Thu, 24 Jul 2003 13:26:01 EDT, Douglas J Hunley <[email protected]>
said:
> > Just had my athlon box lock-up solid. needed SysRq to reboot the thing..
> > kernel info follows:
> > Jul 24 13:08:23 doug kernel: Badness in pci_find_subsys at
> > drivers/pci/search.c:132
> > Jul 24 13:08:23 doug kernel: Call Trace:
> > Jul 24 13:08:23 doug kernel: [<c02064a1>] pci_find_subsys+0x111/0x120
> > Jul 24 13:08:23 doug kernel: [<c02064df>] pci_find_device+0x2f/0x40
> > Jul 24 13:08:23 doug kernel: [<c0206368>] pci_find_slot+0x28/0x50
> > Jul 24 13:08:23 doug kernel: [<f8a2ada4>] os_pci_init_handle+0x3a/0x67
>
> The 'badness in pci_find_subsys' may not be related to your hang.
>
> The NVidia msgs are basically caused by the fact that pci_find_slot() is
> getting called in an interrupt, so we trigger the WARN_ON in
> pci_find_subsys(). The worry here is that we may be walking the PCI list on
> the interrupt side while something else is hotplugging a new device into
> existence, causing it to walk off the end of a inconsistent list. Unless
> you actually crapped out right at 13:08:23, it's probably unrelated.

OK. But I don't have any hot-plugging enabled on this machine. Unless the
kernel is internally doing things...

It crapped out within a matter of seconds. Started chewing up all available
system RAM, then went totally non-responsive to anything but SysRQ (couldn't
even kill X with CTRL-ALT-BKSP)

>
> (I was getting the same NVidia traceback on a regular basis (3-4 at every
> start of the X server, and 1 at X server shutdown) under 2.5.72-mm3, they
> stopped when I went to 2.5.73-mm1. If you're still seeing them in
> 2.6.0-test1, I would suspect something different in the -mm series is
> fixing them for me - first place to look is what got added between 72-mm3
> and 73-mm1.

I try to stick w/ Linus' tree, but I'll attempt to decipher the changelogs on
the -mm tree...
- --
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net && http://www.linux-sxs.org

It takes 47 muscles to frown, but only 4 to pull the trigger of a finely tuned
sniper rifle.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/IUms2MO5UukaubkRArquAJ9uQPVhVSXeyORENJtJxm3ROL9HxgCcDETj
5SXTjSq70hlgXz56TErFDlk=
=EKbw
-----END PGP SIGNATURE-----