2003-05-28 04:12:04

by CaT

[permalink] [raw]
Subject: 2.5.70: pcmcia oops (a real one! honest!)

removed my xircom pcmcia realport card and put in another. End result was
total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
still works). I then removed the xircom card. The following was in dmesg:

Unable to handle kernel paging request at virtual address 6b6b6b6f
printing eip:
c020522b
*pde = 00000000
Oops: 0002 [#1]
CPU: 0
EIP: 0060:[<c020522b>] Not tainted
EFLAGS: 00010282
EIP is at pci_remove_bus_device+0x47/0x74
eax: 6b6b6b6b ebx: 00000000 ecx: c04a9a64 edx: 6b6b6b6b
esi: c138291c edi: 00000080 ebp: cfdc7f40 esp: cfdc7f34
ds: 007b es: 007b ss: 0068
Process pccardd (pid: 10, threadinfo=cfdc6000 task=c136e060)
Stack: c1382968 c138210c cfe1089c cfdc7f54 c020527c c138291c cfe1089c c1305044
cfdc7f64 c02c746d cfe1089c c1305044 cfdc7f88 c02c4564 c1305044 c1305044
c1305050 c1305044 c1305044 00000080 c136e060 cfdc7f98 c02c46d8 c1305044
Call Trace:
[<c020527c>] pci_remove_behind_bridge+0x24/0x48
[<c02c746d>] cb_free+0x1d/0x30
[<c02c4564>] shutdown_socket+0x70/0xe8
[<c02c46d8>] socket_shutdown+0x38/0x40
[<c02c4ae6>] pccardd+0x10e/0x1c4
[<c02c49d8>] pccardd+0x0/0x1c4
[<c0119110>] default_wake_function+0x0/0x20
[<c0119110>] default_wake_function+0x0/0x20
[<c0107211>] kernel_thread_helper+0x5/0xc

Code: 89 50 04 89 02 8b 56 04 8b 06 83 c4 04 89 50 04 89 02 56 e8
<1>Unable to handle kernel paging request at virtual address 50601a3c
printing eip:
50601a3c
*pde = 00000000
Oops: 0000 [#2]
CPU: 0
EIP: 0060:[<50601a3c>] Not tainted
EFLAGS: 00010012
EIP is at 0x50601a3c
eax: 50601a3c ebx: 2a456029 ecx: 00000003 edx: cfdc7fe4
esi: 00000001 edi: 5374872e ebp: c12a9f48 esp: c12a9f2c
ds: 007b es: 007b ss: 0068
Process events/0 (pid: 3, threadinfo=c12a8000 task=c12acc80)
Stack: c0119163 cfdc7fd8 00000003 00000000 c12a8000 00000246 c0551218 c12a9f68
c0119198 c13051c4 00000003 00000001 00000000 c1305044 00000080 c12a9f74
c02c4bda c05511e0 c12a9f8c c02c982f c1305044 00000080 c12a8000 c02c97f4
Call Trace:
[<c0119163>] __wake_up_common+0x33/0x4c
[<c0119198>] __wake_up+0x1c/0x40
[<c02c4bda>] parse_events+0x3e/0x44
[<c02c982f>] yenta_bh+0x3b/0x44
[<c02c97f4>] yenta_bh+0x0/0x44
[<c01283ff>] worker_thread+0x1a3/0x270
[<c012825c>] worker_thread+0x0/0x270
[<c02c97f4>] yenta_bh+0x0/0x44
[<c0119110>] default_wake_function+0x0/0x20
[<c0119110>] default_wake_function+0x0/0x20
[<c0107211>] kernel_thread_helper+0x5/0xc

Code: Bad EIP value.
<6>note: events/0[3] exited with preempt_count 1

--
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
- http://www.informationclearinghouse.info/article2479.htm


2003-05-28 04:25:13

by CaT

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Wed, May 28, 2003 at 02:26:10PM +1000, CaT wrote:
> removed my xircom pcmcia realport card and put in another. End result was
> total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
> still works). I then removed the xircom card. The following was in dmesg:

A bit more info:

lspci segfaults
cat /proc/pci segfaults
cat /proc/bus/pci/devices segfaults
cat /proc/bus/pci/02/00.* segfaults BUT that is only because it is 00.0
that is causing it. This represents the ethernet side of the card and 00.1
is fine.

also, processes hung on me. bash hung on exit and took screen with it. mutt
hung on exit also and took screen with it aswell.

on reboot init reported some hung processes and all up it all went
spaz on me with two of my partitions not being able to be unmounted
(prolly cos of the hung processes) and the e100 driver dieing in
e100_notify_reboot (or somesuch name). the laptop refused to reboot and
I had to powercycle.

--
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
- http://www.informationclearinghouse.info/article2479.htm

2003-05-29 07:49:01

by Russell King

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Wed, May 28, 2003 at 02:26:10PM +1000, CaT wrote:
> removed my xircom pcmcia realport card and put in another. End result was
> total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
> still works). I then removed the xircom card. The following was in dmesg:

I'm assuming that this is something Gregkh needs to look into and not
myself; my guess is that it's related to the pci device accounting stuff.

Greg?

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-05-29 21:07:06

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Thu, May 29, 2003 at 09:02:09AM +0100, Russell King wrote:
> On Wed, May 28, 2003 at 02:26:10PM +1000, CaT wrote:
> > removed my xircom pcmcia realport card and put in another. End result was
> > total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
> > still works). I then removed the xircom card. The following was in dmesg:
>
> I'm assuming that this is something Gregkh needs to look into and not
> myself; my guess is that it's related to the pci device accounting stuff.
>
> Greg?

Yeah, it could be. Cat, can you revert the following patch from your
tree and let me know if it fixes your problem or not?

thanks,

greg k-h



diff -Nru a/drivers/pci/bus.c b/drivers/pci/bus.c
--- a/drivers/pci/bus.c Thu May 29 14:18:20 2003
+++ b/drivers/pci/bus.c Thu May 29 14:18:20 2003
@@ -92,7 +92,7 @@
if (!list_empty(&dev->global_list))
continue;

- device_register(&dev->dev);
+ device_add(&dev->dev);
list_add_tail(&dev->global_list, &pci_devices);
#ifdef CONFIG_PROC_FS
pci_proc_attach_device(dev);
diff -Nru a/drivers/pci/hotplug.c b/drivers/pci/hotplug.c
--- a/drivers/pci/hotplug.c Thu May 29 14:18:20 2003
+++ b/drivers/pci/hotplug.c Thu May 29 14:18:20 2003
@@ -275,7 +275,7 @@
pci_proc_detach_device(dev);
#endif

- kfree(dev);
+ pci_put_dev(dev);
}

/**
diff -Nru a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
--- a/drivers/pci/pci-driver.c Thu May 29 14:18:20 2003
+++ b/drivers/pci/pci-driver.c Thu May 29 14:18:20 2003
@@ -199,6 +199,45 @@
return 0;
}

+/**
+ * pci_get_dev - increments the reference count of the pci device structure
+ * @dev: the device being referenced
+ *
+ * Each live reference to a device should be refcounted.
+ *
+ * Drivers for PCI devices should normally record such references in
+ * their probe() methods, when they bind to a device, and release
+ * them by calling pci_put_dev(), in their disconnect() methods.
+ *
+ * A pointer to the device with the incremented reference counter is returned.
+ */
+struct pci_dev *pci_get_dev (struct pci_dev *dev)
+{
+ struct device *tmp;
+
+ if (!dev)
+ return NULL;
+
+ tmp = get_device(&dev->dev);
+ if (tmp)
+ return to_pci_dev(tmp);
+ else
+ return NULL;
+}
+
+/**
+ * pci_put_dev - release a use of the pci device structure
+ * @dev: device that's been disconnected
+ *
+ * Must be called when a user of a device is finished with it. When the last
+ * user of the device calls this function, the memory of the device is freed.
+ */
+void pci_put_dev(struct pci_dev *dev)
+{
+ if (dev)
+ put_device(&dev->dev);
+}
+
struct bus_type pci_bus_type = {
.name = "pci",
.match = pci_bus_match,
@@ -217,3 +256,5 @@
EXPORT_SYMBOL(pci_unregister_driver);
EXPORT_SYMBOL(pci_dev_driver);
EXPORT_SYMBOL(pci_bus_type);
+EXPORT_SYMBOL(pci_get_dev);
+EXPORT_SYMBOL(pci_put_dev);
diff -Nru a/drivers/pci/probe.c b/drivers/pci/probe.c
--- a/drivers/pci/probe.c Thu May 29 14:18:20 2003
+++ b/drivers/pci/probe.c Thu May 29 14:18:20 2003
@@ -462,6 +462,21 @@
return 0;
}

+/**
+ * pci_release_dev - free a pci device structure when all users of it are finished.
+ * @dev: device that's been disconnected
+ *
+ * Will be called only by the device core when all users of this pci device are
+ * done.
+ */
+static void pci_release_dev(struct device *dev)
+{
+ struct pci_dev *pci_dev;
+
+ pci_dev = to_pci_dev(dev);
+ kfree(pci_dev);
+}
+
/*
* Read the config data for a PCI device, sanity-check it
* and fill in the dev structure...
@@ -506,6 +521,9 @@
kfree(dev);
return NULL;
}
+ device_initialize(&dev->dev);
+ dev->dev.release = pci_release_dev;
+ pci_get_dev(dev);

pci_name_device(dev);

diff -Nru a/include/linux/pci.h b/include/linux/pci.h
--- a/include/linux/pci.h Thu May 29 14:18:20 2003
+++ b/include/linux/pci.h Thu May 29 14:18:20 2003
@@ -556,6 +556,8 @@
struct resource *pci_find_parent_resource(const struct pci_dev *dev, struct resource *res);
int pci_setup_device(struct pci_dev *dev);
int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
+extern struct pci_dev *pci_get_dev(struct pci_dev *dev);
+extern void pci_put_dev(struct pci_dev *dev);

/* Generic PCI functions exported to card drivers */

2003-05-31 15:28:20

by CaT

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Thu, May 29, 2003 at 02:21:39PM -0700, Greg KH wrote:
> On Thu, May 29, 2003 at 09:02:09AM +0100, Russell King wrote:
> > On Wed, May 28, 2003 at 02:26:10PM +1000, CaT wrote:
> > > removed my xircom pcmcia realport card and put in another. End result was
> > > total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
> > > still works). I then removed the xircom card. The following was in dmesg:
> >
> > I'm assuming that this is something Gregkh needs to look into and not
> > myself; my guess is that it's related to the pci device accounting stuff.
> >
> > Greg?
>
> Yeah, it could be. Cat, can you revert the following patch from your
> tree and let me know if it fixes your problem or not?

The kernel no longer crashes on remove and I can reinsert and it
recognises the card without hassle. I do get no messages on eject though
(about devices being deregistered, etc) but I get msgs on insert (about
them getting regstered etc). One time I didn't get the card recognised
at all on insert... dunno if that was myfault or not but on eject and
reinsert all was fine.

--
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
- http://www.informationclearinghouse.info/article2479.htm

2003-06-02 20:50:48

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Sun, Jun 01, 2003 at 01:41:42AM +1000, CaT wrote:
> On Thu, May 29, 2003 at 02:21:39PM -0700, Greg KH wrote:
> > On Thu, May 29, 2003 at 09:02:09AM +0100, Russell King wrote:
> > > On Wed, May 28, 2003 at 02:26:10PM +1000, CaT wrote:
> > > > removed my xircom pcmcia realport card and put in another. End result was
> > > > total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
> > > > still works). I then removed the xircom card. The following was in dmesg:
> > >
> > > I'm assuming that this is something Gregkh needs to look into and not
> > > myself; my guess is that it's related to the pci device accounting stuff.
> > >
> > > Greg?
> >
> > Yeah, it could be. Cat, can you revert the following patch from your
> > tree and let me know if it fixes your problem or not?
>
> The kernel no longer crashes on remove and I can reinsert and it
> recognises the card without hassle. I do get no messages on eject though
> (about devices being deregistered, etc) but I get msgs on insert (about
> them getting regstered etc). One time I didn't get the card recognised
> at all on insert... dunno if that was myfault or not but on eject and
> reinsert all was fine.

Ok, I've duplicated this here with a PCI card containing a bridge on a
pci hotplug system, so I'll work on tracking this down...

thanks for letting me know.

greg k-h

2003-06-03 19:13:22

by Greg KH

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Mon, Jun 02, 2003 at 02:05:37PM -0700, Greg KH wrote:
> On Sun, Jun 01, 2003 at 01:41:42AM +1000, CaT wrote:
> > On Thu, May 29, 2003 at 02:21:39PM -0700, Greg KH wrote:
> > > On Thu, May 29, 2003 at 09:02:09AM +0100, Russell King wrote:
> > > > On Wed, May 28, 2003 at 02:26:10PM +1000, CaT wrote:
> > > > > removed my xircom pcmcia realport card and put in another. End result was
> > > > > total loss of ps2 keyboard functionality (everything else, inc the ps2 mouse
> > > > > still works). I then removed the xircom card. The following was in dmesg:
> > > >
> > > > I'm assuming that this is something Gregkh needs to look into and not
> > > > myself; my guess is that it's related to the pci device accounting stuff.
> > > >
> > > > Greg?
> > >
> > > Yeah, it could be. Cat, can you revert the following patch from your
> > > tree and let me know if it fixes your problem or not?
> >
> > The kernel no longer crashes on remove and I can reinsert and it
> > recognises the card without hassle. I do get no messages on eject though
> > (about devices being deregistered, etc) but I get msgs on insert (about
> > them getting regstered etc). One time I didn't get the card recognised
> > at all on insert... dunno if that was myfault or not but on eject and
> > reinsert all was fine.
>
> Ok, I've duplicated this here with a PCI card containing a bridge on a
> pci hotplug system, so I'll work on tracking this down...

Ah, stupid bug in the driver class code was causing this. Can you try
this patch out against a clean 2.5.70 tree? It fixes the problem for
me, and I want to make sure it fixes it for you too.

thanks,

greg k-h

#Driver Class: don't call put_device() when we never called get_device()
#
#This fixes a oops when unplugging pci network devices.
#

diff -Nru a/drivers/base/class.c b/drivers/base/class.c
--- a/drivers/base/class.c Tue Jun 3 12:24:05 2003
+++ b/drivers/base/class.c Tue Jun 3 12:24:05 2003
@@ -311,11 +311,8 @@
up_write(&parent->subsys.rwsem);
}

- if (class_dev->dev) {
- class_device_dev_unlink(class_dev);
- class_device_driver_unlink(class_dev);
- put_device(class_dev->dev);
- }
+ class_device_dev_unlink(class_dev);
+ class_device_driver_unlink(class_dev);

kobject_del(&class_dev->kobj);

2003-06-05 04:10:34

by CaT

[permalink] [raw]
Subject: Re: 2.5.70: pcmcia oops (a real one! honest!)

On Tue, Jun 03, 2003 at 12:28:36PM -0700, Greg KH wrote:
> Ah, stupid bug in the driver class code was causing this. Can you try
> this patch out against a clean 2.5.70 tree? It fixes the problem for
> me, and I want to make sure it fixes it for you too.

It does. Whilst I get no messages telling me ttyS1 and eth1 have been
deregistered, the kernel doesn't crash on me either and everything
appears normal with lspci.

Whee. :)

--
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
- http://www.informationclearinghouse.info/article2479.htm