2003-07-04 01:29:54

by Michael Frank

[permalink] [raw]
Subject: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

modprobe yenta-socket produces oops below _only_ after cold boot and _only_ when e100 loaded.
No PCMCIA problems with this system with 2.4 and 2.5 until recent PCMCIA rework.

Reproduced behavior with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

2.5.73-mm2 no oops but hangs about 1 in 10 at
PCI: Enabling device 0:12.0 (0->2) (PCMCIA). e100 was loaded but not tested wo e100

Conditions:
Cold-boot - no oops when warm-boot+load after successful load or when unload+load
e100 loaded

Oops appears 1 in 4 loads and looks similar every time

Setup:
ACPI core enabled, no usb

$ lsmod
pcmcia_core
toshiba_acpi
e100

$ lspci
00:00.0 Host bridge: Transmeta Corporation LongRun Northbridge (rev 01)
00:00.1 RAM memory: Transmeta Corporation SDRAM controller
00:00.2 RAM memory: Transmeta Corporation BIOS scratchpad
00:04.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/IX-MV (rev 13)
00:06.0 Multimedia audio controller: ALi Corporation M5451 PCI AC-Link Controller Audio Device (rev 01)
00:07.0 ISA bridge: ALi Corporation M1533 PCI to ISA Bridge [Aladdin IV]
00:0e.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
00:10.0 IDE interface: ALi Corporation M5229 IDE (rev c3)
00:11.0 Bridge: ALi Corporation M7101 PMU
00:12.0 CardBus bridge: Toshiba America Info Systems ToPIC95 PCI to Cardbus Bridge with ZV Support (rev 32)
00:14.0 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)

No serial port, Oops taken from screen
unable to handle null pointer dereference at 0
oops: 0000 #1
EFLAGS 00010086
EIP is at __wake_up_common+0x13
eax ce09c9c0 ebx 286 ecx 1 edx 0
esi 1 edi 0 ebp ccc67dcc esp ccc67dc0
ds 7b es 7b ss 68
Process modprobe pid 1153 threadinfo ccc66000 task cd68e080
Stack: 286 4000001 0 ccc67de8 c011afa1 ce09c9c0 3 1
0 ce09c800 ccc67df0 cf8a3ecf cccc67e04 cf87a7ea ce09c830 80
cdffec00 ccc67e24 c010d0aa 5 ce09c800 ccc67e50 280 5
Call trace:
__wake_up+0x11
pcmcia_parse_events+0x23
yenta_interrupt+0x26
handle_IRQ_event+0x2a
do_IRQ+0x82
common_interrupt+0x18
setup_irq+0x9b
yenta_interrupt+0x0
request_irq+0x89
yenta_probe+0x137
yenta_interrupt+0x0
pci_device_probe_static+0x20
pci_device_probe+0x21
bus_match+0x38
driver_attach+0x3e
bus_add_driver+0x6e
driver_register+0x36
pci_register_driver+0x6a
yenta_socket_init+0xd
sys_init_module+0xe0
syscall_call+0x7
Code: 8b 3a 8b 45 08 83 c0 04 39 c2 74 23 8b 5a f4 8b 4d 14 51 8b
<0> Fatal exception in interrupt
In interrupt handler - not syncing

It is now running allright by starting pcmcia ahead of network.

Typical dmesg enclosed as bz2

Regards
Michael
--
Powered by linux-2.5.74-mm1. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp and ACPI S3
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt










Attachments:
(No filename) (2.82 kB)
dmesg.2.5.74.bz2 (3.55 kB)
Download all attachments

2003-07-05 09:54:04

by Michael Frank

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

On Friday 04 July 2003 09:10, Michael Frank wrote:
> modprobe yenta-socket produces oops below _only_ after cold boot and _only_
> when e100 loaded. No PCMCIA problems with this system with 2.4 and 2.5
> until recent PCMCIA rework.
>
> Reproduced behavior with 2.5.73-mm3, 2.5.74, 2.5.74-mm1
>
> 2.5.73-mm2 no oops but hangs about 1 in 10 at
> PCI: Enabling device 0:12.0 (0->2) (PCMCIA). e100 was loaded but not
> tested wo e100
>
> Conditions:
> Cold-boot - no oops when warm-boot+load after successful load or when
> unload+load e100 loaded
>
> Oops appears 1 in 4 loads and looks similar every time
>
> Setup:
> ACPI core enabled, no usb
>
> $ lsmod
> pcmcia_core
> toshiba_acpi
> e100
>
> $ lspci
> 00:00.0 Host bridge: Transmeta Corporation LongRun Northbridge (rev 01)
> 00:00.1 RAM memory: Transmeta Corporation SDRAM controller
> 00:00.2 RAM memory: Transmeta Corporation BIOS scratchpad
> 00:04.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/IX-MV (rev 13)
> 00:06.0 Multimedia audio controller: ALi Corporation M5451 PCI AC-Link
> Controller Audio Device (rev 01) 00:07.0 ISA bridge: ALi Corporation M1533
> PCI to ISA Bridge [Aladdin IV] 00:0e.0 Ethernet controller: Intel Corp.
> 82557/8/9 [Ethernet Pro 100] (rev 08) 00:10.0 IDE interface: ALi
> Corporation M5229 IDE (rev c3)
> 00:11.0 Bridge: ALi Corporation M7101 PMU
> 00:12.0 CardBus bridge: Toshiba America Info Systems ToPIC95 PCI to Cardbus
> Bridge with ZV Support (rev 32) 00:14.0 USB Controller: ALi Corporation USB
> 1.1 Controller (rev 03)
>
> No serial port, Oops taken from screen
> unable to handle null pointer dereference at 0
> oops: 0000 #1
> EFLAGS 00010086
> EIP is at __wake_up_common+0x13
> eax ce09c9c0 ebx 286 ecx 1 edx 0
> esi 1 edi 0 ebp ccc67dcc esp ccc67dc0
> ds 7b es 7b ss 68
> Process modprobe pid 1153 threadinfo ccc66000 task cd68e080
> Stack: 286 4000001 0 ccc67de8 c011afa1 ce09c9c0 3 1
> 0 ce09c800 ccc67df0 cf8a3ecf cccc67e04 cf87a7ea ce09c830 80
> cdffec00 ccc67e24 c010d0aa 5 ce09c800 ccc67e50 280 5
> Call trace:
> __wake_up+0x11
> pcmcia_parse_events+0x23
> yenta_interrupt+0x26
> handle_IRQ_event+0x2a
> do_IRQ+0x82
> common_interrupt+0x18
> setup_irq+0x9b
> yenta_interrupt+0x0
> request_irq+0x89
> yenta_probe+0x137
> yenta_interrupt+0x0
> pci_device_probe_static+0x20
> pci_device_probe+0x21
> bus_match+0x38
> driver_attach+0x3e
> bus_add_driver+0x6e
> driver_register+0x36
> pci_register_driver+0x6a
> yenta_socket_init+0xd
> sys_init_module+0xe0
> syscall_call+0x7
> Code: 8b 3a 8b 45 08 83 c0 04 39 c2 74 23 8b 5a f4 8b 4d 14 51 8b
> <0> Fatal exception in interrupt
> In interrupt handler - not syncing
>
> It is now running allright by starting pcmcia ahead of network.
>

Just got the same oops with 2.5.74-mm1 on cold boot without e100 loaded.

With e100 probability is > 1 in 4
Without e100 probability is <1 in 10

Regards
Michael

--
Powered by linux-2.5.74-mm1. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp and ACPI S3
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt

2003-07-05 22:31:58

by Daniel Ritz

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

hello

problem is that an interrupt arrives before socket->thread_wait is initialized
so we crash in __wake_up_common. i think source of the interrupt is socket_init
called before the initialization. but an interrupt can still arrive before...

i think the whole init stuff should happen even before we do request_irq(). i
tried moving around pcmcia_register_socket() but then my card doesn't come up...
maybe we should add something like pcmcia_alloc_socket() which does kmalloc()
a socket struct and does all the important init stuff? russel?

michael, can you try this one?

rgds
-daniel


--- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
+++ edited/cs.c Sat Jul 5 23:58:07 2003
@@ -338,13 +338,13 @@
socket->erase_busy.next = socket->erase_busy.prev = &socket->erase_busy;
INIT_LIST_HEAD(&socket->cis_cache);
spin_lock_init(&socket->lock);
-
- init_socket(socket);
-
init_completion(&socket->thread_done);
init_waitqueue_head(&socket->thread_wait);
init_MUTEX(&socket->skt_sem);
spin_lock_init(&socket->thread_lock);
+
+ init_socket(socket);
+
ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
if (ret < 0)
return ret;




On Friday 04 July 2003 09:10, Michael Frank wrote:
> modprobe yenta-socket produces oops below _only_ after cold boot and _only_
> when e100 loaded. No PCMCIA problems with this system with 2.4 and 2.5
> until recent PCMCIA rework.
>
> Reproduced behavior with 2.5.73-mm3, 2.5.74, 2.5.74-mm1
>
> 2.5.73-mm2 no oops but hangs about 1 in 10 at
> PCI: Enabling device 0:12.0 (0->2) (PCMCIA). e100 was loaded but not
> tested wo e100
>
> Conditions:
> Cold-boot - no oops when warm-boot+load after successful load or when
> unload+load e100 loaded
>
> Oops appears 1 in 4 loads and looks similar every time
>
> Setup:
> ACPI core enabled, no usb
>
> $ lsmod
> pcmcia_core
> toshiba_acpi
> e100
>
> $ lspci
> 00:00.0 Host bridge: Transmeta Corporation LongRun Northbridge (rev 01)
> 00:00.1 RAM memory: Transmeta Corporation SDRAM controller
> 00:00.2 RAM memory: Transmeta Corporation BIOS scratchpad
> 00:04.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/IX-MV (rev 13)
> 00:06.0 Multimedia audio controller: ALi Corporation M5451 PCI AC-Link
> Controller Audio Device (rev 01) 00:07.0 ISA bridge: ALi Corporation M1533
> PCI to ISA Bridge [Aladdin IV] 00:0e.0 Ethernet controller: Intel Corp.
> 82557/8/9 [Ethernet Pro 100] (rev 08) 00:10.0 IDE interface: ALi
> Corporation M5229 IDE (rev c3)
> 00:11.0 Bridge: ALi Corporation M7101 PMU
> 00:12.0 CardBus bridge: Toshiba America Info Systems ToPIC95 PCI to Cardbus
> Bridge with ZV Support (rev 32) 00:14.0 USB Controller: ALi Corporation USB
> 1.1 Controller (rev 03)
>
> No serial port, Oops taken from screen
> unable to handle null pointer dereference at 0
> oops: 0000 #1
> EFLAGS 00010086
> EIP is at __wake_up_common+0x13
> eax ce09c9c0 ebx 286 ecx 1 edx 0
> esi 1 edi 0 ebp ccc67dcc esp ccc67dc0
> ds 7b es 7b ss 68
> Process modprobe pid 1153 threadinfo ccc66000 task cd68e080
> Stack: 286 4000001 0 ccc67de8 c011afa1 ce09c9c0 3 1
> 0 ce09c800 ccc67df0 cf8a3ecf cccc67e04 cf87a7ea ce09c830 80
> cdffec00 ccc67e24 c010d0aa 5 ce09c800 ccc67e50 280 5
> Call trace:
> __wake_up+0x11
> pcmcia_parse_events+0x23
> yenta_interrupt+0x26
> handle_IRQ_event+0x2a
> do_IRQ+0x82
> common_interrupt+0x18
> setup_irq+0x9b
> yenta_interrupt+0x0
> request_irq+0x89
> yenta_probe+0x137
> yenta_interrupt+0x0
> pci_device_probe_static+0x20
> pci_device_probe+0x21
> bus_match+0x38
> driver_attach+0x3e
> bus_add_driver+0x6e
> driver_register+0x36
> pci_register_driver+0x6a
> yenta_socket_init+0xd
> sys_init_module+0xe0
> syscall_call+0x7
> Code: 8b 3a 8b 45 08 83 c0 04 39 c2 74 23 8b 5a f4 8b 4d 14 51 8b
> <0> Fatal exception in interrupt
> In interrupt handler - not syncing
>
> It is now running allright by starting pcmcia ahead of network.
>

2003-07-06 03:15:10

by Michael Frank

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

Hello Daniel and Dominik,

Now I got two patches to try, thank you.

I got the patch below (not yet tested) from Dominik.

I will test this further on Monday.

Regards
Michael

diff -ruN linux-original/drivers/pcmcia/cs.c linux/drivers/pcmcia/cs.c
--- linux-original/drivers/pcmcia/cs.c 2003-07-05 10:22:58.000000000 +0200
+++ linux/drivers/pcmcia/cs.c 2003-07-05 10:28:56.000000000 +0200
@@ -351,6 +351,10 @@

wait_for_completion(&socket->thread_done);
BUG_ON(!socket->thread);
+
+ /* ok, allow interrupts to be parsed */
+ socket->init_done = 1;
+
pcmcia_parse_events(socket, SS_DETECT);

return 0;
@@ -361,6 +365,8 @@
struct pcmcia_socket *socket = class_get_devdata(class_dev);
client_t *client;

+ socket->init_done = 0; /* block interrupts */
+
if (socket->thread) {
init_completion(&socket->thread_done);
socket->thread = NULL;
@@ -870,6 +876,9 @@

void pcmcia_parse_events(struct pcmcia_socket *s, u_int events)
{
+ if (unlikely(&s->init_done == 0))
+ return;
+
spin_lock(&s->thread_lock);
s->thread_events |= events;
spin_unlock(&s->thread_lock);
diff -ruN linux-original/include/pcmcia/ss.h linux/include/pcmcia/ss.h
--- linux-original/include/pcmcia/ss.h 2003-07-05 10:23:00.000000000 +0200
+++ linux/include/pcmcia/ss.h 2003-07-05 10:24:22.000000000 +0200
@@ -215,6 +215,7 @@
wait_queue_head_t thread_wait;
spinlock_t thread_lock; /* protects thread_events */
unsigned int thread_events;
+ unsigned short init_done; /* interrupts are parsed only if this is != 0 */

/* pcmcia (16-bit) */
struct pcmcia_bus_socket *pcmcia;


On Sunday 06 July 2003 06:39, Daniel Ritz wrote:
> hello
>
> problem is that an interrupt arrives before socket->thread_wait is
> initialized so we crash in __wake_up_common. i think source of the
> interrupt is socket_init called before the initialization. but an interrupt
> can still arrive before...
>
> i think the whole init stuff should happen even before we do request_irq().
> i tried moving around pcmcia_register_socket() but then my card doesn't
> come up... maybe we should add something like pcmcia_alloc_socket() which
> does kmalloc() a socket struct and does all the important init stuff?
> russel?
>
> michael, can you try this one?
>
> rgds
> -daniel
>
>
> --- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
> +++ edited/cs.c Sat Jul 5 23:58:07 2003
> @@ -338,13 +338,13 @@
> socket->erase_busy.next = socket->erase_busy.prev = &socket->erase_busy;
> INIT_LIST_HEAD(&socket->cis_cache);
> spin_lock_init(&socket->lock);
> -
> - init_socket(socket);
> -
> init_completion(&socket->thread_done);
> init_waitqueue_head(&socket->thread_wait);
> init_MUTEX(&socket->skt_sem);
> spin_lock_init(&socket->thread_lock);
> +
> + init_socket(socket);
> +
> ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
> if (ret < 0)
> return ret;
>
> On Friday 04 July 2003 09:10, Michael Frank wrote:
> > modprobe yenta-socket produces oops below _only_ after cold boot and
> > _only_ when e100 loaded. No PCMCIA problems with this system with 2.4 and
> > 2.5 until recent PCMCIA rework.
> >
> > Reproduced behavior with 2.5.73-mm3, 2.5.74, 2.5.74-mm1
> >
> > 2.5.73-mm2 no oops but hangs about 1 in 10 at
> > PCI: Enabling device 0:12.0 (0->2) (PCMCIA). e100 was loaded but not
> > tested wo e100
> >
> > Conditions:
> > Cold-boot - no oops when warm-boot+load after successful load or when
> > unload+load e100 loaded
> >
> > Oops appears 1 in 4 loads and looks similar every time
> >
> > Setup:
> > ACPI core enabled, no usb
> >
> > $ lsmod
> > pcmcia_core
> > toshiba_acpi
> > e100
> >
> > $ lspci
> > 00:00.0 Host bridge: Transmeta Corporation LongRun Northbridge (rev 01)
> > 00:00.1 RAM memory: Transmeta Corporation SDRAM controller
> > 00:00.2 RAM memory: Transmeta Corporation BIOS scratchpad
> > 00:04.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/IX-MV (rev
> > 13) 00:06.0 Multimedia audio controller: ALi Corporation M5451 PCI
> > AC-Link Controller Audio Device (rev 01) 00:07.0 ISA bridge: ALi
> > Corporation M1533 PCI to ISA Bridge [Aladdin IV] 00:0e.0 Ethernet
> > controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08) 00:10.0 IDE
> > interface: ALi Corporation M5229 IDE (rev c3)
> > 00:11.0 Bridge: ALi Corporation M7101 PMU
> > 00:12.0 CardBus bridge: Toshiba America Info Systems ToPIC95 PCI to
> > Cardbus Bridge with ZV Support (rev 32) 00:14.0 USB Controller: ALi
> > Corporation USB 1.1 Controller (rev 03)
> >
> > No serial port, Oops taken from screen
> > unable to handle null pointer dereference at 0
> > oops: 0000 #1
> > EFLAGS 00010086
> > EIP is at __wake_up_common+0x13
> > eax ce09c9c0 ebx 286 ecx 1 edx 0
> > esi 1 edi 0 ebp ccc67dcc esp ccc67dc0
> > ds 7b es 7b ss 68
> > Process modprobe pid 1153 threadinfo ccc66000 task cd68e080
> > Stack: 286 4000001 0 ccc67de8 c011afa1 ce09c9c0 3 1
> > 0 ce09c800 ccc67df0 cf8a3ecf cccc67e04 cf87a7ea ce09c830 80
> > cdffec00 ccc67e24 c010d0aa 5 ce09c800 ccc67e50 280 5
> > Call trace:
> > __wake_up+0x11
> > pcmcia_parse_events+0x23
> > yenta_interrupt+0x26
> > handle_IRQ_event+0x2a
> > do_IRQ+0x82
> > common_interrupt+0x18
> > setup_irq+0x9b
> > yenta_interrupt+0x0
> > request_irq+0x89
> > yenta_probe+0x137
> > yenta_interrupt+0x0
> > pci_device_probe_static+0x20
> > pci_device_probe+0x21
> > bus_match+0x38
> > driver_attach+0x3e
> > bus_add_driver+0x6e
> > driver_register+0x36
> > pci_register_driver+0x6a
> > yenta_socket_init+0xd
> > sys_init_module+0xe0
> > syscall_call+0x7
> > Code: 8b 3a 8b 45 08 83 c0 04 39 c2 74 23 8b 5a f4 8b 4d 14 51 8b
> > <0> Fatal exception in interrupt
> > In interrupt handler - not syncing
> >
> > It is now running allright by starting pcmcia ahead of network.

--
Powered by linux-2.5.74-mm1. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp and ACPI S3
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt

2003-07-06 07:31:05

by Russell King

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

On Sun, Jul 06, 2003 at 11:26:34AM +0800, Michael Frank wrote:
> I got the patch below (not yet tested) from Dominik.

I've already thrown this one out and suggested a cleaner alternative to
Dominik.

I was busy wasting time trying to get an XScale platform up and running
yesterday, and getting nowhere fast. Going nowhere at all is a very
accurate description of yesterdays activities. I suspect the hardware
may have been messed up during transit thanks to various screws missing.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-07-06 13:03:25

by Michael Frank

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

On Sunday 06 July 2003 15:45, Russell King wrote:
> On Sun, Jul 06, 2003 at 11:26:34AM +0800, Michael Frank wrote:
> > I got the patch below (not yet tested) from Dominik.
>
> I've already thrown this one out and suggested a cleaner alternative to
> Dominik.
>

I await the new patch then,

> I was busy wasting time trying to get an XScale platform up and running
> yesterday, and getting nowhere fast. Going nowhere at all is a very
> accurate description of yesterdays activities. I suspect the hardware
> may have been messed up during transit thanks to various screws missing.

Hopefully Monday will be a better day...

Regards
Michael

--
Powered by linux-2.5.74-mm1. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp and ACPI S3
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt

2003-07-06 16:42:28

by Claus-Justus Heine

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

--- yenta_socket.c.old 2003-07-02 22:49:32.000000000 +0200
+++ yenta_socket.c 2003-07-06 16:13:55.000000000 +0200
@@ -426,7 +426,8 @@

events = yenta_events(socket);
if (events) {
- pcmcia_parse_events(&socket->socket, events);
+ if (likely(socket->init_done))
+ pcmcia_parse_events(&socket->socket, events);
return IRQ_HANDLED;
}
return IRQ_NONE;
@@ -501,8 +502,8 @@
socket->socket.features |= SS_CAP_PAGE_REGS | SS_CAP_PCCARD | SS_CAP_CARDBUS;
socket->socket.map_size = 0x1000;
socket->socket.pci_irq = socket->cb_irq;
- socket->socket.irq_mask = yenta_probe_irq(socket, isa_irq_mask);
socket->socket.cb_dev = socket->dev;
+ socket->socket.irq_mask = yenta_probe_irq(socket, isa_irq_mask);

printk("Yenta IRQ list %04x, PCI irq%d\n", socket->socket.irq_mask, socket->cb_irq);
}
@@ -821,6 +822,7 @@
{
struct yenta_socket *socket;
struct cardbus_override_struct *d;
+ int ret;

socket = kmalloc(sizeof(struct yenta_socket), GFP_KERNEL);
if (!socket)
@@ -888,12 +890,18 @@
add_timer(&socket->poll_timer);
}

+ socket->init_done = 0; /* should still be 0, paranoya ... */
+
/* Figure out what the dang thing can do for the PCMCIA layer... */
yenta_get_socket_capabilities(socket, isa_interrupts);
printk("Socket status: %08x\n", cb_readl(socket, CB_SOCKET_STATE));

/* Register it with the pcmcia layer.. */
- return pcmcia_register_socket(&socket->socket);
+ ret = pcmcia_register_socket(&socket->socket);
+ if (ret == 0) {
+ socket->init_done = 1;
+ }
+ return ret;
}


--- yenta_socket.h.old 2003-07-02 22:45:05.000000000 +0200
+++ yenta_socket.h 2003-07-06 16:05:40.000000000 +0200
@@ -103,6 +103,8 @@

struct pcmcia_socket socket;

+ unsigned int init_done:1; /* used during initialization */
+
/* A few words of private data for special stuff of overrides... */
unsigned int private[8];
};


Attachments:
foo.diff (1.82 kB)

2003-07-06 22:01:25

by Russell King

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

On Sun, Jul 06, 2003 at 12:39:34AM +0200, Daniel Ritz wrote:
> problem is that an interrupt arrives before socket->thread_wait is
> initialized so we crash in __wake_up_common. i think source of the
> interrupt is socket_init called before the initialization. but an
> interrupt can still arrive before...

I suspect that even with your patch below, there is the posibility
to receive an unintentional call into pcmcia_parse_events() from some
socket drivers. The all round better fix is to make pcmcia_parse_events()
ignore socket change events until the socket thread is up and running.

Nevertheless, the patch looks correct, so I am still interested in
whether your patch helps solve Michael's problem.

> michael, can you try this one?

Daniel's patch:

--- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
+++ edited/cs.c Sat Jul 5 23:58:07 2003
@@ -338,13 +338,13 @@
socket->erase_busy.next = socket->erase_busy.prev = &socket->erase_busy;
INIT_LIST_HEAD(&socket->cis_cache);
spin_lock_init(&socket->lock);
-
- init_socket(socket);
-
init_completion(&socket->thread_done);
init_waitqueue_head(&socket->thread_wait);
init_MUTEX(&socket->skt_sem);
spin_lock_init(&socket->thread_lock);
+
+ init_socket(socket);
+
ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
if (ret < 0)
return ret;

and my patch (may apply with some offset, which I'm about to check
into bk anyway):

--- linux/drivers/pcmcia/cs.c.old Fri Jul 4 10:21:50 2003
+++ linux/drivers/pcmcia/cs.c Sun Jul 6 23:04:10 2003
@@ -870,11 +870,13 @@

void pcmcia_parse_events(struct pcmcia_socket *s, u_int events)
{
- spin_lock(&s->thread_lock);
- s->thread_events |= events;
- spin_unlock(&s->thread_lock);
+ if (s->thread) {
+ spin_lock(&s->thread_lock);
+ s->thread_events |= events;
+ spin_unlock(&s->thread_lock);

- wake_up(&s->thread_wait);
+ wake_up(&s->thread_wait);
+ }
} /* pcmcia_parse_events */



--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-07-07 01:52:28

by Michael Frank

[permalink] [raw]
Subject: Re: yenta-socket oops with 2.5.73-mm3, 2.5.74, 2.5.74-mm1

Applied both patches without offsets.

Did 20 halt cycles with e100 loaded - no oopses. Seems to
be fixed.

The oops without e100 seems to be fixed too but needs more
watching as it's probability increases when power is off for
longer than the short time in above test where power is switched
on again within a few seconds after the halt.

Regards
Michael


On Monday 07 July 2003 06:15, Russell King wrote:
> On Sun, Jul 06, 2003 at 12:39:34AM +0200, Daniel Ritz wrote:
> > problem is that an interrupt arrives before socket->thread_wait is
> > initialized so we crash in __wake_up_common. i think source of the
> > interrupt is socket_init called before the initialization. but an
> > interrupt can still arrive before...
>
> I suspect that even with your patch below, there is the posibility
> to receive an unintentional call into pcmcia_parse_events() from some
> socket drivers. The all round better fix is to make pcmcia_parse_events()
> ignore socket change events until the socket thread is up and running.
>
> Nevertheless, the patch looks correct, so I am still interested in
> whether your patch helps solve Michael's problem.
>
> > michael, can you try this one?
>
> Daniel's patch:
>
> --- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
> +++ edited/cs.c Sat Jul 5 23:58:07 2003
> @@ -338,13 +338,13 @@
> socket->erase_busy.next = socket->erase_busy.prev = &socket->erase_busy;
> INIT_LIST_HEAD(&socket->cis_cache);
> spin_lock_init(&socket->lock);
> -
> - init_socket(socket);
> -
> init_completion(&socket->thread_done);
> init_waitqueue_head(&socket->thread_wait);
> init_MUTEX(&socket->skt_sem);
> spin_lock_init(&socket->thread_lock);
> +
> + init_socket(socket);
> +
> ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
> if (ret < 0)
> return ret;
>
> and my patch (may apply with some offset, which I'm about to check
> into bk anyway):
>
> --- linux/drivers/pcmcia/cs.c.old Fri Jul 4 10:21:50 2003
> +++ linux/drivers/pcmcia/cs.c Sun Jul 6 23:04:10 2003
> @@ -870,11 +870,13 @@
>
> void pcmcia_parse_events(struct pcmcia_socket *s, u_int events)
> {
> - spin_lock(&s->thread_lock);
> - s->thread_events |= events;
> - spin_unlock(&s->thread_lock);
> + if (s->thread) {
> + spin_lock(&s->thread_lock);
> + s->thread_events |= events;
> + spin_unlock(&s->thread_lock);
>
> - wake_up(&s->thread_wait);
> + wake_up(&s->thread_wait);
> + }
> } /* pcmcia_parse_events */

--
Powered by linux-2.5.74-mm1. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp and ACPI S3
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt

2003-07-10 03:23:52

by Michael Frank

[permalink] [raw]
Subject: 2.5.74-mm3 yenta-socket oops back

2.5.74-mm3 yenta-socket oopsed on the first boot at the same spot.

I have successfully used both patches below with -mm1.

Regards
Michael

On Monday 07 July 2003 06:15, Russell King wrote:

> michael, can you try this one?

Daniel's patch:

--- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
+++ edited/cs.c Sat Jul 5 23:58:07 2003
@@ -338,13 +338,13 @@
socket->erase_busy.next = socket->erase_busy.prev = &socket->erase_busy;
INIT_LIST_HEAD(&socket->cis_cache);
spin_lock_init(&socket->lock);
-
- init_socket(socket);
-
init_completion(&socket->thread_done);
init_waitqueue_head(&socket->thread_wait);
init_MUTEX(&socket->skt_sem);
spin_lock_init(&socket->thread_lock);
+
+ init_socket(socket);
+
ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
if (ret < 0)
return ret;

and my patch (may apply with some offset, which I'm about to check
into bk anyway):

--- linux/drivers/pcmcia/cs.c.old Fri Jul 4 10:21:50 2003
+++ linux/drivers/pcmcia/cs.c Sun Jul 6 23:04:10 2003
@@ -870,11 +870,13 @@

void pcmcia_parse_events(struct pcmcia_socket *s, u_int events)
{
- spin_lock(&s->thread_lock);
- s->thread_events |= events;
- spin_unlock(&s->thread_lock);
+ if (s->thread) {
+ spin_lock(&s->thread_lock);
+ s->thread_events |= events;
+ spin_unlock(&s->thread_lock);

- wake_up(&s->thread_wait);
+ wake_up(&s->thread_wait);
+ }
} /* pcmcia_parse_events */

--
Powered by linux-2.5.74-mm3. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp for 2.4/2.5 and ACPI S3 of 2.5 kernel
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt
More info on swsusp: http://sourceforge.net/projects/swsusp/

2003-07-10 04:20:26

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.74-mm3 yenta-socket oops back

Michael Frank <[email protected]> wrote:
>
> 2.5.74-mm3 yenta-socket oopsed on the first boot at the same spot.
>
> I have successfully used both patches below with -mm1.
>
> --- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
> +++ edited/cs.c Sat Jul 5 23:58:07 2003
> @@ -338,13 +338,13 @@
> socket->erase_busy.next = socket->erase_busy.prev = &socket->erase_busy;
> INIT_LIST_HEAD(&socket->cis_cache);
> spin_lock_init(&socket->lock);
> -
> - init_socket(socket);
> -
> init_completion(&socket->thread_done);
> init_waitqueue_head(&socket->thread_wait);
> init_MUTEX(&socket->skt_sem);
> spin_lock_init(&socket->thread_lock);
> +
> + init_socket(socket);
> +
> ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
> if (ret < 0)
> return ret;
>

This one is clearly correct.

> and my patch (may apply with some offset, which I'm about to check
> into bk anyway):
>
> --- linux/drivers/pcmcia/cs.c.old Fri Jul 4 10:21:50 2003
> +++ linux/drivers/pcmcia/cs.c Sun Jul 6 23:04:10 2003
> @@ -870,11 +870,13 @@
>
> void pcmcia_parse_events(struct pcmcia_socket *s, u_int events)
> {
> - spin_lock(&s->thread_lock);
> - s->thread_events |= events;
> - spin_unlock(&s->thread_lock);
> + if (s->thread) {
> + spin_lock(&s->thread_lock);
> + s->thread_events |= events;
> + spin_unlock(&s->thread_lock);
>
> - wake_up(&s->thread_wait);
> + wake_up(&s->thread_wait);
> + }
> } /* pcmcia_parse_events */

This one may not be. How did we get here with no thread to handle the
event? Do you have an oops trace on this one?

Or just stick a

if (!s->thread)
dump_stack();

in there as well.

2003-07-10 07:03:37

by Michael Frank

[permalink] [raw]
Subject: Re: 2.5.74-mm3 yenta-socket oops back

On Thursday 10 July 2003 12:30, Andrew Morton wrote:
> Michael Frank <[email protected]> wrote:
> > 2.5.74-mm3 yenta-socket oopsed on the first boot at the same spot.
> >
> > I have successfully used both patches below with -mm1.
> >
> > --- 1.50/drivers/pcmcia/cs.c Mon Jun 30 22:22:30 2003
> > +++ edited/cs.c Sat Jul 5 23:58:07 2003
> > @@ -338,13 +338,13 @@
> > socket->erase_busy.next = socket->erase_busy.prev =
> > &socket->erase_busy; INIT_LIST_HEAD(&socket->cis_cache);
> > spin_lock_init(&socket->lock);
> > -
> > - init_socket(socket);
> > -
> > init_completion(&socket->thread_done);
> > init_waitqueue_head(&socket->thread_wait);
> > init_MUTEX(&socket->skt_sem);
> > spin_lock_init(&socket->thread_lock);
> > +
> > + init_socket(socket);
> > +
> > ret = kernel_thread(pccardd, socket, CLONE_KERNEL);
> > if (ret < 0)
> > return ret;
>
> This one is clearly correct.
>
> > and my patch (may apply with some offset, which I'm about to check
> > into bk anyway):
> >
> > --- linux/drivers/pcmcia/cs.c.old Fri Jul 4 10:21:50 2003
> > +++ linux/drivers/pcmcia/cs.c Sun Jul 6 23:04:10 2003
> > @@ -870,11 +870,13 @@
> >
> > void pcmcia_parse_events(struct pcmcia_socket *s, u_int events)
> > {
> > - spin_lock(&s->thread_lock);
> > - s->thread_events |= events;
> > - spin_unlock(&s->thread_lock);
> > + if (s->thread) {
> > + spin_lock(&s->thread_lock);
> > + s->thread_events |= events;
> > + spin_unlock(&s->thread_lock);
> >
> > - wake_up(&s->thread_wait);
> > + wake_up(&s->thread_wait);
> > + }
> > } /* pcmcia_parse_events */
>
> This one may not be. How did we get here with no thread to handle the
> event? Do you have an oops trace on this one?
>

Is called from interrupt handler. Seems that events occur before the
thread is created.

No serial port, Oops taken from screen
unable to handle null pointer dereference at 0
oops: 0000 #1
EFLAGS 00010086
EIP is at __wake_up_common+0x13
eax ce09c9c0 ebx 286 ecx 1 edx 0
esi 1 edi 0 ebp ccc67dcc esp ccc67dc0
ds 7b es 7b ss 68
Process modprobe pid 1153 threadinfo ccc66000 task cd68e080
Stack: 286 4000001 0 ccc67de8 c011afa1 ce09c9c0 3 1
0 ce09c800 ccc67df0 cf8a3ecf cccc67e04 cf87a7ea ce09c830 80
cdffec00 ccc67e24 c010d0aa 5 ce09c800 ccc67e50 280 5
Call trace:
__wake_up+0x11
pcmcia_parse_events+0x23
yenta_interrupt+0x26
handle_IRQ_event+0x2a
do_IRQ+0x82
common_interrupt+0x18
setup_irq+0x9b
yenta_interrupt+0x0
request_irq+0x89
yenta_probe+0x137
yenta_interrupt+0x0
pci_device_probe_static+0x20
pci_device_probe+0x21
bus_match+0x38
driver_attach+0x3e
bus_add_driver+0x6e
driver_register+0x36
pci_register_driver+0x6a
yenta_socket_init+0xd
sys_init_module+0xe0
syscall_call+0x7
Code: 8b 3a 8b 45 08 83 c0 04 39 c2 74 23 8b 5a f4 8b 4d 14 51 8b
<0> Fatal exception in interrupt
In interrupt handler - not syncing


> Or just stick a
>
> if (!s->thread)
> dump_stack();
>
> in there as well.

Applied rmk's patch and the above to -mm3 and send the stack trace once obtained.

Regards
Michael


--
Powered by linux-2.5.74-mm3. Compiled with gcc-2.95-3 - mature and rock solid

My current linux related activities:
- 2.5 yenta_socket testing
- Test development and testing of swsusp for 2.4/2.5 and ACPI S3 of 2.5 kernel
- Everyday usage of 2.5 kernel

More info on 2.5 kernel: http://www.codemonkey.org.uk/post-halloween-2.5.txt
More info on swsusp: http://sourceforge.net/projects/swsusp/

2003-07-10 07:47:19

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.74-mm3 yenta-socket oops back

Michael Frank <[email protected]> wrote:
>
> Is called from interrupt handler. Seems that events occur before the
> thread is created.

OK, fair enough. The interrupt probe happens before the socket
registration. It needs a /* comment */


2003-07-10 09:54:22

by Russell King

[permalink] [raw]
Subject: Re: 2.5.74-mm3 yenta-socket oops back

On Wed, Jul 09, 2003 at 09:30:10PM -0700, Andrew Morton wrote:
> This one may not be. How did we get here with no thread to handle the
> event? Do you have an oops trace on this one?

It's correct. Had the fan in my server not died last night, I'd have
gotten some of these fixes to Linus. God how I hate anything with fans
in. They're the number one cause of failure.

The problem is that the interrupts are claimed before pcmcia has been
properly initialised, so the cs.c-private bits of pcmcia_socket aren't
setup.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html