2002-11-13 03:41:27

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH][2.5] Remove BUG in cpu_up

I think a BUG here is a bit on the extreme side, we already have a running
processor (in boot i'd presume its the BSP) so we can afford to limp on.
At runtime a stopped/dead processor which refuses to come back up
shouldn't make the kernel oops.

Zwane

Index: linux-2.5.47/kernel/cpu.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.47/kernel/cpu.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 cpu.c
--- linux-2.5.47/kernel/cpu.c 11 Nov 2002 03:59:33 -0000 1.1.1.1
+++ linux-2.5.47/kernel/cpu.c 13 Nov 2002 03:37:37 -0000
@@ -35,13 +35,11 @@
return ret;

if (cpu_online(cpu)) {
- ret = -EINVAL;
+ ret = -EBUSY;
goto out;
}
ret = notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu);
if (ret == NOTIFY_BAD) {
- printk("%s: attempt to bring up CPU %u failed\n",
- __FUNCTION__, cpu);
ret = -EINVAL;
goto out_notify;
}
@@ -50,16 +48,22 @@
ret = __cpu_up(cpu);
if (ret != 0)
goto out_notify;
- if (!cpu_online(cpu))
- BUG();
+
+ if (!cpu_online(cpu)) {
+ ret = -EIO;
+ goto out_notify;
+ }

/* Now call notifier in preparation. */
- printk("CPU %u IS NOW UP!\n", cpu);
+ printk(KERN_INFO "CPU %u IS NOW UP!\n", cpu);
notifier_call_chain(&cpu_chain, CPU_ONLINE, hcpu);

out_notify:
- if (ret != 0)
+ if (ret != 0) {
+ printk(KERN_WARNING "%s: attempt to bring up CPU %u failed\n",
+ __FUNCTION__, cpu);
notifier_call_chain(&cpu_chain, CPU_UP_CANCELED, hcpu);
+ }
out:
up(&cpucontrol);
return ret;
--
function.linuxpower.ca


2002-11-13 09:31:52

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH][2.5] Remove BUG in cpu_up

In message <Pine.LNX.4.44.0211122236270.24523-100000@montezuma.mastecende.com>
you write:
> ret = __cpu_up(cpu);
> if (ret != 0)
> goto out_notify;
> - if (!cpu_online(cpu))
> - BUG();
> +
> + if (!cpu_online(cpu)) {
> + ret = -EIO;
> + goto out_notify;
> + }

Err, no. If __cpu_up(cpu) succeeded, that means the cpu should bloody
well be online!

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-13 13:16:55

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.5] Remove BUG in cpu_up

On Wed, 13 Nov 2002, Rusty Russell wrote:

> Err, no. If __cpu_up(cpu) succeeded, that means the cpu should bloody
> well be online!

smp startup looks rather convoluted to me right now, but if i see it
correctly, __cpu_up should eventually be doing a wakeup_secondary_via_INIT
on vanilla i386 correct? In that case, the processor accepting the IPI
doesn't necessarily mean it will have managed to initialise (if at all) itself by
the time you do that cpu_online check, the wakeup_secondary_via_INIT will
simply tell you wether you succeeded in sending the IPI. There are i386
systems which take considerably long to do that AP initialisation
procedure. I still reckon the most you should do there is specify
PENDING with the cpu in question sending an ONLINE notification when it
finally does all init.

Zwane

--
function.linuxpower.ca

2002-11-14 01:30:26

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH][2.5] Remove BUG in cpu_up

In message <Pine.LNX.4.44.0211130804380.24523-100000@montezuma.mastecende.com>
you write:
> On Wed, 13 Nov 2002, Rusty Russell wrote:
>
> > Err, no. If __cpu_up(cpu) succeeded, that means the cpu should bloody
> > well be online!
>
> smp startup looks rather convoluted to me right now, but if i see it
> correctly, __cpu_up should eventually be doing a wakeup_secondary_via_INIT
> on vanilla i386 correct? In that case, the processor accepting the IPI
> doesn't necessarily mean it will have managed to initialise (if at all) itsel
f by

It is bloody convoluted. Hmm, the arch needs to wait before returning
"success" on __cpu_up.

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-14 03:21:14

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.5] Remove BUG in cpu_up

On Thu, 14 Nov 2002, Rusty Russell wrote:

> It is bloody convoluted. Hmm, the arch needs to wait before returning
> "success" on __cpu_up.

What if the processor never comes up? Whats wrong with doing this async?

Zwane
--
function.linuxpower.ca

2002-11-14 04:02:30

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH][2.5] Remove BUG in cpu_up

In message <Pine.LNX.4.44.0211132217580.24523-100000@montezuma.mastecende.com>
you write:
> On Thu, 14 Nov 2002, Rusty Russell wrote:
>
> > It is bloody convoluted. Hmm, the arch needs to wait before returning
> > "success" on __cpu_up.
>
> What if the processor never comes up? Whats wrong with doing this async?

What's wrong with doing it sync? Are you in a hurry? 8)

That's what the return code is *for*...
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-14 22:23:44

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.5] Remove BUG in cpu_up

On Thu, 14 Nov 2002, Rusty Russell wrote:

> What's wrong with doing it sync? Are you in a hurry? 8)
>
> That's what the return code is *for*...
> Rusty.

Yes, i'd rather a box limp along until i can come up with a solution
rather than it sit there indefinitely waiting for a processor which has
decided to go on early retirement ;)

But i feel like i'm going round in circles, anyone else with opinions on
this?

Zwane
--
function.linuxpower.ca