From: Rusty Russell <rusty@rustcorp.com.au>
To: davidm@hpl.hp.com
Subject: Re: [PATCH] per-cpu change 
Cc: linux-kernel@vger.kernel.org
In-reply-to: Your message of "Fri, 02 May 2003 09:51:30 MST."
             <16050.41490.798865.517036@napali.hpl.hp.com> 
Date: Sat, 03 May 2003 14:40:57 +1000
Message-Id: <20030503044617.DE0642C003@lists.samba.org>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2314
Lines: 58

In message <16050.41490.798865.517036@napali.hpl.hp.com> you write:
>  - On NUMA, you want to allocate the per-CPU areas in node-local memory.

Thanks: I'd forgotten about this.  Even "big smp" machines often have
different memory latencies.

I need to allow archs to override the core allocation.

>  - It's important to initialize the per-CPU area early enough.  We
>    currently do it in cpu_init().  Anything later would probably break
>    things.

Yep, I imagine you'd explicitly call setup_per_cpu_areas() in
cpu_init.  (the later call will do nothing).

> I am a little concerned with supporting per-CPU storage in modules.
> What happens if the per-CPU area fills up?  With the normal kernel,
> you'll get a link-time error, but with modules, you won't know until
> you try to insert a module.

Yes.  In fact, you get a boot error not a link error in my
implementation.

My previous implementation used a linked-list of memory chunks, like
so:

	Core kernel percpu	First dynamic percpu     Second dynamic percpu
	+----+----+---...-+  -> +----+----+---...-+  -> +----+----+---...-+ ->
	 CPU0 CPU1 CPU2...

You can then use the same __per_cpu_offset[], as the spacing between
each cpu's variables in the dynamically allocated sections is the same
as in the core kernel.  You can __alloc_percpu any size up to the
initial inter-cpu spacing (which was equal to the initial amount of
static per-cpu data, or 4k, whatever is more).

But if the cpu's variables are not consecutive (ie your NUMA point),
this scheme breaks down: you'd need to allocate areas with the same
spacing between them, which won't happen.

So I think the current, simpler choice is preferable.

> But the alternative of using something similar to the user-level TLS
> model doesn't look exactly attractive either.

Yes, I spoke with Alan Modra about what he had to do on PPC64, and ran
screaming.  We can live with more constraints, in the kernel, I think,
especially when the result is so fast and simple.

Thanks,
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/