Hello,
At the moment, data specific to a CPU is stored in different, fixed-
size separate arrays by means of the "percpu framework". I'm working
on some changes to modify the way some CPUs are represented, and I'm
wondering what's the rationale behind such a representation.
At first sight, it'd seem more reasonable to have a structure holding
all the information that is CPU-specific (as is done with any object
represented within the system). After searching the mail archives, I
see that similar changes were proposed before, but those threads did
not seem to get any reply (so I'm assuming that the changes were not
desired).
Similarly, and if I understood it correctly, the PDA (Per-processor
Data Area) also aims to do the above, but at the moment it only
contains some fields and is not defined in all platforms. There are
still a lot of usages of the percpu functionality (such as, e.g., in
kernel/sched.c).
Part of my changes introduce a new structure that is able to
represent any kind of CPU (and which each platform can extend to add
new information to it). It is supposed to supersede the per-cpu
definitions. I bet this could also be redone by using percpu in some
way... The thing is I am willing to share my work when I've finished
it (it is still very much work-in-progress), but first I'm interested
to know if adding this new structure is a crazy idea (meaning I
should stick to percpu wherever possible) or something that can be
accepted later on.
Summarizing, my questions are:
- Why is the code currently using multiple separate arrays (percpu)
to hold CPU information instead of a structure?
- Could this structure-based approach (instead of all these separate
arrays) be considered for inclusion into the system?
As far as I can tell, the advantage of percpu is that you can define
new "fields" anywhere in the code and independently from the rest of
the system. Also, I seem to understand that there are performance
advantages related to this. But on the other hand, percpu seems like
an unnatural approach to "reimplement" regular structures.
Thank you very much.
--
Julio M. Merino Vidal <[email protected]>
"Julio M. Merino Vidal" <[email protected]> writes:
> Similarly, and if I understood it correctly, the PDA (Per-processor
> Data Area) also aims to do the above, but at the moment it only
> contains some fields and is not defined in all platforms. There are
> still a lot of usages of the percpu functionality (such as, e.g., in
> kernel/sched.c).
PDA is an earlier version of percpu; it still can be more efficiently
accessed so it is kept for some low level code.
> As far as I can tell, the advantage of percpu is that you can define
> new "fields" anywhere in the code and independently from the rest of
> the system.
- Independent maintenance as you noted
- Fast access and relatively compact code
- Avoids false sharing by keeping cache lines of different CPUs separate
- Doesn't waste a lot of memory in padding like NR_CPUs arrays usually
need to to avoid the previous point.
Any replacement that doesn't have these properties too will probably
be not useful.
-Andi
Andi Kleen wrote:
>> As far as I can tell, the advantage of percpu is that you can define
>> new "fields" anywhere in the code and independently from the rest of
>> the system.
>>
>
> - Independent maintenance as you noted
> - Fast access and relatively compact code
> - Avoids false sharing by keeping cache lines of different CPUs separate
> - Doesn't waste a lot of memory in padding like NR_CPUs arrays usually
> need to to avoid the previous point.
>
> Any replacement that doesn't have these properties too will probably
> be not useful.
>
Thank you for the details. I'll try to stick to per-cpu wherever
possible for now.
Anyway, what do you think about adding the above text to the code (percpu.h
maybe) as documentation? See the patch below. (Dunno if the Signed-off-by
line is appropriate as most of the text is yours.)
Signed-off-by: Julio M. Merino Vidal <[email protected]>
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 600e3d3..b8e8b8c 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -1,6 +1,21 @@
#ifndef __LINUX_PERCPU_H
#define __LINUX_PERCPU_H
+/*
+ * percpu provides a mechanism to define variables that are specific to
each
+ * CPU in the system.
+ *
+ * Each variable is defined as an independent array of NR_CPUS elements.
+ * This approach is used instead of a per-CPU structure because it has the
+ * following advantages:
+ * - Independent maintenance: a source file can define new per-CPU
+ * variables without distorting others.
+ * - Fast access and relatively compact code.
+ * - Avoids false sharing by keeping cache lines of different CPUs
separate.
+ * - Doesn't waste a lot of memory in padding like NR_CPUs arrays usually
+ * need to to avoid the previous point.
+ */
+
#include <linux/spinlock.h> /* For preempt_disable() */
#include <linux/slab.h> /* For kmalloc() */
#include <linux/smp.h>
Kind regards.
On Fri, 04 May 2007 10:36:37 +0200
"Julio M. Merino Vidal" <[email protected]> wrote:
>
> Anyway, what do you think about adding the above text to the code (percpu.h
> maybe) as documentation? See the patch below. (Dunno if the Signed-off-by
> line is appropriate as most of the text is yours.)
>
> Signed-off-by: Julio M. Merino Vidal <[email protected]>
>
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 600e3d3..b8e8b8c 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -1,6 +1,21 @@
> #ifndef __LINUX_PERCPU_H
> #define __LINUX_PERCPU_H
>
> +/*
> + * percpu provides a mechanism to define variables that are specific to
> each
> + * CPU in the system.
> + *
> + * Each variable is defined as an independent array of NR_CPUS elements.
> + * This approach is used instead of a per-CPU structure because it has the
> + * following advantages:
> + * - Independent maintenance: a source file can define new per-CPU
> + * variables without distorting others.
> + * - Fast access and relatively compact code.
> + * - Avoids false sharing by keeping cache lines of different CPUs
> separate.
> + * - Doesn't waste a lot of memory in padding like NR_CPUs arrays usually
> + * need to to avoid the previous point.
> + */
> +
> #include <linux/spinlock.h> /* For preempt_disable() */
> #include <linux/slab.h> /* For kmalloc() */
> #include <linux/smp.h>
Documentation is good, and percpu probably misses one, but please add it in a Documentation/percpu.txt file, because it's the right place.
You then can really have an extensive documentation, and you wont slow down kernel compiles...
I suggest you document all variants (get_cpu_var(), __get_cpu_var(), ...) with examples of use
Also, please note that per cpu data is not allocated * NR_CPUS, but depends on possible cpus. So if you boot an SMP kernel on a one CPU desktop, kernel allocates only the needed space.
So per_cpu data has also a space saving argument against structures declared with [NR_CPUS] arrays.
Thank you
> +/*
> + * percpu provides a mechanism to define variables that are specific to
> each
> + * CPU in the system.
> + *
> + * Each variable is defined as an independent array of NR_CPUS elements.
The independent array term seems misleading to me. There isn't really
an array anywhere. Perhaps explain it in more details.
If you write documentation please write it in Kerneldoc format so
that it can be automatically extracted. See
Documentation/kernel-doc-nano-HOWTO.txt
-Andi