2009-07-21 22:00:24

by Dave Hansen

[permalink] [raw]
Subject: [RFCv2][PATCH] flexible array implementation



Changes from v1:
- to vs too typo
- added __check_part_and_nr() and gave it a warning
- fixed off-by-one check on __nr_part_ptrs()
- addedFLEX_ARRAY_INIT() macro
- some kerneldoc comments about the capacity
with various sized objects
- comments to note lack of locking semantice

--

Once a structure goes over PAGE_SIZE*2, we see occasional
allocation failures. Some people have chosen to switch
over to things like vmalloc() that will let them keep
array-like access to such a large structures. But,
vmalloc() has plenty of downsides.

Here's an alternative. I think it's what Andrew was
suggesting here:

http://lkml.org/lkml/2009/7/2/518

I call it a flexible array. It does all of its work in
PAGE_SIZE bits, so never does an order>0 allocation.
The base level has PAGE_SIZE-2*sizeof(int) bytes of
storage for pointers to the second level. So, with a
32-bit arch, you get about 4MB (4183112 bytes) of total
storage when the objects pack nicely into a page. It
is half that on 64-bit because the pointers are twice
the size.

The interface is dirt simple. 4 functions:
alloc_flex_array()
free_flex_array()
flex_array_put()
flex_array_get()

put() appends an item into the array while get() takes
indexes and does array-style access.

One thought is that we should perhaps make the base
structure half the size on 32-bit arches. That will
ensure that someone testing on 32-bit will not get
bitten by the size shrinking by half when moving to
64-bit.

We could also potentially just pass the "element_size"
into each of the API functions instead of storing it
internally. That would get us one more base pointer
on 32-bit.

The last improvement that I thought about was letting
the individual array members span pages. In this
implementation, if you have a 2049-byte object, it
will only pack one of them into each "part" with
no attempt to pack them. At this point, I don't think
the added complexity would be worth it.

Signed-off-by: Dave Hansen <[email protected]>
---

linux-2.6.git-dave/include/linux/flex_array.h | 45 +++++
linux-2.6.git-dave/lib/Makefile | 2
linux-2.6.git-dave/lib/flex_array.c | 230 ++++++++++++++++++++++++++
3 files changed, 276 insertions(+), 1 deletion(-)

diff -puN /dev/null include/linux/flex_array.h
--- /dev/null 2008-09-02 09:40:19.000000000 -0700
+++ linux-2.6.git-dave/include/linux/flex_array.h 2009-07-21 14:55:35.000000000 -0700
@@ -0,0 +1,45 @@
+#ifndef _FLEX_ARRAY_H
+#define _FLEX_ARRAY_H
+
+#include <linux/types.h>
+#include <asm/page.h>
+
+#define FLEX_ARRAY_PART_SIZE PAGE_SIZE
+#define FLEX_ARRAY_BASE_SIZE PAGE_SIZE
+
+struct flex_array_part;
+
+/*
+ * This is meant too replace cases where an array-like
+ * structure has gotten to big to fit into kmalloc()
+ * and the developer is getting tempted to use
+ * vmalloc().
+ */
+
+struct flex_array {
+ union {
+ struct {
+ int nr_elements;
+ int element_size;
+ struct flex_array_part *parts[0];
+ };
+ /*
+ * This little trick makes sure that
+ * sizeof(flex_array) == PAGE_SIZE
+ */
+ char padding[FLEX_ARRAY_BASE_SIZE];
+ };
+};
+
+#define FLEX_ARRAY_INIT(size, total) {{{\
+ .element_size = (size), \
+ .nr_elements = 0, \
+}}}
+
+struct flex_array *flex_array_alloc(int element_size, int total, gfp_t flags);
+void flex_array_free(struct flex_array *fa);
+int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags);
+int flex_array_append(struct flex_array *fa, void *src, gfp_t flags);
+void *flex_array_get(struct flex_array *fa, int element_nr);
+
+#endif /* _FLEX_ARRAY_H */
diff -puN /dev/null lib/flex_array.c
--- /dev/null 2008-09-02 09:40:19.000000000 -0700
+++ linux-2.6.git-dave/lib/flex_array.c 2009-07-21 14:52:09.000000000 -0700
@@ -0,0 +1,230 @@
+/*
+ * Flexible array managed in PAGE_SIZE parts
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2009
+ *
+ * Author: Dave Hansen <[email protected]>
+ */
+
+#include <linux/flex_array.h>
+#include <linux/slab.h>
+#include <linux/stddef.h>
+
+struct flex_array_part {
+ char elements[FLEX_ARRAY_PART_SIZE];
+};
+
+static inline int __elements_per_part(int element_size)
+{
+ return FLEX_ARRAY_PART_SIZE / element_size;
+}
+
+static inline int __nr_part_ptrs(void)
+{
+ int element_offset = offsetof(struct flex_array, parts);
+ int bytes_left = FLEX_ARRAY_BASE_SIZE - element_offset;
+ return bytes_left / sizeof(struct flex_array_part *);
+}
+
+/**
+ * flex_array_alloc - allocate a new flexible array
+ * @element_size: the size of individual elements in the array
+ * @total: total number of elements that this should hold
+ *
+ * Note: all locking must be provided by the caller.
+ *
+ * We do not actually use @total to size the allocation at this
+ * point. It is just used to ensure that the user does not try
+ * to use this structure for something larger than it can handle
+ * later on.
+ *
+ * The maximum number of elements is defined as: the number of
+ * elements that can be stored in a page times the number of
+ * page pointers that we can fit in the base structure or (using
+ * integer math):
+ *
+ * (PAGE_SIZE/element_size) * (PAGE_SIZE-8)/sizeof(void *)
+ *
+ * Here's a table showing example capacities. Note that the maximum
+ * index that the get/put() functions is just nr_objects-1.
+ *
+ * Element size | Objects | Objects |
+ * PAGE_SIZE=4k | 32-bit | 64-bit |
+ * ----------------------------------|
+ * 1 byte | 4186112 | 2093056 |
+ * 2 bytes | 2093056 | 1046528 |
+ * 3 bytes | 1395030 | 697515 |
+ * 4 bytes | 1046528 | 523264 |
+ * 32 bytes | 130816 | 65408 |
+ * 33 bytes | 126728 | 63364 |
+ * 2048 bytes | 2044 | 10228 |
+ * 2049 bytes | 1022 | 511 |
+ * void * | 1046528 | 261632 |
+ *
+ * Since 64-bit pointers are twice the size, we lose half the
+ * capacity in the base structure. Also note that no effort is made
+ * to efficiently pack objects across page boundaries.
+ */
+struct flex_array *flex_array_alloc(int element_size, int total, gfp_t flags)
+{
+ struct flex_array *ret;
+ int max_size = __nr_part_ptrs() * __elements_per_part(element_size);
+
+ /* max_size will end up 0 if element_size > PAGE_SIZE */
+ if (total > max_size)
+ return NULL;
+ ret = kzalloc(sizeof(struct flex_array), flags);
+ if (!ret)
+ return NULL;
+ ret->element_size = element_size;
+ return ret;
+}
+
+static int fa_element_to_part_nr(struct flex_array *fa, int element_nr)
+{
+ return element_nr / __elements_per_part(fa->element_size);
+}
+
+void flex_array_free(struct flex_array *fa)
+{
+ int part_nr;
+ int max_part;
+
+ /* keeps us from getting the index of -1 below */
+ if (!fa->nr_elements)
+ goto free_base;
+
+ /* we really want the *index* of the last element, thus the -1 */
+ max_part = fa_element_to_part_nr(fa, fa->nr_elements-1);
+ for (part_nr = 0; part_nr <= max_part; part_nr++)
+ kfree(fa->parts[part_nr]);
+free_base:
+ kfree(fa);
+}
+
+static int fa_index_inside_part(struct flex_array *fa, int element_nr)
+{
+ return (element_nr % __elements_per_part(fa->element_size));
+}
+
+static int offset_inside_part(struct flex_array *fa, int element_nr)
+{
+ int part_offset = fa_index_inside_part(fa, element_nr);
+ return part_offset * fa->element_size;
+}
+
+static int __check_part_and_nr(struct flex_array *fa,
+ int part_nr, int element_nr)
+{
+ if (part_nr >= __nr_part_ptrs() ||
+ element_nr > fa->nr_elements) {
+ WARN(1, "bad flexible array element number: %d > %d\n",
+ element_nr, fa->nr_elements);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static struct flex_array_part *
+__fa_get_part(struct flex_array *fa, int part_nr, gfp_t flags)
+{
+ struct flex_array_part *part = NULL;
+ if (__check_part_and_nr(fa, part_nr, fa->nr_elements))
+ return NULL;
+ part = fa->parts[part_nr];
+ if (!part) {
+ part = kmalloc(FLEX_ARRAY_PART_SIZE, flags);
+ if (!part)
+ return NULL;
+ fa->parts[part_nr] = part;
+ }
+ return part;
+}
+
+/**
+ * flex_array_put - copy data into the array at @element_nr
+ * @src: address of data to copy into the array
+ * @element_nr: index of the position in which to insert
+ * the new element.
+ *
+ * Note that this *copies* the contents of @src into
+ * the array. If you are trying to store an array of
+ * pointers, make sure to pass in &ptr instead of ptr.
+ *
+ * Locking must be provided by the caller.
+ */
+int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags)
+{
+ int part_nr = fa_element_to_part_nr(fa, element_nr);
+ struct flex_array_part *part;
+ void *dst;
+
+ part = __fa_get_part(fa, part_nr, flags);
+ if (!part)
+ return -ENOMEM;
+ dst = &part->elements[offset_inside_part(fa, element_nr)];
+ memcpy(dst, src, fa->element_size);
+ return 0;
+}
+
+/**
+ * flex_array_append - append a new member into the array
+ * @src: address of data to copy into the array
+ *
+ * This will use the internally-remembered last position in
+ * the array to choose an insertion point.
+ *
+ * Note that this *copies* the contents of @src into
+ * the array. If you are trying to store an array of
+ * pointers, make sure to pass in &ptr instead of ptr.
+ *
+ * Locking must be provided by the caller.
+ */
+int flex_array_append(struct flex_array *fa, void *src, gfp_t flags)
+{
+ int ret = flex_array_put(fa, fa->nr_elements, src, flags);
+ if (!ret)
+ fa->nr_elements++;
+ return ret;
+}
+
+
+/**
+ * flex_array_get - pull data back out of the array
+ * @element_nr: index of the element to fetch from the array
+ *
+ * Returns a pointer to the data at index @element_nr. Note
+ * that this is a copy of the data that was passed in. If you
+ * are using this to store pointers, you'll get back &ptr.
+ *
+ * Locking must be provided by the caller.
+ */
+void *flex_array_get(struct flex_array *fa, int element_nr)
+{
+ int part_nr = fa_element_to_part_nr(fa, element_nr);
+ struct flex_array_part *part;
+ int offset;
+
+ if (__check_part_and_nr(fa, part_nr, fa->nr_elements))
+ return NULL;
+ if (!fa->parts[part_nr])
+ return NULL;
+
+ part = fa->parts[part_nr];
+ offset = offset_inside_part(fa, element_nr);
+ return &part->elements[offset_inside_part(fa, element_nr)];
+}
diff -puN lib/Makefile~fa lib/Makefile
--- linux-2.6.git/lib/Makefile~fa 2009-07-21 14:42:37.000000000 -0700
+++ linux-2.6.git-dave/lib/Makefile 2009-07-21 14:42:37.000000000 -0700
@@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmd
idr.o int_sqrt.o extable.o prio_tree.o \
sha1.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o prio_heap.o ratelimit.o show_mem.o \
- is_single_threaded.o plist.o decompress.o
+ is_single_threaded.o plist.o decompress.o flex_array.o

lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
diff -puN lib/radix-tree.c~fa lib/radix-tree.c
diff -puN ./include/linux/radix-tree.h~fa ./include/linux/radix-tree.h
_


2009-07-21 22:09:10

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Tue, 2009-07-21 at 15:00 -0700, Dave Hansen wrote:
>
> The interface is dirt simple. 4 functions:
> alloc_flex_array()
> free_flex_array()
> flex_array_put()
> flex_array_get()
>
> put() appends an item into the array while get() takes
> indexes and does array-style access.

I need to update this description, but the kerneldoc comments are up to
date.

That reminds me... People will get somewhat weird behavior if they mix
flex_array_append() and flex_array_put(). Is that OK? Should
flex_array_put() modify ->nr_elements to point to the element past the
one that was just put()? Should we perhaps drop the append() function
and the ->nr_elements variable completely?

-- Dave

2009-07-21 22:36:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Tue, 21 Jul 2009 15:09:05 -0700
Dave Hansen <[email protected]> wrote:

> On Tue, 2009-07-21 at 15:00 -0700, Dave Hansen wrote:
> >
> > The interface is dirt simple. 4 functions:
> > alloc_flex_array()
> > free_flex_array()
> > flex_array_put()
> > flex_array_get()
> >
> > put() appends an item into the array while get() takes
> > indexes and does array-style access.
>
> I need to update this description, but the kerneldoc comments are up to
> date.
>
> That reminds me... People will get somewhat weird behavior if they mix
> flex_array_append() and flex_array_put(). Is that OK? Should
> flex_array_put() modify ->nr_elements to point to the element past the
> one that was just put()? Should we perhaps drop the append() function
> and the ->nr_elements variable completely?

I'd say that we can drop ->append. C arrays don't have an `append', and
callers trivially append stuff to arrays all the time. `for (i = 0; i < ....'

2009-07-22 03:26:34

by Li Zefan

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

> +/**
> + * flex_array_put - copy data into the array at @element_nr
> + * @src: address of data to copy into the array
> + * @element_nr: index of the position in which to insert
> + * the new element.

@fa and @flags are not documented.

> + *
> + * Note that this *copies* the contents of @src into
> + * the array. If you are trying to store an array of
> + * pointers, make sure to pass in &ptr instead of ptr.
> + *
> + * Locking must be provided by the caller.
> + */
> +int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags)
> +{
> + int part_nr = fa_element_to_part_nr(fa, element_nr);
> + struct flex_array_part *part;
> + void *dst;
> +
> + part = __fa_get_part(fa, part_nr, flags);
> + if (!part)
> + return -ENOMEM;

So this may allocate memory, and has disavantages:

- If flex_array_put() is called in atomic context, flags has to be GFP_ATOMIC.
- and thus it may fail.

Since we pass the total_elem to flex_array_alloc(), how about add a flag,
and if the flag is set, the alloc() will also allocate all fa_parts?

And add __flex_array_put(), which assumes fa_parts has been allocated.

> + dst = &part->elements[offset_inside_part(fa, element_nr)];
> + memcpy(dst, src, fa->element_size);
> + return 0;
> +}
> +

2009-07-22 04:35:57

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 11:25 +0800, Li Zefan wrote:
> > +/**
> > + * flex_array_put - copy data into the array at @element_nr
> > + * @src: address of data to copy into the array
> > + * @element_nr: index of the position in which to insert
> > + * the new element.
>
> @fa and @flags are not documented.

True... But one of my pet peeves are kerneldocs like this:

@fa: the flex array
@flags: GFP flags

It's so trivially obvious from looking at the types and the variable
names that I'm not sure it's worth the cost of the lines.

> > + *
> > + * Note that this *copies* the contents of @src into
> > + * the array. If you are trying to store an array of
> > + * pointers, make sure to pass in &ptr instead of ptr.
> > + *
> > + * Locking must be provided by the caller.
> > + */
> > +int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags)
> > +{
> > + int part_nr = fa_element_to_part_nr(fa, element_nr);
> > + struct flex_array_part *part;
> > + void *dst;
> > +
> > + part = __fa_get_part(fa, part_nr, flags);
> > + if (!part)
> > + return -ENOMEM;
>
> So this may allocate memory, and has disavantages:
>
> - If flex_array_put() is called in atomic context, flags has to be GFP_ATOMIC.
> - and thus it may fail.
>
> Since we pass the total_elem to flex_array_alloc(), how about add a flag,
> and if the flag is set, the alloc() will also allocate all fa_parts?
>
> And add __flex_array_put(), which assumes fa_parts has been allocated.

How about flex_array_prealloc()? It seems to work for all the radix
tree users.

-- Dave

2009-07-22 06:16:26

by Li Zefan

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

12:34, Dave Hansen wrote:
> On Wed, 2009-07-22 at 11:25 +0800, Li Zefan wrote:
>>> +/**
>>> + * flex_array_put - copy data into the array at @element_nr
>>> + * @src: address of data to copy into the array
>>> + * @element_nr: index of the position in which to insert
>>> + * the new element.
>> @fa and @flags are not documented.
>
> True... But one of my pet peeves are kerneldocs like this:
>
> @fa: the flex array
> @flags: GFP flags
>
> It's so trivially obvious from looking at the types and the variable
> names that I'm not sure it's worth the cost of the lines.
>

I'm not kernel-doc expert, but ./scripts/kernel-doc will warn
on this. And from time to time, we receive patches to fix
kernel-doc.

>>> + *
>>> + * Note that this *copies* the contents of @src into
>>> + * the array. If you are trying to store an array of
>>> + * pointers, make sure to pass in &ptr instead of ptr.
>>> + *
>>> + * Locking must be provided by the caller.
>>> + */
>>> +int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags)
>>> +{
>>> + int part_nr = fa_element_to_part_nr(fa, element_nr);
>>> + struct flex_array_part *part;
>>> + void *dst;
>>> +
>>> + part = __fa_get_part(fa, part_nr, flags);
>>> + if (!part)
>>> + return -ENOMEM;
>> So this may allocate memory, and has disavantages:
>>
>> - If flex_array_put() is called in atomic context, flags has to be GFP_ATOMIC.
>> - and thus it may fail.
>>
>> Since we pass the total_elem to flex_array_alloc(), how about add a flag,
>> and if the flag is set, the alloc() will also allocate all fa_parts?
>>
>> And add __flex_array_put(), which assumes fa_parts has been allocated.
>
> How about flex_array_prealloc()? It seems to work for all the radix
> tree users.
>

I have no strong opinion. I just want a non-fail version of
flex_array_put() (I mean "void __flex_array_put()").

2009-07-22 07:06:55

by Cong Wang

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Tue, Jul 21, 2009 at 03:00:17PM -0700, Dave Hansen wrote:
>
>
>Changes from v1:
>- to vs too typo
>- added __check_part_and_nr() and gave it a warning
>- fixed off-by-one check on __nr_part_ptrs()
>- addedFLEX_ARRAY_INIT() macro
>- some kerneldoc comments about the capacity
> with various sized objects
>- comments to note lack of locking semantice
>
>--
>
>Once a structure goes over PAGE_SIZE*2, we see occasional
>allocation failures. Some people have chosen to switch
>over to things like vmalloc() that will let them keep
>array-like access to such a large structures. But,
>vmalloc() has plenty of downsides.
>
>Here's an alternative. I think it's what Andrew was
>suggesting here:
>
> http://lkml.org/lkml/2009/7/2/518
>
>I call it a flexible array. It does all of its work in
>PAGE_SIZE bits, so never does an order>0 allocation.
>The base level has PAGE_SIZE-2*sizeof(int) bytes of
>storage for pointers to the second level. So, with a
>32-bit arch, you get about 4MB (4183112 bytes) of total
>storage when the objects pack nicely into a page. It
>is half that on 64-bit because the pointers are twice
>the size.
>
>The interface is dirt simple. 4 functions:
> alloc_flex_array()
> free_flex_array()
> flex_array_put()
> flex_array_get()
>
>put() appends an item into the array while get() takes
>indexes and does array-style access.
>
>One thought is that we should perhaps make the base
>structure half the size on 32-bit arches. That will
>ensure that someone testing on 32-bit will not get
>bitten by the size shrinking by half when moving to
>64-bit.
>
>We could also potentially just pass the "element_size"
>into each of the API functions instead of storing it
>internally. That would get us one more base pointer
>on 32-bit.
>
>The last improvement that I thought about was letting
>the individual array members span pages. In this
>implementation, if you have a 2049-byte object, it
>will only pack one of them into each "part" with
>no attempt to pack them. At this point, I don't think
>the added complexity would be worth it.
>
>Signed-off-by: Dave Hansen <[email protected]>
>---
>
> linux-2.6.git-dave/include/linux/flex_array.h | 45 +++++
> linux-2.6.git-dave/lib/Makefile | 2
> linux-2.6.git-dave/lib/flex_array.c | 230 ++++++++++++++++++++++++++
> 3 files changed, 276 insertions(+), 1 deletion(-)
>
>diff -puN /dev/null include/linux/flex_array.h
>--- /dev/null 2008-09-02 09:40:19.000000000 -0700
>+++ linux-2.6.git-dave/include/linux/flex_array.h 2009-07-21 14:55:35.000000000 -0700
>@@ -0,0 +1,45 @@
>+#ifndef _FLEX_ARRAY_H
>+#define _FLEX_ARRAY_H
>+
>+#include <linux/types.h>
>+#include <asm/page.h>
>+
>+#define FLEX_ARRAY_PART_SIZE PAGE_SIZE
>+#define FLEX_ARRAY_BASE_SIZE PAGE_SIZE
>+
>+struct flex_array_part;
>+
>+/*
>+ * This is meant too replace cases where an array-like


s/too/to/


>+ * structure has gotten to big to fit into kmalloc()


s/to big/too big/


>+ * and the developer is getting tempted to use
>+ * vmalloc().
>+ */
>+
>+struct flex_array {
>+ union {
>+ struct {
>+ int nr_elements;
>+ int element_size;
>+ struct flex_array_part *parts[0];
>+ };
>+ /*
>+ * This little trick makes sure that
>+ * sizeof(flex_array) == PAGE_SIZE
>+ */
>+ char padding[FLEX_ARRAY_BASE_SIZE];
>+ };
>+};
>+
>+#define FLEX_ARRAY_INIT(size, total) {{{\
>+ .element_size = (size), \
>+ .nr_elements = 0, \
>+}}}
>+
>+struct flex_array *flex_array_alloc(int element_size, int total, gfp_t flags);
>+void flex_array_free(struct flex_array *fa);
>+int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags);
>+int flex_array_append(struct flex_array *fa, void *src, gfp_t flags);
>+void *flex_array_get(struct flex_array *fa, int element_nr);
>+
>+#endif /* _FLEX_ARRAY_H */
>diff -puN /dev/null lib/flex_array.c
>--- /dev/null 2008-09-02 09:40:19.000000000 -0700
>+++ linux-2.6.git-dave/lib/flex_array.c 2009-07-21 14:52:09.000000000 -0700
>@@ -0,0 +1,230 @@
>+/*
>+ * Flexible array managed in PAGE_SIZE parts
>+ *
>+ * This program is free software; you can redistribute it and/or modify
>+ * it under the terms of the GNU General Public License as published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+ * GNU General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public License
>+ * along with this program; if not, write to the Free Software
>+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
>+ *
>+ * Copyright IBM Corporation, 2009
>+ *
>+ * Author: Dave Hansen <[email protected]>
>+ */
>+
>+#include <linux/flex_array.h>
>+#include <linux/slab.h>
>+#include <linux/stddef.h>
>+
>+struct flex_array_part {
>+ char elements[FLEX_ARRAY_PART_SIZE];
>+};
>+
>+static inline int __elements_per_part(int element_size)
>+{
>+ return FLEX_ARRAY_PART_SIZE / element_size;
>+}
>+
>+static inline int __nr_part_ptrs(void)


How about __nr_ptrs_in_part()?


>+{
>+ int element_offset = offsetof(struct flex_array, parts);
>+ int bytes_left = FLEX_ARRAY_BASE_SIZE - element_offset;
>+ return bytes_left / sizeof(struct flex_array_part *);
>+}
>+
>+/**
>+ * flex_array_alloc - allocate a new flexible array
>+ * @element_size: the size of individual elements in the array
>+ * @total: total number of elements that this should hold
>+ *
>+ * Note: all locking must be provided by the caller.
>+ *
>+ * We do not actually use @total to size the allocation at this
>+ * point. It is just used to ensure that the user does not try
>+ * to use this structure for something larger than it can handle
>+ * later on.
>+ *
>+ * The maximum number of elements is defined as: the number of
>+ * elements that can be stored in a page times the number of
>+ * page pointers that we can fit in the base structure or (using
>+ * integer math):
>+ *
>+ * (PAGE_SIZE/element_size) * (PAGE_SIZE-8)/sizeof(void *)
>+ *
>+ * Here's a table showing example capacities. Note that the maximum
>+ * index that the get/put() functions is just nr_objects-1.
>+ *
>+ * Element size | Objects | Objects |
>+ * PAGE_SIZE=4k | 32-bit | 64-bit |
>+ * ----------------------------------|
>+ * 1 byte | 4186112 | 2093056 |
>+ * 2 bytes | 2093056 | 1046528 |
>+ * 3 bytes | 1395030 | 697515 |
>+ * 4 bytes | 1046528 | 523264 |
>+ * 32 bytes | 130816 | 65408 |
>+ * 33 bytes | 126728 | 63364 |
>+ * 2048 bytes | 2044 | 10228 |
>+ * 2049 bytes | 1022 | 511 |
>+ * void * | 1046528 | 261632 |
>+ *
>+ * Since 64-bit pointers are twice the size, we lose half the
>+ * capacity in the base structure. Also note that no effort is made
>+ * to efficiently pack objects across page boundaries.
>+ */
>+struct flex_array *flex_array_alloc(int element_size, int total, gfp_t flags)
>+{
>+ struct flex_array *ret;
>+ int max_size = __nr_part_ptrs() * __elements_per_part(element_size);
>+
>+ /* max_size will end up 0 if element_size > PAGE_SIZE */
>+ if (total > max_size)
>+ return NULL;
>+ ret = kzalloc(sizeof(struct flex_array), flags);
>+ if (!ret)
>+ return NULL;
>+ ret->element_size = element_size;
>+ return ret;
>+}
>+
>+static int fa_element_to_part_nr(struct flex_array *fa, int element_nr)
>+{
>+ return element_nr / __elements_per_part(fa->element_size);
>+}
>+
>+void flex_array_free(struct flex_array *fa)
>+{
>+ int part_nr;
>+ int max_part;
>+
>+ /* keeps us from getting the index of -1 below */
>+ if (!fa->nr_elements)
>+ goto free_base;
>+
>+ /* we really want the *index* of the last element, thus the -1 */
>+ max_part = fa_element_to_part_nr(fa, fa->nr_elements-1);
>+ for (part_nr = 0; part_nr <= max_part; part_nr++)
>+ kfree(fa->parts[part_nr]);
>+free_base:
>+ kfree(fa);
>+}
>+
>+static int fa_index_inside_part(struct flex_array *fa, int element_nr)
>+{
>+ return (element_nr % __elements_per_part(fa->element_size));
>+}
>+
>+static int offset_inside_part(struct flex_array *fa, int element_nr)



How about index_in_part()?


>+{
>+ int part_offset = fa_index_inside_part(fa, element_nr);
>+ return part_offset * fa->element_size;
>+}
>+
>+static int __check_part_and_nr(struct flex_array *fa,
>+ int part_nr, int element_nr)
>+{
>+ if (part_nr >= __nr_part_ptrs() ||
>+ element_nr > fa->nr_elements) {
>+ WARN(1, "bad flexible array element number: %d > %d\n",
>+ element_nr, fa->nr_elements);
>+ return -EINVAL;
>+ }
>+ return 0;
>+}
>+
>+static struct flex_array_part *
>+__fa_get_part(struct flex_array *fa, int part_nr, gfp_t flags)
>+{
>+ struct flex_array_part *part = NULL;
>+ if (__check_part_and_nr(fa, part_nr, fa->nr_elements))
>+ return NULL;
>+ part = fa->parts[part_nr];
>+ if (!part) {
>+ part = kmalloc(FLEX_ARRAY_PART_SIZE, flags);
>+ if (!part)
>+ return NULL;
>+ fa->parts[part_nr] = part;
>+ }
>+ return part;
>+}
>+
>+/**
>+ * flex_array_put - copy data into the array at @element_nr
>+ * @src: address of data to copy into the array
>+ * @element_nr: index of the position in which to insert
>+ * the new element.
>+ *
>+ * Note that this *copies* the contents of @src into
>+ * the array. If you are trying to store an array of
>+ * pointers, make sure to pass in &ptr instead of ptr.
>+ *
>+ * Locking must be provided by the caller.
>+ */
>+int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags)
>+{
>+ int part_nr = fa_element_to_part_nr(fa, element_nr);
>+ struct flex_array_part *part;
>+ void *dst;
>+
>+ part = __fa_get_part(fa, part_nr, flags);
>+ if (!part)
>+ return -ENOMEM;
>+ dst = &part->elements[offset_inside_part(fa, element_nr)];
>+ memcpy(dst, src, fa->element_size);
>+ return 0;
>+}
>+
>+/**
>+ * flex_array_append - append a new member into the array
>+ * @src: address of data to copy into the array
>+ *
>+ * This will use the internally-remembered last position in
>+ * the array to choose an insertion point.
>+ *
>+ * Note that this *copies* the contents of @src into
>+ * the array. If you are trying to store an array of
>+ * pointers, make sure to pass in &ptr instead of ptr.
>+ *
>+ * Locking must be provided by the caller.
>+ */
>+int flex_array_append(struct flex_array *fa, void *src, gfp_t flags)
>+{
>+ int ret = flex_array_put(fa, fa->nr_elements, src, flags);
>+ if (!ret)
>+ fa->nr_elements++;


This looks ugly...

Why not just ++nr_elements in flex_array_put()?


>+ return ret;
>+}
>+
>+
>+/**
>+ * flex_array_get - pull data back out of the array
>+ * @element_nr: index of the element to fetch from the array
>+ *
>+ * Returns a pointer to the data at index @element_nr. Note
>+ * that this is a copy of the data that was passed in. If you
>+ * are using this to store pointers, you'll get back &ptr.
>+ *
>+ * Locking must be provided by the caller.
>+ */
>+void *flex_array_get(struct flex_array *fa, int element_nr)
>+{
>+ int part_nr = fa_element_to_part_nr(fa, element_nr);
>+ struct flex_array_part *part;
>+ int offset;
>+
>+ if (__check_part_and_nr(fa, part_nr, fa->nr_elements))
>+ return NULL;
>+ if (!fa->parts[part_nr])
>+ return NULL;
>+
>+ part = fa->parts[part_nr];
>+ offset = offset_inside_part(fa, element_nr);
>+ return &part->elements[offset_inside_part(fa, element_nr)];


'->nr_element' is corrected?


>+}
>diff -puN lib/Makefile~fa lib/Makefile
>--- linux-2.6.git/lib/Makefile~fa 2009-07-21 14:42:37.000000000 -0700
>+++ linux-2.6.git-dave/lib/Makefile 2009-07-21 14:42:37.000000000 -0700
>@@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmd
> idr.o int_sqrt.o extable.o prio_tree.o \
> sha1.o irq_regs.o reciprocal_div.o argv_split.o \
> proportions.o prio_heap.o ratelimit.o show_mem.o \
>- is_single_threaded.o plist.o decompress.o
>+ is_single_threaded.o plist.o decompress.o flex_array.o
>
> lib-$(CONFIG_MMU) += ioremap.o
> lib-$(CONFIG_SMP) += cpumask.o
>diff -puN lib/radix-tree.c~fa lib/radix-tree.c
>diff -puN ./include/linux/radix-tree.h~fa ./include/linux/radix-tree.h

huh?? What is this?

2009-07-22 15:03:06

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 15:09 +0800, Amerigo Wang wrote:
> On Tue, Jul 21, 2009 at 03:00:17PM -0700, Dave Hansen wrote:
> >+static inline int __nr_part_ptrs(void)
>
> How about __nr_ptrs_in_part()?

That would be fine except it is the number of part pointers in the base.
I guess you're proving that I named it horribly. :)

> >+static int fa_index_inside_part(struct flex_array *fa, int element_nr)
> >+{
> >+ return (element_nr % __elements_per_part(fa->element_size));
> >+}
> >+
> >+static int offset_inside_part(struct flex_array *fa, int element_nr)
>
> How about index_in_part()?

Yeah, I guess that's decent. I'll go see how it feels when it gets
used.

-- Dave

2009-07-22 19:45:40

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Tue, Jul 21, 2009 at 03:00:17PM -0700, Dave Hansen wrote:
>
>
> Changes from v1:
> - to vs too typo
> - added __check_part_and_nr() and gave it a warning
> - fixed off-by-one check on __nr_part_ptrs()
> - addedFLEX_ARRAY_INIT() macro
> - some kerneldoc comments about the capacity
> with various sized objects
> - comments to note lack of locking semantice
>
> --
>
> Once a structure goes over PAGE_SIZE*2, we see occasional
> allocation failures. Some people have chosen to switch
> over to things like vmalloc() that will let them keep
> array-like access to such a large structures. But,
> vmalloc() has plenty of downsides.
>
> Here's an alternative. I think it's what Andrew was
> suggesting here:
>
> http://lkml.org/lkml/2009/7/2/518
>
> I call it a flexible array. It does all of its work in
> PAGE_SIZE bits, so never does an order>0 allocation.
> The base level has PAGE_SIZE-2*sizeof(int) bytes of
> storage for pointers to the second level. So, with a
> 32-bit arch, you get about 4MB (4183112 bytes) of total
> storage when the objects pack nicely into a page. It
> is half that on 64-bit because the pointers are twice
> the size.
>
> The interface is dirt simple. 4 functions:
> alloc_flex_array()
> free_flex_array()
> flex_array_put()
> flex_array_get()
>
> put() appends an item into the array while get() takes
> indexes and does array-style access.
>
> One thought is that we should perhaps make the base
> structure half the size on 32-bit arches. That will
> ensure that someone testing on 32-bit will not get
> bitten by the size shrinking by half when moving to
> 64-bit.
>
> We could also potentially just pass the "element_size"
> into each of the API functions instead of storing it
> internally. That would get us one more base pointer
> on 32-bit.
>
> The last improvement that I thought about was letting
> the individual array members span pages. In this
> implementation, if you have a 2049-byte object, it
> will only pack one of them into each "part" with
> no attempt to pack them. At this point, I don't think
> the added complexity would be worth it.
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux-2.6.git-dave/include/linux/flex_array.h | 45 +++++
> linux-2.6.git-dave/lib/Makefile | 2
> linux-2.6.git-dave/lib/flex_array.c | 230 ++++++++++++++++++++++++++
> 3 files changed, 276 insertions(+), 1 deletion(-)
>
> diff -puN /dev/null include/linux/flex_array.h
> --- /dev/null 2008-09-02 09:40:19.000000000 -0700
> +++ linux-2.6.git-dave/include/linux/flex_array.h 2009-07-21 14:55:35.000000000 -0700
> @@ -0,0 +1,45 @@
> +#ifndef _FLEX_ARRAY_H
> +#define _FLEX_ARRAY_H
> +
> +#include <linux/types.h>
> +#include <asm/page.h>
> +
> +#define FLEX_ARRAY_PART_SIZE PAGE_SIZE
> +#define FLEX_ARRAY_BASE_SIZE PAGE_SIZE
> +
> +struct flex_array_part;
> +
> +/*
> + * This is meant too replace cases where an array-like
> + * structure has gotten to big to fit into kmalloc()
> + * and the developer is getting tempted to use
> + * vmalloc().
> + */
> +
> +struct flex_array {
> + union {
> + struct {
> + int nr_elements;
> + int element_size;
> + struct flex_array_part *parts[0];
> + };
> + /*
> + * This little trick makes sure that
> + * sizeof(flex_array) == PAGE_SIZE
> + */
> + char padding[FLEX_ARRAY_BASE_SIZE];
> + };
> +};
> +
> +#define FLEX_ARRAY_INIT(size, total) {{{\
> + .element_size = (size), \
> + .nr_elements = 0, \
> +}}}
> +
> +struct flex_array *flex_array_alloc(int element_size, int total, gfp_t flags);
> +void flex_array_free(struct flex_array *fa);
> +int flex_array_put(struct flex_array *fa, int element_nr, void *src, gfp_t flags);
> +int flex_array_append(struct flex_array *fa, void *src, gfp_t flags);
> +void *flex_array_get(struct flex_array *fa, int element_nr);
> +
> +#endif /* _FLEX_ARRAY_H */
> diff -puN /dev/null lib/flex_array.c
> --- /dev/null 2008-09-02 09:40:19.000000000 -0700
> +++ linux-2.6.git-dave/lib/flex_array.c 2009-07-21 14:52:09.000000000 -0700
> @@ -0,0 +1,230 @@
> +/*
> + * Flexible array managed in PAGE_SIZE parts
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright IBM Corporation, 2009
> + *
> + * Author: Dave Hansen <[email protected]>
> + */
> +
> +#include <linux/flex_array.h>
> +#include <linux/slab.h>
> +#include <linux/stddef.h>
> +
> +struct flex_array_part {
> + char elements[FLEX_ARRAY_PART_SIZE];
> +};
> +
> +static inline int __elements_per_part(int element_size)
> +{
> + return FLEX_ARRAY_PART_SIZE / element_size;
> +}
> +
> +static inline int __nr_part_ptrs(void)
> +{
> + int element_offset = offsetof(struct flex_array, parts);
> + int bytes_left = FLEX_ARRAY_BASE_SIZE - element_offset;
> + return bytes_left / sizeof(struct flex_array_part *);
> +}
> +
> +/**
> + * flex_array_alloc - allocate a new flexible array
> + * @element_size: the size of individual elements in the array
> + * @total: total number of elements that this should hold
> + *
> + * Note: all locking must be provided by the caller.
> + *
> + * We do not actually use @total to size the allocation at this
> + * point. It is just used to ensure that the user does not try
> + * to use this structure for something larger than it can handle
> + * later on.
> + *
> + * The maximum number of elements is defined as: the number of
> + * elements that can be stored in a page times the number of
> + * page pointers that we can fit in the base structure or (using
> + * integer math):
> + *
> + * (PAGE_SIZE/element_size) * (PAGE_SIZE-8)/sizeof(void *)
> + *
> + * Here's a table showing example capacities. Note that the maximum
> + * index that the get/put() functions is just nr_objects-1.
> + *
> + * Element size | Objects | Objects |
> + * PAGE_SIZE=4k | 32-bit | 64-bit |
> + * ----------------------------------|
> + * 1 byte | 4186112 | 2093056 |
> + * 2 bytes | 2093056 | 1046528 |
> + * 3 bytes | 1395030 | 697515 |
> + * 4 bytes | 1046528 | 523264 |
> + * 32 bytes | 130816 | 65408 |
> + * 33 bytes | 126728 | 63364 |
> + * 2048 bytes | 2044 | 10228 |
> + * 2049 bytes | 1022 | 511 |
> + * void * | 1046528 | 261632 |
> + *
> + * Since 64-bit pointers are twice the size, we lose half the
> + * capacity in the base structure. Also note that no effort is made
> + * to efficiently pack objects across page boundaries.
> + */
> +struct flex_array *flex_array_alloc(int element_size, int total, gfp_t flags)
> +{
> + struct flex_array *ret;
> + int max_size = __nr_part_ptrs() * __elements_per_part(element_size);
> +
> + /* max_size will end up 0 if element_size > PAGE_SIZE */
> + if (total > max_size)
> + return NULL;
> + ret = kzalloc(sizeof(struct flex_array), flags);
> + if (!ret)
> + return NULL;
> + ret->element_size = element_size;
> + return ret;
> +}
> +
> +static int fa_element_to_part_nr(struct flex_array *fa, int element_nr)
> +{
> + return element_nr / __elements_per_part(fa->element_size);
> +}
> +
> +void flex_array_free(struct flex_array *fa)
> +{
> + int part_nr;
> + int max_part;
> +
> + /* keeps us from getting the index of -1 below */
> + if (!fa->nr_elements)
> + goto free_base;
> +
> + /* we really want the *index* of the last element, thus the -1 */
> + max_part = fa_element_to_part_nr(fa, fa->nr_elements-1);
> + for (part_nr = 0; part_nr <= max_part; part_nr++)
> + kfree(fa->parts[part_nr]);
> +free_base:
> + kfree(fa);
> +}
> +
> +static int fa_index_inside_part(struct flex_array *fa, int element_nr)
> +{
> + return (element_nr % __elements_per_part(fa->element_size));
> +}
> +
> +static int offset_inside_part(struct flex_array *fa, int element_nr)
> +{
> + int part_offset = fa_index_inside_part(fa, element_nr);
> + return part_offset * fa->element_size;
> +}
> +
> +static int __check_part_and_nr(struct flex_array *fa,
> + int part_nr, int element_nr)
> +{
> + if (part_nr >= __nr_part_ptrs() ||
> + element_nr > fa->nr_elements) {
> + WARN(1, "bad flexible array element number: %d > %d\n",
> + element_nr, fa->nr_elements);
> + return -EINVAL;
> + }
> + return 0;
> +}

Should the above be inline? Does it make sense to optimize the "common"
case and penalize inappropriate access by putting an unlikely() in
there? Or is it too early for this stuff?

I wonder how the *, /, and % ops will affect things that otherwise
would have been reduced to shifts and masks -- especially on the
"smaller" embedded archs.

Cheers,
-Matt Helsley

2009-07-22 19:56:20

by Mike Waychison

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

Dave Hansen wrote:
> Changes from v1:
> - to vs too typo
> - added __check_part_and_nr() and gave it a warning
> - fixed off-by-one check on __nr_part_ptrs()
> - addedFLEX_ARRAY_INIT() macro
> - some kerneldoc comments about the capacity
> with various sized objects
> - comments to note lack of locking semantice
>
> --
>
> Once a structure goes over PAGE_SIZE*2, we see occasional
> allocation failures. Some people have chosen to switch
> over to things like vmalloc() that will let them keep
> array-like access to such a large structures. But,
> vmalloc() has plenty of downsides.
>
> Here's an alternative. I think it's what Andrew was
> suggesting here:
>
> http://lkml.org/lkml/2009/7/2/518
>
> I call it a flexible array. It does all of its work in
> PAGE_SIZE bits, so never does an order>0 allocation.
> The base level has PAGE_SIZE-2*sizeof(int) bytes of
> storage for pointers to the second level. So, with a
> 32-bit arch, you get about 4MB (4183112 bytes) of total
> storage when the objects pack nicely into a page. It
> is half that on 64-bit because the pointers are twice
> the size.
>
> The interface is dirt simple. 4 functions:
> alloc_flex_array()
> free_flex_array()
> flex_array_put()
> flex_array_get()
>
> put() appends an item into the array while get() takes
> indexes and does array-style access.
>
> One thought is that we should perhaps make the base
> structure half the size on 32-bit arches. That will
> ensure that someone testing on 32-bit will not get
> bitten by the size shrinking by half when moving to
> 64-bit.
>
> We could also potentially just pass the "element_size"
> into each of the API functions instead of storing it
> internally. That would get us one more base pointer
> on 32-bit.
>
> The last improvement that I thought about was letting
> the individual array members span pages. In this
> implementation, if you have a 2049-byte object, it
> will only pack one of them into each "part" with
> no attempt to pack them. At this point, I don't think
> the added complexity would be worth it.
>
> Signed-off-by: Dave Hansen <[email protected]>
> ---
>
> linux-2.6.git-dave/include/linux/flex_array.h | 45 +++++
> linux-2.6.git-dave/lib/Makefile | 2
> linux-2.6.git-dave/lib/flex_array.c | 230 ++++++++++++++++++++++++++
> 3 files changed, 276 insertions(+), 1 deletion(-)
>
> diff -puN /dev/null include/linux/flex_array.h
> --- /dev/null 2008-09-02 09:40:19.000000000 -0700
> +++ linux-2.6.git-dave/include/linux/flex_array.h 2009-07-21 14:55:35.000000000 -0700
> @@ -0,0 +1,45 @@
> +#ifndef _FLEX_ARRAY_H
> +#define _FLEX_ARRAY_H
> +
> +#include <linux/types.h>
> +#include <asm/page.h>
> +
> +#define FLEX_ARRAY_PART_SIZE PAGE_SIZE
> +#define FLEX_ARRAY_BASE_SIZE PAGE_SIZE
> +
> +struct flex_array_part;
> +
> +/*
> + * This is meant too replace cases where an array-like
> + * structure has gotten to big to fit into kmalloc()
> + * and the developer is getting tempted to use
> + * vmalloc().
> + */
> +
> +struct flex_array {
> + union {
> + struct {
> + int nr_elements;
> + int element_size;
> + struct flex_array_part *parts[0];
> + };
> + /*
> + * This little trick makes sure that
> + * sizeof(flex_array) == PAGE_SIZE
> + */
> + char padding[FLEX_ARRAY_BASE_SIZE];
> + };
> +};
> +
> +#define FLEX_ARRAY_INIT(size, total) {{{\
> + .element_size = (size), \
> + .nr_elements = 0, \
> +}}}
> +

It's not clear how this guy is used. It will initialize a flex_array,
but how is somebody expected to free the parts that get associated with it?

Is there a fancy way to make declaring a flex_array on stack a
compile-time error?

2009-07-22 20:57:22

by Ben Blum

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

Does the array automatically grow if you give it more elements than
you tell it it can have? How about a resize() function that can be
used to either grow or shrink the array?

2009-07-22 21:55:42

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 13:57 -0700, Benjamin Blum wrote:
> Does the array automatically grow if you give it more elements than
> you tell it it can have?

The only limits it has or enforces are the structural and architectural
ones dictated by the layout.

> How about a resize() function that can be
> used to either grow or shrink the array?

I think growing is out of the question. It has a fixed maximum size
already. As for shrinking, there's probably a use case for when
something is large, then shrinks back down. But, I think I'd want to
see a user for it, otherwise I'm just guessing at it too much.

-- Dave

2009-07-22 22:00:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 15:55 -0400, Mike Waychison wrote:
> > +#define FLEX_ARRAY_INIT(size, total) {{{\
> > + .element_size = (size), \
> > + .nr_elements = 0, \
> > +}}}
> > +
>
> It's not clear how this guy is used. It will initialize a flex_array,
> but how is somebody expected to free the parts that get associated with it?

I tried to make that a bit more clear with the kerneldocs.
flex_array_free_parts() will free just the "parts" without touching the
"base" structure. Otherwise, this macro is used in the same way as
RADIX_TREE_INIT().

> Is there a fancy way to make declaring a flex_array on stack a
> compile-time error?

Not that I know of. One thing that I did which makes this a _bit_
easier to detect is that sizeof(struct flex_array)==PAGE_SIZE instead of
just leaving a "void members[0]" at the end of the struct. That means
that scripts/checkstack.pl should spot these.

I just double-checked:

$ objdump -d ../mhp-build/i386-qemu-smp/vmlinux | scripts/checkstack.pl i386
0xc02499d8 min_free_kbytes_sysctl_handler [vmlinux]: 4096
0xc0276619 do_sys_poll [vmlinux]: 896
0xc0276b49 do_select [vmlinux]: 704
...

That's after I added a 'struct flex_array f;' in
min_free_kbytes_sysctl_handler().

I'm happy to hear of anybody else's tricks, though. I guess I could do
a WARN_ONCE() in some of the flex_array*() functions if we detect an
address that looks to be on the stack. But, I can't think of any fancy
compile-time ones.

-- Dave

2009-07-22 22:03:47

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 11:30 -0700, Matt Helsley wrote:
> > +static int __check_part_and_nr(struct flex_array *fa,
> > + int part_nr, int element_nr)
> > +{
> > + if (part_nr >= __nr_part_ptrs() ||
> > + element_nr > fa->nr_elements) {
> > + WARN(1, "bad flexible array element number: %d > %d\n",
> > + element_nr, fa->nr_elements);
> > + return -EINVAL;
> > + }
> > + return 0;
> > +}
>
> Should the above be inline? Does it make sense to optimize the "common"
> case and penalize inappropriate access by putting an unlikely() in
> there? Or is it too early for this stuff?

I think I'll leave it to the compiler for now. Since we also don't have
a single user, I don't think we have an idea how hot of a path this
might get used in.

> I wonder how the *, /, and % ops will affect things that otherwise
> would have been reduced to shifts and masks -- especially on the
> "smaller" embedded archs.

I'm generally fine with rounding all these sizes to powers-of-two. But,
I do think it's a wee bit premature at this point.

-- Dave

2009-07-22 23:21:03

by Ben Blum

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, Jul 22, 2009 at 2:51 PM, Dave Hansen<[email protected]> wrote:
> On Wed, 2009-07-22 at 13:57 -0700, Benjamin Blum wrote:
>> Does the array automatically grow if you give it more elements than
>> you tell it it can have?
>
> The only limits it has or enforces are the structural and architectural
> ones dictated by the layout.
>
>> How about a resize() function that can be
>> used to either grow or shrink the array?
>
> I think growing is out of the question. ?It has a fixed maximum size
> already. ?As for shrinking, there's probably a use case for when
> something is large, then shrinks back down. ?But, I think I'd want to
> see a user for it, otherwise I'm just guessing at it too much.
>
> -- Dave
>
>

Check out the reallocate logic in pidlist_uniq from my patch (the
series you linked).

2009-07-23 02:46:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On 07/21/2009 03:00 PM, Dave Hansen wrote:
>
> Here's an alternative. I think it's what Andrew was
> suggesting here:
>
> http://lkml.org/lkml/2009/7/2/518
>
> I call it a flexible array. It does all of its work in
> PAGE_SIZE bits, so never does an order>0 allocation.
> The base level has PAGE_SIZE-2*sizeof(int) bytes of
> storage for pointers to the second level. So, with a
> 32-bit arch, you get about 4MB (4183112 bytes) of total
> storage when the objects pack nicely into a page. It
> is half that on 64-bit because the pointers are twice
> the size.
>

I'm wondering if there is any use case which would require scaling below
the PAGE_SIZE level... in which case it would be nice for it to
gracefully decay to a single kmalloc allocation + some metadata.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-07-23 05:41:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 16:20 -0700, Benjamin Blum wrote:
>
> >> How about a resize() function that can be
> >> used to either grow or shrink the array?
> >
> > I think growing is out of the question. It has a fixed maximum size
> > already. As for shrinking, there's probably a use case for when
> > something is large, then shrinks back down. But, I think I'd want to
> > see a user for it, otherwise I'm just guessing at it too much.
>
> Check out the reallocate logic in pidlist_uniq from my patch (the
> series you linked).

To me, it seems like you should just replace the cgroup_pidlist->list
with a 'struct flex_array *'. It sounds like you're concerned that a
large cgroup_pidlist that was later shrunk would take up too much space.

That's a valid concern and it would be quite possible to make a
flex_array_clean() or truncate() or something similar. Such a function,
given an index, could clean out the array at all points past the given
index. Kinda the opposite of prealloc().

At the same time, you could get the same effect by allocating a new flex
array and doing copies like you are now with the normal arrays.

-- Dave

2009-07-23 14:46:41

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFCv2][PATCH] flexible array implementation

On Wed, 2009-07-22 at 19:45 -0700, H. Peter Anvin wrote:
> > I call it a flexible array. It does all of its work in
> > PAGE_SIZE bits, so never does an order>0 allocation.
> > The base level has PAGE_SIZE-2*sizeof(int) bytes of
> > storage for pointers to the second level. So, with a
> > 32-bit arch, you get about 4MB (4183112 bytes) of total
> > storage when the objects pack nicely into a page. It
> > is half that on 64-bit because the pointers are twice
> > the size.
>
> I'm wondering if there is any use case which would require scaling below
> the PAGE_SIZE level... in which case it would be nice for it to
> gracefully decay to a single kmalloc allocation + some metadata.

For the pid_t case that actually spawned all this, I think it makes a
ton of sense. It should be pretty easy to just cannibalize the
base->part[] pointers to use for data if (total*element_size <=
bytes_allocated - sizeof(metadata)) in the base. We'd have to store the
requested allocation total, but that's not bad.

1. Do we use this mode *only* when using a small 'total'? Or, do we
use the mode only when accessing a small range of elements, like
indexes < 511.
2. Do we just use the mode when the indexes are large, but not sparse?
Say, when someone is only using indexes 1024->1535?
3. Do we ever allocate less than 1 page (i.e. kmalloc(2048)) for the
base given a small enough 'total' requested? If so, it gets much
harder to grow the array later on if we want to support resizing.
4. Do we ever promote under the covers from this mode up to the full
two-level one?

This also gives us more of a reason to store a flex_array->total and
enforce that the array is not allowed to grow past that.

-- Dave