2010-11-12 11:28:34

by Richard Kennedy

[permalink] [raw]
Subject: [PATCH/RFC] MM slub: add a sysfs entry to show the calculated number of fallback slabs

Add a slub sysfs entry to show the calculated number of fallback slabs.

Using the information already available it is straightforward to
calculate the number of fallback & full size slabs. We can then track
which slabs are particularly effected by memory fragmentation and how
long they take to recover.

There is no change to the mainline code, the calculation is only
performed on request, and the value is available without having to
enable CONFIG_SLUB_STATS.

Note that this could give the wrong value if the user changes the slab
order via the sysfs interface.

Signed-off-by: Richard Kennedy <[email protected]>
---


As we have the information needed to do this calculation is seem useful
to expose it and provide another way to understand what is happening
inside the memory manager.

On my desktop workloads (kernel compile etc) I'm seeing surprisingly
little slab fragmentation. Do you have any suggestions for test cases
that will fragment the memory?

I copied the code to count the total objects from the slabinfo s_show
function, but as I don't need the partial count I didn't extract it into
a helper function.

regards
Richard


diff --git a/mm/slub.c b/mm/slub.c
index 8fd5401..8c79eaa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4043,6 +4043,46 @@ static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
}
SLAB_ATTR_RO(destroy_by_rcu);

+/* The number of fallback slabs can be calculated to give an
+ * indication of how fragmented this slab is.
+ * This is a snapshot of the current makeup of this cache.
+ *
+ * Given
+ *
+ * total_objects = (nr_fallback_slabs * objects_per_fallback_slab) +
+ * ( nr_normal_slabs * objects_per_slab)
+ * and
+ * nr_slabs = nr_normal_slabs + nr_fallback_slabs
+ *
+ * then we can easily calculate nr_fallback_slabs.
+ *
+ * Note that this can give the wrong answer if the user has changed the
+ * order of this slab via sysfs.
+ */
+
+static ssize_t fallback_show(struct kmem_cache *s, char *buf)
+{
+ unsigned long nr_objects = 0;
+ unsigned long nr_slabs = 0;
+ unsigned long nr_fallback = 0;
+ unsigned long acc;
+ int node;
+
+ if (oo_order(s->oo) != oo_order(s->min)) {
+ for_each_online_node(node) {
+ struct kmem_cache_node *n = get_node(s, node);
+ nr_slabs += atomic_long_read(&n->nr_slabs);
+ nr_objects += atomic_long_read(&n->total_objects);
+ }
+ acc = nr_objects - nr_slabs * oo_objects(s->min);
+ acc /= (oo_objects(s->oo) - oo_objects(s->min));
+ nr_fallback = nr_slabs - acc;
+ }
+ return sprintf(buf, "%lu\n", nr_fallback);
+}
+SLAB_ATTR_RO(fallback);
+
+
#ifdef CONFIG_SLUB_DEBUG
static ssize_t slabs_show(struct kmem_cache *s, char *buf)
{
@@ -4329,6 +4369,7 @@ static struct attribute *slab_attrs[] = {
&reclaim_account_attr.attr,
&destroy_by_rcu_attr.attr,
&shrink_attr.attr,
+ &fallback_attr.attr,
#ifdef CONFIG_SLUB_DEBUG
&total_objects_attr.attr,
&slabs_attr.attr,


Subject: Re: [PATCH/RFC] MM slub: add a sysfs entry to show the calculated number of fallback slabs

On Fri, 12 Nov 2010, Richard Kennedy wrote:

> On my desktop workloads (kernel compile etc) I'm seeing surprisingly
> little slab fragmentation. Do you have any suggestions for test cases
> that will fragment the memory?

Do a massive scan through huge amounts of files that triggers inode and
dentry reclaim?

> + * Note that this can give the wrong answer if the user has changed the
> + * order of this slab via sysfs.

Not good. Maybe have an additional counter in kmem_cache_node instead?

Subject: Re: [PATCH/RFC] MM slub: add a sysfs entry to show the calculated number of fallback slabs

On Fri, 12 Nov 2010, Richard Kennedy wrote:

> I know it's not ideal. Of course there already is a counter in
> CONFIG_SLUB_STATS but it only counts the total number of fallback slabs
> issued since boot time.
> I'm not sure if I can reliably decrement a fallback counter when a slab
> get freed. If the size was changed then we could have slabs with several
> different sizes, and off the top of my head I'm not sure if I can
> identify which ones were created as fallback slabs. I don't suppose
> there's a spare flag anywhere.

There are lots of spare flags to be used for SLABs. We just decommissioned
the use of one SLUB_DEBUG. Look at the patchlist and revert that one
giving it a new name like SLUB_FALLBACK.

2010-11-12 17:02:41

by Richard Kennedy

[permalink] [raw]
Subject: Re: [PATCH/RFC] MM slub: add a sysfs entry to show the calculated number of fallback slabs

On Fri, 2010-11-12 at 09:13 -0600, Christoph Lameter wrote:
> On Fri, 12 Nov 2010, Richard Kennedy wrote:
>
> > On my desktop workloads (kernel compile etc) I'm seeing surprisingly
> > little slab fragmentation. Do you have any suggestions for test cases
> > that will fragment the memory?
>
> Do a massive scan through huge amounts of files that triggers inode and
> dentry reclaim?

thanks, I'll give it a try.

> > + * Note that this can give the wrong answer if the user has changed the
> > + * order of this slab via sysfs.
>
> Not good. Maybe have an additional counter in kmem_cache_node instead?


I know it's not ideal. Of course there already is a counter in
CONFIG_SLUB_STATS but it only counts the total number of fallback slabs
issued since boot time.
I'm not sure if I can reliably decrement a fallback counter when a slab
get freed. If the size was changed then we could have slabs with several
different sizes, and off the top of my head I'm not sure if I can
identify which ones were created as fallback slabs. I don't suppose
there's a spare flag anywhere.

I'll give this some more thought.

regards
Richard

2010-11-12 18:47:01

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH/RFC] MM slub: add a sysfs entry to show the calculated number of fallback slabs

On Fri, 12 Nov 2010, Richard Kennedy wrote:

> Add a slub sysfs entry to show the calculated number of fallback slabs.
>
> Using the information already available it is straightforward to
> calculate the number of fallback & full size slabs. We can then track
> which slabs are particularly effected by memory fragmentation and how
> long they take to recover.
>
> There is no change to the mainline code, the calculation is only
> performed on request, and the value is available without having to
> enable CONFIG_SLUB_STATS.
>

I don't see why this information is generally useful unless debugging an
issue where statistics are already needed such as ALLOC_SLOWPATH and
ALLOC_SLAB, so it CONFIG_SLUB_STATS can be enabled to also track
ORDER_FALLBACK. All stats can be cleared by writing a '0' to its sysfs
stats file so you can collect statistics for various workloads without
rebooting (after doing a forced shrink, for example).