2002-01-25 17:28:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

[resent because t-offline's outgoing mail server seems to be eating/
not delivering mail. sorry if it appears multiple times]

This patch fixes an reiserfs BUG() at boot time introduced by the
inode cleanups. The problem is that it passes a 20 char name to
kmem_cache_create() ("reiserfs_inode_cache") but kmem_cache_create()
only allows 19 character names and BUG()s for longer names.

The patch fixes this in a low tech approach. It's required to boot
a 2.5.3preX machine with reiserfs compiled in.

-Andi


Index: fs/reiserfs/super.c
===================================================================
RCS file: /cvs/linux/fs/reiserfs/super.c,v
retrieving revision 1.17
diff -u -u -r1.17 super.c
--- fs/reiserfs/super.c 2002/01/24 14:07:54 1.17
+++ fs/reiserfs/super.c 2002/01/24 22:03:34
@@ -153,7 +153,7 @@

static int init_inodecache(void)
{
- reiserfs_inode_cachep = kmem_cache_create("reiserfs_inode_cache",
+ reiserfs_inode_cachep = kmem_cache_create("reiser_inode_cache",
sizeof(struct reiserfs_inode_info),
0, SLAB_HWCACHE_ALIGN,
init_once, NULL);


2002-01-25 17:59:31

by John Levon

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Fri, Jan 25, 2002 at 06:28:08PM +0100, Andi Kleen wrote:

> kmem_cache_create() ("reiserfs_inode_cache") but kmem_cache_create()
> only allows 19 character names and BUG()s for longer names.

please apply this too then.

regards
john


--- mm/slab.c.old Fri Jan 25 17:54:11 2002
+++ mm/slab.c Fri Jan 25 17:55:18 2002
@@ -599,7 +599,10 @@
* @dtor: A destructor for the objects.
*
* Returns a ptr to the cache on success, NULL on failure.
- * Cannot be called within a int, but can be interrupted.
+ * Cannot be called within a interrupt, but can be interrupted.
+ *
+ * strlen(@name) must be less than %CACHE_NAMELEN.
+ *
* The @ctor is run when new pages are allocated by the cache
* and the @dtor is run before the pages are handed back.
* The flags are

2002-01-25 18:10:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time


On Fri, 25 Jan 2002, John Levon wrote:
>
> please apply this too then.

I would prefer instead just avoiding the copy altogether, and just save
the name pointer - with no length restrictions.

Right now the code has the comment

/* Copy name over so we don't have problems with unloaded modules */

but that was written before "kmem_cache_destroy()" existed, and we should
long ago have fixed any modules that don't properly destroy their caches
when they exit (and yes, I know the difference between "should" and "did",
but that's not an excuse for a bad interface).

Linus

2002-01-25 18:32:02

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Jan 25, 2002 10:08 -0800, Linus Torvalds wrote:
> I would prefer instead just avoiding the copy altogether, and just save
> the name pointer - with no length restrictions.
>
> Right now the code has the comment
>
> /* Copy name over so we don't have problems with unloaded modules */

Yes, I put that in.

> but that was written before "kmem_cache_destroy()" existed, and we should
> long ago have fixed any modules that don't properly destroy their caches
> when they exit (and yes, I know the difference between "should" and "did",
> but that's not an excuse for a bad interface).

The problem is that if, for some reason, the cache is NOT empty when you
call kmem_cache_destroy(), it will not be freed, but the module exits
anyways. Then, any access to /proc/slabinfo will OOPS.

Yes, code should be written correctly so that its slab is empty when it
exits, but I'd rather have a _bit_ of safety here so that you can at
least check slabinfo when you get a kernel message "slab is not empty"
(or whatever it is) so you can at least try and investigate the problem.

The other alternative is to BUG with enough information to figure out
the status of this cache if you try to free a non-empty cache. At
least then you would get some data at the time the real problem happens
as opposed to killing some random process later that tries to read
slabinfo.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-01-25 18:51:57

by Hans Reiser

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

We are preparing a large set of patches (each to be sent as a separate
email per your SOP, but all tested together in a thorough testing
procedure), which includes this patch, plus the kdev fix, plus all 2.4
fixes. We expect to send it out on monday, we just found a bug in the
patches as a set that was not present in them individually.

Hans


2002-01-25 19:49:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Fri, Jan 25, 2002 at 10:08:56AM -0800, Linus Torvalds wrote:
>
> On Fri, 25 Jan 2002, John Levon wrote:
> >
> > please apply this too then.
>
> I would prefer instead just avoiding the copy altogether, and just save
> the name pointer - with no length restrictions.
>
> Right now the code has the comment
>
> /* Copy name over so we don't have problems with unloaded modules */
>
> but that was written before "kmem_cache_destroy()" existed, and we should
> long ago have fixed any modules that don't properly destroy their caches
> when they exit (and yes, I know the difference between "should" and "did",
> but that's not an excuse for a bad interface).

This patch implements your suggestion. It works for me, but I haven't audited
the whole tree if they do kmem_cache_destroy as needed. It tries to avoid
oopses for unmapped or bogus names on /proc/slabinfo reading at least. Also
fixes an warning in slab.c

-Andi

Index: mm/slab.c
===================================================================
RCS file: /cvs/linux/mm/slab.c,v
retrieving revision 1.66
diff -u -u -r1.66 slab.c
--- mm/slab.c 2002/01/08 16:00:20 1.66
+++ mm/slab.c 2002/01/25 19:47:20
@@ -186,8 +186,6 @@
* manages a cache.
*/

-#define CACHE_NAMELEN 20 /* max name length for a slab cache */
-
struct kmem_cache_s {
/* 1) each alloc & free */
/* full, partial first, then free */
@@ -225,7 +223,7 @@
unsigned long failures;

/* 3) cache creation/removal */
- char name[CACHE_NAMELEN];
+ const char *name;
struct list_head next;
#ifdef CONFIG_SMP
/* 4) per-cpu data */
@@ -335,6 +333,7 @@
kmem_cache_t *cs_dmacachep;
} cache_sizes_t;

+/* These are the default caches for kmalloc. Custom caches can have other sizes. */
static cache_sizes_t cache_sizes[] = {
#if PAGE_SIZE == 4096
{ 32, NULL, NULL},
@@ -353,6 +352,26 @@
{131072, NULL, NULL},
{ 0, NULL, NULL}
};
+/* Must match cache_sizes above. Out of line to keep cache footprint low. */
+#define CN(x) { x, x ## "(DMA)" }
+static char cache_names[][2][18] = {
+#if PAGE_SIZE == 4096
+ CN("size-32"),
+#endif
+ CN("size-64"),
+ CN("size-128"),
+ CN("size-256"),
+ CN("size-512"),
+ CN("size-1024"),
+ CN("size-2048"),
+ CN("size-4096"),
+ CN("size-8192"),
+ CN("size-16384"),
+ CN("size-32768"),
+ CN("size-65536"),
+ CN("size-131072")
+};
+#undef CN

/* internal cache of cache description objs */
static kmem_cache_t cache_cache = {
@@ -437,7 +456,6 @@
void __init kmem_cache_sizes_init(void)
{
cache_sizes_t *sizes = cache_sizes;
- char name[20];
/*
* Fragmentation resistance on low memory - only use bigger
* page orders on machines with more than 32MB of memory.
@@ -450,9 +468,9 @@
* eliminates "false sharing".
* Note for systems short on memory removing the alignment will
* allow tighter packing of the smaller caches. */
- sprintf(name,"size-%Zd",sizes->cs_size);
if (!(sizes->cs_cachep =
- kmem_cache_create(name, sizes->cs_size,
+ kmem_cache_create(cache_names[sizes-cache_sizes][0],
+ sizes->cs_size,
0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
BUG();
}
@@ -462,9 +480,10 @@
offslab_limit = sizes->cs_size-sizeof(slab_t);
offslab_limit /= 2;
}
- sprintf(name, "size-%Zd(DMA)",sizes->cs_size);
- sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
- SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
+ sizes->cs_dmacachep = kmem_cache_create(
+ cache_names[sizes-cache_sizes][1],
+ sizes->cs_size, 0,
+ SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!sizes->cs_dmacachep)
BUG();
sizes++;
@@ -604,6 +623,11 @@
* Cannot be called within a int, but can be interrupted.
* The @ctor is run when new pages are allocated by the cache
* and the @dtor is run before the pages are handed back.
+ *
+ * @name must be valid until the cache is destroyed. This implies that
+ * the module calling this has to destroy the cache before getting
+ * unloaded.
+ *
* The flags are
*
* %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
@@ -632,7 +656,6 @@
* Sanity checks... these are all serious usage bugs.
*/
if ((!name) ||
- ((strlen(name) >= CACHE_NAMELEN - 1)) ||
in_interrupt() ||
(size < BYTES_PER_WORD) ||
(size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
@@ -797,8 +820,7 @@
cachep->slabp_cache = kmem_find_general_cachep(slab_size,0);
cachep->ctor = ctor;
cachep->dtor = dtor;
- /* Copy name over so we don't have problems with unloaded modules */
- strcpy(cachep->name, name);
+ cachep->name = name;

#ifdef CONFIG_SMP
if (g_cpucache_up)
@@ -810,11 +832,8 @@
struct list_head *p;

list_for_each(p, &cache_chain) {
- kmem_cache_t *pc = list_entry(p, kmem_cache_t, next);
-
- /* The name field is constant - no lock needed. */
- if (!strcmp(pc->name, name))
- BUG();
+ kmem_cache_t *pc;
+ pc = list_entry(p, kmem_cache_t, next);
}
}

@@ -1878,6 +1897,7 @@
unsigned long num_objs;
unsigned long active_slabs = 0;
unsigned long num_slabs;
+ const char *name;
cachep = list_entry(p, kmem_cache_t, next);

spin_lock_irq(&cachep->spinlock);
@@ -1906,8 +1926,15 @@
num_slabs+=active_slabs;
num_objs = num_slabs*cachep->num;

+ name = cachep->name;
+ {
+ char tmp;
+ if (get_user(tmp, name))
+ name = "broken";
+ }
+
len += sprintf(page+len, "%-17s %6lu %6lu %6u %4lu %4lu %4u",
- cachep->name, active_objs, num_objs, cachep->objsize,
+ name, active_objs, num_objs, cachep->objsize,
active_slabs, num_slabs, (1<<cachep->gfporder));

#if STATS


2002-01-25 20:39:04

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Jan 25, 2002 20:49 +0100, Andi Kleen wrote:
> @@ -810,11 +832,8 @@
> struct list_head *p;
>
> list_for_each(p, &cache_chain) {
> - kmem_cache_t *pc = list_entry(p, kmem_cache_t, next);
> -
> - /* The name field is constant - no lock needed. */
> - if (!strcmp(pc->name, name))
> - BUG();
> + kmem_cache_t *pc;
> + pc = list_entry(p, kmem_cache_t, next);
> }
> }
>

So, what exactly does the above do now (hint: p and pc are both local
so they cannot be referenced anywhere else)? It used to check that you
weren't trying to add two caches with the same name. This isn't
possible with caches from broken modules anymore as they have no name.

In the end, it is mostly irrelevant if we have duplicate names in the
slab cache, because you can't "attach" to a cache by name (you can
only "create" a cache and access it via a pointer). We may as well
just remove the whole loop above, since it doesn't do anything anymore.

> + name = cachep->name;
> + {
> + char tmp;
> + if (get_user(tmp, name))
> + name = "broken";
> + }

When calling kmem_cache_destroy() on a non-empty slab we should just
malloc some memory with the old cache name + "_leaked" for the name
pointer. At least then we have a sane chance of figuring out what caused
the problem, instead of having a bunch of "broken" entries in the table,
and remove the above "broken" check entirely (we will always have a name).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-01-25 22:16:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Fri, Jan 25, 2002 at 01:38:14PM -0700, Andreas Dilger wrote:
> So, what exactly does the above do now (hint: p and pc are both local
> so they cannot be referenced anywhere else)? It used to check that you
> weren't trying to add two caches with the same name. This isn't
> possible with caches from broken modules anymore as they have no name.

I have fixed the loop now to check for names again.

> When calling kmem_cache_destroy() on a non-empty slab we should just
> malloc some memory with the old cache name + "_leaked" for the name
> pointer. At least then we have a sane chance of figuring out what caused
> the problem, instead of having a bunch of "broken" entries in the table,
> and remove the above "broken" check entirely (we will always have a name).

I don't like this because it complicates the code too much.
"broken" should be enough to debug it.

New patch appended. Linus please apply if you didn't already.

-Andi


Index: mm/slab.c
===================================================================
RCS file: /cvs/linux/mm/slab.c,v
retrieving revision 1.66
diff -u -u -r1.66 slab.c
--- mm/slab.c 2002/01/08 16:00:20 1.66
+++ mm/slab.c 2002/01/25 22:14:40
@@ -186,8 +186,6 @@
* manages a cache.
*/

-#define CACHE_NAMELEN 20 /* max name length for a slab cache */
-
struct kmem_cache_s {
/* 1) each alloc & free */
/* full, partial first, then free */
@@ -225,7 +223,7 @@
unsigned long failures;

/* 3) cache creation/removal */
- char name[CACHE_NAMELEN];
+ const char *name;
struct list_head next;
#ifdef CONFIG_SMP
/* 4) per-cpu data */
@@ -335,6 +333,7 @@
kmem_cache_t *cs_dmacachep;
} cache_sizes_t;

+/* These are the default caches for kmalloc. Custom caches can have other sizes. */
static cache_sizes_t cache_sizes[] = {
#if PAGE_SIZE == 4096
{ 32, NULL, NULL},
@@ -353,6 +352,26 @@
{131072, NULL, NULL},
{ 0, NULL, NULL}
};
+/* Must match cache_sizes above. Out of line to keep cache footprint low. */
+#define CN(x) { x, x ## "(DMA)" }
+static char cache_names[][2][18] = {
+#if PAGE_SIZE == 4096
+ CN("size-32"),
+#endif
+ CN("size-64"),
+ CN("size-128"),
+ CN("size-256"),
+ CN("size-512"),
+ CN("size-1024"),
+ CN("size-2048"),
+ CN("size-4096"),
+ CN("size-8192"),
+ CN("size-16384"),
+ CN("size-32768"),
+ CN("size-65536"),
+ CN("size-131072")
+};
+#undef CN

/* internal cache of cache description objs */
static kmem_cache_t cache_cache = {
@@ -437,7 +456,6 @@
void __init kmem_cache_sizes_init(void)
{
cache_sizes_t *sizes = cache_sizes;
- char name[20];
/*
* Fragmentation resistance on low memory - only use bigger
* page orders on machines with more than 32MB of memory.
@@ -450,9 +468,9 @@
* eliminates "false sharing".
* Note for systems short on memory removing the alignment will
* allow tighter packing of the smaller caches. */
- sprintf(name,"size-%Zd",sizes->cs_size);
if (!(sizes->cs_cachep =
- kmem_cache_create(name, sizes->cs_size,
+ kmem_cache_create(cache_names[sizes-cache_sizes][0],
+ sizes->cs_size,
0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
BUG();
}
@@ -462,9 +480,10 @@
offslab_limit = sizes->cs_size-sizeof(slab_t);
offslab_limit /= 2;
}
- sprintf(name, "size-%Zd(DMA)",sizes->cs_size);
- sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
- SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
+ sizes->cs_dmacachep = kmem_cache_create(
+ cache_names[sizes-cache_sizes][1],
+ sizes->cs_size, 0,
+ SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!sizes->cs_dmacachep)
BUG();
sizes++;
@@ -604,6 +623,11 @@
* Cannot be called within a int, but can be interrupted.
* The @ctor is run when new pages are allocated by the cache
* and the @dtor is run before the pages are handed back.
+ *
+ * @name must be valid until the cache is destroyed. This implies that
+ * the module calling this has to destroy the cache before getting
+ * unloaded.
+ *
* The flags are
*
* %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
@@ -632,7 +656,6 @@
* Sanity checks... these are all serious usage bugs.
*/
if ((!name) ||
- ((strlen(name) >= CACHE_NAMELEN - 1)) ||
in_interrupt() ||
(size < BYTES_PER_WORD) ||
(size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
@@ -797,8 +820,7 @@
cachep->slabp_cache = kmem_find_general_cachep(slab_size,0);
cachep->ctor = ctor;
cachep->dtor = dtor;
- /* Copy name over so we don't have problems with unloaded modules */
- strcpy(cachep->name, name);
+ cachep->name = name;

#ifdef CONFIG_SMP
if (g_cpucache_up)
@@ -811,10 +833,11 @@

list_for_each(p, &cache_chain) {
kmem_cache_t *pc = list_entry(p, kmem_cache_t, next);
-
- /* The name field is constant - no lock needed. */
- if (!strcmp(pc->name, name))
- BUG();
+ char tmp;
+ if (get_user(tmp,pc->name))
+ continue;
+ if (!strcmp(pc->name,name))
+ BUG();
}
}

@@ -1878,6 +1901,7 @@
unsigned long num_objs;
unsigned long active_slabs = 0;
unsigned long num_slabs;
+ const char *name;
cachep = list_entry(p, kmem_cache_t, next);

spin_lock_irq(&cachep->spinlock);
@@ -1906,8 +1930,15 @@
num_slabs+=active_slabs;
num_objs = num_slabs*cachep->num;

+ name = cachep->name;
+ {
+ char tmp;
+ if (get_user(tmp, name))
+ name = "broken";
+ }
+
len += sprintf(page+len, "%-17s %6lu %6lu %6u %4lu %4lu %4u",
- cachep->name, active_objs, num_objs, cachep->objsize,
+ name, active_objs, num_objs, cachep->objsize,
active_slabs, num_slabs, (1<<cachep->gfporder));

#if STATS

2002-01-25 22:34:27

by Andrea Ferraris

[permalink] [raw]
Subject: eth0: NULL pointer encountered in RX ring, skipping

After such kernel message the network interface (kernel 2.4.2) replied to
ping from other network's machines with times varying between 200 and 2000 ms
and it was almost impossible to use the network to transferring files to or
from this PC.
The card is a SIS960 on the mboard on an Acer PC with a Celeron 666 Mhz
processor.

The problem was solved with a shutdown -h, powering down and powering up
the PC.

Can somebody explain the trouble?

Best regards to all all and thx to replying people,

Andrea

2002-01-25 22:41:47

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Jan 25, 2002 23:15 +0100, Andi Kleen wrote:
> On Fri, Jan 25, 2002 at 01:38:14PM -0700, Andreas Dilger wrote:
> > When calling kmem_cache_destroy() on a non-empty slab we should just
> > malloc some memory with the old cache name + "_leaked" for the name
> > pointer. At least then we have a sane chance of figuring out what caused
> > the problem, instead of having a bunch of "broken" entries in the table,
> > and remove the above "broken" check entirely (we will always have a name).
>
> I don't like this because it complicates the code too much.
> "broken" should be enough to debug it.

Hmm, then you could just point to a static "broken" name at
kmem_cache_destroy() time and save yourself the get_user() checks
for each access to the name. This would gratuitously overwrite
the name for non-modular caches that failed to unload, but I doubt
that such things exist.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-01-25 23:00:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: eth0: NULL pointer encountered in RX ring, skipping

Well, the code says "this should never happen" ;-)

But anyway, it is probably a temporary memory allocation failure. The
code handles this case.

Jeff



--
Jeff Garzik | "I went through my candy like hot oatmeal
Building 1024 | through an internally-buttered weasel."
MandrakeSoft | - goats.com

2002-01-26 09:12:52

by kaih

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

[email protected] (Andi Kleen) wrote on 25.01.02 in <[email protected]>:

> +/* Must match cache_sizes above. Out of line to keep cache footprint low.
> */ +#define CN(x) { x, x ## "(DMA)" }
> +static char cache_names[][2][18] = {


> + CN("size-128"),
> + CN("size-256"),
> + CN("size-512"),


What on earth is that ## for?! If that actually works, I strongly suspect
that it is a bug in cpp.

MfG Kai

2002-01-26 10:13:50

by Andrea Ferraris

[permalink] [raw]
Subject: Re: eth0: NULL pointer encountered in RX ring, skipping

Friday 25 January 2002 23:59, Jeff Garzik scrisse:
> Well, the code says "this should never happen" ;-)
>
> But anyway, it is probably a temporary memory allocation failure. The
> code handles this case.

Yes, but I think that isn't normal to have to do a cold reboot to have the
machine again working on the network. It is, maybe that the code doesn't
handle so well this case. Do you suggest a kernel upgrade?

Andrea

2002-01-26 15:26:30

by Andrea Ferraris

[permalink] [raw]
Subject: OPS: eth0: NULL pointer encountered in RX ring, skipping

Saturday 26 January 2002 11:11, Andrea Ferraris scrisse:
> Friday 25 January 2002 23:59, Jeff Garzik scrisse:
> > Well, the code says "this should never happen" ;-)
> >
> > But anyway, it is probably a temporary memory allocation failure. The
> > code handles this case.
>
> Yes, but I think that isn't normal to have to do a cold reboot to have the
> machine again working on the network. It is, maybe that the code doesn't
> handle so well this case. Do you suggest a kernel upgrade?

Ops ... sorry. It was a 2.2.16 kernel.

Andrea

2002-01-27 23:02:31

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

Andi Kleen wrote:
>
> On Fri, Jan 25, 2002 at 01:38:14PM -0700, Andreas Dilger wrote:
> > So, what exactly does the above do now (hint: p and pc are both local
> > so they cannot be referenced anywhere else)? It used to check that you
> > weren't trying to add two caches with the same name. This isn't
> > possible with caches from broken modules anymore as they have no name.
>
> I have fixed the loop now to check for names again.
>
> > When calling kmem_cache_destroy() on a non-empty slab we should just
> > malloc some memory with the old cache name + "_leaked" for the name
> > pointer. At least then we have a sane chance of figuring out what caused
> > the problem, instead of having a bunch of "broken" entries in the table,
> > and remove the above "broken" check entirely (we will always have a name).
>
> I don't like this because it complicates the code too much.
> "broken" should be enough to debug it.
>
> New patch appended. Linus please apply if you didn't already.
>
> -Andi



2.5.3-pre5 + this patch still can't boot my system. I haven't had
time to copy down oops at boot, will do if needed.

Thanks & ciao,



> Index: mm/slab.c
> ===================================================================
> RCS file: /cvs/linux/mm/slab.c,v
> retrieving revision 1.66
> diff -u -u -r1.66 slab.c
> --- mm/slab.c 2002/01/08 16:00:20 1.66
> +++ mm/slab.c 2002/01/25 22:14:40
> @@ -186,8 +186,6 @@
> * manages a cache.
> */
>
> -#define CACHE_NAMELEN 20 /* max name length for a slab cache */
> -
> struct kmem_cache_s {
> /* 1) each alloc & free */
> /* full, partial first, then free */
> @@ -225,7 +223,7 @@
> unsigned long failures;
>
> /* 3) cache creation/removal */
> - char name[CACHE_NAMELEN];
> + const char *name;
> struct list_head next;
> #ifdef CONFIG_SMP
> /* 4) per-cpu data */
> @@ -335,6 +333,7 @@
> kmem_cache_t *cs_dmacachep;
> } cache_sizes_t;
>
> +/* These are the default caches for kmalloc. Custom caches can have other sizes. */
> static cache_sizes_t cache_sizes[] = {
> #if PAGE_SIZE == 4096
> { 32, NULL, NULL},
> @@ -353,6 +352,26 @@
> {131072, NULL, NULL},
> { 0, NULL, NULL}
> };
> +/* Must match cache_sizes above. Out of line to keep cache footprint low. */
> +#define CN(x) { x, x ## "(DMA)" }
> +static char cache_names[][2][18] = {
> +#if PAGE_SIZE == 4096
> + CN("size-32"),
> +#endif
> + CN("size-64"),
> + CN("size-128"),
> + CN("size-256"),
> + CN("size-512"),
> + CN("size-1024"),
> + CN("size-2048"),
> + CN("size-4096"),
> + CN("size-8192"),
> + CN("size-16384"),
> + CN("size-32768"),
> + CN("size-65536"),
> + CN("size-131072")
> +};
> +#undef CN
>
> /* internal cache of cache description objs */
> static kmem_cache_t cache_cache = {
> @@ -437,7 +456,6 @@
> void __init kmem_cache_sizes_init(void)
> {
> cache_sizes_t *sizes = cache_sizes;
> - char name[20];
> /*
> * Fragmentation resistance on low memory - only use bigger
> * page orders on machines with more than 32MB of memory.
> @@ -450,9 +468,9 @@
> * eliminates "false sharing".
> * Note for systems short on memory removing the alignment will
> * allow tighter packing of the smaller caches. */
> - sprintf(name,"size-%Zd",sizes->cs_size);
> if (!(sizes->cs_cachep =
> - kmem_cache_create(name, sizes->cs_size,
> + kmem_cache_create(cache_names[sizes-cache_sizes][0],
> + sizes->cs_size,
> 0, SLAB_HWCACHE_ALIGN, NULL, NULL))) {
> BUG();
> }
> @@ -462,9 +480,10 @@
> offslab_limit = sizes->cs_size-sizeof(slab_t);
> offslab_limit /= 2;
> }
> - sprintf(name, "size-%Zd(DMA)",sizes->cs_size);
> - sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
> - SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
> + sizes->cs_dmacachep = kmem_cache_create(
> + cache_names[sizes-cache_sizes][1],
> + sizes->cs_size, 0,
> + SLAB_CACHE_DMA|SLAB_HWCACHE_ALIGN, NULL, NULL);
> if (!sizes->cs_dmacachep)
> BUG();
> sizes++;
> @@ -604,6 +623,11 @@
> * Cannot be called within a int, but can be interrupted.
> * The @ctor is run when new pages are allocated by the cache
> * and the @dtor is run before the pages are handed back.
> + *
> + * @name must be valid until the cache is destroyed. This implies that
> + * the module calling this has to destroy the cache before getting
> + * unloaded.
> + *
> * The flags are
> *
> * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
> @@ -632,7 +656,6 @@
> * Sanity checks... these are all serious usage bugs.
> */
> if ((!name) ||
> - ((strlen(name) >= CACHE_NAMELEN - 1)) ||
> in_interrupt() ||
> (size < BYTES_PER_WORD) ||
> (size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
> @@ -797,8 +820,7 @@
> cachep->slabp_cache = kmem_find_general_cachep(slab_size,0);
> cachep->ctor = ctor;
> cachep->dtor = dtor;
> - /* Copy name over so we don't have problems with unloaded modules */
> - strcpy(cachep->name, name);
> + cachep->name = name;
>
> #ifdef CONFIG_SMP
> if (g_cpucache_up)
> @@ -811,10 +833,11 @@
>
> list_for_each(p, &cache_chain) {
> kmem_cache_t *pc = list_entry(p, kmem_cache_t, next);
> -
> - /* The name field is constant - no lock needed. */
> - if (!strcmp(pc->name, name))
> - BUG();
> + char tmp;
> + if (get_user(tmp,pc->name))
> + continue;
> + if (!strcmp(pc->name,name))
> + BUG();
> }
> }
>
> @@ -1878,6 +1901,7 @@
> unsigned long num_objs;
> unsigned long active_slabs = 0;
> unsigned long num_slabs;
> + const char *name;
> cachep = list_entry(p, kmem_cache_t, next);
>
> spin_lock_irq(&cachep->spinlock);
> @@ -1906,8 +1930,15 @@
> num_slabs+=active_slabs;
> num_objs = num_slabs*cachep->num;
>
> + name = cachep->name;
> + {
> + char tmp;
> + if (get_user(tmp, name))
> + name = "broken";
> + }
> +
> len += sprintf(page+len, "%-17s %6lu %6lu %6u %4lu %4lu %4u",
> - cachep->name, active_objs, num_objs, cachep->objsize,
> + name, active_objs, num_objs, cachep->objsize,
> active_slabs, num_slabs, (1<<cachep->gfporder));
>
> #if STATS

--alessandro

"this machine will, will not communicate
these thoughts and the strain I am under
be a world child, form a circle before we all go under"
(Radiohead, "Street Spirit [fade out]")

2002-01-28 00:02:05

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Mon, Jan 28, 2002 at 12:02:54AM +0100, Alessandro Suardi wrote:
>
> 2.5.3-pre5 + this patch still can't boot my system. I haven't had
> time to copy down oops at boot, will do if needed.

Please do. I cannot see anything in the patch that should prevent bootup
though, so I would also recommend a make clean and recompile first just
to make sure it isn't a broken build.

-Andi

2002-01-28 11:08:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Mon, Jan 28 2002, Andi Kleen wrote:
> On Mon, Jan 28, 2002 at 12:02:54AM +0100, Alessandro Suardi wrote:
> >
> > 2.5.3-pre5 + this patch still can't boot my system. I haven't had
> > time to copy down oops at boot, will do if needed.
>
> Please do. I cannot see anything in the patch that should prevent bootup
> though, so I would also recommend a make clean and recompile first just
> to make sure it isn't a broken build.

Probably the kmem_cache_create 'name too long' bug that Viro pointed out
to me. fs/reiserfs/super.c:init_inodecache(). Change the name passed to
kmem_cache_create to something shorter.

--
Jens Axboe

2002-01-28 14:53:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Mon, Jan 28, 2002 at 12:07:47PM +0100, Jens Axboe wrote:
> On Mon, Jan 28 2002, Andi Kleen wrote:
> > On Mon, Jan 28, 2002 at 12:02:54AM +0100, Alessandro Suardi wrote:
> > >
> > > 2.5.3-pre5 + this patch still can't boot my system. I haven't had
> > > time to copy down oops at boot, will do if needed.
> >
> > Please do. I cannot see anything in the patch that should prevent bootup
> > though, so I would also recommend a make clean and recompile first just
> > to make sure it isn't a broken build.
>
> Probably the kmem_cache_create 'name too long' bug that Viro pointed out
> to me. fs/reiserfs/super.c:init_inodecache(). Change the name passed to
> kmem_cache_create to something shorter.

The patch he tried removed the check of the name length ;)

-Andi

2002-01-28 14:55:06

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

On Mon, Jan 28 2002, Andi Kleen wrote:
> On Mon, Jan 28, 2002 at 12:07:47PM +0100, Jens Axboe wrote:
> > On Mon, Jan 28 2002, Andi Kleen wrote:
> > > On Mon, Jan 28, 2002 at 12:02:54AM +0100, Alessandro Suardi wrote:
> > > >
> > > > 2.5.3-pre5 + this patch still can't boot my system. I haven't had
> > > > time to copy down oops at boot, will do if needed.
> > >
> > > Please do. I cannot see anything in the patch that should prevent bootup
> > > though, so I would also recommend a make clean and recompile first just
> > > to make sure it isn't a broken build.
> >
> > Probably the kmem_cache_create 'name too long' bug that Viro pointed out
> > to me. fs/reiserfs/super.c:init_inodecache(). Change the name passed to
> > kmem_cache_create to something shorter.
>
> The patch he tried removed the check of the name length ;)

Heh, my cover is blown -- I didn't read it :-)

--
Jens Axboe

2002-01-29 13:13:42

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [PATCH] Fix 2.5.3pre reiserfs BUG() at boot time

Andi Kleen wrote:
>
> On Mon, Jan 28, 2002 at 12:02:54AM +0100, Alessandro Suardi wrote:
> >
> > 2.5.3-pre5 + this patch still can't boot my system. I haven't had
> > time to copy down oops at boot, will do if needed.
>
> Please do. I cannot see anything in the patch that should prevent bootup
> though, so I would also recommend a make clean and recompile first just
> to make sure it isn't a broken build.

I ended up away from email for a couple of days and saw -pre6;
re-patched from 2.5.2 to -pre6, boot is okay .

Thanks & ciao,

--alessandro

"this machine will, will not communicate
these thoughts and the strain I am under
be a world child, form a circle before we all go under"
(Radiohead, "Street Spirit [fade out]")