2002-06-01 08:36:42

by Andrew Morton

[permalink] [raw]
Subject: [patch 2/16] list_head debugging



A common and very subtle bug is to use list_heads which aren't on any
lists. It causes kernel memory corruption which is observed long after
the offending code has executed.

The patch nulls out the dangling pointers so we get a nice oops at the
site of the buggy code.


=====================================

--- 2.5.19/include/linux/list.h~list-debug Sat Jun 1 01:18:05 2002
+++ 2.5.19-akpm/include/linux/list.h Sat Jun 1 01:18:05 2002
@@ -94,6 +94,11 @@ static __inline__ void __list_del(struct
static __inline__ void list_del(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
+ /*
+ * This is debug. Remove it when the kernel has no bugs ;)
+ */
+ entry->next = 0;
+ entry->prev = 0;
}

/**

-


2002-06-01 17:19:14

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [patch 2/16] list_head debugging

Em Sat, Jun 01, 2002 at 01:40:03AM -0700, Andrew Morton escreveu:
> A common and very subtle bug is to use list_heads which aren't on any
> lists. It causes kernel memory corruption which is observed long after
> the offending code has executed.

> The patch nulls out the dangling pointers so we get a nice oops at the
> site of the buggy code.

> =====================================
>
> --- 2.5.19/include/linux/list.h~list-debug Sat Jun 1 01:18:05 2002
> +++ 2.5.19-akpm/include/linux/list.h Sat Jun 1 01:18:05 2002
> @@ -94,6 +94,11 @@ static __inline__ void __list_del(struct
> static __inline__ void list_del(struct list_head *entry)
> {
> __list_del(entry->prev, entry->next);

#ifdef CONFIG_DEBUG_LIST_DEL_NULLIFY

> + entry->next = 0;
> + entry->prev = 0;

#endif

> }

8) And get this configured in the Debug section of make *config

The kernel will always have bugs ;)

- Arnaldo

2002-06-03 13:55:35

by Jan Harkes

[permalink] [raw]
Subject: Re: [patch 2/16] list_head debugging

On Sat, Jun 01, 2002 at 01:40:03AM -0700, Andrew Morton wrote:
> The patch nulls out the dangling pointers so we get a nice oops at the
> site of the buggy code.
...
> static __inline__ void list_del(struct list_head *entry)
> {
> __list_del(entry->prev, entry->next);
> + /*
> + * This is debug. Remove it when the kernel has no bugs ;)
> + */
> + entry->next = 0;
> + entry->prev = 0;
> }

We've had this before, and it breaks some code that removes items from
lists as follows,

list_for_each(p, list)
if (condition)
list_del(p);

These would have to either use __list_del, or need to do,

for(p = list.next; p != &list;) {
struct list_head *n = p->next;
if (condition)
list_del(p);
p = n;
}

I'm not sure how many places are using the list_for_each/list_del
construction, but there were a couple when this was in the tree
previously. Converting most places that use list_del to use
list_del_init should fix the same bugs, but not cause problems for
existing code.

Just did a grep for list_del and didn't find any obvious places where we
are doing the above construction, except for drivers/isdn/capi/capilib.c
and maybe drivers/hotplug/pcihp_skeleton.c, but it could be hidden in
many more places by a macro or function call (or a larger loop than my 3
line context was showing). But there are not that many places where
we're calling list_del while not immediately re-initializing or adding
the unlinked list_head to another list.

You could probably also add list_move

list_move(entry, head)
{
if (!list_empty(entry))
__list_del(entry->prev, entry->next);
list_add(entry, head);
}

And just delete list_del completely, because all existing places where
list_del is currently used should probably use either list_del_init, or
list_move.

Jan

2002-06-03 20:19:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 2/16] list_head debugging

Jan Harkes wrote:
>
> On Sat, Jun 01, 2002 at 01:40:03AM -0700, Andrew Morton wrote:
> > The patch nulls out the dangling pointers so we get a nice oops at the
> > site of the buggy code.
> ...
> > static __inline__ void list_del(struct list_head *entry)
> > {
> > __list_del(entry->prev, entry->next);
> > + /*
> > + * This is debug. Remove it when the kernel has no bugs ;)
> > + */
> > + entry->next = 0;
> > + entry->prev = 0;
> > }
>
> We've had this before, and it breaks some code that removes items from
> lists as follows,
>
> list_for_each(p, list)
> if (condition)
> list_del(p);

hmm. I suppose that's sane.

> These would have to either use __list_del, or need to do,
>
> for(p = list.next; p != &list;) {
> struct list_head *n = p->next;
> if (condition)
> list_del(p);
> p = n;
> }

list_for_each_safe() does this.

2002-06-03 20:55:16

by Rik van Riel

[permalink] [raw]
Subject: Re: [patch 2/16] list_head debugging

On Mon, 3 Jun 2002, Jan Harkes wrote:

> > static __inline__ void list_del(struct list_head *entry)
> > {
> > __list_del(entry->prev, entry->next);
> > + /*
> > + * This is debug. Remove it when the kernel has no bugs ;)
> > + */
> > + entry->next = 0;
> > + entry->prev = 0;
> > }
>
> We've had this before, and it breaks some code that removes items from
> lists as follows,

Such code is probably not SMP safe anyway.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-06-10 16:36:35

by Jan Harkes

[permalink] [raw]
Subject: Re: [patch 2/16] list_head debugging

On Mon, Jun 03, 2002 at 05:41:39PM -0300, Rik van Riel wrote:
> > We've had this before, and it breaks some code that removes items from
> > lists as follows,
>
> Such code is probably not SMP safe anyway.

Where are you coming from with that comment?

down(&semaphore);

list_for_each(p, list)
if (condition)
list_del(p);

up(&semaphore);

Should be completely SMP safe, or we have more serious problems than I
even imagined. And it worked just fine before the 'zero pointers in
list_del'-patch was included and is used in _at least_ two places in the
existing kernel, probably more.

And list_for_each_safe might work when list_del is zero-ing pointers,
but imho has an ugly interface, it still doesn't fix SMP issues. You
still need to prevent concurrent modifications, otherwise someone could
just as well remove the curr->next object and you corrupt/crash on
dereferencing the saved next pointer.

In fact this saved next pointer will far more likely to lead to obscure
and hard to debug crashes compared to leaving prev/next as they are in
the existing list_del function, just because people will think it is a
'safe' list iterator and forget about correctly locking their lists down.

Jan

2002-06-14 09:26:49

by Rik van Riel

[permalink] [raw]
Subject: Re: [patch 2/16] list_head debugging

On Mon, 10 Jun 2002, Jan Harkes wrote:
> On Mon, Jun 03, 2002 at 05:41:39PM -0300, Rik van Riel wrote:
> > > We've had this before, and it breaks some code that removes items from
> > > lists as follows,
> >
> > Such code is probably not SMP safe anyway.
>
> Where are you coming from with that comment?
>
> down(&semaphore);
>
> list_for_each(p, list)
> if (condition)
> list_del(p);
>
> up(&semaphore);
>
> Should be completely SMP safe,

Not if 'p' comes from the slab cache.

In that case 'p' can be re-allocated on another CPU
before we dereference ->next ...

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/