2010-06-24 03:16:47

by Nick Piggin

[permalink] [raw]
Subject: [patch 02/52] fs: fix superblock iteration race

list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af6 introduced a race in sb walking.

list_for_each_entry can use the trick of pinning the current entry in
the list before we drop and retake the lock because it subsequently
follows cur->next. However list_for_each_entry_safe saves n=cur->next
for following before entering the loop body, so when the lock is
dropped, n may be deleted.

Signed-off-by: Nick Piggin <[email protected]>
---
fs/dcache.c | 2 ++
fs/super.c | 6 ++++++
include/linux/list.h | 15 +++++++++++++++
3 files changed, 23 insertions(+)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -590,6 +590,8 @@ static void prune_dcache(int count)
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
count -= pruned;
__put_super(sb);
/* more work left to do? */
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -374,6 +374,8 @@ void sync_supers(void)
up_read(&sb->s_umount);

spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
}
@@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
up_read(&sb->s_umount);

spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
@@ -585,6 +589,8 @@ static void do_emergency_remount(struct
}
up_write(&sb->s_umount);
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
Index: linux-2.6/include/linux/list.h
===================================================================
--- linux-2.6.orig/include/linux/list.h
+++ linux-2.6/include/linux/list.h
@@ -544,6 +544,21 @@ static inline void list_splice_tail_init
&pos->member != (head); \
pos = n, n = list_entry(n->member.prev, typeof(*n), member))

+/**
+ * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
+ * @pos: the loop cursor used in the list_for_each_entry_safe loop
+ * @n: temporary storage used in list_for_each_entry_safe
+ * @member: the name of the list_struct within the struct.
+ *
+ * list_safe_reset_next is not safe to use in general if the list may be
+ * modified concurrently (eg. the lock is dropped in the loop body). An
+ * exception to this is if the cursor element (pos) is pinned in the list,
+ * and list_safe_reset_next is called after re-taking the lock and before
+ * completing the current iteration of the loop body.
+ */
+#define list_safe_reset_next(pos, n, member) \
+ n = list_entry(pos->member.next, typeof(*pos), member)
+
/*
* Double linked lists with a single pointer list head.
* Mostly useful for hash tables where the two pointer list head is


2010-06-29 13:02:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

This should actually be on it's way to Linus for .35, shouldn't it?

On Thu, Jun 24, 2010 at 01:02:14PM +1000, [email protected] wrote:
> list_for_each_entry_safe is not suitable to protect against concurrent
> modification of the list. 6754af6 introduced a race in sb walking.
>
> list_for_each_entry can use the trick of pinning the current entry in
> the list before we drop and retake the lock because it subsequently
> follows cur->next. However list_for_each_entry_safe saves n=cur->next
> for following before entering the loop body, so when the lock is
> dropped, n may be deleted.
>
> Signed-off-by: Nick Piggin <[email protected]>
> ---
> fs/dcache.c | 2 ++
> fs/super.c | 6 ++++++
> include/linux/list.h | 15 +++++++++++++++
> 3 files changed, 23 insertions(+)
>
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c
> +++ linux-2.6/fs/dcache.c
> @@ -590,6 +590,8 @@ static void prune_dcache(int count)
> up_read(&sb->s_umount);
> }
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> count -= pruned;
> __put_super(sb);
> /* more work left to do? */
> Index: linux-2.6/fs/super.c
> ===================================================================
> --- linux-2.6.orig/fs/super.c
> +++ linux-2.6/fs/super.c
> @@ -374,6 +374,8 @@ void sync_supers(void)
> up_read(&sb->s_umount);
>
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> __put_super(sb);
> }
> }
> @@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
> up_read(&sb->s_umount);
>
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> __put_super(sb);
> }
> spin_unlock(&sb_lock);
> @@ -585,6 +589,8 @@ static void do_emergency_remount(struct
> }
> up_write(&sb->s_umount);
> spin_lock(&sb_lock);
> + /* lock was dropped, must reset next */
> + list_safe_reset_next(sb, n, s_list);
> __put_super(sb);
> }
> spin_unlock(&sb_lock);
> Index: linux-2.6/include/linux/list.h
> ===================================================================
> --- linux-2.6.orig/include/linux/list.h
> +++ linux-2.6/include/linux/list.h
> @@ -544,6 +544,21 @@ static inline void list_splice_tail_init
> &pos->member != (head); \
> pos = n, n = list_entry(n->member.prev, typeof(*n), member))
>
> +/**
> + * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
> + * @pos: the loop cursor used in the list_for_each_entry_safe loop
> + * @n: temporary storage used in list_for_each_entry_safe
> + * @member: the name of the list_struct within the struct.
> + *
> + * list_safe_reset_next is not safe to use in general if the list may be
> + * modified concurrently (eg. the lock is dropped in the loop body). An
> + * exception to this is if the cursor element (pos) is pinned in the list,
> + * and list_safe_reset_next is called after re-taking the lock and before
> + * completing the current iteration of the loop body.
> + */
> +#define list_safe_reset_next(pos, n, member) \
> + n = list_entry(pos->member.next, typeof(*pos), member)
> +
> /*
> * Double linked lists with a single pointer list head.
> * Mostly useful for hash tables where the two pointer list head is
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---

2010-06-29 14:56:24

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 09:02:14AM -0400, Christoph Hellwig wrote:
> This should actually be on it's way to Linus for .35, shouldn't it?

Yeah, I was waiting for Al to reappear, but I think this is
probably the nicest way to solve the problem. Linus?
--
fs: fix superblock iteration race

list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af6 introduced a race in sb walking.

list_for_each_entry can use the trick of pinning the current entry in
the list before we drop and retake the lock because it subsequently
follows cur->next. However list_for_each_entry_safe saves n=cur->next
for following before entering the loop body, so when the lock is
dropped, n may be deleted.

Signed-off-by: Nick Piggin <[email protected]>
---
fs/dcache.c | 2 ++
fs/super.c | 6 ++++++
include/linux/list.h | 15 +++++++++++++++
3 files changed, 23 insertions(+)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -590,6 +590,8 @@ static void prune_dcache(int count)
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
count -= pruned;
__put_super(sb);
/* more work left to do? */
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -374,6 +374,8 @@ void sync_supers(void)
up_read(&sb->s_umount);

spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
}
@@ -405,6 +407,8 @@ void iterate_supers(void (*f)(struct sup
up_read(&sb->s_umount);

spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
@@ -585,6 +589,8 @@ static void do_emergency_remount(struct
}
up_write(&sb->s_umount);
spin_lock(&sb_lock);
+ /* lock was dropped, must reset next */
+ list_safe_reset_next(sb, n, s_list);
__put_super(sb);
}
spin_unlock(&sb_lock);
Index: linux-2.6/include/linux/list.h
===================================================================
--- linux-2.6.orig/include/linux/list.h
+++ linux-2.6/include/linux/list.h
@@ -544,6 +544,21 @@ static inline void list_splice_tail_init
&pos->member != (head); \
pos = n, n = list_entry(n->member.prev, typeof(*n), member))

+/**
+ * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
+ * @pos: the loop cursor used in the list_for_each_entry_safe loop
+ * @n: temporary storage used in list_for_each_entry_safe
+ * @member: the name of the list_struct within the struct.
+ *
+ * list_safe_reset_next is not safe to use in general if the list may be
+ * modified concurrently (eg. the lock is dropped in the loop body). An
+ * exception to this is if the cursor element (pos) is pinned in the list,
+ * and list_safe_reset_next is called after re-taking the lock and before
+ * completing the current iteration of the loop body.
+ */
+#define list_safe_reset_next(pos, n, member) \
+ n = list_entry(pos->member.next, typeof(*pos), member)
+
/*
* Double linked lists with a single pointer list head.
* Mostly useful for hash tables where the two pointer list head is

2010-06-29 17:36:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 7:56 AM, Nick Piggin <[email protected]> wrote:
> On Tue, Jun 29, 2010 at 09:02:14AM -0400, Christoph Hellwig wrote:
>> This should actually be on it's way to Linus for .35, shouldn't it?
>
> Yeah, I was waiting for Al to reappear, but I think this is
> probably the nicest way to solve the problem. Linus?

I'll apply it. We have a couple of oopses listed for the superblock
iterator, and I haven't heard from Al. And the patch looks obviously
fine, whether it's actually the cause of some of the bugs or not.

Linus

2010-06-29 17:41:28

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 10:35:47AM -0700, Linus Torvalds wrote:
> On Tue, Jun 29, 2010 at 7:56 AM, Nick Piggin <[email protected]> wrote:
> > On Tue, Jun 29, 2010 at 09:02:14AM -0400, Christoph Hellwig wrote:
> >> This should actually be on it's way to Linus for .35, shouldn't it?
> >
> > Yeah, I was waiting for Al to reappear, but I think this is
> > probably the nicest way to solve the problem. Linus?
>
> I'll apply it. We have a couple of oopses listed for the superblock
> iterator, and I haven't heard from Al. And the patch looks obviously
> fine, whether it's actually the cause of some of the bugs or not.

OK. I only have managed to get it into an infininte loop but I think
it would be surely possible to oops it because the next pointer can
be uninitialised memory at that point.

2010-06-29 17:52:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 10:41 AM, Nick Piggin <[email protected]> wrote:
> On Tue, Jun 29, 2010 at 10:35:47AM -0700, Linus Torvalds wrote:
>>
>> I'll apply it. We have a couple of oopses listed for the superblock
>> iterator, and I haven't heard from Al. And the patch looks obviously
>> fine, whether it's actually the cause of some of the bugs or not.
>
> OK. I only have managed to get it into an infininte loop but I think
> it would be surely possible to oops it because the next pointer can
> be uninitialised memory at that point.

Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
guarantee that it's the same thing, but it's "iterate_supers()"
getting an oops when it does "down_read(&sb->s_umount)". Which really
looks suspiciously like "sb" just being totally bogus, most likely
because of this same issue.

So I dunno, but I asked Al to look at it, and haven't heard back.

Regardless, I think your patch is the right thing to do (modulo any
syntactic issues - and I think your final version was the best of the
lot).

Linus

2010-06-29 18:03:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 10:52 AM, Linus Torvalds
<[email protected]> wrote:
>
> Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
> guarantee that it's the same thing, but it's "iterate_supers()"
> getting an oops [..]

Also, "Oops during closedown with 2.6.35-rc3-git3" is an
iterate_supers oops (in the jpg) and Chris says it's repeatable for
him.

Chris - you could try testing current -git now that I've merged Nick's
patch. It's commit 57439f878af ("fs: fix superblock iteration race"),
and I just pushed it out (so it might take a few minutes to mirror out
to the public git trees, but it should be there shortly).

Linus

2010-06-29 20:04:42

by Chris Clayton

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tuesday 29 June 2010, Linus Torvalds wrote:
> On Tue, Jun 29, 2010 at 10:52 AM, Linus Torvalds
>
> <[email protected]> wrote:
> > Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
> > guarantee that it's the same thing, but it's "iterate_supers()"
> > getting an oops [..]
>
> Also, "Oops during closedown with 2.6.35-rc3-git3" is an
> iterate_supers oops (in the jpg) and Chris says it's repeatable for
> him.
>
> Chris - you could try testing current -git now that I've merged Nick's
> patch. It's commit 57439f878af ("fs: fix superblock iteration race"),
> and I just pushed it out (so it might take a few minutes to mirror out
> to the public git trees, but it should be there shortly).
>

Well, it was repeatable this morning, but despite 30+ shutdowns this evening, I
haven't had a single oops. Perhaps I'm just not doing enough computing stuff
between startup and shutdown to create the conditions under which I got the
oopses this morning. Moreover, I've been using this kernel every day with,
maybe two or three shutdowns a day, for two weeks or so now (since just after
Linus went on vacation) with no oopses. (The kernel is -rc3 + John
Fastabend's "net: fix deliver_no_wcard regression on loopback device" patch).

I'll spend a couple of hours doing stuff and see if I can generate an oops.
Trouble is, of course, that when I pull and build the latest and greatest, I
won't know why I'm not getting oopses, assuming I don't.

I'll update y'all later.

Chris

> Linus



--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller

2010-06-29 20:14:17

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 09:04:33PM +0100, Chris Clayton wrote:
> On Tuesday 29 June 2010, Linus Torvalds wrote:
> > On Tue, Jun 29, 2010 at 10:52 AM, Linus Torvalds
> >
> > <[email protected]> wrote:
> > > Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
> > > guarantee that it's the same thing, but it's "iterate_supers()"
> > > getting an oops [..]
> >
> > Also, "Oops during closedown with 2.6.35-rc3-git3" is an
> > iterate_supers oops (in the jpg) and Chris says it's repeatable for
> > him.
> >
> > Chris - you could try testing current -git now that I've merged Nick's
> > patch. It's commit 57439f878af ("fs: fix superblock iteration race"),
> > and I just pushed it out (so it might take a few minutes to mirror out
> > to the public git trees, but it should be there shortly).
> >
>
> Well, it was repeatable this morning, but despite 30+ shutdowns this evening, I
> haven't had a single oops. Perhaps I'm just not doing enough computing stuff
> between startup and shutdown to create the conditions under which I got the
> oopses this morning. Moreover, I've been using this kernel every day with,
> maybe two or three shutdowns a day, for two weeks or so now (since just after
> Linus went on vacation) with no oopses. (The kernel is -rc3 + John
> Fastabend's "net: fix deliver_no_wcard regression on loopback device" patch).
>
> I'll spend a couple of hours doing stuff and see if I can generate an oops.
> Trouble is, of course, that when I pull and build the latest and greatest, I
> won't know why I'm not getting oopses, assuming I don't.

Don't worry too much if you can't reproduce. I'd say it is likely to be
this bug, and if so, it is going to depend significantly on timing and
ordering of mounts/umounts.

2010-06-29 20:38:51

by Chris Clayton

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tuesday 29 June 2010, Nick Piggin wrote:
> On Tue, Jun 29, 2010 at 09:04:33PM +0100, Chris Clayton wrote:
> > On Tuesday 29 June 2010, Linus Torvalds wrote:
> > > On Tue, Jun 29, 2010 at 10:52 AM, Linus Torvalds
> > >
> > > <[email protected]> wrote:
> > > > Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
> > > > guarantee that it's the same thing, but it's "iterate_supers()"
> > > > getting an oops [..]
> > >
> > > Also, "Oops during closedown with 2.6.35-rc3-git3" is an
> > > iterate_supers oops (in the jpg) and Chris says it's repeatable for
> > > him.
> > >
> > > Chris - you could try testing current -git now that I've merged Nick's
> > > patch. It's commit 57439f878af ("fs: fix superblock iteration race"),
> > > and I just pushed it out (so it might take a few minutes to mirror out
> > > to the public git trees, but it should be there shortly).
> >
> > Well, it was repeatable this morning, but despite 30+ shutdowns this
> > evening, I haven't had a single oops. Perhaps I'm just not doing enough
> > computing stuff between startup and shutdown to create the conditions
> > under which I got the oopses this morning. Moreover, I've been using this
> > kernel every day with, maybe two or three shutdowns a day, for two weeks
> > or so now (since just after Linus went on vacation) with no oopses. (The
> > kernel is -rc3 + John Fastabend's "net: fix deliver_no_wcard regression
> > on loopback device" patch).
> >
> > I'll spend a couple of hours doing stuff and see if I can generate an
> > oops. Trouble is, of course, that when I pull and build the latest and
> > greatest, I won't know why I'm not getting oopses, assuming I don't.
>
> Don't worry too much if you can't reproduce. I'd say it is likely to be
> this bug, and if so, it is going to depend significantly on timing and
> ordering of mounts/umounts.

Yes, that makes sense. I did quite a bit of copying files to and from USB
storage devices this morning. I've got some more to do, so I'll see what that
produces.

--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller

2010-06-30 07:13:52

by Chris Clayton

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tuesday 29 June 2010, Chris Clayton wrote:
> On Tuesday 29 June 2010, Nick Piggin wrote:
> > On Tue, Jun 29, 2010 at 09:04:33PM +0100, Chris Clayton wrote:

> > >
> > > I'll spend a couple of hours doing stuff and see if I can generate an
> > > oops. Trouble is, of course, that when I pull and build the latest and
> > > greatest, I won't know why I'm not getting oopses, assuming I don't.
> >
> > Don't worry too much if you can't reproduce. I'd say it is likely to be
> > this bug, and if so, it is going to depend significantly on timing and
> > ordering of mounts/umounts.
>
> Yes, that makes sense. I did quite a bit of copying files to and from USB
> storage devices this morning. I've got some more to do, so I'll see what
> that produces.

Well, try as I may, these oopses are like London buses - firstly you get three
in quick succession when you don't need them and then when you do, none turn
up. Oh well!

Chris

--
The more I see, the more I know. The more I know, the less I understand.
Changing Man - Paul Weller

2010-06-30 12:51:14

by Al Viro

[permalink] [raw]
Subject: Re: [patch 02/52] fs: fix superblock iteration race

On Tue, Jun 29, 2010 at 10:58:14AM -0700, Linus Torvalds wrote:
> On Tue, Jun 29, 2010 at 10:52 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Look for "2.6.35-rc3 oops trying to suspend" on lkml, for example. No
> > guarantee that it's the same thing, but it's "iterate_supers()"
> > getting an oops [..]
>
> Also, "Oops during closedown with 2.6.35-rc3-git3" is an
> iterate_supers oops (in the jpg) and Chris says it's repeatable for
> him.
>
> Chris - you could try testing current -git now that I've merged Nick's
> patch. It's commit 57439f878af ("fs: fix superblock iteration race"),
> and I just pushed it out (so it might take a few minutes to mirror out
> to the public git trees, but it should be there shortly).

ACK. I've several cleanups on top of that, but that's not a .35 fodder.
I'll put them into for-next tonight. But yes, the hole caught by
Nick is definitely real and very likely to be responsible for all that
pile of reports. Mea culpa.