2005-09-10 00:44:13

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

Code reading uncovered a potential deadlock on the global cpuset
semaphore, cpuset_sem.

==> This patch is only useful in the 2.6.13-stable series.

(It's harmless, and useless, in the pre 2.6.14 fork)

The pre-2.6.14 fork has already diverged, with an additional patch
that further aggrevated this problem, and a more thorough overhaul
of the cpuset locking, to fix the problems.

All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
grab cpuset_sem and then allocate memory _must_ call the routine
'refresh_mems()', after getting cpuset_sem, before any possible
allocation.

If this refresh_mems() call is not done, then there is a risk that one
of the cpuset_zone_allowed() calls made from within the page allocator
(__alloc_pages) will find that the mems_generation of the current task
doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
Since it already held cpuset_sem, this deadlocks that task, and any
subsequent task wanting cpuset_sem.

==> The code paths leading to the kmalloc in check_for_release(), from
cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
fail to invoke refresh_mems() as required.

The fix is easy enough - add the requisite refresh_mems() call.

Unless someone is rapidly creating, modifying and destroying cpusets,
they are unlikely to have any chance of encountering this deadlock.
And even then, it is apparently difficult to do so.

In the case we got here from cpuset_exit(), we have already torn
down the tasks connection to this cpuset and current->cpuset is NULL.
Don't call refresh_mems() in that case - it oops the kernel.

Signed-off-by: Paul Jackson <[email protected]>

Index: linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
===================================================================
--- linux-2.6.13-mem_exclusive_oom.orig/kernel/cpuset.c
+++ linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
@@ -458,7 +458,10 @@ static void check_for_release(struct cpu
if (notify_on_release(cs) && atomic_read(&cs->count) == 0 &&
list_empty(&cs->children)) {
char *buf;
+ static void refresh_mems(void);

+ if (current->cpuset)
+ refresh_mems();
buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!buf)
return;

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2005-09-10 01:56:07

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

On Fri, 9 Sep 2005 17:44:03 -0700 (PDT) Paul Jackson wrote:

> Code reading uncovered a potential deadlock on the global cpuset
> semaphore, cpuset_sem.
>
> ==> This patch is only useful in the 2.6.13-stable series.

Paul,

stable patch candidates should be sent to [email protected]

Chris/Greg,
maybe that should be listed in MAINTAINERS as well as in
Documentation/stable_kernel_rules.txt (it's here already).


> (It's harmless, and useless, in the pre 2.6.14 fork)
>
> The pre-2.6.14 fork has already diverged, with an additional patch
> that further aggrevated this problem, and a more thorough overhaul
> of the cpuset locking, to fix the problems.
>
> All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
> grab cpuset_sem and then allocate memory _must_ call the routine
> 'refresh_mems()', after getting cpuset_sem, before any possible
> allocation.
>
> If this refresh_mems() call is not done, then there is a risk that one
> of the cpuset_zone_allowed() calls made from within the page allocator
> (__alloc_pages) will find that the mems_generation of the current task
> doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
> Since it already held cpuset_sem, this deadlocks that task, and any
> subsequent task wanting cpuset_sem.
>
> ==> The code paths leading to the kmalloc in check_for_release(), from
> cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
> fail to invoke refresh_mems() as required.
>
> The fix is easy enough - add the requisite refresh_mems() call.
>
> Unless someone is rapidly creating, modifying and destroying cpusets,
> they are unlikely to have any chance of encountering this deadlock.
> And even then, it is apparently difficult to do so.
>
> In the case we got here from cpuset_exit(), we have already torn
> down the tasks connection to this cpuset and current->cpuset is NULL.
> Don't call refresh_mems() in that case - it oops the kernel.
>
> Signed-off-by: Paul Jackson <[email protected]>
>
> Index: linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.13-mem_exclusive_oom.orig/kernel/cpuset.c
> +++ linux-2.6.13-mem_exclusive_oom/kernel/cpuset.c
> @@ -458,7 +458,10 @@ static void check_for_release(struct cpu
> if (notify_on_release(cs) && atomic_read(&cs->count) == 0 &&
> list_empty(&cs->children)) {
> char *buf;
> + static void refresh_mems(void);
>
> + if (current->cpuset)
> + refresh_mems();
> buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> if (!buf)
> return;
>

---
~Randy

2005-09-10 02:51:19

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

> Documentation/stable_kernel_rules.txt

Aha - that's the stable documentation that I missed. Thanks, Randy.

I figured that there was an explanation somewhere. And there it is,
right in plain view.

Reading ...

My patch does pass the following criteria:

- No "theoretical race condition" issues, unless an explanation of how
the race can be exploited.

The conditions to trigger the race are too delicate to exploit
quickly. I don't have a coded exploit.

Apparently I did the right thing by _not_ sending this patch to
[email protected] <grin>.

This patch is dead.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-09-10 03:02:08

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

Thanks Paul. As Randy mentioned, please send these to [email protected]
in the future.

* Paul Jackson ([email protected]) wrote:
> Code reading uncovered a potential deadlock on the global cpuset
> semaphore, cpuset_sem.

Another 'by inspection' patch, perhaps we'll need to update the stable
rules, since these can be quite valid fixes, yet typically trigger
review replies asking if it's necessary for -stable.

> ==> This patch is only useful in the 2.6.13-stable series.
>
> (It's harmless, and useless, in the pre 2.6.14 fork)
>
> The pre-2.6.14 fork has already diverged, with an additional patch
> that further aggrevated this problem, and a more thorough overhaul
> of the cpuset locking, to fix the problems.
>
> All code paths in kernel/cpuset.c (2.6.13 or earlier) that first
> grab cpuset_sem and then allocate memory _must_ call the routine
> 'refresh_mems()', after getting cpuset_sem, before any possible
> allocation.
>
> If this refresh_mems() call is not done, then there is a risk that one
> of the cpuset_zone_allowed() calls made from within the page allocator
> (__alloc_pages) will find that the mems_generation of the current task
> doesn't match that of its cpuset, causing it to try to grab cpuset_sem.
> Since it already held cpuset_sem, this deadlocks that task, and any
> subsequent task wanting cpuset_sem.
>
> ==> The code paths leading to the kmalloc in check_for_release(), from
> cpuset_exit, cpuset_rmdir and attach_task (for the detached cpuset),
> fail to invoke refresh_mems() as required.
>
> The fix is easy enough - add the requisite refresh_mems() call.
>
> Unless someone is rapidly creating, modifying and destroying cpusets,
> they are unlikely to have any chance of encountering this deadlock.
> And even then, it is apparently difficult to do so.

How unlikely? So unlikely that it's more a theoreitical race, or did
you find ways to trigger? If it's purely theoretical then it's not a
good candidiate for -stable.

> In the case we got here from cpuset_exit(), we have already torn
> down the tasks connection to this cpuset and current->cpuset is NULL.
> Don't call refresh_mems() in that case - it oops the kernel.

Is this one well-tested, since the fix diverges from upstream? And one
minor nit, let's just do a real forward declaration of refresh_mems()
instead of local to check_for_release().

thanks,
-chris

2005-09-10 03:20:45

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

Chris wrote:
> Another 'by inspection' patch, perhaps we'll need to update the stable
> rules, since these can be quite valid fixes, yet typically trigger
> review replies asking if it's necessary for -stable.

I'm scratching my head here, trying to figure out what is the
bottom line of this comment.

I'm guessing you're saying:

"By inspection" patches, unless they have something further
to recommend their inclusion, are not candidates for -stable.

But intent of your phrase "yet typically trigger review replies ..."
went right past me ...


> How unlikely? So unlikely that it's more a theoreitical race, or did
> you find ways to trigger?

I don't have a way to trigger it. My guess is that someday, some
customer will find the right combination of calls, and be able to
trigger this once every few hours or days. The odds are quite good
that 2.6.13.* will live out its life before that happens. When it
happens, it will be a customer doing some serious cpuset manipulations
on serious big iron.


> Is this one well-tested, since the fix diverges from upstream?

Yes - I have a stress test, and some custom kernel tracing, that
could see the conditions needed to trigger this occurring, just not
all simultaneously in the necessary small window of vulnerability.


> And one minor nit, let's just do a real forward declaration of
> refresh_mems() instead of local to check_for_release().

Normally, yes. In this case, I was coding to keep the patch as
localized as possible, and less to optimize the resulting organization
of the kernel/cpuset.c source file after the patch was applied.

Given that this patch is unlikely to ever have a life beyond a briefly
discussed patch today, I guessed right ;) Coding for clarity of the
patch, not of the source, was arguably the right choice this time.

Thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-09-10 03:29:24

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

* Paul Jackson ([email protected]) wrote:
> Chris wrote:
> > Another 'by inspection' patch, perhaps we'll need to update the stable
> > rules, since these can be quite valid fixes, yet typically trigger
> > review replies asking if it's necessary for -stable.
>
> I'm scratching my head here, trying to figure out what is the
> bottom line of this comment.
>
> I'm guessing you're saying:
>
> "By inspection" patches, unless they have something further
> to recommend their inclusion, are not candidates for -stable.

Yes.

> But intent of your phrase "yet typically trigger review replies ..."
> went right past me ...

Sorry, I was thinking outloud, it wasn't a direct comment on this patch.
During the -stable review period, patches like these usually get some
squawks. And there are cases where 'found by inspection' are valid.

> > How unlikely? So unlikely that it's more a theoreitical race, or did
> > you find ways to trigger?
>
> I don't have a way to trigger it. My guess is that someday, some
> customer will find the right combination of calls, and be able to
> trigger this once every few hours or days. The odds are quite good
> that 2.6.13.* will live out its life before that happens. When it
> happens, it will be a customer doing some serious cpuset manipulations
> on serious big iron.

OK, we can hold until you find a good case for triggering ;-)

thanks,
-chris

2005-09-10 03:41:04

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2.6.13-stable] cpuset semaphore double trip fix

Chris wrote:
> OK, we can hold until you find a good case for triggering ;-)

Agreed.

By the way, you or Randy mentioned something about leaving a mention of
-stable in the MAINTAINERS file.

That wouldn't have helped me this week, but if you had put the string:

Documentation/stable_kernel_rules.txt

somewhere in the opening text of the messages that announce stable
reviews, that would have saved me some time. I have in mind the
lkml messages such as:

From: Chris Wright <[email protected]>
To: [email protected], [email protected]
Subject: [PATCH 0/9] -stable review

This is the start of the stable review cycle for the 2.6.13.1 release.

Just a thought ...

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401