2009-12-23 00:33:25

by Smith, GeoffX

[permalink] [raw]
Subject: [PATCH] prctl: return MCE process flags through pointer

This patch fixes the semantics of prctl() option PR_MCE_KILL_GET
to pass the return value through *arg2.

With this change, the option now follows the same conventions as the
other "get" options added since 2.6.0, and also brings it into
conformance with the advice in chapter 16 of Documentation/CodingStyle.

This prctl() option was only added within the last month, so there are
not any production applications to break. This patch applies cleanly
to mainline and to 2.6.32.2 for backporting.

Signed-off-by: Geoff Smith <[email protected]>


diff --git a/kernel/sys.c b/kernel/sys.c
index 26a6b73..347021a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1570,13 +1570,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
error = 0;
break;
case PR_MCE_KILL_GET:
- if (arg2 | arg3 | arg4 | arg5)
+ if (arg3 | arg4 | arg5)
return -EINVAL;
if (current->flags & PF_MCE_PROCESS)
- error = (current->flags & PF_MCE_EARLY) ?
- PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE;
+ error = put_user(
+ (current->flags & PF_MCE_EARLY) ?
+ PR_MCE_KILL_EARLY : PR_MCE_KILL_LATE,
+ (unsigned long __user *)arg2);
else
- error = PR_MCE_KILL_DEFAULT;
+ error = put_user(PR_MCE_KILL_DEFAULT,
+ (unsigned long __user *)arg2);
break;
default:
error = -EINVAL;
return -EINVAL;


2009-12-23 01:15:05

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

"Smith, GeoffX" <[email protected]> writes:

> This patch fixes the semantics of prctl() option PR_MCE_KILL_GET
> to pass the return value through *arg2.
>
> With this change, the option now follows the same conventions as the
> other "get" options added since 2.6.0, and also brings it into
> conformance with the advice in chapter 16 of Documentation/CodingStyle.
>
> This prctl() option was only added within the last month, so there are
> not any production applications to break. This patch applies cleanly
> to mainline and to 2.6.32.2 for backporting.

It breaks the test suite, the man pages, qemu and one slide deck at least.

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-23 01:34:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

On Wed, 23 Dec 2009 02:14:51 +0100 Andi Kleen <[email protected]> wrote:

> "Smith, GeoffX" <[email protected]> writes:
>
> > This patch fixes the semantics of prctl() option PR_MCE_KILL_GET
> > to pass the return value through *arg2.
> >
> > With this change, the option now follows the same conventions as the
> > other "get" options added since 2.6.0, and also brings it into
> > conformance with the advice in chapter 16 of Documentation/CodingStyle.
> >
> > This prctl() option was only added within the last month, so there are
> > not any production applications to break. This patch applies cleanly
> > to mainline and to 2.6.32.2 for backporting.
>
> It breaks the test suite, the man pages, qemu and one slide deck at least.
>

Should've got it right the first time.

What goes wrong with qemu?

2009-12-23 09:52:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

On Tue, Dec 22, 2009 at 05:34:24PM -0800, Andrew Morton wrote:
> On Wed, 23 Dec 2009 02:14:51 +0100 Andi Kleen <[email protected]> wrote:
>
> > "Smith, GeoffX" <[email protected]> writes:
> >
> > > This patch fixes the semantics of prctl() option PR_MCE_KILL_GET
> > > to pass the return value through *arg2.
> > >
> > > With this change, the option now follows the same conventions as the
> > > other "get" options added since 2.6.0, and also brings it into
> > > conformance with the advice in chapter 16 of Documentation/CodingStyle.
> > >
> > > This prctl() option was only added within the last month, so there are
> > > not any production applications to break. This patch applies cleanly
> > > to mainline and to 2.6.32.2 for backporting.
> >
> > It breaks the test suite, the man pages, qemu and one slide deck at least.
> >
>
> Should've got it right the first time.

To be honest it's not fully clear to me what is "wrong" with returning
the return value with "return".

If there was a security bug or something maybe such a radical step
as changing a published API would be justified, but for this I'm sceptical.

>
> What goes wrong with qemu?

It's the main current user of this interface currently.

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-23 10:18:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

On Wed, 23 Dec 2009 10:52:23 +0100 Andi Kleen <[email protected]> wrote:

> On Tue, Dec 22, 2009 at 05:34:24PM -0800, Andrew Morton wrote:
> > On Wed, 23 Dec 2009 02:14:51 +0100 Andi Kleen <[email protected]> wrote:
> >
> > > "Smith, GeoffX" <[email protected]> writes:
> > >
> > > > This patch fixes the semantics of prctl() option PR_MCE_KILL_GET
> > > > to pass the return value through *arg2.
> > > >
> > > > With this change, the option now follows the same conventions as the
> > > > other "get" options added since 2.6.0, and also brings it into
> > > > conformance with the advice in chapter 16 of Documentation/CodingStyle.
> > > >
> > > > This prctl() option was only added within the last month, so there are
> > > > not any production applications to break. This patch applies cleanly
> > > > to mainline and to 2.6.32.2 for backporting.
> > >
> > > It breaks the test suite, the man pages, qemu and one slide deck at least.
> > >
> >
> > Should've got it right the first time.
>
> To be honest it's not fully clear to me what is "wrong" with returning
> the return value with "return".

Just a convention thing.

> If there was a security bug or something maybe such a radical step
> as changing a published API would be justified, but for this I'm sceptical.

hm, OK, shrug.

Regarding "prctl: return timerslack through pointer": are there any
known users of PR_GET_TIMERSLACK yet?

Why are task_struct.timer_slack_ns and
task_struct.default_timer_slack_ns unsigned long, btw? AFACIT we could
make them unsigned ints.

2009-12-23 17:56:12

by Smith, GeoffX

[permalink] [raw]
Subject: RE: [PATCH] prctl: return MCE process flags through pointer

On Wed, 23 Dec Andrew Morton wrote:
>On Wed, 23 Dec 2009 10:52:23 +0100 Andi Kleen <[email protected]> wrote:
>
>> On Tue, Dec 22, 2009 at 05:34:24PM -0800, Andrew Morton wrote:
>> > On Wed, 23 Dec 2009 02:14:51 +0100 Andi Kleen <[email protected]>
>wrote:
>> >
>> > > "Smith, GeoffX" <[email protected]> writes:
>> > >
>> > > > This patch fixes the semantics of prctl() option PR_MCE_KILL_GET
>> > > > to pass the return value through *arg2.
>> > > >
>> > > > With this change, the option now follows the same conventions as
>the
>> > > > other "get" options added since 2.6.0, and also brings it into
>> > > > conformance with the advice in chapter 16 of
>Documentation/CodingStyle.
>> > > >
>> > > > This prctl() option was only added within the last month, so there
>are
>> > > > not any production applications to break. This patch applies
>cleanly
>> > > > to mainline and to 2.6.32.2 for backporting.
>> > >
>> > > It breaks the test suite, the man pages, qemu and one slide deck at
>least.
>> > >
>> >
>> > Should've got it right the first time.
>>
>> To be honest it's not fully clear to me what is "wrong" with returning
>> the return value with "return".
>
>Just a convention thing.
>
>> If there was a security bug or something maybe such a radical step
>> as changing a published API would be justified, but for this I'm
>sceptical.
>
>hm, OK, shrug.
>

Sorry, I wasn't aware of any published API, PR_MCE_KILL and KILL_GET just got accepted into mainline. (And why aren't they called PR_SET/GET_MCE_KILL???)

I just noticed that these new options didn't follow the conventions of most prctl() options. There is only one option newer than 2.3.48 that uses return-as-function-result, and that one is essentially a no-op.

>Regarding "prctl: return timerslack through pointer": are there any
>known users of PR_GET_TIMERSLACK yet?

The feature is only 3 months old, so no. We are finding it beneficial in my group at Intel, which is the only reason I am looking at prctl().

>Why are task_struct.timer_slack_ns and
>task_struct.default_timer_slack_ns unsigned long, btw? AFACIT we could
>make them unsigned ints.

Timer slack is not a Boolean or enum, and we want the greatest range possible. (Actually, I'd like to talk Arjan into using the same time structure as select(), but that's another discussion.) Internally hrtimer uses unsigned long. I know long and unsigned long are the same on some architectures, but let's not introduce an unnatural restriction -- recall that arg2 is unsigned long.

2009-12-23 18:55:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

On Wed, 23 Dec 2009 09:56:04 -0800 "Smith, GeoffX" <[email protected]> wrote:

> >Why are task_struct.timer_slack_ns and
> >task_struct.default_timer_slack_ns unsigned long, btw? AFACIT we could
> >make them unsigned ints.
>
> Timer slack is not a Boolean or enum, and we want the greatest range possible. (Actually, I'd like to talk Arjan into using the same time structure as select(), but that's another discussion.) Internally hrtimer uses unsigned long. I know long and unsigned long are the same on some architectures, but let's not introduce an unnatural restriction -- recall that arg2 is unsigned long.

Using unsigned ints will reduce the size of the task_struct.

Is there any conceivable case for a timer_slack which exceeds four
seconds? If so, what is it, and if so, why this:

#define MAX_SLACK (100 * NSEC_PER_MSEC)

?

2009-12-23 19:31:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

> Sorry, I wasn't aware of any published API, PR_MCE_KILL and KILL_GET just got accepted into mainline.

An API is published as soon as the kernel including it is released.

>
> I just noticed that these new options didn't follow the conventions of most prctl() options. There is only one option newer than 2.3.48 that uses return-as-function-result, and that one is essentially a no-op.

We could have fixed it if you had mentioned that during code review before 2.6.32, but
now it's too late (at least unless it would have been fundamentally broken or something like
that, but I don't think that's the case)

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-23 20:24:34

by Smith, GeoffX

[permalink] [raw]
Subject: RE: [PATCH] prctl: return MCE process flags through pointer

Earlier today, Andrew Morton <[email protected]> wrote:
>>
>> >Why are task_struct.timer_slack_ns and
>> >task_struct.default_timer_slack_ns unsigned long, btw? AFACIT we could
>> >make them unsigned ints.
>>
>> Timer slack is not a Boolean or enum, and we want the greatest range
>possible. (Actually, I'd like to talk Arjan into using the same time
>structure as select(), but that's another discussion.) Internally hrtimer
>uses unsigned long. I know long and unsigned long are the same on some
>architectures, but let's not introduce an unnatural restriction -- recall
>that arg2 is unsigned long.
>
>Using unsigned ints will reduce the size of the task_struct.
>
>Is there any conceivable case for a timer_slack which exceeds four
>seconds? If so, what is it, and if so, why this:
>
>#define MAX_SLACK (100 * NSEC_PER_MSEC)
>
>

WRT to times longer than 4.2 sec -- So far my works suggests there is a good case, but it's moot because that is all I get with unsigned long on my platform. (Both int and long are 32-bit for me; x86)

WRT MAX_SLACK, select() always imposes a little slack (for non-real-time threads). The name MAX_SLACK is not actually the absolute maximum, it is only the maximum that the kernel will impose by default.

2009-12-25 08:28:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

On Wed, 23 Dec 2009 02:18:07 -0800
Andrew Morton <[email protected]> wrote:
>
> Regarding "prctl: return timerslack through pointer": are there any
> known users of PR_GET_TIMERSLACK yet?

I only have a SET_ user, not a GET_; I added the GET_ for API symmetry.

>
> Why are task_struct.timer_slack_ns and
> task_struct.default_timer_slack_ns unsigned long, btw? AFACIT we
> could make them unsigned ints.

I can't remember any particular reason at this point.



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-12-25 08:32:42

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] prctl: return MCE process flags through pointer

On Wed, 23 Dec 2009 10:54:50 -0800
Andrew Morton <[email protected]> wrote:

> On Wed, 23 Dec 2009 09:56:04 -0800 "Smith, GeoffX"
> <[email protected]> wrote:
>
> > >Why are task_struct.timer_slack_ns and
> > >task_struct.default_timer_slack_ns unsigned long, btw? AFACIT we
> > >could make them unsigned ints.
> >
> > Timer slack is not a Boolean or enum, and we want the greatest
> > range possible. (Actually, I'd like to talk Arjan into using the
> > same time structure as select(), but that's another discussion.)
> > Internally hrtimer uses unsigned long. I know long and unsigned
> > long are the same on some architectures, but let's not introduce an
> > unnatural restriction -- recall that arg2 is unsigned long.
>
> Using unsigned ints will reduce the size of the task_struct.
>
> Is there any conceivable case for a timer_slack which exceeds four
> seconds?

the largest I've seen asked in practice is 2 seconds.
But even at 100 msec or more you're talking about very special
applications; rounding up timers otherwise really impacts the
functionality of the app in a bad way.


> If so, what is it, and if so, why this:
>
> #define MAX_SLACK (100 * NSEC_PER_MSEC)

this is the max for "automatic slack"

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org