2012-02-07 00:15:04

by Josh Hunt

[permalink] [raw]
Subject: [RFC PATCH] poll() in 32-bit applications does not handle timeout of -1 properly on 64-bit kernels

We've hit an issue where our 32-bit applications, when running on a
64-bit kernel, using poll() and passing in a value of -1 for the timeout
return after ~49 days (2^32 msec). Instead of waiting indefinitely as it
is stated they should. Reproducing the issue is trivial. I've
instrumented the kernel and found we are hitting the case where poll()
believes we've passed in a positive number and thus creates a timespec,
etc. Currently poll() is defined in userspace as:

int poll(struct pollfd *ufds, nfds_t nfds, int timeout);

but in the kernel timeout is of type long.

I can think of a few ways to solve this. One, which is the patch I've
attached, is to change the type of timeout to int in the kernel. I'm not
certain the ramifications this may have since it's changing a syscall's
arguments which may be a big no-no :) Another way I am proposing is by
bounds checking. Currently we do the following:

if (timeout_msecs >= 0) {
to = &end_time;
poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC,
NSEC_PER_MSEC * (timeout_msecs % MSEC_PER_SEC));
}

We could add an upper bound on timeout_msecs to say < 0xffffffff. I'm
not sure if either is acceptable though.

Josh



Attachments:
poll-timeout-try1.patch (1.64 kB)

2012-02-07 00:38:22

by Al Viro

[permalink] [raw]
Subject: Re: [RFC PATCH] poll() in 32-bit applications does not handle timeout of -1 properly on 64-bit kernels

On Mon, Feb 06, 2012 at 06:05:30PM -0600, Josh Hunt wrote:
> We've hit an issue where our 32-bit applications, when running on a
> 64-bit kernel, using poll() and passing in a value of -1 for the timeout
> return after ~49 days (2^32 msec). Instead of waiting indefinitely as it
> is stated they should. Reproducing the issue is trivial. I've
> instrumented the kernel and found we are hitting the case where poll()
> believes we've passed in a positive number and thus creates a timespec,
> etc. Currently poll() is defined in userspace as:
>
> int poll(struct pollfd *ufds, nfds_t nfds, int timeout);
>
> but in the kernel timeout is of type long.
>
> I can think of a few ways to solve this. One, which is the patch I've
> attached, is to change the type of timeout to int in the kernel. I'm not
> certain the ramifications this may have since it's changing a syscall's
> arguments which may be a big no-no :) Another way I am proposing is by
> bounds checking. Currently we do the following:
>
> if (timeout_msecs >= 0) {
> to = &end_time;
> poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC,
> NSEC_PER_MSEC * (timeout_msecs % MSEC_PER_SEC));
> }
>
> We could add an upper bound on timeout_msecs to say < 0xffffffff. I'm
> not sure if either is acceptable though.

Or just add compat_sys_poll() with that argument being int and have it call
sys_poll(). The value will be sign-extended...

2012-02-07 17:52:01

by Josh Hunt

[permalink] [raw]
Subject: Re: [RFC PATCH] poll() in 32-bit applications does not handle timeout of -1 properly on 64-bit kernels

On 02/06/2012 06:38 PM, Al Viro wrote:
> On Mon, Feb 06, 2012 at 06:05:30PM -0600, Josh Hunt wrote:
>> We've hit an issue where our 32-bit applications, when running on a
>> 64-bit kernel, using poll() and passing in a value of -1 for the timeout
>> return after ~49 days (2^32 msec). Instead of waiting indefinitely as it
>> is stated they should. Reproducing the issue is trivial. I've
>> instrumented the kernel and found we are hitting the case where poll()
>> believes we've passed in a positive number and thus creates a timespec,
>> etc. Currently poll() is defined in userspace as:
>>
>> int poll(struct pollfd *ufds, nfds_t nfds, int timeout);
>>
>> but in the kernel timeout is of type long.
>>
>> I can think of a few ways to solve this. One, which is the patch I've
>> attached, is to change the type of timeout to int in the kernel. I'm not
>> certain the ramifications this may have since it's changing a syscall's
>> arguments which may be a big no-no :) Another way I am proposing is by
>> bounds checking. Currently we do the following:
>>
>> if (timeout_msecs >= 0) {
>> to = &end_time;
>> poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC,
>> NSEC_PER_MSEC * (timeout_msecs % MSEC_PER_SEC));
>> }
>>
>> We could add an upper bound on timeout_msecs to say < 0xffffffff. I'm
>> not sure if either is acceptable though.
>
> Or just add compat_sys_poll() with that argument being int and have it call
> sys_poll(). The value will be sign-extended...

Al

I've implemented what you suggested by adding compat_sys_poll() with
an int argument for timeout allowing it to do the sign extension. I
wanted to point out there was an almost identical patch submitted last
year, which appears to have gotten lost in the wash:
https://lkml.org/lkml/2011/9/18/19

I am guessing there are other architectures affected by this bug. This
patch only fixes x86.

Josh


Attachments:
compat-sys-poll.patch (2.28 kB)