2006-10-12 02:58:07

by Vadim Lobanov

[permalink] [raw]
Subject: [PATCH] fdtable: Eradicate fdarray overflow.

Andrew,

If you want it, here is the "actual patch format" fix for the random kernel
bug issue that has been discovered. This patch is functionally identical to
the one you grabbed, but contains comments and sign-offs.

Fix the computation of the length of an allocated fdarray, when we decide to
grow the fdtable. The rationale behind this fix is as follows:
=> The 'nr' variable is the requested fd, so will be one less than the minimum
allowable fdarray size.
=> Due to the above fact, when we divide 'nr' by a fourth-of-a-page block, we
will always be exactly one block short of the size we need.
=> Incrementing before the division is wrong, because the division will discard
a non-zero modulo, possibly leaving us one fourth-of-a-page block short.

Signed-off-by: Vadim Lobanov <[email protected]>

diff -Npru old/fs/file.c new/fs/file.c
--- old/fs/file.c 2006-10-10 18:58:21.000000000 -0700
+++ new/fs/file.c 2006-10-11 19:37:23.000000000 -0700
@@ -164,9 +164,8 @@ static struct fdtable * alloc_fdtable(un
* the fdarray into page-sized chunks: starting at a quarter of a page,
* and growing in powers of two from there on.
*/
- nr++;
nr /= (PAGE_SIZE / 4 / sizeof(struct file *));
- nr = roundup_pow_of_two(nr);
+ nr = roundup_pow_of_two(nr + 1);
nr *= (PAGE_SIZE / 4 / sizeof(struct file *));
if (nr > NR_OPEN)
nr = NR_OPEN;


2006-10-12 05:19:18

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] fdtable: Eradicate fdarray overflow.

Vadim Lobanov a ?crit :
>
> diff -Npru old/fs/file.c new/fs/file.c
> --- old/fs/file.c 2006-10-10 18:58:21.000000000 -0700
> +++ new/fs/file.c 2006-10-11 19:37:23.000000000 -0700
> @@ -164,9 +164,8 @@ static struct fdtable * alloc_fdtable(un
> * the fdarray into page-sized chunks: starting at a quarter of a page,
> * and growing in powers of two from there on.
> */
> - nr++;
> nr /= (PAGE_SIZE / 4 / sizeof(struct file *));
> - nr = roundup_pow_of_two(nr);
> + nr = roundup_pow_of_two(nr + 1);
> nr *= (PAGE_SIZE / 4 / sizeof(struct file *));
> if (nr > NR_OPEN)
> nr = NR_OPEN;

Hi Vadim

I find your PAGE_SIZE/4 minimum allocation quite unjustified.

For architectures with 64K PAGE_SIZE, we endup allocating 16K, for poor tasks
that happen to touch a not so high (>= 64) file descriptor...

I would vote for a fixed size, like 1024

Eric

2006-10-12 06:07:40

by Vadim Lobanov

[permalink] [raw]
Subject: Re: [PATCH] fdtable: Eradicate fdarray overflow.

On Wednesday 11 October 2006 22:19, Eric Dumazet wrote:
> Hi Vadim
>
> I find your PAGE_SIZE/4 minimum allocation quite unjustified.
>
> For architectures with 64K PAGE_SIZE, we endup allocating 16K, for poor
> tasks that happen to touch a not so high (>= 64) file descriptor...
>
> I would vote for a fixed size, like 1024

In my opinion, always picking 1024 would be highly suboptimal for some
architectures (x86-64 in particular -- that's a whole page, just for the
fdarray!). If anything, I'd prefer something similar to this pseudo-code:

#define FDTABLE_MIN min_t(uint, PAGE_SIZE / 4 / sizeof(struct file *), 1024)
...
nr /= FDTABLE_MIN;
nr = roundup_pow_of_two(nr + 1);
nr *= FDTABLE_MIN;

gcc should be smart enough to optimize that expression into a single constant.
At least it did (version 4.1.0) in my quick test here.

> Eric

Let me know what you think. Please don't just go radio-silent on me. ;)

-- Vadim Lobanov

2006-10-12 06:32:41

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] fdtable: Eradicate fdarray overflow.

Vadim Lobanov a ?crit :
> On Wednesday 11 October 2006 22:19, Eric Dumazet wrote:
>> Hi Vadim
>>
>> I find your PAGE_SIZE/4 minimum allocation quite unjustified.
>>
>> For architectures with 64K PAGE_SIZE, we endup allocating 16K, for poor
>> tasks that happen to touch a not so high (>= 64) file descriptor...
>>
>> I would vote for a fixed size, like 1024
>
> In my opinion, always picking 1024 would be highly suboptimal for some
> architectures (x86-64 in particular -- that's a whole page, just for the
> fdarray!). If anything, I'd prefer something similar to this pseudo-code:

I was speaking of 1024 bytes.

I was the guy who made fdset going from PAGE_SIZE to 64 bytes (L1_CACHE_BYTES
if you dare), I wont be the guy responsible for a reverse path on fdtable :)

That is replace your (PAGE_SIZE/4) by 1024, wich was you probably meant
No archi has a smaller page, so no need to play with min_t() macro...

>
> #define FDTABLE_MIN min_t(uint, PAGE_SIZE / 4 / sizeof(struct file *), 1024)
> ...
> nr /= FDTABLE_MIN;
> nr = roundup_pow_of_two(nr + 1);
> nr *= FDTABLE_MIN;
>
> gcc should be smart enough to optimize that expression into a single constant.
> At least it did (version 4.1.0) in my quick test here.
>
>> Eric
>
> Let me know what you think. Please don't just go radio-silent on me. ;)
>

radio-silent ? well, it seems I already sent you many mails about your patches :)

Eric

2006-10-12 07:16:43

by Vadim Lobanov

[permalink] [raw]
Subject: Re: [PATCH] fdtable: Eradicate fdarray overflow.

On Wednesday 11 October 2006 23:32, Eric Dumazet wrote:
> Vadim Lobanov a ?crit :
> > On Wednesday 11 October 2006 22:19, Eric Dumazet wrote:
> >> Hi Vadim
> >>
> >> I find your PAGE_SIZE/4 minimum allocation quite unjustified.
> >>
> >> For architectures with 64K PAGE_SIZE, we endup allocating 16K, for poor
> >> tasks that happen to touch a not so high (>= 64) file descriptor...
> >>
> >> I would vote for a fixed size, like 1024
> >
> > In my opinion, always picking 1024 would be highly suboptimal for some
> > architectures (x86-64 in particular -- that's a whole page, just for the
> > fdarray!). If anything, I'd prefer something similar to this pseudo-code:
>
> I was speaking of 1024 bytes.

Oh, well, that does make a difference! :) I misread your email: I thought you
were suggesting 1024 fdarray entries, not 1024 bytes. I'd agree with you
completely then.

> That is replace your (PAGE_SIZE/4) by 1024, wich was you probably meant
> No archi has a smaller page, so no need to play with min_t() macro...

Good point. Code can then simply become:
nr /= (1024 / sizeof(struct file *));
nr = roundup_pow_of_two(nr + 1);
nr *= (1024 / sizeof(struct file *));

> > Let me know what you think. Please don't just go radio-silent on me. ;)
>
> radio-silent ? well, it seems I already sent you many mails about your
> patches :)

Your feedback is very much appreciated, believe me! :) Both you (for the
ideas) and akpm (for his assistance and seemingly-infinite patience, even
when I accidentally broke his tree (I owe him a few beers/other-goodies for
that)) have helped out tremendously with this.

So far, I have two comments from you that I need to take care of:
1) allocate at least L1_CACHE_BYTES for fdsets
2) change PAGE_SIZE/4 to 1024
Both are performance tweaks, not crash fixes, and are easy to do. Could you
please look through the rest of the code in fs/file.c and point out any other
issues or code tweaks you can see? My plan is to prepare patches for these
things and buffer them up, and when we're done tweaking, I'll forward the
whole batch on to akpm.

Then, once everything has settled down for a long while, I'll send out the
final fs/file.c cleanups patch -- mostly it introduces extensive comments
into that file about why the code does what it does.

> Eric

-- Vadim Lobanov