I posted this to the newsgroup news:linux-kernel but got no response...
I'll try here.
While going thru the kernel code for 2.4.17 (uClinux patch) I noticed a
problem with functions in 3 files:
dcache_init in fs/dcache.c, inode_init in fs/inode.c and in mnt_init in
fs/namespace.c
At least on my Redhat 7.3 2.4.18-10 source, in the function
static void __init dcache_init(unsigned long mempages)
{
struct list_head *d;
unsigned long order;
unsigned int nr_hash;
int i;
/*
* A constructor could be added for stable state like the lists,
* but it is probably not worth it because of the cache nature
* of the dcache.
* If fragmentation is too bad then the SLAB_HWCACHE_ALIGN
* flag could be removed here, to hint to the allocator that
* it should not try to get multiple page regions.
*/
dentry_cache = kmem_cache_create("dentry_cache",
sizeof(struct dentry),
0,
SLAB_HWCACHE_ALIGN,
NULL, NULL);
if (!dentry_cache)
panic("Cannot create dentry cache");
#if PAGE_SHIFT < 13
mempages >>= (13 - PAGE_SHIFT);
#endif
mempages *= sizeof(struct list_head);
for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
;
do {
u32 tmp;
nr_hash = (1UL << order) * PAGE_SIZE /
sizeof(struct list_head);
d_hash_mask = (nr_hash - 1);
tmp = nr_hash;
d_hash_shift = 0;
while ((tmp >>= 1UL) != 0UL)
d_hash_shift++;
dentry_hashtable = (struct list_head *)
__get_free_pages(GFP_ATOMIC, order);
} while (dentry_hashtable == NULL && --order >= 0);
........... Notice the --order >=0 in the do while test... since order
is declared as an unsigned long, this test is a pointless comparison and
never fails, causing an infinite loop if the hashtable is empty.
My fix to this is to change the test to have
while (dentry_hashtable == NULL && order-- != 0);
Could someone check me on this?
On 27 Aug 2002 at 17:23, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 17:24:19 -0700
>
> ........... Notice the --order >=0 in the do while test... since order is declared as an unsigned
> long, this test is a pointless comparison and never fails, causing an infinite loop if the
> hashtable is empty.
>
> My fix to this is to change the test to have
> while (dentry_hashtable == NULL && order-- != 0);
>
> Could someone check me on this?
>
> Your analysis is right but the fix is wrong, we do want to
> try order == 0 on very small memory systems, and you should
> not decrement the order at the beginning of the loop. We
> want to use the "order" calculated.
>
> Just make 'order' signed long to fix this bug.
>
NO! That won't work either. This is a "do while" loop so the first test is always done and if
order is 0, the check will be done AFTER the decrement, so this works. Changing it to a signed
long loses you a bit.
From: "Stephen C. Biggs" <[email protected]>
Date: Tue, 27 Aug 2002 17:32:58 -0700
This is a "do while" loop so the first test is always done
No, a "do while" loop always executes the first iteration.
What version of the C language do you think we are using?
On 27 Aug 2002 at 17:29, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 17:32:58 -0700
>
> This is a "do while" loop so the first test is always done
>
> No, a "do while" loop always executes the first iteration.
> What version of the C language do you think we are using?
>
You're misunderstanding me, I meant that the first test is done AFTER the first iteration is
executed, so my fix is correct since, even if order is 0 because at least one iteration of the loop
is done, and the post decrement makes sure that the test succeeds if order was 0 going into the
loop.
Changing order to a signed long loses half of the cache entries due to the number being divided by
2.
From: "Stephen C. Biggs" <[email protected]>
Date: Tue, 27 Aug 2002 17:40:41 -0700
You're misunderstanding me, I meant that the first test is done
AFTER the first iteration is executed, so my fix is correct
since, even if order is 0 because at least one iteration of the
loop is done, and the post decrement makes sure that the test
succeeds if order was 0 going into the loop.
Order is always >= 0 when we enter the loop.
If we actually get the table allocated then the decrement of 'order'
is not executed if we allocate the table successfully.
I don't understand what the problem is with my fix.
On 27 Aug 2002 at 17:42, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 17:40:41 -0700
>
> You're misunderstanding me, I meant that the first test is done
> AFTER the first iteration is executed, so my fix is correct
> since, even if order is 0 because at least one iteration of the
> loop is done, and the post decrement makes sure that the test
> succeeds if order was 0 going into the loop.
>
> Order is always >= 0 when we enter the loop.
>
> If we actually get the table allocated then the decrement of 'order'
> is not executed if we allocate the table successfully.
>
> I don't understand what the problem is with my fix.
>
[Trying again - my mail client is messing up/sending empty posts]
As I look at it, it can only be 0 <= order < 32, so your fix works, as mine does. Yours is simpler
and doesn't involve changing code, just declarations.
There need to be some sanity checks in this code: what if mempages is passed as some insanely huge
number, e.g.
From: "Stephen C. Biggs" <[email protected]>
Date: Tue, 27 Aug 2002 18:09:46 -0700
There need to be some sanity checks in this code: what if mempages is passed as some insanely huge
number, e.g.
And then your mail ends.... let us know when you've fixed
your email client, this isn't rocket science :-)
On 27 Aug 2002 at 18:09, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 18:09:46 -0700
>
> There need to be some sanity checks in this code: what if mempages is passed as some insanely huge
> number, e.g.
>
> And then your mail ends.... let us know when you've fixed
> your email client, this isn't rocket science :-)
>
What are you talking about "And then your mail ends..." That's all I wanted to say...
Oh, you want me to post a complete patch.... I get it.
From: "Stephen C. Biggs" <[email protected]>
Date: Tue, 27 Aug 2002 18:24:13 -0700
On 27 Aug 2002 at 18:09, David S. Miller wrote:
> And then your mail ends.... let us know when you've fixed
> your email client, this isn't rocket science :-)
>
What are you talking about "And then your mail ends..." That's all I wanted to say...
No, I thought you were going to give an example of a huge number.
:-)
On 27 Aug 2002 at 18:23, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 18:24:13 -0700
>
> On 27 Aug 2002 at 18:09, David S. Miller wrote:
>
> > And then your mail ends.... let us know when you've fixed
> > your email client, this isn't rocket science :-)
> >
>
> What are you talking about "And then your mail ends..." That's all I wanted to say...
>
> No, I thought you were going to give an example of a huge number.
> :-)
>
How about (unsigned long)(~0)?
This code never finishes:
#include <stdio.h>
#define PAGE_SHIFT 12
struct list_head {
struct list_head *next, *prev;
};
int main(void)
{
unsigned long order;
unsigned long mempages = ~0;
#if PAGE_SHIFT < 13
mempages >>= (13 - PAGE_SHIFT);
#endif
mempages *= sizeof(struct list_head);
for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
;
printf("%d\n",order);
return 0;
}
On 27 Aug 2002 at 18:23, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 18:24:13 -0700
>
> On 27 Aug 2002 at 18:09, David S. Miller wrote:
>
> > And then your mail ends.... let us know when you've fixed
> > your email client, this isn't rocket science :-)
> >
>
> What are you talking about "And then your mail ends..." That's all I wanted to say...
>
> No, I thought you were going to give an example of a huge number.
> :-)
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Argh. I meant to post THIS code:
#include <stdio.h>
#define PAGE_SHIFT 12
struct list_head {
struct list_head *next, *prev;
};
int main(void)
{
unsigned long order;
unsigned long mempages = ~0;
#if PAGE_SHIFT < 13
mempages >>= (13 - PAGE_SHIFT);
#endif
mempages *= sizeof(struct list_head);
printf("mempages: %lu\n", mempages);
for (order = 0; ((1UL << order) << PAGE_SHIFT) < mempages; order++)
;
printf("%lu\n",order);
return 0;
}
From: "Stephen C. Biggs" <[email protected]>
Date: Tue, 27 Aug 2002 19:42:36 -0700
How about (unsigned long)(~0)?
Realistically possible with any known configuration?
On Wed, 2002-08-28 at 05:39, David S. Miller wrote:
> From: "Stephen C. Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 19:42:36 -0700
>
> How about (unsigned long)(~0)?
>
> Realistically possible with any known configuration?
How about replacing the loop with ffs/__ffs that would be more elegant,
shorter and avoid any problem or doubt?
From: Luca Barbieri <[email protected]>
Date: 28 Aug 2002 05:57:50 +0200
On Wed, 2002-08-28 at 05:39, David S. Miller wrote:
> Realistically possible with any known configuration?
How about replacing the loop with ffs/__ffs that would be more
elegant, shorter and avoid any problem or doubt?
This is getting rediculious. You can pursue this further
if you want, but I think we've wasted enough time with
this non-bug.
Who is getting bootup failures due to this problem?
Answer: nobody
On Wed, 2002-08-28 at 05:58, David S. Miller wrote:
> From: Luca Barbieri <[email protected]>
> Date: 28 Aug 2002 05:57:50 +0200
>
> On Wed, 2002-08-28 at 05:39, David S. Miller wrote:
> > Realistically possible with any known configuration?
>
> How about replacing the loop with ffs/__ffs that would be more
> elegant, shorter and avoid any problem or doubt?
>
> This is getting rediculious. You can pursue this further
> if you want, but I think we've wasted enough time with
> this non-bug.
>
> Who is getting bootup failures due to this problem?
> Answer: nobody
I'm not saying that it's serious bug, just that using __ffs is more
appropriate than reimplementing it incorrectly and inefficiently.
From: Luca Barbieri <[email protected]>
Date: 28 Aug 2002 06:10:39 +0200
I'm not saying that it's serious bug, just that using __ffs is more
appropriate than reimplementing it incorrectly and inefficiently.
ffs won't find the smallest power of 2 >= some_arbitrary_value.
That is what this code is doing.
I've personally wasted enough typing on this thread, so enough....
On 27 Aug 2002 at 20:39, David S. Miller wrote:
> How about (unsigned long)(~0)?
>
> Realistically possible with any known configuration?
>
You tell me. You're saying a billion pages (((unsigned long)(~0)) >> 2) also crashes) is never
going to be realistically possible? Sounds like Bill Gates when he said (and I don't know the word-
for-word quote) "Who's ever going to need more than 640K??" What if we get into 64 bit addressing?
What if there is some sort of bug that passes all 1s on the stack for just this one instance?
Never could "realistically" happen? Yeah, right; I've seen weirder things than that.
It's a question of mandatory paranoid sanity checking in an OS wherever possible. Linux is trying
to be known as robust. Are you saying that a supposedly robust kernel should have a chance to
crash in an infinite loop during initialization because there isn't code doing input validation
when there isn't an optimization or speed issue?
On Wed, 2002-08-28 at 06:07, David S. Miller wrote:
> From: Luca Barbieri <[email protected]>
> Date: 28 Aug 2002 06:10:39 +0200
>
> I'm not saying that it's serious bug, just that using __ffs is more
> appropriate than reimplementing it incorrectly and inefficiently.
>
> ffs won't find the smallest power of 2 >= some_arbitrary_value.
> That is what this code is doing.
Yes you are right, fls should be used there, not ffs.
From: "Stephen Biggs" <[email protected]>
Date: Tue, 27 Aug 2002 21:29:06 -0700
You tell me. You're saying a billion pages (((unsigned long)(~0)) >> 2) also crashes) is never
going to be realistically possible?
On a 32-bit system? No. x86 cpus are architectually limited
to 64GB of memory, shift that right by PAGE_SIZE (13) and we're
still within bounds.
Larger memory will be then be found on 64-bit systems, and hey
then the limit becomes significantly higher.
So that is exactly what I am saying, it is not realistically
possible.
Followup to: <[email protected]>
By author: "David S. Miller" <[email protected]>
In newsgroup: linux.dev.kernel
>
> From: "Stephen Biggs" <[email protected]>
> Date: Tue, 27 Aug 2002 21:29:06 -0700
>
> You tell me. You're saying a billion pages (((unsigned long)(~0)) >> 2) also crashes) is never
> going to be realistically possible?
>
> On a 32-bit system? No. x86 cpus are architectually limited
> to 64GB of memory, shift that right by PAGE_SIZE (13) and we're
> still within bounds. ^^^^^^^^^^^^^^
>
PAGE_SHIFT, 12.
The maximum possible page count of any known 32-bit system (64 GB @ 4K
for x86) is 2^24 pages.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>