2003-06-15 12:45:24

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Fix undefined/miscompiled construct in kernel parameters


The parse_args call in init/main.c does pointer arithmetic between two
different external symbols. This is undefined in C (only pointer arthmetic
in the same symbol is defined) and gets miscompiled on AMD64 with gcc 3.2,
resulting in a triple fault at boot when an driver with new style
parameters is compiled in. In my case it was triggered by oprofile.

This patch works around this by passing the end pointer directly
to the low level function and comparing it there. Strictly this
is still undefined, but should be ok because gcc has to handle these
pointers in another function correctly.

It is also possible to use an empty asm() with dummy input/out to work
around this, but I didn't do that for now.

The construct may appear in other cases too, but I didn't see any
miscompilation so far.

Please apply,

-Andi

diff -burpN -X ../KDIFX linux/include/linux/moduleparam.h linux-2.5.71-amd64/include/linux/moduleparam.h
--- linux/include/linux/moduleparam.h 2003-05-27 03:00:39.000000000 +0200
+++ linux-2.5.71-amd64/include/linux/moduleparam.h 2003-06-15 14:49:45.000000000 +0200
@@ -67,7 +67,7 @@ struct kparam_string {
extern int parse_args(const char *name,
char *args,
struct kernel_param *params,
- unsigned num,
+ struct kernel_param *end,
int (*unknown)(char *param, char *val));

/* All the helper functions */
diff -burpN -X ../KDIFX linux/init/main.c linux-2.5.71-amd64/init/main.c
--- linux/init/main.c 2003-06-14 23:43:06.000000000 +0200
+++ linux-2.5.71-amd64/init/main.c 2003-06-15 13:16:41.000000000 +0200
@@ -404,7 +404,7 @@ asmlinkage void __init start_kernel(void
page_alloc_init();
printk("Kernel command line: %s\n", saved_command_line);
parse_args("Booting kernel", command_line, &__start___param,
- &__stop___param - &__start___param,
+ &__stop___param,
&unknown_bootoption);
trap_init();
rcu_init();
diff -burpN -X ../KDIFX linux/kernel/module.c linux-2.5.71-amd64/kernel/module.c
--- linux/kernel/module.c 2003-06-14 23:43:06.000000000 +0200
+++ linux-2.5.71-amd64/kernel/module.c 2003-06-15 13:18:49.000000000 +0200
@@ -929,7 +929,7 @@ static int obsolete_params(const char *n
kp[i].arg = &obsparm[i];
}

- ret = parse_args(name, args, kp, num, NULL);
+ ret = parse_args(name, args, kp, &kp[num], NULL);
out:
kfree(kp);
return ret;
@@ -1622,11 +1622,11 @@ static struct module *load_module(void _
(char *)sechdrs[strindex].sh_addr);
} else {
/* Size of section 0 is 0, so this works well if no params */
- err = parse_args(mod->name, mod->args,
- (struct kernel_param *)
- sechdrs[setupindex].sh_addr,
- sechdrs[setupindex].sh_size
- / sizeof(struct kernel_param),
+ struct kernel_param *parm = (struct kernel_param *)
+ sechdrs[setupindex].sh_addr;
+ err = parse_args(mod->name, mod->args, parm,
+ &parm[sechdrs[setupindex].sh_size
+ / sizeof(struct kernel_param)],
NULL);
}
if (err < 0)
diff -burpN -X ../KDIFX linux/kernel/params.c linux-2.5.71-amd64/kernel/params.c
--- linux/kernel/params.c 2003-05-27 03:00:46.000000000 +0200
+++ linux-2.5.71-amd64/kernel/params.c 2003-06-15 13:39:16.000000000 +0200
@@ -46,13 +46,13 @@ static inline int parameq(const char *in
static int parse_one(char *param,
char *val,
struct kernel_param *params,
- unsigned num_params,
+ struct kernel_param *end,
int (*handle_unknown)(char *param, char *val))
{
unsigned int i;

/* Find parameter */
- for (i = 0; i < num_params; i++) {
+ for (i = 0; &params[i] < end; i++) {
if (parameq(param, params[i].name)) {
DEBUGP("They are equal! Calling %p\n",
params[i].set);
@@ -109,7 +109,7 @@ static char *next_arg(char *args, char *
int parse_args(const char *name,
char *args,
struct kernel_param *params,
- unsigned num,
+ struct kernel_param *end,
int (*unknown)(char *param, char *val))
{
char *param, *val;
@@ -120,7 +120,7 @@ int parse_args(const char *name,
int ret;

args = next_arg(args, &param, &val);
- ret = parse_one(param, val, params, num, unknown);
+ ret = parse_one(param, val, params, end, unknown);
switch (ret) {
case -ENOENT:
printk(KERN_ERR "%s: Unknown parameter `%s'\n",



2003-06-15 16:39:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters


On Sun, 15 Jun 2003, Andi Kleen wrote:
>
> The parse_args call in init/main.c does pointer arithmetic between two
> different external symbols. This is undefined in C (only pointer arthmetic
> in the same symbol is defined) and gets miscompiled on AMD64 with gcc 3.2,

That's silly. You're making the code less readable, with some silly
parameters. Why does it get miscompiled on amd64, and let's just fix that
one.

Linus

2003-06-15 17:06:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

Linus Torvalds <[email protected]> writes:

> On Sun, 15 Jun 2003, Andi Kleen wrote:
>>
>> The parse_args call in init/main.c does pointer arithmetic between two
>> different external symbols. This is undefined in C (only pointer arthmetic
>> in the same symbol is defined) and gets miscompiled on AMD64 with gcc 3.2,
>
> That's silly. You're making the code less readable, with some silly
> parameters. Why does it get miscompiled on amd64, and let's just fix that
> one.

Because &arbitary_symbol_a - &arbitary_symbol_b is undefined in C and
the amd64 gcc 3.2 choses to miscompile it (it results in a very big
number because it converts the 56/40 division to an inversed multiplication
in a wrong way). I actually wrote a compiler bug report first, but the
compiler developers rightly pointed out that it is undefined.

Note this is not the first time we had such problems, it happened
with __pa(&symbol) too (amd64 has a special macro to work around that).
I don't know why it happens more often on amd64 than on i386, perhaps
it has something to do with the RIP relative addressing or the
negative kernel code model. ppc seems to have similar problems.

Passing an end pointer is not too different from passing an
max index in my opinion.

Another way would be to use the anonymous asm trick proposed by rth
some time ago for a similar problem (asm("" : "=g" (output) : "0" (input))
to hide the symbols from the compiler), but I didn't do that because
it looked more ugly to me.

If you prefer to do it with the anonymous asm I can do it, but I don't
see a big problem with passing an end pointer.

-Andi

2003-06-15 17:11:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters


On Sun, 15 Jun 2003, Andi Kleen wrote:
>
> Because &arbitary_symbol_a - &arbitary_symbol_b is undefined in C and
> the amd64 gcc 3.2 choses to miscompile it (it results in a very big
> number because it converts the 56/40 division to an inversed multiplication
> in a wrong way). I actually wrote a compiler bug report first, but the
> compiler developers rightly pointed out that it is undefined.

They are not arbitrary symbols. They are symbols in the same data
structure, set up by the linker script. Gcc doesn't know that, but the
fact that gcc doesn't know doesn't mean that gcc should be lazy and
doesn't really excuse buggy code.

The gcc developers you talked to are picking their legalistic noses, and
it's sad that this isn't exactly the first time it has happened.

Linus

2003-06-15 17:19:10

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

> They are not arbitrary symbols. They are symbols in the same data
> structure, set up by the linker script. Gcc doesn't know that, but the
> fact that gcc doesn't know doesn't mean that gcc should be lazy and
> doesn't really excuse buggy code.
>
> The gcc developers you talked to are picking their legalistic noses, and
> it's sad that this isn't exactly the first time it has happened.

I tend to agree, feel free to flame them. But it doesn't help me right now
when I want to get a booting kernel. Could you merge that change or if you
prefer I can rewrite it to anonymous asm (but it will be probably more ugly).
I just need some workaround.

Thanks.

-Andi

2003-06-15 17:35:31

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

Hi,

On 15 Jun 2003, Andi Kleen wrote:

> I tend to agree, feel free to flame them. But it doesn't help me right now
> when I want to get a booting kernel. Could you merge that change or if you
> prefer I can rewrite it to anonymous asm (but it will be probably more ugly).
> I just need some workaround.

Does the patch below work better?

bye, Roman

--- linux/init/main.c 14 Jun 2003 23:01:48 -0000 1.1.1.41
+++ linux/init/main.c 15 Jun 2003 17:46:16 -0000
@@ -383,7 +383,7 @@ asmlinkage void __init start_kernel(void
{
char * command_line;
extern char saved_command_line[];
- extern struct kernel_param __start___param, __stop___param;
+ extern struct kernel_param __start___param[], __stop___param[];
/*
* Interrupts are still disabled. Do necessary setups, then
* enable them
@@ -403,8 +403,8 @@ asmlinkage void __init start_kernel(void
build_all_zonelists();
page_alloc_init();
printk("Kernel command line: %s\n", saved_command_line);
- parse_args("Booting kernel", command_line, &__start___param,
- &__stop___param - &__start___param,
+ parse_args("Booting kernel", command_line, __start___param,
+ __stop___param - __start___param,
&unknown_bootoption);
trap_init();
rcu_init();


2003-06-15 18:14:38

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

On Sun, Jun 15, 2003 at 07:48:56PM +0200, Roman Zippel wrote:
> Hi,
>
> On 15 Jun 2003, Andi Kleen wrote:
>
> > I tend to agree, feel free to flame them. But it doesn't help me right now
> > when I want to get a booting kernel. Could you merge that change or if you
> > prefer I can rewrite it to anonymous asm (but it will be probably more ugly).
> > I just need some workaround.
>
> Does the patch below work better?

Nope, generates exactly the same code.

-Andi

2003-06-15 22:42:49

by Richard Henderson

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

On Sun, Jun 15, 2003 at 07:48:56PM +0200, Roman Zippel wrote:
> Does the patch below work better?
>
> bye, Roman
>
> --- linux/init/main.c 14 Jun 2003 23:01:48 -0000 1.1.1.41
> +++ linux/init/main.c 15 Jun 2003 17:46:16 -0000
> @@ -383,7 +383,7 @@ asmlinkage void __init start_kernel(void
> {
> char * command_line;
> extern char saved_command_line[];
> - extern struct kernel_param __start___param, __stop___param;
> + extern struct kernel_param __start___param[], __stop___param[];
> /*
> * Interrupts are still disabled. Do necessary setups, then
> * enable them
> @@ -403,8 +403,8 @@ asmlinkage void __init start_kernel(void
> build_all_zonelists();
> page_alloc_init();
> printk("Kernel command line: %s\n", saved_command_line);
> - parse_args("Booting kernel", command_line, &__start___param,
> - &__stop___param - &__start___param,
> + parse_args("Booting kernel", command_line, __start___param,
> + __stop___param - __start___param,
> &unknown_bootoption);
> trap_init();
> rcu_init();

Linus, I'd REALLY prefer this patch be applied, even though
the problem turned out to be one of amd64 alignment, which
can be worked around.

Even if struct kernel_param doesn't suffer from .sdata problems,
this formulation is closer to Correct. I'd really prefer that
all such linker-script generated arrays used the [] form, and
not worry about the size of the data object involved.


r~

2003-06-16 00:11:16

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

In message <[email protected]> you wri
te:
>
> On Sun, 15 Jun 2003, Andi Kleen wrote:
> >
> > The parse_args call in init/main.c does pointer arithmetic between two
> > different external symbols. This is undefined in C (only pointer arthmetic
> > in the same symbol is defined) and gets miscompiled on AMD64 with gcc 3.2,
>
> That's silly. You're making the code less readable, with some silly
> parameters. Why does it get miscompiled on amd64, and let's just fix that
> one.

AFAICT, Roman's fix is correct; Richard admonished me in the past for
such code, IIRC, but this one slipped through.

Since Andi reports that even that doesn't work for x86-64, I'd say
apply this patch based on his: it's an arbitrary change anyway.

The compiler should be fixed, too, but if this is the only compiler
bug which effects the kernel, we should consider ourselves lucky.

(Andi: your module.c patch was a little convoluted).
Cheers,
Rusty.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.71-bk1/include/linux/moduleparam.h working-2.5.71-bk1-param/include/linux/moduleparam.h
--- linux-2.5.71-bk1/include/linux/moduleparam.h 2003-02-07 19:20:01.000000000 +1100
+++ working-2.5.71-bk1-param/include/linux/moduleparam.h 2003-06-16 10:18:25.000000000 +1000
@@ -67,7 +67,7 @@ struct kparam_string {
extern int parse_args(const char *name,
char *args,
struct kernel_param *params,
- unsigned num,
+ struct kernel_param *end,
int (*unknown)(char *param, char *val));

/* All the helper functions */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.71-bk1/init/main.c working-2.5.71-bk1-param/init/main.c
--- linux-2.5.71-bk1/init/main.c 2003-06-15 11:30:10.000000000 +1000
+++ working-2.5.71-bk1-param/init/main.c 2003-06-16 10:18:25.000000000 +1000
@@ -404,7 +404,7 @@ asmlinkage void __init start_kernel(void
page_alloc_init();
printk("Kernel command line: %s\n", saved_command_line);
parse_args("Booting kernel", command_line, &__start___param,
- &__stop___param - &__start___param,
+ &__stop___param,
&unknown_bootoption);
trap_init();
rcu_init();
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.71-bk1/kernel/module.c working-2.5.71-bk1-param/kernel/module.c
--- linux-2.5.71-bk1/kernel/module.c 2003-06-15 11:30:11.000000000 +1000
+++ working-2.5.71-bk1-param/kernel/module.c 2003-06-16 10:19:53.000000000 +1000
@@ -929,7 +929,7 @@ static int obsolete_params(const char *n
kp[i].arg = &obsparm[i];
}

- ret = parse_args(name, args, kp, num, NULL);
+ ret = parse_args(name, args, kp, kp + num, NULL);
out:
kfree(kp);
return ret;
@@ -1623,10 +1623,9 @@ static struct module *load_module(void _
} else {
/* Size of section 0 is 0, so this works well if no params */
err = parse_args(mod->name, mod->args,
- (struct kernel_param *)
- sechdrs[setupindex].sh_addr,
- sechdrs[setupindex].sh_size
- / sizeof(struct kernel_param),
+ (void *)sechdrs[setupindex].sh_addr,
+ (void *)sechdrs[setupindex].sh_addr
+ + sechdrs[setupindex].sh_size,
NULL);
}
if (err < 0)
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.71-bk1/kernel/params.c working-2.5.71-bk1-param/kernel/params.c
--- linux-2.5.71-bk1/kernel/params.c 2003-02-11 14:26:20.000000000 +1100
+++ working-2.5.71-bk1-param/kernel/params.c 2003-06-16 10:18:25.000000000 +1000
@@ -46,13 +46,13 @@ static inline int parameq(const char *in
static int parse_one(char *param,
char *val,
struct kernel_param *params,
- unsigned num_params,
+ struct kernel_param *end,
int (*handle_unknown)(char *param, char *val))
{
unsigned int i;

/* Find parameter */
- for (i = 0; i < num_params; i++) {
+ for (i = 0; &params[i] < end; i++) {
if (parameq(param, params[i].name)) {
DEBUGP("They are equal! Calling %p\n",
params[i].set);
@@ -109,7 +109,7 @@ static char *next_arg(char *args, char *
int parse_args(const char *name,
char *args,
struct kernel_param *params,
- unsigned num,
+ struct kernel_param *end,
int (*unknown)(char *param, char *val))
{
char *param, *val;
@@ -120,7 +120,7 @@ int parse_args(const char *name,
int ret;

args = next_arg(args, &param, &val);
- ret = parse_one(param, val, params, num, unknown);
+ ret = parse_one(param, val, params, end, unknown);
switch (ret) {
case -ENOENT:
printk(KERN_ERR "%s: Unknown parameter `%s'\n",


--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2003-06-16 00:36:13

by Richard Henderson

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

On Mon, Jun 16, 2003 at 10:23:41AM +1000, Rusty Russell wrote:
> Since Andi reports that even that doesn't work for x86-64, I'd say
> apply this patch based on his: it's an arbitrary change anyway.

No, Andi located the *real* problem. The compiler was over-aligning
these objects, which added padding, which broke the array semantics
you were looking for. The solution is to add an attribute aligned;
he's sent a patch to Linus already.


r~

2003-06-16 02:31:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters


On Mon, 16 Jun 2003, Rusty Russell wrote:
>
> AFAICT, Roman's fix is correct; Richard admonished me in the past for
> such code, IIRC, but this one slipped through.

Roman's fix is fine, but the fact is, the original code was also fine.
Yes, the C standard has all these rules about "within objects" for pointer
differences, but the "objects" themselves can come from outside the
compiler. As they did in this case.

(Yeah, I could see the compiler warning about cases it suspects might be
separate objects, but the end result should still be the right one).

In general, I accept _local_ uglifications to work around compiler
problems. But I do not accept non-local stuff like making for ugly calling
conventions etc, which is why Andi's original patch was not acceptable to
me.

It turns out that the real bug was somewhere in the tool chain, and the
linker should either honor alignment requirements or warn about them when
it cannot. I suspect in this case the alignment requirement wasn't
properly passed down the chain somewhere, I dunno. The problem is fixed,
but for future reference please keep this in mind when working around
compiler problems.

If worst comes to worst, we'll have notes about certain compiler versions
just not working. It's certainly happened before.

Linus

2003-06-16 06:37:12

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined/miscompiled construct in kernel parameters

In message <[email protected]> you write:
> On Mon, Jun 16, 2003 at 10:23:41AM +1000, Rusty Russell wrote:
> > Since Andi reports that even that doesn't work for x86-64, I'd say
> > apply this patch based on his: it's an arbitrary change anyway.
>
> No, Andi located the *real* problem. The compiler was over-aligning
> these objects, which added padding, which broke the array semantics
> you were looking for. The solution is to add an attribute aligned;
> he's sent a patch to Linus already.

Thanks for the explanation. Sorry I missed it.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.