2004-11-02 20:11:40

by Chris Friesen

[permalink] [raw]
Subject: question on common error-handling idiom


There's something I've been wondering about for a while. There is a lot of code
in linux that looks something like this:


err = -ERRORCODE
if (error condition)
goto out;


While nice to read, it would seem that it might be more efficient to do the
following:

if (error condition) {
err = -ERRORCODE;
goto out;
}


Is there any particular reason why the former is preferred? Is the compiler
smart enough to optimize away the additional write in the non-error path?

Chris


2004-11-02 21:02:02

by Jan Engelhardt

[permalink] [raw]
Subject: Re: question on common error-handling idiom

>There's something I've been wondering about for a while. There is a lot of code
>in linux that looks something like this:
>
>err = -ERRORCODE
>if (error condition)
> goto out;

That's because there might something as:

err = -EPERM;
if(error) { goto out; }
do something;
if(error2) { goto out; }
do something more;
if(error3) { goto out; }

Is shorter than:

if(error) { err = -EPERM; goto out; }
do something;
if(error2) { err = -EPERM; goto out; }
do something more;
if(error3) { err = -EPERM; goto out; }


>Is there any particular reason why the former is preferred? Is the compiler

To keep it short. Because it might have been worse than just err =xxx:

if(error) {
do this and that;
and more;
even more;
more more;
goto out;
}

Repeating that over and over is not that good. So we wrap it a little bit to do
a "staircase" deinitialization:

err = -EPERM;
if(error) { goto this_didnot_work; }
...
err = -ENOSPC;
if(error) { goto that_didnot_work; }


this_didnot_work:
all uninitializations needed

that_didnot_work:
all other uninit's

return err;


So to summarize, it's done to reduce code whilst keeping the error code around
until we actually leave the function.


My € 0.02!


Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-02 21:08:45

by Jesper Juhl

[permalink] [raw]
Subject: Re: question on common error-handling idiom

On Tue, 2 Nov 2004, Chris Friesen wrote:

>
> There's something I've been wondering about for a while. There is a lot of
> code in linux that looks something like this:
>
>
> err = -ERRORCODE
> if (error condition)
> goto out;
>
>
> While nice to read, it would seem that it might be more efficient to do the
> following:
>
> if (error condition) {
> err = -ERRORCODE;
> goto out;
> }
>
>
> Is there any particular reason why the former is preferred? Is the compiler
> smart enough to optimize away the additional write in the non-error path?
>
There are some places that do

err = -SOMEERROR;
if (some_error)
goto out;
if (some_other_error)
goto out;
if (another_error)
goto out;

In that case, where there are several different conditions that need
testing, but they all need to return the same error, setting the error
just once seems the best approach.

but for the places that do

err = -SOMEERROR;
if (condition)
goto out;

err = -OTHERERROR;
if (condition)
goto out;

I would tend to agree with you that moving the setting of the error inside
the if() would make sense.

Let's see what other people think :)


--
Jesper Juhl

2004-11-02 21:14:12

by Russell Miller

[permalink] [raw]
Subject: Re: question on common error-handling idiom

On Tuesday 02 November 2004 14:58, Jan Engelhardt wrote:

> So to summarize, it's done to reduce code whilst keeping the error code
> around until we actually leave the function.
>
I understand what you're saying, the OP did raise a point that I think is
worth repeating, that it's an extra instruction in all but error cases. Is
that extra instruction worth the tradeoff?

--Russell

>
> My € 0.02!
>
>
> Jan Engelhardt

--

Russell Miller - [email protected] - Le Mars, IA
Duskglow Consulting - Helping companies just like you to succeed for ~ 10 yrs.
http://www.duskglow.com - 712-546-5886

2004-11-02 21:15:00

by linux-os

[permalink] [raw]
Subject: Re: question on common error-handling idiom

On Tue, 2 Nov 2004, Jan Engelhardt wrote:

>> There's something I've been wondering about for a while. There is a lot of code
>> in linux that looks something like this:
>>
>> err = -ERRORCODE
>> if (error condition)
>> goto out;
>
> That's because there might something as:
>
> err = -EPERM;
> if(error) { goto out; }
> do something;
> if(error2) { goto out; }
> do something more;
> if(error3) { goto out; }
>
> Is shorter than:
>
> if(error) { err = -EPERM; goto out; }
> do something;
> if(error2) { err = -EPERM; goto out; }
> do something more;
> if(error3) { err = -EPERM; goto out; }
>
>
>> Is there any particular reason why the former is preferred? Is the compiler
>
> To keep it short. Because it might have been worse than just err =xxx:
>
> if(error) {
> do this and that;
> and more;
> even more;
> more more;
> goto out;
> }
>
> Repeating that over and over is not that good. So we wrap it a little bit to do
> a "staircase" deinitialization:
>
> err = -EPERM;
> if(error) { goto this_didnot_work; }
> ...
> err = -ENOSPC;
> if(error) { goto that_didnot_work; }
>
>
> this_didnot_work:
> all uninitializations needed
>
> that_didnot_work:
> all other uninit's
>
> return err;
>
>
> So to summarize, it's done to reduce code whilst keeping the error code around
> until we actually leave the function.
>
>
> My ?? 0.02!
>
>
> Jan Engelhardt
> --
> Gesellschaft f??r Wissenschaftliche Datenverarbeitung
> Am Fassberg, 37077 G??ttingen, http://www.gwdg.de

I think it's just to get around the "uninitialized variable"
warning when the 'C' compiler doesn't know that it will
always be initialized.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.

2004-11-02 21:22:09

by Oliver Neukum

[permalink] [raw]
Subject: Re: question on common error-handling idiom


> but for the places that do
>
> err = -SOMEERROR;
> if (condition)
> goto out;
>
> err = -OTHERERROR;
> if (condition)
> goto out;
>
> I would tend to agree with you that moving the setting of the error inside
> the if() would make sense.

The intention is to allow the compiler to turn the if into a simple
conditional branch on the assumption that gcc is not smart enough
to put the if's body out of line.

Regards
Oliver

2004-11-02 21:39:06

by Jan Engelhardt

[permalink] [raw]
Subject: Re: question on common error-handling idiom

>There are some places that do
>
>err = -SOMEERROR;
>if (some_error)
> goto out;
>if (some_other_error)
> goto out;
>if (another_error)
> goto out;
>
>Let's see what other people think :)

err = -ESOME;
if(some_error || some_other_error || another_error) {
goto out;
}

Best.

Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-03 00:59:46

by Jesper Juhl

[permalink] [raw]
Subject: Re: question on common error-handling idiom

On Tue, 2 Nov 2004, Jan Engelhardt wrote:

> >There are some places that do
> >
> >err = -SOMEERROR;
> >if (some_error)
> > goto out;
> >if (some_other_error)
> > goto out;
> >if (another_error)
> > goto out;
> >
> >Let's see what other people think :)
>
> err = -ESOME;
> if(some_error || some_other_error || another_error) {
> goto out;
> }
>
> Best.
>
Agreed, but that would potentially make something like the following (from
fs/binfmt_elf.c::load_elf_binary) quite unreadable :

if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN)
goto out;
if (!elf_check_arch(&loc->elf_ex))
goto out;
if (!bprm->file->f_op||!bprm->file->f_op->mmap)
goto out;


But that's not even the most interresting case, the interresting one is
the one that does

err = -ERR;
if (foo)
goto out;

err = -ERROR;
if (bar)
goto out;

where there's the potential to save an instruction by moving the err= into
the error path.
But as Oliver Neukum pointed out gcc may not be smart enough for that to
actually generate the best code.

Has anyone taken a look at what recent gcc's actually do with different
variations of the constructs mentioned in this thread?


--
Jesper Juhl


2004-11-03 08:05:42

by GNicz

[permalink] [raw]
Subject: Re: question on common error-handling idiom

Some code parts is using diffirent error handling.

if(err1) {
clenup1;
return -ERR1;
}

...

if(err2) {
cleanup2;
return -ERR2;
}

Shouldn't it be converted into:

err = -ERR1;
if(err1)
goto cleanup1;

...

err = -ERR2;
if(err2)
goto cleanup2;

I think there should be one methond, which will be used in all kernel code.

Regards, GNicz

2004-11-03 10:48:49

by Ross Kendall Axe

[permalink] [raw]
Subject: Re: question on common error-handling idiom

linux-os wrote:

>>> There's something I've been wondering about for a while. There is a
>>> lot of code
>>> in linux that looks something like this:
>>>
>>> err = -ERRORCODE
>>> if (error condition)
>>> goto out;
>>
>
>
> I think it's just to get around the "uninitialized variable"
> warning when the 'C' compiler doesn't know that it will
> always be initialized.
>

gcc is smart enough to get this case right.

Ross


Attachments:
signature.asc (256.00 B)
OpenPGP digital signature

2004-11-03 16:53:09

by Chris Friesen

[permalink] [raw]
Subject: Re: question on common error-handling idiom

Jesper Juhl wrote:

> Has anyone taken a look at what recent gcc's actually do with different
> variations of the constructs mentioned in this thread?

I did, out of curiosity:

I used the following (admittedly simplistic) code, compiled with -O2.

int bbbb(int a)
{
int err = -5;
if (a == 1)
goto out;
err=0;
out:
return err;
}

int cccc(int a)
{
int err=0;
if (a == 1) {
err = -5;
goto out;
}
out:
return err;
}


With gcc 3.2.2 for x86, both constructs generated the same code:

pushl %ebp
movl %esp, %ebp
xorl %eax, %eax
cmpl $1, 8(%ebp)
setne %al
leal -5(%eax,%eax,4), %eax
leave
ret

With gcc 2.96 (Mandrake) however, the standard construct generated this:


pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl $-5, -4(%ebp)
cmpl $1, 8(%ebp)
jne .L3
jmp .L4
.p2align 4,,7
.L3:
movl $0, -4(%ebp)
.L4:
movl -4(%ebp), %eax
movl %eax, %eax
movl %ebp, %esp
popl %ebp
ret



While moving the err setting into the conditional generates the following:

pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl $0, -4(%ebp)
cmpl $1, 8(%ebp)
jne .L6
movl $-5, -4(%ebp)
jmp .L7
.p2align 4,,7
.L6:
nop
.L7:
movl -4(%ebp), %eax
movl %eax, %eax
movl %ebp, %esp
popl %ebp
ret


For PPC, gcc 3.3.3, the standard construct gave:

xori 3,3,1
addic 3,3,-1
subfe 3,3,3
rlwinm 3,3,0,30,28
blr

While moving the err setting into the conditional generates the following:

xori 3,3,1
srawi 0,3,31
xor 3,0,3
subf 3,3,0
srawi 3,3,31
andi. 3,3,5
addi 3,3,-5
blr


So, it looks like the standard construct can actually generate better code in
some cases, its almost never worse, and it's certainly nicer to read.

Chris

2004-11-04 19:58:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: question on common error-handling idiom



On Tue, 2 Nov 2004, Chris Friesen wrote:
>
> While nice to read, it would seem that it might be more efficient to do the
> following:
>
> if (error condition) {
> err = -ERRORCODE;
> goto out;
> }
>
>
> Is there any particular reason why the former is preferred? Is the compiler
> smart enough to optimize away the additional write in the non-error path?

Quite often, the additional write is _faster_ in the non-error path.

If it's a plain register, the obvious code generation for the above is

..
testl error-condition
je no_error
movl $-XXX,%eax
jmp out;
no_error:
...

which is a lot slower (for the common non-error case) than


movl $-XXX,%eax
testl error-condition
jne out

because forward branches are usually predicted not-taken when no other
prediction exists.

You can do the same thing with

if (unlikely(error condition)) {
err = -ERRORCODE;
goto out;
}

which is hopefully even better, but the fact is, the regular

err = -ERRORCODE;
if (error condition)
goto out;

is just _smaller_ and simpler and quite often more readable anyway.

It has the added advantage that it tends to stylistically match what I
consider the proper error return behaviour, ie

err = function_returns_errno(...)
if (err)
goto out;

ie it looks syntactically identical when "error condition" and the error
value happen to be one and the same. Which is, after all, _the_ most
common case.

So I personally tend to prefer the simple format for several reasons.
It's small, it's consistent, and it maps well to good code generation.

The code generation part ends up being nice when something goes wrong.
When somebody sends in an oops, I often end up having to look at the
disassembly (and no, a fancy debugger wouldn't help - I'm talking about
the disassembly of the "code" portion of the oops itself, and matching it
up with the source tree - the oops doesn't come with the whole binary),
and then having code generation match the source makes things a _lot_
easier.

Does it make sense for other projects? Dunno. But it is, as you noted, a
common idiom in the kernel. Getting used to it just makes it even more
readable.

Linus