There are many places in the kernel where a function checks whether a
pointers it has been given is NULL. Now sometimes this makes perfect
sense because the function's description explicitly says that a NULL
pointer argument is valid. But in many, many cases (maybe even the
majority) it is nothing more than paranoia: the pointer can never be NULL
in a properly functioning system.
Should these checks be made? I claim they should not.
Suppose everything is working correctly and the pointer never is NULL.
Then it really doesn't matter whether you check or not; the loss in code
speed and size is completely negligible (except maybe deep in some inner
loop). But there is a loss in code clarity; when I see a check like that
it makes me think, "What's that doing there? Can that pointer ever be
NULL, or is someone just being paranoid?" Distractions of that sort don't
help when trying to read code.
On the other hand, what if on rare occasions the pointer actually is NULL,
even though it's not supposed to be? This can only be the result of an
error somewhere else in the kernel (such as incorrect locking during a
data structure update). Detecting the NULL pointer and returning an error
code will hide the existence of the true underlying error. But if the
check _isn't_ made, then as soon as the pointer is derefenced there will
be a nice big segfault. This will immediately alert people to the
existence of a problem, something they otherwise might not be aware of at
all.
Ultimately this comes down to a question of style and taste. This
particular issue is not addressed in Documentation/CodingStyle so I'm
raising it here. My personal preference is for code that means what it
says; if a pointer is checked it should be because there is a genuine
possibility that the pointer _is_ NULL. I see no reason for pure
paranoia, particularly if it's not commented as such.
Comments, anyone?
Alan Stern
Alan Stern wrote:
[snip]
> Ultimately this comes down to a question of style and taste. This
> particular issue is not addressed in Documentation/CodingStyle so I'm
> raising it here. My personal preference is for code that means what it
> says; if a pointer is checked it should be because there is a genuine
> possibility that the pointer _is_ NULL. I see no reason for pure
> paranoia, particularly if it's not commented as such.
>
> Comments, anyone?
BUG_ON() perhaps?
Eli
--------------------. "If it ain't broke now,
Eli Carter \ it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------
Alan Stern wrote:
> On the other hand, what if on rare occasions the pointer actually is NULL,
> even though it's not supposed to be? This can only be the result of an
> error somewhere else in the kernel (such as incorrect locking during a
> data structure update). Detecting the NULL pointer and returning an error
> code will hide the existence of the true underlying error. But if the
> check _isn't_ made, then as soon as the pointer is derefenced there will
> be a nice big segfault. This will immediately alert people to the
> existence of a problem, something they otherwise might not be aware of at
> all.
The problem is IMHO code where some pretty fragile things are handled,
especially file systems. I'd say: DO the paranoia checks if some fragile
things are involved like key structures of the file system that can take
_permanent_ damage. If you check for a NULL pointer you still have the
chance to properly leave the system in a consistent state and no user
will be happy if his filesystem goes messy just because someone saved a
check to have nicer code, even if the original of the NULL pointer
wasn't his fault, even if it's a developing version. So if the check
isn't a total performace disaster, do it whenever permanent damage could
occur.
On other sections where, let's say just a user memory allocation would
crash, checks could be ommitted, because it isn't that fatal and leaves
no permanent destruction.
Just my opinion :)
TriPhoenix
Followup to: <[email protected]>
By author: Dennis Bliefernicht <[email protected]>
In newsgroup: linux.dev.kernel
>
> The problem is IMHO code where some pretty fragile things are handled,
> especially file systems. I'd say: DO the paranoia checks if some fragile
> things are involved like key structures of the file system that can take
> _permanent_ damage. If you check for a NULL pointer you still have the
> chance to properly leave the system in a consistent state and no user
> will be happy if his filesystem goes messy just because someone saved a
> check to have nicer code, even if the original of the NULL pointer
> wasn't his fault, even if it's a developing version. So if the check
> isn't a total performace disaster, do it whenever permanent damage could
> occur.
>
Actually, you have it somewhat backwards.
In most cases, checking for NULL pointers (and returning an error
whatnot) is actually *more* likely to cause permanent damage than
having the kernel bomb out. At least with the kernel bombing out you
won't keep grinding on a filesystem for which your kernel
datastructures are bad. This is *IMPORTANT*.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Followup to: <[email protected]>
By author: Eli Carter <[email protected]>
In newsgroup: linux.dev.kernel
>
> Alan Stern wrote:
> [snip]
> > Ultimately this comes down to a question of style and taste. This
> > particular issue is not addressed in Documentation/CodingStyle so I'm
> > raising it here. My personal preference is for code that means what it
> > says; if a pointer is checked it should be because there is a genuine
> > possibility that the pointer _is_ NULL. I see no reason for pure
> > paranoia, particularly if it's not commented as such.
> >
> > Comments, anyone?
>
> BUG_ON() perhaps?
>
BUG_ON() is largely redundant if you would have a null pointer
reference anyway.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
On Thu, Jul 10, 2003 at 03:13:52PM -0700, H. Peter Anvin wrote:
> Followup to: <[email protected]>
> By author: Dennis Bliefernicht <[email protected]>
> In newsgroup: linux.dev.kernel
> >
> > The problem is IMHO code where some pretty fragile things are handled,
> > especially file systems. I'd say: DO the paranoia checks if some fragile
> > things are involved like key structures of the file system that can take
> > _permanent_ damage. If you check for a NULL pointer you still have the
> > chance to properly leave the system in a consistent state and no user
> > will be happy if his filesystem goes messy just because someone saved a
> > check to have nicer code, even if the original of the NULL pointer
> > wasn't his fault, even if it's a developing version. So if the check
> > isn't a total performace disaster, do it whenever permanent damage could
> > occur.
> >
>
> Actually, you have it somewhat backwards.
>
> In most cases, checking for NULL pointers (and returning an error
> whatnot) is actually *more* likely to cause permanent damage than
> having the kernel bomb out. At least with the kernel bombing out you
> won't keep grinding on a filesystem for which your kernel
> datastructures are bad. This is *IMPORTANT*.
In BK, we have a READ_ONLY flag on each revision history file. Whenever
we get into a state where we don't understand what's going on, we set that
flag. That flag is checked in the code path which writes the file and it
will simply refuse the write the file if the flag is set.
Seems like the same idea would work here. I can imagine a lot of use for
a file system which remounts itself as read only the second it sees a
problem it can't handle. At least you can poke around and try and figure
out what is going on.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
There is an old saying in software design:
"Never check for an error condition that you do not know how to handle."
In other words: if you have identified a possible error condition (such
as a NULL pointer), until you have identified a way to meaningfully
handle that error condition, simply testing for it is useless.
Now, if you have some function that can return an error code, then
testing for NULL and returning an error condition is sensible. But if
you have no way to report the error, then what good is the test?
However, if you test for NULL, and log a report when you detect it then
deref it anyway (to force an OOPS, in other words throw an exception),
then at least there is some meaningful info in the log.
But just doing something like
void foo(void *ptr)
{
if (!ptr)
return;
....
}
just masks the problem.
On Thu, 10 Jul 2003, Eli Carter wrote:
> Alan Stern wrote:
> [snip]
> > Ultimately this comes down to a question of style and taste. This
> > particular issue is not addressed in Documentation/CodingStyle so I'm
> > raising it here. My personal preference is for code that means what it
> > says; if a pointer is checked it should be because there is a genuine
> > possibility that the pointer _is_ NULL. I see no reason for pure
> > paranoia, particularly if it's not commented as such.
> >
> > Comments, anyone?
>
> BUG_ON() perhaps?
Not really needed, since a segfault will produce almost as much
information as a BUG_ON(). Certainly it will produce enough to let a
developer know that the pointer was NULL.
Alan Stern>
On Thursday, Jul 10, 2003, at 17:54 US/Central, David D. Hagood wrote:
>
> Now, if you have some function that can return an error code, then
> testing for NULL and returning an error condition is sensible. But if
> you have no way to report the error, then what good is the test?
Then you add the test, fix your interface to be able to report the
error, and update callers as necessary... if your code can fail, you
should be able to report it.
When writing a new function you think returns void, seriously consider
having it return success instead.
--
Hollis Blanchard
IBM Linux Technology Center
> There is an old saying in software design:
>
> "Never check for an error condition that you do not know how
> to handle."
>
> In other words: if you have identified a possible error
> condition (such as a NULL pointer), until you have identified a way to
meaningfully
> handle that error condition, simply testing for it is useless.
> Now, if you have some function that can return an error code, then
> testing for NULL and returning an error condition is sensible. But if
> you have no way to report the error, then what good is the test?
Not always true. In some cases you know how to handle: just return
without doing anyting.
man 3 free
It's an example that passing a NULL is allowed by the API.
> However, if you test for NULL, and log a report when you
> detect it then
> deref it anyway (to force an OOPS, in other words throw an
> exception),
> then at least there is some meaningful info in the log.
>
> But just doing something like
>
> void foo(void *ptr)
> {
> if (!ptr)
> return;
>
> ....
> }
>
> just masks the problem.
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Hua Zhong wrote:
> Not always true. In some cases you know how to handle: just return
> without doing anyting.
That is NOT an error condition - the API specifically allows NULL to be
passed in, and specifically states that no action will be taken in that
case.
But consider the following code:
sscanf(0,0);
That IS an error condition - both the string to scan and the format
string are NULL. In this case sscanf should return EITHER 0 (no items
matched) or better still -1 (error).
As others have said - ideally, if you have any doubt about a new
function you are writing being able to succeed, you should write it to
return a success report.
However, my whole point was that simply checking for null and doing
nothing when null was not a valid value was violating the rule of "don't
check if you don't know how to handle".
Alan Stern wrote:
> On Thu, 10 Jul 2003, Eli Carter wrote:
>
>
>>Alan Stern wrote:
>>[snip]
>>
>>>Ultimately this comes down to a question of style and taste. This
>>>particular issue is not addressed in Documentation/CodingStyle so I'm
>>>raising it here. My personal preference is for code that means what it
>>>says; if a pointer is checked it should be because there is a genuine
>>>possibility that the pointer _is_ NULL. I see no reason for pure
>>>paranoia, particularly if it's not commented as such.
>>>
>>>Comments, anyone?
>>
>>BUG_ON() perhaps?
>
>
> Not really needed, since a segfault will produce almost as much
> information as a BUG_ON(). Certainly it will produce enough to let a
> developer know that the pointer was NULL.
Your first message said, "I see no reason for pure paranoia,
particularly if it's not commented as such." A BUG_ON() call makes it
clear that the condition should never happen. Dereferencing a NULL
leaves the question of whether NULL is an unhandled case or invalid
input. BUG_ON() is an explicit paranoia check, and with a bit of
preprocessing magic, you could compile out all of those checks.
So it documents invalid input conditions, allows you to eliminate the
checks in the name of speed or your personal preference, or use them to
help with debugging/testing.
Eli
--------------------. "If it ain't broke now,
Eli Carter \ it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------
On Fri, 11 Jul 2003, David D. Hagood wrote:
> Hua Zhong wrote:
>
> > Not always true. In some cases you know how to handle: just return
> > without doing anyting.
>
> That is NOT an error condition - the API specifically allows NULL to be
> passed in, and specifically states that no action will be taken in that
> case.
>
> But consider the following code:
>
> sscanf(0,0);
>
> That IS an error condition - both the string to scan and the format
> string are NULL. In this case sscanf should return EITHER 0 (no items
> matched) or better still -1 (error).
>
But it does neither. Instead, it seg-faults your code!
The problem lies with the original question. The question
referred to "Style" (look at the subject-line). It is
not a question about style, but a question about utility
and design. Style has nothing to do with it.
If you are writing code for an embedded system, the code
must always run even if RAM got trashed from alpha particles
or EMP. In that case, you trap every possible condition and
force a restart off a hardware timer, refreshing everything in
RAM from PROM or NVRAM.
If you are writing code to examine the contents of sys_errlist[],
prior to adding another error-code, then you don't check anything
and it's file-name is probably a.out, compiled from xxx.c.
So, the extent to which one checks for exceptions and provides
utility for handling exceptions depends upon the code design, not
it's style.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
On Fri, 11 Jul 2003, Eli Carter wrote:
> > Not really needed, since a segfault will produce almost as much
> > information as a BUG_ON(). Certainly it will produce enough to let a
> > developer know that the pointer was NULL.
>
> Your first message said, "I see no reason for pure paranoia,
> particularly if it's not commented as such." A BUG_ON() call makes it
> clear that the condition should never happen. Dereferencing a NULL
> leaves the question of whether NULL is an unhandled case or invalid
> input. BUG_ON() is an explicit paranoia check, and with a bit of
> preprocessing magic, you could compile out all of those checks.
>
> So it documents invalid input conditions, allows you to eliminate the
> checks in the name of speed or your personal preference, or use them to
> help with debugging/testing.
Okay, that makes sense. Particularly the debugging and testing part. And
for an excellent example of _documented_ paranoia, see the source to
schedule_timeout().
But if you look very far through the kernel sources you will see many
occurrences of code similar to this:
static void release(struct xxx *ptr)
{
if (!ptr)
return;
...
I can't see any reason for keeping something like that.
Alan Stern
On Fri, 11 Jul 2003, Richard B. Johnson wrote:
> On Fri, 11 Jul 2003, David D. Hagood wrote:
>
> > But consider the following code:
> >
> > sscanf(0,0);
> >
> > That IS an error condition - both the string to scan and the format
> > string are NULL. In this case sscanf should return EITHER 0 (no items
> > matched) or better still -1 (error).
> >
>
> But it does neither. Instead, it seg-faults your code!
>
> The problem lies with the original question. The question
> referred to "Style" (look at the subject-line). It is
> not a question about style, but a question about utility
> and design. Style has nothing to do with it.
This isn't really an example of the sort of thing I was asking about. I
was referring specifically to kernel code, not general-purpose userspace C
library routines like sscanf.
But maybe you mean the kernel's internal implementation of sscanf. If
someone in the kernel did write "sscanf(0,0);" then that would definitely
be an error in the source. It should be brought to the author's attention
as soon as possible, and a segfault (or BUG_ON) is a much more effective
way of doing so than simply returning -1.
> If you are writing code for an embedded system, the code
> must always run even if RAM got trashed from alpha particles
> or EMP. In that case, you trap every possible condition and
> force a restart off a hardware timer, refreshing everything in
> RAM from PROM or NVRAM.
Okay, but there's no way at the source level to protect against all kinds
of events like alpha particles changing a stored value. And what's the
point in checking just _some_ of them in the source if you're going to
have to check _all_ of them at a lower (hardware) level anyway?
> If you are writing code to examine the contents of sys_errlist[],
> prior to adding another error-code, then you don't check anything
> and it's file-name is probably a.out, compiled from xxx.c.
I don't understand that comment.
> So, the extent to which one checks for exceptions and provides
> utility for handling exceptions depends upon the code design, not
> it's style.
But the kernel already provides a utility for handling exceptions and has
a fairly fixed design. The extent to which one writes exception checks of
dubious value is indeed a stylistic issue, at least in this one setting.
Alan Stern
Alan Stern <[email protected]> said:
[...]
> Suppose everything is working correctly and the pointer never is NULL.
> Then it really doesn't matter whether you check or not; the loss in code
> speed and size is completely negligible (except maybe deep in some inner
> loop). But there is a loss in code clarity; when I see a check like that
> it makes me think, "What's that doing there? Can that pointer ever be
> NULL, or is someone just being paranoid?" Distractions of that sort don't
> help when trying to read code.
My personal paranoia when reading code goes the other way: How can I be
sure it won?t ever be NULL? Maybe it can't be now (and to find that out an
hour grepping around goes by), but the very next patch introduces the
possibility. Better have the function do an extra check, or make sure
somehow the assumption won't _ever_ be violated. But that is a large (huge,
even) cost, so...
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Followup to: <[email protected]>
By author: Horst von Brand <[email protected]>
In newsgroup: linux.dev.kernel
>
> Alan Stern <[email protected]> said:
>
> [...]
>
> > Suppose everything is working correctly and the pointer never is NULL.
> > Then it really doesn't matter whether you check or not; the loss in code
> > speed and size is completely negligible (except maybe deep in some inner
> > loop). But there is a loss in code clarity; when I see a check like that
> > it makes me think, "What's that doing there? Can that pointer ever be
> > NULL, or is someone just being paranoid?" Distractions of that sort don't
> > help when trying to read code.
>
> My personal paranoia when reading code goes the other way: How can I be
> sure it won?t ever be NULL? Maybe it can't be now (and to find that out an
> hour grepping around goes by), but the very next patch introduces the
> possibility. Better have the function do an extra check, or make sure
> somehow the assumption won't _ever_ be violated. But that is a large (huge,
> even) cost, so...
>
And you just shot yourself in the foot, majorly, because you tested
for NULLness and then took the action you anticipated might have been
appropriate, which really it wasn't, and you allowed a kernel with
now-corrupt data structures to continue to run.
This is bad. This is extrememly bad. And your "forward thinking"
*caused* it.
A null pointer dereference in the kernel is fatal for a reason. It
indicates that there are interfaces that aren't consistent, and your
data structures are now completely unreliable.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Followup to: <[email protected]>
By author: Eli Carter <[email protected]>
In newsgroup: linux.dev.kernel
> >
> > Not really needed, since a segfault will produce almost as much
> > information as a BUG_ON(). Certainly it will produce enough to let a
> > developer know that the pointer was NULL.
>
> Your first message said, "I see no reason for pure paranoia,
> particularly if it's not commented as such." A BUG_ON() call makes it
> clear that the condition should never happen. Dereferencing a NULL
> leaves the question of whether NULL is an unhandled case or invalid
> input. BUG_ON() is an explicit paranoia check, and with a bit of
> preprocessing magic, you could compile out all of those checks.
>
> So it documents invalid input conditions, allows you to eliminate the
> checks in the name of speed or your personal preference, or use them to
> help with debugging/testing.
>
... but it also bloats the code, in this case, in many ways
needlessly. You don't want to compile out all BUG_ON()'s, just the
ones that wouldn't be checked for anyway.
In fact, have a macro that explicitly tests for nullness by
dereferencing a pointer might be a good idea; on most architectures it
will be a lot cheaper than BUG_ON() (which usually requires an
explicit test), and the compiler at least has a prayer at optimizing
it out.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
On Fri, 11 Jul 2003, Horst von Brand wrote:
> My personal paranoia when reading code goes the other way: How can I be
> sure it won?t ever be NULL? Maybe it can't be now (and to find that out an
> hour grepping around goes by), but the very next patch introduces the
> possibility. Better have the function do an extra check, or make sure
> somehow the assumption won't _ever_ be violated. But that is a large (huge,
> even) cost, so...
Suppose something does change and your function is passed a NULL pointer
after all. What will you do about it then? Clearly this represents a
mistake on the part of the caller; are you simply going to return without
doing anything and hope that nothing else will go wrong? Or will you
return an error code and hope that a caller careless enough to pass an
invalid argument will also be careful enough to check the return code?
Quite a lot of places in the kernel do one of these (usually the first).
Neither of those options is attractive to me. I would prefer something
that leaves a clear indication that the problem exists. A logged error
message would help; a BUG_ON or segfault would be even better. This is
especially true in situations where the function in question is part of a
relatively small subsystem where you _know_ that a NULL pointer is never
valid. (It may occur, but if it does it must represent an error.)
Alan Stern
Alan Stern <[email protected]> said:
[...]
> But if you look very far through the kernel sources you will see many
> occurrences of code similar to this:
>
> static void release(struct xxx *ptr)
> {
> if (!ptr)
> return;
> ...
>
> I can't see any reason for keeping something like that.
Just like free(3)
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
Hi,
On Thu, Jul 10, 2003 at 04:28:09PM -0400, Alan Stern wrote:
> There are many places in the kernel where a function checks whether a
> pointers it has been given is NULL. Now sometimes this makes perfect
> sense because the function's description explicitly says that a NULL
> pointer argument is valid. But in many, many cases (maybe even the
> majority) it is nothing more than paranoia: the pointer can never be NULL
> in a properly functioning system.
There are many meanings of NULL.
a) NULL -> I don't know
Reaction: Ok, then do a generic/default variant.
b) NULL -> failure in caller passed down to us.
Reaction: Pass it on, return -EINVAL or ignore the call
c) NULL -> failure in API (argument can't be NULL)
Reaction: BUG_ON()
...
So the answer isn't only taste, it's a matter of simplicity and
roboustness.
Regards
Ingo Oeser
On Sat, 12 Jul 2003, Horst von Brand wrote:
> Alan Stern <[email protected]> said:
>
> [...]
>
> > But if you look very far through the kernel sources you will see many
> > occurrences of code similar to this:
> >
> > static void release(struct xxx *ptr)
> > {
> > if (!ptr)
> > return;
> > ...
> >
> > I can't see any reason for keeping something like that.
>
> Just like free(3)
NO! Not just like free(). The documentation for free() explicitly states
that NULL pointers are valid input and result in no action. A
release()-type function, by contrast, is called back from core system code
that guarantees it will always pass a pointer to the currently-registered
owner of the corresponding resource. If the owner were NULL, the
release() function wouldn't have been called in the first place.
Alan Stern