Just did a pull for the first time since 2.6.20, and a /megaton/ of new
warnings have appeared, on Fedora Core 6/x86-64 "make allyesconfig".
All of the new warnings spew appear to be "pointers differ in
signedness" warning. Granted, we should care about these, but even a
"silent" build (make -s) now spews far more output than any human could
possibly care about, and digging important nuggets out of all this noise
is beginning to require automated tools rather than just human eyes.
Jeff
On Thu, 8 Feb 2007, Jeff Garzik wrote:
>
> Just did a pull for the first time since 2.6.20, and a /megaton/ of new
> warnings have appeared, on Fedora Core 6/x86-64 "make allyesconfig".
That was due to the Makefile rule breakage. Hopefully it now _really_ is
fixed. The
CFLAGS += $(call cc-option,-Wno-pointer-sign,)
part effectively got dropped.
> All of the new warnings spew appear to be "pointers differ in signedness"
> warning. Granted, we should care about these
The thing is, I'm _really_ sad we have to use that -Wno-pointer-sign
thing. It's a real and valid warning. It finds really ugly code issues
(unlike a lot of the other things). So it got dropped by mistake, but that
is one warning that I'd actually like to resurrect (unlike a lot of other
gcc warnings that I just think are crap to begin with).
However, we've got a metric shitload of code that passes in pointers to
the wrong sign, so a lot of it would involve mindless changes that are
likely to break real things.
And the thing is, sometimes -Wpointer-sign-compare is just horribly
broken. For example, you cannot write a "strlen()" that doesn't complain
about "unsigned char *" vs "char *". And that is just a BUG.
I'm continually amazed at how totally clueless some gcc warnings are.
You'd think that the people writing them are competent. And then sometimes
they show just how totally incompetent they are, by making it impossible
to use "unsigned char" arrays together with standard functions.
So a _lot_ of the warnings are real. But with no way to shut up the ones
that aren't, it sadly doesn't matter one whit ;(
(And adding casts is just much worse than shutting up the warning
entirely)
There should be a rule about warnings: if you can't shut up stylistic
warnings on a site-by-site level without adding horribly syntactic stuff
that is worse than what the warning is all about, the warning shouldn't
exist.
Linus
On Thu, 8 Feb 2007, Jeff Garzik wrote:
> Just did a pull for the first time since 2.6.20, and a /megaton/ of new
> warnings have appeared, on Fedora Core 6/x86-64 "make allyesconfig".
>
> All of the new warnings spew appear to be "pointers differ in signedness"
> warning. Granted, we should care about these, but even a "silent" build
> (make -s) now spews far more output than any human could possibly care about,
> and digging important nuggets out of all this noise is beginning to require
> automated tools rather than just human eyes.
I think this is because of changes to Kbuild and the ongoing discussion
related to it. On PPC I see that we are losing options via $(call
cc-option..)
- k
On Feb 8 2007 08:33, Linus Torvalds wrote:
>
>And the thing is, sometimes -Wpointer-sign-compare is just horribly
>broken. For example, you cannot write a "strlen()" that doesn't
>complain about "unsigned char *" vs "char *". And that is just a BUG.
>
>I'm continually amazed at how totally clueless some gcc warnings are.
>You'd think that the people writing them are competent. And then
>sometimes they show just how totally incompetent they are, by making it
>impossible to use "unsigned char" arrays together with standard
>functions.
I generally have to agree with you about the unsigned char* vs char*. It
is a problem of the C language that char can be signed and unsigned, and
that people, as a result, have used it for storing
"shorter_than_short_t"s.
What C needs is a distinction between char and int8_t, rendering "char"
an unsigned at all times basically and making "unsigned char" and
"signed char" illegal types in turn.
Jan
--
ft: http://freshmeat.net/p/chaostables/
On Thu, 8 Feb 2007, Jan Engelhardt wrote:
>
> I generally have to agree with you about the unsigned char* vs char*. It
> is a problem of the C language that char can be signed and unsigned, and
> that people, as a result, have used it for storing
> "shorter_than_short_t"s.
>
> What C needs is a distinction between char and int8_t, rendering "char"
> an unsigned at all times basically and making "unsigned char" and
> "signed char" illegal types in turn.
No, it's really more fundamental than that.
Exactly because "char *" doesn't have a defined sign, only a TOTALLY
INCOMPETENT compiler will warn about its signedness.
The user has clearly stated "I don't care about the sign". If a compiler
complains about us passing "unsigned char *" (or, if "char" is naturally
unsigned on that platform, "signed char *") to strcmp then that compiler
IS BROKEN. Because "strcmp()" takes "char *", which simply DOES NOT HAVE a
uniquely defined sign.
That's why we can't have -Wpointer-sign on by default. The gcc warning is
simply *crap*.
If you were to have
extern int strcmp(signed char *);
and would pass *that* an "unsigned char", it would be a real and valid
sign error. But just plain "char *" isn't signed or unsigned. It's
"implementation defined", and as such, if the programmer used it, the
programmer clearly doesn't care. If he cared, he would just state the
signedness explicitly.
It really is that simple. gcc is broken. The C language isn't, it's purely
a broken compiler issue.
The other things in the kernel I'd be willing to fix up. But I simply
refuse to work around a known-buggy warning.
Linus
On Feb 8 2007 11:53, Linus Torvalds wrote:
>On Thu, 8 Feb 2007, Jan Engelhardt wrote:
>>
>Exactly because "char *" doesn't have a defined sign,
>The user has clearly stated "I don't care about the sign". If a compiler
>complains about us passing "unsigned char *" (or, if "char" is naturally
>unsigned on that platform, "signed char *") to strcmp then that compiler
>IS BROKEN. Because "strcmp()" takes "char *", which simply DOES NOT HAVE a
>uniquely defined sign.
Thank you for this insight, I don't usually read standards, only RFCs :)
Uh, does that also apply to the longer types, int, long etc.? I hope not.
>only a TOTALLY INCOMPETENT compiler will warn about its signedness.
>That's why we can't have -Wpointer-sign on by default. The gcc warning is
>simply *crap*.
>[...]
>It really is that simple. gcc is broken. The C language isn't, it's purely
>a broken compiler issue.
Maybe you could send in a patch to gcc that fixes the issue?
Jan
--
On Thu, 8 Feb 2007 11:53:38 -0800 (PST), Linus Torvalds <[email protected]> wrote:
>
>
> On Thu, 8 Feb 2007, Jan Engelhardt wrote:
> >
> > I generally have to agree with you about the unsigned char* vs char*. It
> > is a problem of the C language that char can be signed and unsigned, and
> > that people, as a result, have used it for storing
> > "shorter_than_short_t"s.
> >
> > What C needs is a distinction between char and int8_t, rendering "char"
> > an unsigned at all times basically and making "unsigned char" and
> > "signed char" illegal types in turn.
>
> No, it's really more fundamental than that.
>
> Exactly because "char *" doesn't have a defined sign, only a TOTALLY
> INCOMPETENT compiler will warn about its signedness.
>
Perhaps the problem is that gcc warns you something like 'if you care
anything about the sign behaviour of what you send to strlen() you're doomed'.
Ie, you declared the var unsigned, so you care about it. But I can do
anything without any regard to the sign.
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.19-jam06 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #1 SMP PREEMPT
On Thu, 8 Feb 2007, Jan Engelhardt wrote:
>
> Thank you for this insight, I don't usually read standards, only RFCs :)
> Uh, does that also apply to the longer types, int, long etc.? I hope not.
No. An "int" or "long" is always signed.
But bitfields, enums or "char" can - and do - have signedness that depends
on the architecture and/or compiler writers whims.
So if you do
enum my_enum {
orange = 1,
blue = 2,
green = 3
};
you don't know if "enum my_enum" is signed or not. The compiler could
decide to use "int". Or it could decide to use "unsigned char" for it. You
simply don't know.
Same goes for
struct dummy {
int flag:1;
} a_variable;
which could make "a_variable.d" be either signed or unsigned. In the
absense of an explicit "signed" or "unsigned" by the programmer, you
really won't even _know_ whether "flag" can contain the values {0,1} or
{-1,0}.
Normally you wouldn't ever even care. A one-bit bitfield would only ever
really be tested against 0, but it really is possible that when you assign
"1" to that value, and read it back, it will read back -1. Try it.
Those are the three only types I can think of, but the point is, they
really don't have any standard-defined sign. It's up to the compiler.
Now, passing a "pointer to bitfield" is impossible, and if you pass a
pointer to an enum you'd better *always* use the same enum anyway (because
not only is the signedness implementation-defined, the _size_ of the enum
is also implementation-defined), so in those two cases, -Wpointer-sign
doesn't hurt.
But in the case of "char", it really is insane. It's doubly insane exactly
because the C standard defines a lot of standard functions that take "char
*", and that make tons of sense with unsigned chars too, and you may have
100% good reasons to use "unsigned char" for many of them.
> >It really is that simple. gcc is broken. The C language isn't, it's purely
> >a broken compiler issue.
>
> Maybe you could send in a patch to gcc that fixes the issue?
I've butted heads with too many broken gcc decisions. I've occasionally
used to try to discuss these kinds of things on the gcc mailing lists, but
I got too fed up with language lawyers that said "the standard _allows_ us
to be total *ssholes, so stop complaining". It seemed to be an almost
universal defense.
(And yes, the standard allows any warnings at all. So technically, gcc can
warn about whatever hell it wants. It doesn't make it a good idea).
Linus
On Thu, 8 Feb 2007, J.A. Magallón wrote:
>
> Perhaps the problem is that gcc warns you something like 'if you care
> anything about the sign behaviour of what you send to strlen() you're doomed'.
But you *cannot* care. That's the point. The warning is insane. For any
function that takes "char *", you very fundamentally *cannot* care about
the signedness of the argument, because doing so would be wrong. If you
do care, you're already buggy.
> Ie, you declared the var unsigned, so you care about it. But I can do
> anything without any regard to the sign.
That makes no sense.
First off, "strlen()" doesn't care about the sign of the buffer, so it's a
bad example anyway. But take something that *can* care, namely "strcmp()",
which can return different things depending on whether "char" is signed or
not (is 0xff larger than 0x00? Depends on whether char is signed or
not..).
But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is
implementation-defined, so if you call "strcmp()", you are already
basically saying: I don't care (and I _cannot_ care) what sign you are
using.
So having the compiler warn about it is insane. It's like warnign about
the fact that it's Thursday today. It's not something the programmer cares
about.
Linus
On Thu, 8 Feb 2007, Linus Torvalds wrote:
>
> But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is
> implementation-defined, so if you call "strcmp()", you are already
> basically saying: I don't care (and I _cannot_ care) what sign you are
> using.
Let me explain it another way.
Say you use
signed char *myname, *yourname;
if (strcmp(myname,yourname) < 0)
printf("Ha, I win!\n")
and you compile this on an architecture where "char" is signed even
without the explicit "signed".
What should happen?
Should you get a warning? The types really *are* the same, so getting a
warning sounds obviously insane. But on the other hand, if you really care
about the sign that strcmp() uses internally, the code is wrong *anyway*,
because with another compiler, or with the *same* compiler on another
architecture or some other compiler flags, the very same code is buggy.
In other words, either you should get a warning *regardless* of whether
the sign actually matches or not, or you shouldn't get a warning at all
for the above code. Either it's buggy code, or it isn't.
Warning only when the sign doesn't _happen_ to match is crap. In that
case, it's not a warning about bad code, it's a warning about a bad
*compiler*.
My suggestion is that if you *really* care about the sign so much that you
want the sign warning, make it really obvious to the compiler. Don't ever
call functions that have implicit signs. Make even "int" arguments (which
is well-defined in its sign) use "signed int", and then you can make the
compiler warn if anybody ever passes it an "unsigned int".
Never mind even a pointer - if somebody actually took the time and effort
to spell out "signed int" in a function prototype, and you pass that
function an unsigned integer, maybe a warning is perfectly fine. Clearly
the programmer really cared, and if he didn't care about the sign that
much, he could have used just "int".
Conversely, if somebody has a function with a "unsigned int" prototype,
and you pass it a regular "int", a compiler shouldn't complain, because an
"int" will just silently promote to unsigned. But perhaps the programmer
passes it something that he had _explicitly_ marked with "signed int".
Would it make sense to warn then? Makes sense to me.
And no, none of this is about "strict C standards". All of it is about
"what makes sense". It simply doesn't make sense to complain about the
sign of "char", because it's not something that has a very hard
definition. Similarly, you shouldn't complain about regular "int"
conversions, because they are normal, and the standard defines them, but
maybe you can take a hint when the programmer gives you a hint by doing
something that is "obviously unnecessary", like explicitly saying that
"signed int" thing.
Just an idea.
Linus
On Thu, Feb 08, 2007 at 02:03:06PM -0800, Linus Torvalds wrote:
>
>
> On Thu, 8 Feb 2007, Linus Torvalds wrote:
> >
> > But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is
> > implementation-defined, so if you call "strcmp()", you are already
> > basically saying: I don't care (and I _cannot_ care) what sign you are
> > using.
>
> Let me explain it another way.
>
> Say you use
>
> signed char *myname, *yourname;
>
> if (strcmp(myname,yourname) < 0)
> printf("Ha, I win!\n")
>
> and you compile this on an architecture where "char" is signed even
> without the explicit "signed".
>
> What should happen?
>
> Should you get a warning? The types really *are* the same, so getting a
> warning sounds obviously insane. But on the other hand, if you really care
> about the sign that strcmp() uses internally, the code is wrong *anyway*,
> because with another compiler, or with the *same* compiler on another
> architecture or some other compiler flags, the very same code is buggy.
>
> In other words, either you should get a warning *regardless* of whether
> the sign actually matches or not, or you shouldn't get a warning at all
> for the above code. Either it's buggy code, or it isn't.
>
> Warning only when the sign doesn't _happen_ to match is crap. In that
> case, it's not a warning about bad code, it's a warning about a bad
> *compiler*.
>
> My suggestion is that if you *really* care about the sign so much that you
> want the sign warning, make it really obvious to the compiler. Don't ever
> call functions that have implicit signs. Make even "int" arguments (which
> is well-defined in its sign) use "signed int", and then you can make the
> compiler warn if anybody ever passes it an "unsigned int".
>
> Never mind even a pointer - if somebody actually took the time and effort
> to spell out "signed int" in a function prototype, and you pass that
> function an unsigned integer, maybe a warning is perfectly fine. Clearly
> the programmer really cared, and if he didn't care about the sign that
> much, he could have used just "int".
>
> Conversely, if somebody has a function with a "unsigned int" prototype,
> and you pass it a regular "int", a compiler shouldn't complain, because an
> "int" will just silently promote to unsigned. But perhaps the programmer
> passes it something that he had _explicitly_ marked with "signed int".
> Would it make sense to warn then? Makes sense to me.
>
> And no, none of this is about "strict C standards". All of it is about
> "what makes sense". It simply doesn't make sense to complain about the
> sign of "char", because it's not something that has a very hard
> definition. Similarly, you shouldn't complain about regular "int"
> conversions, because they are normal, and the standard defines them, but
> maybe you can take a hint when the programmer gives you a hint by doing
> something that is "obviously unnecessary", like explicitly saying that
> "signed int" thing.
>
> Just an idea.
>
> Linus
I perfectly agree with your analysis here. I'm used to specify "signed"
or "unsigned" when I really expect such a sign, whether it's an int or
char, and because of this, I've spent a lot of time stuffing useless
casts everywhere to shut stupid warning with common functions such as
strcmp(), or even my functions when I don't care about the sign. IOW,
the compiler's excessive warnings discourage you from being strict on
your types. That could be called "funny" if it did not make us waste
so much time.
I would really like to get the "I don't care" possibility by default.
Willy
On Thu, 8 Feb 2007, Linus Torvalds wrote:
> Same goes for
>
> struct dummy {
> int flag:1;
> } a_variable;
>
> which could make "a_variable.d" be either signed or unsigned. In the
> absense of an explicit "signed" or "unsigned" by the programmer, you
> really won't even _know_ whether "flag" can contain the values {0,1} or
> {-1,0}.
>
> Normally you wouldn't ever even care. A one-bit bitfield would only ever
> really be tested against 0, but it really is possible that when you assign
> "1" to that value, and read it back, it will read back -1. Try it.
>
> Those are the three only types I can think of, but the point is, they
> really don't have any standard-defined sign. It's up to the compiler.
>
And a compiler that makes a_variable.flag unsigned would be brain-dead
because "int" is always signed. The signed specifier is only useful for
forcing chars to carry a sign and it's redundant with any other types.
Using bitfields requires the knowledge of the sign bit in twos-complement
just like when you use a signed qualifier for any other definition.
Having inconsistent behavior on whether it is unsigned or signed based on
how large the bitfield is (i.e. one bit compared to multiple bits) is
inappropriate.
On Thu, 8 Feb 2007, David Rientjes wrote:
>
> And a compiler that makes a_variable.flag unsigned would be brain-dead
> because "int" is always signed.
No, making bitfields unsigned is actually usually a good idea. It allows
you to often generate better code, and it actually tends to be what
programmers _expect_. A lot of people seem to be surprised to hear that a
one-bit bitfield actually often encodes -1/0, and not 0/1.
So unsigned bitfields are not only traditional K&R, they are also usually
_faster_ (which is probably why they are traditional K&R - along with
allowing "char" to be unsigned by default). Don't knock them. It's much
better to just remember that bitfields simply don't _have_ any standard
sign unless you specify it explicitly, than saying "it should be signed
because 'int' is signed".
I will actually argue that having signed bit-fields is almost always a
bug, and that as a result you should _never_ use "int" at all. Especially
as you might as well just write it as
signed a:1;
if you really want a signed bitfield.
So I would reall yrecommend that you never use "int a:<bits>" AT ALL,
because there really is never any good reason to do so. Do it as
unsigned a:3;
signed b:2;
but never
int c:4;
because the latter really isn't sensible.
"sparse" will actually complain about single-bit signed bitfields, and it
found a number of cases where people used that "int x:1" kind of syntax.
Just don't do it.
Linus
On Thu, 8 Feb 2007 14:03:06 -0800 (PST), Linus Torvalds <[email protected]> wrote:
>
>
> On Thu, 8 Feb 2007, Linus Torvalds wrote:
> >
> > But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is
> > implementation-defined, so if you call "strcmp()", you are already
> > basically saying: I don't care (and I _cannot_ care) what sign you are
> > using.
>
> Let me explain it another way.
>
> Say you use
>
> signed char *myname, *yourname;
>
> if (strcmp(myname,yourname) < 0)
> printf("Ha, I win!\n")
>
> and you compile this on an architecture where "char" is signed even
> without the explicit "signed".
>
> What should happen?
>
> Should you get a warning? The types really *are* the same, so getting a
> warning sounds obviously insane. But on the other hand, if you really care
> about the sign that strcmp() uses internally, the code is wrong *anyway*,
Thats the point. Mmmm, I think I see it the other way around. I defined
a variable as 'signed' or 'unsigned', because the sign info matters for me.
And gcc warns about using a function on it that will _ignore_ or even
misinterpret that info. Could it be a BUG ? Yes.
> because with another compiler, or with the *same* compiler on another
> architecture or some other compiler flags, the very same code is buggy.
>
> In other words, either you should get a warning *regardless* of whether
> the sign actually matches or not, or you shouldn't get a warning at all
> for the above code. Either it's buggy code, or it isn't.
>
In fact, I tried to look for an arch where this gives only _one_ error:
#include <string.h>
void f()
{
const unsigned char *a;
const signed char *b;
int la,lb;
la = strlen(a);
lb = strlen(b);
}
and guess, all give _both_ errors.
Linux/x86, gcc 4.1.2-0.20070115:
werewolf:~> gcc -Wpointer-sign -c t.c
t.c: In function ‘f’:
t.c:10: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness
t.c:11: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness
OSX ppc, gcc-4.0.1
belly:~> gcc -Wpointer-sign -c t.c
t.c: In function ‘f’:
t.c:10: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness
t.c:11: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness
Solaris5.7 Ultra, cc WorkShop Compilers 5.0
den:~> cc -Xa -c t.c
ucbcc: Warning: "-Xa" redefines compatibility mode from "SunC transition" to "ANSI"
"t.c", line 10: warning: argument #1 is incompatible with prototype:
prototype: pointer to const char : "/usr/include/string.h", line 71
argument : pointer to const uchar
"t.c", line 11: warning: argument #1 is incompatible with prototype:
prototype: pointer to const char : "/usr/include/string.h", line 71
argument : pointer to const schar
This makes sense.
Someone can try on something psychedelic, like ARM ;) ?
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.19-jam06 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #1 SMP PREEMPT
On Fri, 9 Feb 2007, J.A. Magallón wrote:
>
> Thats the point. Mmmm, I think I see it the other way around. I defined
> a variable as 'signed' or 'unsigned', because the sign info matters for me.
> And gcc warns about using a function on it that will _ignore_ or even
> misinterpret that info. Could it be a BUG ? Yes.
Sure. The other way of seeing it is that *anything* could be a bug.
Could adding 1 to "a" be a bug? Yes. "a" might overflow. So maybe the
compiler should warn about that too?
So do you think a compiler should warn when you do
int a = i + 1;
and say "warning: Expression on line x might overflow"?
Could it be a BUG? Hell yeah.
Is warning for things that _could_ be bugs sane? Hell NO.
> Linux/x86, gcc 4.1.2-0.20070115:
> werewolf:~> gcc -Wpointer-sign -c t.c
> t.c: In function ??f??:
> t.c:10: warning: pointer targets in passing argument 1 of ??strlen?? differ in signedness
> t.c:11: warning: pointer targets in passing argument 1 of ??strlen?? differ in signedness
Yeah, and that's what I think is crazy.
Is it consistent? Yes. Does it help people? No.
A warning that is consistent is not necessarily a good warning. It needs
to MAKE SENSE too. And this one doesn't. I'm sorry if you can't see that.
Linus
On Thu, 8 Feb 2007, Linus Torvalds wrote:
> No, making bitfields unsigned is actually usually a good idea. It allows
> you to often generate better code, and it actually tends to be what
> programmers _expect_. A lot of people seem to be surprised to hear that a
> one-bit bitfield actually often encodes -1/0, and not 0/1.
>
Your struct:
struct dummy {
int flag:1;
} a_variable;
should expect a_varible.flag to be signed, that's what the int says.
There is no special case here with regard to type. It's traditional K&R
that writing signed flag:1 here is redundant (K&R pg. 211). Now whether
that's what you wanted or not is irrelevant. As typical with C, it gives
you exactly what you asked for.
You're arguing for inconsistency in how bitfields are qualified based on
their size. If you declare a bitfield as int byte:8, then you're going
to expect this to behave exactly like a signed char, which it does.
struct {
int byte:8;
} __attribute__((packed)) b_variable;
b_variable.byte is identical to a signed char.
Just because a_variable.flag happens to be one bit, you cannot say that
programmers _expect_ it to be unsigned, or you would also expect
b_variable.byte to act as an unsigned char. signed is the default
behavior for all types besides char, so the behavior is appropriate based
on what most programmers would expect.
> So unsigned bitfields are not only traditional K&R, they are also usually
> _faster_ (which is probably why they are traditional K&R - along with
> allowing "char" to be unsigned by default). Don't knock them. It's much
> better to just remember that bitfields simply don't _have_ any standard
> sign unless you specify it explicitly, than saying "it should be signed
> because 'int' is signed".
>
Of course they're faster, it doesn't require the sign extension. But
you're adding additional semantics to the language by your interpretation
of what is expected by the programmer in the bitfield case as opposed to a
normal variable definition.
> I will actually argue that having signed bit-fields is almost always a
> bug, and that as a result you should _never_ use "int" at all. Especially
> as you might as well just write it as
>
> signed a:1;
>
> if you really want a signed bitfield.
>
How can signed bitfields almost always be a bug? I'm the programmer and I
want to store my data in a variable that I define, so I must follow the
twos-complement rule and allot a sign bit if I declare it as a signed
value. In C, you do this with "int"; otherwise, you use "unsigned".
> So I would reall yrecommend that you never use "int a:<bits>" AT ALL,
> because there really is never any good reason to do so. Do it as
>
> unsigned a:3;
> signed b:2;
>
> but never
>
> int c:4;
>
> because the latter really isn't sensible.
>
That _is_ sensible because anything you declare "int" is going to be
signed as far as gcc is concerned. "c" just happens to be 4-bits wide
instead of 32. But for everything else, it's the same as an int. I know
exactly what I'm getting by writing "int c:4", as would most programmers
who use bitfields to begin with.
David
On Thu, 8 Feb 2007, David Rientjes wrote:
>
> Your struct:
>
> struct dummy {
> int flag:1;
> } a_variable;
>
> should expect a_varible.flag to be signed, that's what the int says.
No it's not.
You just don't understand the C language.
And if you don't understand the C language, you can't say "that's what the
int says". It says no such thing.
The C language clearly says that bitfields have implementation-defined
types. So when you see
struct dummy {
int flag:1;
} a_variable;
if you don't read that as "oh, the sign of 'flag' is implementation-
defined", then you simply aren't reading it right.
Is it "intuitive"? Nobody ever really called C an _intuitive_ language. C
has a lot of rules that you simply have to know. The bitfield sign rule is
just one such rule.
> There is no special case here with regard to type.
Sure there is. Read the spec.
I don't understand why you are arguing. YOU ARE WRONG.
Bitfields simply have implementation-defined signedness. As do enums. As
does "char". It really is that simple.
The *real* special case is actually "int" and "long". In many ways, the
fact that those *do* have a well-specified signedness is actually the
exception rather than the rule.
Most C types don't, and some you can't even tell (do pointers generate
"signed" or "unsigned" comparisons? I'll argue that a compiler that
generates signed comparisons for them is broken, but it tends to be
something you can only see with a standards- conforming proghram if you
can allocate memory across the sign boundary, which may or may not be
true..)
Linus
On Thu, 8 Feb 2007, Linus Torvalds wrote:
>
> The C language clearly says that bitfields have implementation-defined
> types.
Btw, this isn't even some "theoretical discussion". LOTS of compilers do
unsigned bitfields. It may even be the majority. There may (or may not) be
some correlation with what the default for "char" is, I haven't looked
into it.
Linus
On Thu, 8 Feb 2007, Linus Torvalds wrote:
> No it's not.
>
> You just don't understand the C language.
>
> And if you don't understand the C language, you can't say "that's what the
> int says". It says no such thing.
>
> The C language clearly says that bitfields have implementation-defined
> types. So when you see
>
> struct dummy {
> int flag:1;
> } a_variable;
>
> if you don't read that as "oh, the sign of 'flag' is implementation-
> defined", then you simply aren't reading it right.
>
Maybe you should read my first post, we're talking about gcc's behavior
here, not the C standard. My criticism was that any compiler that makes
a_variable.flag unsigned is brain-dead and I was arguing in favor of gcc
treating plain int bitfields as signed ints (6.7.2, 6.7.2.1). This has
_nothing_ to do with the fact that the standard leaves it implementation
defined.
Naturally you should define it's signness explicitly in your code since it
is implementation defined. That's not the point.
Just because a compiler CAN consider a_variable.flag as unsigned doesn't
mean it makes sense. It makes no sense, and thus is brain-dead.
On Thu, 8 Feb 2007, David Rientjes wrote:
>
> Maybe you should read my first post, we're talking about gcc's behavior
> here, not the C standard.
Give it up, David.
Even gcc DOES DIFFERENT THINGS! Have you even read the docs?
By default it is treated as signed int but this may be changed by
the -funsigned-bitfields option.
So even for gcc, it's just a default. I think you can even change the
default in the spec file.
Linus
On Thu, 8 Feb 2007, Linus Torvalds wrote:
> Even gcc DOES DIFFERENT THINGS! Have you even read the docs?
>
> By default it is treated as signed int but this may be changed by
> the -funsigned-bitfields option.
>
Yes, I read the 4.1.1 docs:
By default, such a bit-field is signed, because this is
consistent: the basic integer types such as int are signed
types.
That is the whole basis for my argument, when you declare something "int,"
most programmers would consider that to be SIGNED regardless of whether it
is 32 bits, 13 bits, or 1 bit.
No doubt it is configurable because of the existance of brain-dead
compilers that treat them as unsigned.
On Thursday 08 February 2007 19:42, Linus Torvalds wrote:
<snip>
> Most C types don't, and some you can't even tell (do pointers generate
> "signed" or "unsigned" comparisons? I'll argue that a compiler that
> generates signed comparisons for them is broken, but it tends to be
> something you can only see with a standards- conforming proghram if you
> can allocate memory across the sign boundary, which may or may not be
> true..)
And this is, because as Dennis Ritchie says in "The Development of the C
Language" (http://cm.bell-labs.com/cm/cs/who/dmr/chist.html) C evolved from a
typeless language, B - which, in turn, had originated as a stripped down
version of BCPL. B and BCPL used a "cell addressing" scheme, but because of
the move off the PDP-7, and the difficulties run into by Ken Thompson in
rewriting the code for Unix in B (the original Unix was written in assembler)
types were added - 'char' and 'int'. These represented the byte and word
addressing modes of the PDP-11, and were needed because Ritchie was in the
process of having the compiler generate assembler instead of "threaded code".
This "NB" language, after also having data structures added and the rules for
defining pointers finalized, was renamed "C" by Ritchie.
So almost all the rules around the signs of types are because of a single,
historical machine. Hence the rules about "char" being unsigned by default
and "int" being signed by default are because of the nature of the PDP-11.
The implementation defined nature of bitfields is because Dennis Ritchie had
freedom there - the PDP-11 didn't have anything like them in the hardware and
they were being stuffed in to make Thompsons life easier.
Now: There is no reason for the behavior that came from the nature of the
PDP11 to have survived, but because it was in "The White Book" it made it
through the ANSI standardization process.
Now that this history lesson is over...
DRH
On Feb 8 2007 16:42, Linus Torvalds wrote:
>
>Most C types don't, and some you can't even tell (do pointers generate
>"signed" or "unsigned" comparisons?
I'd say "neither", because both
signed void *ptr; and
unsigned void *xyz;
are impossible (and dull at that). That's why you explicitly will
have to cast one operand in a comparison to make it evident
what sort of comparison is intended, i.e.
if((unsigned long)ptr < PAGE_OFFSET)
Further, giving again answer to the question whether they generate signed or
unsigned comparisons: Have you ever seen a computer which addresses memory with
negative numbers? Since the answer is most likely no, signed comparisons would
not make sense for me.
> I'll argue that a compiler that
>generates signed comparisons for them is broken, but it tends to be
>something you can only see with a standards- conforming proghram if you
>can allocate memory across the sign boundary, which may or may not be
>true..)
Jan
--
ft: http://freshmeat.net/p/chaostables/
Linus Torvalds <[email protected]> writes:
> On Thu, 8 Feb 2007, Linus Torvalds wrote:
>>
>> But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is
>> implementation-defined, so if you call "strcmp()", you are already
>> basically saying: I don't care (and I _cannot_ care) what sign you are
>> using.
>
> Let me explain it another way.
>
> Say you use
>
> signed char *myname, *yourname;
>
> if (strcmp(myname,yourname) < 0)
> printf("Ha, I win!\n")
>
> and you compile this on an architecture where "char" is signed even
> without the explicit "signed".
As far as I can read the C99 standard, the "char", "signed char", and
"unsigned char", are all different types:
"The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned
char.(35)
....
(35) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or
SCHAR_MIN, and this can be used to distinguish the two
options. Irrespective of the choice made, char is a separate type
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
from the other two and is not compatible with either.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"
If so, then the code above is broken no matter what representation of
"char" is chosen for given arch, as strcmp() takes "char*" arguments,
that are not compatible with either "signed char*" or "unsigned char*".
-- Sergei.
On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>
> On Feb 8 2007 16:42, Linus Torvalds wrote:
>>
>> Most C types don't, and some you can't even tell (do pointers generate
>> "signed" or "unsigned" comparisons?
>
> I'd say "neither", because both
>
> signed void *ptr; and
> unsigned void *xyz;
>
> are impossible (and dull at that). That's why you explicitly will
> have to cast one operand in a comparison to make it evident
> what sort of comparison is intended, i.e.
>
> if((unsigned long)ptr < PAGE_OFFSET)
>
> Further, giving again answer to the question whether they generate signed or
> unsigned comparisons: Have you ever seen a computer which addresses memory with
> negative numbers? Since the answer is most likely no, signed comparisons would
Yes, most all do. Indexed memory operands are signed displacements. See
page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This
gets hidden by higher-level languages such as 'C'. It is also non-obvious
when using the AT&T (GNU) assembly language. However, the Intel documentation
shows it clearly:
MOV AL, [EBX+0xFFFFFFFF] ; -1
...will put the byte existing at the location just BEFORE the offset in
EBX, into register AL, while:
MOV AL, [EBX+0x00000001] ; +1
...will put the byte existing at the location just AFTER the offset in
EBX, into register AL.
Also, the documentation specifically states that sign extension is
provided for all effective address calculations.
> not make sense for me.
>
>> I'll argue that a compiler that
>> generates signed comparisons for them is broken, but it tends to be
>> something you can only see with a standards- conforming proghram if you
>> can allocate memory across the sign boundary, which may or may not be
>> true..)
>
> Jan
> --
> ft: http://freshmeat.net/p/chaostables/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
Jan Engelhardt <[email protected]> writes:
> On Feb 8 2007 08:33, Linus Torvalds wrote:
>>
[...]
> What C needs is a distinction between char and int8_t, rendering "char"
> an unsigned at all times basically and making "unsigned char" and
> "signed char" illegal types in turn.
AFAIK, C already has 3 *distinct* types, "char", "signed char", and
"unsigned char". So your int8_t reads "signed char" in C, uint8_t reads
"unsigned char" in C, while "char" is yet another type that is not
compatible with those two. Yes, the names for these types are somewhat
confusing, but that's due to historical reasons, I think.
Overall,
signed char == tiny signed int == tiny int
unsigned char == tiny unsigned int
char != signed char
char != unsigned char
where
tiny == short short ;)
Rules of thumb: don't use bare char to store integers; don't use
signed/unsigned char to store characters.
[strxxx() routines take char* arguments as they work on '\0'-terminated
character arrays (strings). Using them on arrays of tiny integers is
unsafe no matter how char behaves on given platform (like signed or like
unsigned type).]
-- Sergei.
On Thu, 8 Feb 2007, David Rientjes wrote:
>
> Yes, I read the 4.1.1 docs:
>
> By default, such a bit-field is signed, because this is
> consistent: the basic integer types such as int are signed
> types.
Ok, so the gcc people have added some language. So what? The fact is, that
language has absolutely nothing to do with the C language, and doesn't
change anything at all.
> That is the whole basis for my argument, when you declare something "int,"
> most programmers would consider that to be SIGNED regardless of whether it
> is 32 bits, 13 bits, or 1 bit.
And MY argument is that YOUR argument is CRAP.
The fact is, signs of bitfields, chars and enums aren't well-defined.
That's a FACT.
Your argument makes no sense. It's like arguing against "gravity", or like
arguing against the fact that the earth circles around the sun. It us
STUPID to argue against facts.
Your argument is exactly the same argument that people used to say that
the earth is the center of the universe: "because it makes sense that
we're the center".
The fact that "most programmers" or "it makes sense" doesn't make anything
at all true. Only verifiable FACTS make something true.
There's a great saying: "Reality is that which, when you stop believing in
it, doesn't go away." - Philip K Dick.
So wake up and smell the coffee: reality is that bitfields don't have a
specified sign. It's just a FACT. Whether you _like_ it or not is simply
not relevant.
The same is true of "char". You can argue that it should be signed by
default, sicne "short", "int" and "long" are signed. But THAT IS NOT TRUE.
Arguing against reality really isn't very smart.
Linus
On Fri, 9 Feb 2007, Sergei Organov wrote:
>
> As far as I can read the C99 standard, the "char", "signed char", and
> "unsigned char", are all different types:
Indeed. Search for "pseudo-unsigned", and you'll see more. There are
actually cases where "char" can act differently from _both_ "unsigned
char" and "signed char".
> If so, then the code above is broken no matter what representation of
> "char" is chosen for given arch, as strcmp() takes "char*" arguments,
> that are not compatible with either "signed char*" or "unsigned char*".
..and my argument is that a warning which doesn't allow you to call
"strlen()" on a "unsigned char" array without triggering is a bogus
warning, and must be removed.
Which is all I've ever said from the beginning.
I've never disputed that "char" and "unsigned char" are different types.
They *clearly* are different types.
But that doesn't make the warning valid. There is a real need for NOT
warning about sign differences.
But this argument has apparently again broken down and become a "language
lawyer" argument instead of being about what a reasonable compiler
actually can and should do.
As long as gcc warns about idiotic things, the kernel will keep shutting
that warning off. I'd _like_ to use the warning, but if the gcc
implementation is on crack, we clearly cannot.
There's simply an issue of "quality of implementation". In the case of
this warning, gcc is just not very high quality.
Linus
On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote:
>On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>> On Feb 8 2007 16:42, Linus Torvalds wrote:
>>>
>> Further, giving again answer to the question whether they generate
>> signed or unsigned comparisons: Have you ever seen a computer which
>> addresses memory with negative numbers? Since the answer is most likely
>> no, signed comparisons would
>
>Yes, most all do. Indexed memory operands are signed displacements. See
>page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This
I was referring to "absolute memory", not the offset magic that assembler
allows. After all, (reg+relativeOffset) will yield an absolute address.
What I was out at: for machines that have more than 2 GB of memory, you
don't call the address that is given by 0x80000000U actually "byte
-2147483648", but "byte 2147483648".
>gets hidden by higher-level languages such as 'C'. It is also non-obvious
>when using the AT&T (GNU) assembly language. However, the Intel documentation
>shows it clearly:
> MOV AL, [EBX+0xFFFFFFFF] ; -1
It can be debated what exactly this is... negative offset or "using
overflow to get where we want".
Jan
--
ft: http://freshmeat.net/p/chaostables/
On Thu, 8 Feb 2007 22:27:43 -0500, "D. Hazelton" <[email protected]> wrote:
> So almost all the rules around the signs of types are because of a single,
> historical machine. Hence the rules about "char" being unsigned by default
> and "int" being signed by default are because of the nature of the PDP-11.
>[...]
> Now: There is no reason for the behavior that came from the nature of the
> PDP11 to have survived, but because it was in "The White Book" it made it
> through the ANSI standardization process.
>
> Now that this history lesson is over...
... we can remember that char is signed on PDP-11. Mvu hah hah hah ha,
only serious. Check how movb works.
-- Pete
On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>
> On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote:
>> On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>>> On Feb 8 2007 16:42, Linus Torvalds wrote:
>>>>
>>> Further, giving again answer to the question whether they generate
>>> signed or unsigned comparisons: Have you ever seen a computer which
>>> addresses memory with negative numbers? Since the answer is most likely
>>> no, signed comparisons would
>>
>> Yes, most all do. Indexed memory operands are signed displacements. See
>> page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This
>
> I was referring to "absolute memory", not the offset magic that assembler
> allows. After all, (reg+relativeOffset) will yield an absolute address.
> What I was out at: for machines that have more than 2 GB of memory, you
> don't call the address that is given by 0x80000000U actually "byte
> -2147483648", but "byte 2147483648".
Don't make any large bets on that!
char foo()
{
volatile char *p = (char *)0x80000000;
return *p;
}
Optimized....
.file "zzz.c"
.text
.p2align 2,,3
.globl foo
.type foo, @function
foo:
pushl %ebp
movb -2147483648, %al
movl %esp, %ebp
movsbl %al,%eax
leave
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"
Not optimized...
.file "zzz.c"
.text
.globl foo
.type foo, @function
foo:
pushl %ebp
movl %esp, %ebp
subl $4, %esp
movl $-2147483648, -4(%ebp)
movl -4(%ebp), %eax
movb (%eax), %al
movsbl %al,%eax
leave
ret
.size foo, .-foo
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"
... Still negative numbers!
>
>> gets hidden by higher-level languages such as 'C'. It is also non-obvious
>> when using the AT&T (GNU) assembly language. However, the Intel documentation
>> shows it clearly:
>> MOV AL, [EBX+0xFFFFFFFF] ; -1
>
> It can be debated what exactly this is... negative offset or "using
> overflow to get where we want".
>
No debate at all, and no overflow. The construction is even used in
'C' to optimize memory access while unrolling loops:
/*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
/*
* This converts an array of integers to an array of double floats.
*/
void int2float(double *fp, int *ip, size_t len)
{
while(len >= 0x10)
{
fp += 0x10;
ip += 0x10;
len -= 0x10;
fp[-0x10] = (double) ip[-0x10];
fp[-0x0f] = (double) ip[-0x0f];
fp[-0x0e] = (double) ip[-0x0e];
fp[-0x0d] = (double) ip[-0x0d];
fp[-0x0c] = (double) ip[-0x0c];
fp[-0x0b] = (double) ip[-0x0b];
fp[-0x0a] = (double) ip[-0x0a];
fp[-0x09] = (double) ip[-0x09];
fp[-0x08] = (double) ip[-0x08];
fp[-0x07] = (double) ip[-0x07];
fp[-0x06] = (double) ip[-0x06];
fp[-0x05] = (double) ip[-0x05];
fp[-0x04] = (double) ip[-0x04];
fp[-0x03] = (double) ip[-0x03];
fp[-0x02] = (double) ip[-0x02];
fp[-0x01] = (double) ip[-0x01];
}
while(len--)
*fp++ = (double) *ip++;
}
This makes certain that each memory access uses the same type
and the address-calculation for the next memory access can be done
in parallel with the current fetch. This is one of the methods
used to save a few machine cycles. Doesn't mean much until you
end up with large arrays such as used in image processing. Then,
it may be the difference that makes or breaks the project!
>
> Jan
> --
> ft: http://freshmeat.net/p/chaostables/
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote:
>>
>> I was referring to "absolute memory", not the offset magic that assembler
>> allows. After all, (reg+relativeOffset) will yield an absolute address.
>> What I was out at: for machines that have more than 2 GB of memory, you
>> don't call the address that is given by 0x80000000U actually "byte
>> -2147483648", but "byte 2147483648".
>
>Don't make any large bets on that!
>
>char foo()
>{
> volatile char *p = (char *)0x80000000;
> return *p;
>}
>Optimized....
> .file "zzz.c"
> .text
> .p2align 2,,3
>.globl foo
> .type foo, @function
>foo:
> pushl %ebp
> movb -2147483648, %al
> movl %esp, %ebp
> movsbl %al,%eax
> leave
> ret
> .size foo, .-foo
> .section .note.GNU-stack,"",@progbits
> .ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"
00000000 <foo>:
0: 55 push %ebp
1: 0f b6 05 00 00 00 80 movzbl 0x80000000,%eax
8: 89 e5 mov %esp,%ebp
a: 5d pop %ebp
b: 0f be c0 movsbl %al,%eax
e: c3 ret
You do know that there is a bijection between the set of signed [32bit]
integers and unsigned [32bit] integers, don't you?
For the CPU, it's just bits. Being signed or unsigned is not important
when just accessing memory. It will, when a comparison is involved, but
that was not the point here. void* comparisons are unsigned. Period.
Because a compiler doing signed comparisons will "map" the memory [from
2 GB to 4 GB] as part of the signed comparison before the memory [from 0
GB to 2 GB], which collides with - let's call it - "the world view".
Jan
--
ft: http://freshmeat.net/p/chaostables/
Hello!
> void* comparisons are unsigned. Period.
As far as the C standard is concerned, there is no relationship between
comparison on pointers and comparison of their values casted to uintptr_t.
The address space needn't be linear and on some machines it isn't. So
speaking about signedness of pointer comparisons doesn't make sense,
except for concrete implementations.
Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Top ten reasons to procrastinate: 1.
Linus Torvalds <[email protected]> writes:
> On Fri, 9 Feb 2007, Sergei Organov wrote:
>>
>> As far as I can read the C99 standard, the "char", "signed char", and
>> "unsigned char", are all different types:
>
> Indeed. Search for "pseudo-unsigned", and you'll see more. There are
> actually cases where "char" can act differently from _both_ "unsigned
> char" and "signed char".
>
>> If so, then the code above is broken no matter what representation of
>> "char" is chosen for given arch, as strcmp() takes "char*" arguments,
>> that are not compatible with either "signed char*" or "unsigned char*".
>
> ..and my argument is that a warning which doesn't allow you to call
> "strlen()" on a "unsigned char" array without triggering is a bogus
> warning, and must be removed.
Why strlen() should be allowed to be called with an incompatible pointer
type? My point is that gcc should issue *different warning*, -- the same
warning it issues here:
$ cat incompat.c
void foo(int *p);
void boo(long *p) { foo(p); }
$ gcc -W -Wall -c incompat.c
incompat.c:2: warning: passing argument 1 of 'foo' from incompatible pointer type
Calling strlen(char*) with "unsigned char*" argument does pass argument
of incompatible pointer type due to the fact that in C "char" and
"unsigned char" are two incompatible types, and it has nothing to do
with signedness.
[...]
-- Sergei.
On Mon, 12 Feb 2007, Sergei Organov wrote:
>
> Why strlen() should be allowed to be called with an incompatible pointer
> type? My point is that gcc should issue *different warning*, -- the same
> warning it issues here:
I agree that "strlen()" per se isn't different.
The issue is not that the warning isn't "technically correct". IT IS.
Nobody should ever argue that the warning isn't "correct". I hope people
didn't think I argued that.
I've argued that the warning is STUPID. That's a totally different thing.
I can say totally idiotic things in perfectly reasonable English grammar
and spelling. Does that make the things I say "good"? No.
The same is true of this gcc warning. It's technically perfectly
reasonable both in English grammar and spelling (well, as far as
any compiler warning ever is) _and_ in "C grammar and spelling" too.
But being grammatically correct does not make it "smart". IT IS STILL
STUPID.
Can people not see the difference between "grammatically correct" and
"intelligent"? People on the internet seem to have often acquired the
understanding that "bad grammar and spelling" => "stupid", and yes, there
is definitely some kind of correlation there. But as any logician and
matematician hopefully knows, "a => b" does NOT imply "!a => !b".
Some people think that "warnings are always good". HELL NO!
A warnign is only as good as
(a) the thing it warns about
(b) the thing you can do about it
And THAT is the fundamental problem with that *idiotic* warning. Yes, it's
technically correct. Yes, it's "proper C grammar". But if you can't get
over the hump of realizing that there is a difference between "grammar"
and "intelligent speech", you shouldn't be doing compilers.
So the warning sucks because:
- the thing it warns about (passing "unsigned char" to something that
doesn't specify a sign at all!) is not something that sounds wrong in
the first place. Yes, it's unsigned. But no, the thing it is passed to
didn't specify that it wanted a "signed" thing in the first place. The
"strlen()" function literally says
"I want a char of indeterminate sign"!
which implies that strlen really doesn't care about the sign. The same
is true of *any* function that takes a "char *". Such a function
doesn't care, and fundamentally CANNOT care about the sign, since it's
not even defined!
So the warning fails the (a) criterion. The warning isn't valid,
because the thing it warns about isn't a valid problem!
- it _also_ fails the (b) criterion, because quite often there is nothing
you can do about it. Yes, you can add a cast, but adding a cast
actually causes _worse_ code (but the warning is certainly gone). But
that makes the _other_ argument for the warning totally point-less: if
the reason for the warning was "bad code", then having the warning is
actively BAD, because the end result is actually "worse code".
See? The second point is why it's important to also realize that there is
a lot of real and valid code that actually _does_ pass "strlen()" an
unsigned string. There are tons of reasons for that to happen: the part of
the program that _does_ care wants to use a "unsigned char" array, because
it ends up doing things like "isspace(array[x])", and that is not
well-defined if you use a "char *" array.
So there are lots of reasons to use "unsigned char" arrays for strings.
Look it up. Look up any half-way reasonable man-page for the "isspace()"
kind of functions, and if they don't actually explicitly say that you
should use unsigned characters for it, those man-pages are crap. Because
those functions really *are* defined in "int", but it's the same kind of
namespace that "getchar()" works in (ie "unsigned char" + EOF, where EOF
_usually_ is -1, although other values are certainly technically legal
too).
So:
- in practice, a lot of "good programming" uses "unsigned char" pointers
for doing strings. There are LOTS of reasons for that, but "isspace()"
and friends is the most obvious one.
- if you can't call "strlen()" on your strings without the compiler
warning, there's two choices: the compiler warning is CRAP, or your
program is bad. But as I just showed you, "unsigned char *" is actually
often the *right* thing to use for string work, so it clearly wasn't
the program that was bad.
So *please* understand:
- yes, the warning is "correct" from a C grammatical standpoint
- the warnign is STILL CRAP, because grammar isn't the only thing about a
computer language. Sane usage is MUCH MORE important than any grammar.
Thus ends the sacred teachings of Linus "always right" Torvalds. Go and
ponder these words, and please send me all your money (certified checks
only, please - sending small unmarked bills is against USPS rules) to show
your support of the holy church of good taste.
Linus
On Fri, 9 Feb 2007, Jan Engelhardt wrote:
>
> On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote:
>>>
>>> I was referring to "absolute memory", not the offset magic that assembler
>>> allows. After all, (reg+relativeOffset) will yield an absolute address.
>>> What I was out at: for machines that have more than 2 GB of memory, you
>>> don't call the address that is given by 0x80000000U actually "byte
>>> -2147483648", but "byte 2147483648".
>>
>> Don't make any large bets on that!
>>
>> char foo()
>> {
>> volatile char *p = (char *)0x80000000;
>> return *p;
>> }
>> Optimized....
>> .file "zzz.c"
>> .text
>> .p2align 2,,3
>> .globl foo
>> .type foo, @function
>> foo:
>> pushl %ebp
>> movb -2147483648, %al
>> movl %esp, %ebp
>> movsbl %al,%eax
>> leave
>> ret
>> .size foo, .-foo
>> .section .note.GNU-stack,"",@progbits
>> .ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"
>
> 00000000 <foo>:
> 0: 55 push %ebp
> 1: 0f b6 05 00 00 00 80 movzbl 0x80000000,%eax
> 8: 89 e5 mov %esp,%ebp
> a: 5d pop %ebp
> b: 0f be c0 movsbl %al,%eax
> e: c3 ret
>
> You do know that there is a bijection between the set of signed [32bit]
> integers and unsigned [32bit] integers, don't you?
> For the CPU, it's just bits. Being signed or unsigned is not important
> when just accessing memory.
I would have normally just let that comment go, but it is symptomatic
of a complete misunderstanding of how many (most) processors address
memory. This may come about because modern compilers tend to isolate
programmers from the underlying mechanisms. There isn't a 1:1
correspondence between a signed displacement and an unsigned one.
Signed displacements are mandatory with processors that provide
protection. Here is the reason: assume that you could execute code
anywhere (a flat module) and you didn't have 'C' complaining about
NULL pointers. If you were to execute code at memory location 0,
that told the processor to jump to location -1 in absolute memory, you
need to have the processor generate a bounds error trap, not to wrap
to offset 0xffffffff. The same thing is true if code is executing
at or near offset 0xffffffff. If it jumps to a location that would wrap
past 0xffffffff, one needs to generate a trap as well. Also, except for
the "FAR" jumps and FAR calls, where the segment register is part of
the operand, all programed jumps or calls are relative, i.e., based
upon the current program-counter's value. This means that the program-
counter will have a value added to, or subtracted from, its current
value to control program flow. For such addition to work either as
an addition of signed numbers or as a relative displacement, the
math must use signed (twos compliment) arithmetic.
If you look at the opcode output from the compiler, when performing
conditional jumps, you will note that the jump usually doesn't
include a 32-bit address. Instead, these are mostly short jumps, using
an 8-bit signed displacement (-128 to +127 bytes). Other conditional
jumps use 32-bit signed displacements.
The exact same addressing mechanism exists for memory read and write
accesses. They are all signed displacements.
> It will, when a comparison is involved, but
> that was not the point here. void* comparisons are unsigned. Period.
> Because a compiler doing signed comparisons will "map" the memory [from
> 2 GB to 4 GB] as part of the signed comparison before the memory [from 0
> GB to 2 GB], which collides with - let's call it - "the world view".
>
> Jan
> --
> ft: http://freshmeat.net/p/chaostables/
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips).
New book: http://www.AbominableFirebug.com/
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
Jan Engelhardt <[email protected]> wrote:
| Further, giving again answer to the question whether they generate signed or
| unsigned comparisons: Have you ever seen a computer which addresses memory with
| negative numbers? Since the answer is most likely no, signed comparisons would
| not make sense for me.
The Transputer has (had?) a signed address space:
http://maven.smith.edu/~thiebaut/transputer/chapter2/chap2-3.html
--
Dick Streefland //// Altium BV
[email protected] (@ @) http://www.altium.com
--------------------------------oOO--(_)--OOo---------------------------
Linus Torvalds <[email protected]> writes:
> On Mon, 12 Feb 2007, Sergei Organov wrote:
>>
>> Why strlen() should be allowed to be called with an incompatible pointer
>> type? My point is that gcc should issue *different warning*, -- the same
>> warning it issues here:
>
> I agree that "strlen()" per se isn't different.
I agree that making strxxx() family special is not a good idea. So what
do we do for a random foo(char*) called with an 'unsigned char*'
argument? Silence? Hmmm... It's not immediately obvious that it's indeed
harmless. Yet another -Wxxx option to GCC to silence this particular
case?
> A warnign is only as good as
> (a) the thing it warns about
> (b) the thing you can do about it
>
> And THAT is the fundamental problem with that *idiotic* warning. Yes, it's
> technically correct. Yes, it's "proper C grammar". But if you can't get
> over the hump of realizing that there is a difference between "grammar"
> and "intelligent speech", you shouldn't be doing compilers.
>
> So the warning sucks because:
May I suggest another definition for a warning being entirely sucks?
"The warning is entirely sucks if and only if it never has true
positives." In all other cases it's only more or less sucks, IMHO.
> - the thing it warns about (passing "unsigned char" to something that
> doesn't specify a sign at all!) is not something that sounds wrong in
> the first place. Yes, it's unsigned. But no, the thing it is passed to
> didn't specify that it wanted a "signed" thing in the first place. The
> "strlen()" function literally says
>
> "I want a char of indeterminate sign"!
I'm afraid I don't follow. Do we have a way to say "I want an int of
indeterminate sign" in C? The same way there doesn't seem to be a way
to say "I want a char of indeterminate sign". :( So no, strlen() doesn't
actually say that, no matter if we like it or not. It actually says "I
want a char with implementation-defined sign".
> which implies that strlen really doesn't care about the sign. The same
> is true of *any* function that takes a "char *". Such a function
> doesn't care, and fundamentally CANNOT care about the sign, since it's
> not even defined!
In fact it's implementation-defined, and this may make a difference
here. strlen(), being part of C library, could be specifically
implemented for given architecture, and as architecture is free to
define the sign of "char", strlen() could in theory rely on particular
sign of "char" as defined for given architecture. [Not that I think that
any strlen() implementation actually depends on sign.]
> So the warning fails the (a) criterion. The warning isn't valid,
> because the thing it warns about isn't a valid problem!
Can we assure that no function taking 'char*' ever cares about the sign?
I'm not sure, and I'm not a language lawyer, but if it's indeed the
case, I'd probably agree that it might be a good idea for GCC to extend
the C language so that function argument declared "char*" means either
of "char*", "signed char*", or "unsigned char*" even though there is no
precedent in the language.
BTW, the next logical step would be for "int*" argument to stop meaning
"signed int*" and become any of "int*", "signed int*" or "unsigned
int*". Isn't it cool to be able to declare a function that won't produce
warning no matter what int is passed to it? ;)
> - it _also_ fails the (b) criterion, because quite often there is nothing
> you can do about it. Yes, you can add a cast, but adding a cast
> actually causes _worse_ code (but the warning is certainly gone). But
> that makes the _other_ argument for the warning totally point-less: if
> the reason for the warning was "bad code", then having the warning is
> actively BAD, because the end result is actually "worse code".
>
> See? The second point is why it's important to also realize that there is
> a lot of real and valid code that actually _does_ pass "strlen()" an
> unsigned string.
And here we finally get to the practice...
> There are tons of reasons for that to happen: the part of the program
> that _does_ care wants to use a "unsigned char" array, because it ends
> up doing things like "isspace(array[x])", and that is not well-defined
> if you use a "char *" array.
Yes, indeed. So the real problem of the C language is inconsistency
between strxxx() and isxxx() families of functions? If so, what is
wrong with actually fixing the problem, say, by using wrappers over
isxxx()? Checking... The kernel already uses isxxx() that are macros
that do conversion to "unsigned char" themselves, and a few invocations
of isspace() I've checked pass "char" as argument. So that's not a real
problem for the kernel, right?
> So there are lots of reasons to use "unsigned char" arrays for strings.
> Look it up. Look up any half-way reasonable man-page for the "isspace()"
> kind of functions, and if they don't actually explicitly say that you
> should use unsigned characters for it, those man-pages are crap.
Well, even C99 itself does say it. But the kernel already handles this
issue nicely without resorting to requirement to convert char to
unsigned char, isn't it?
> Because those functions really *are* defined in "int", but it's the
> same kind of namespace that "getchar()" works in (ie "unsigned char" +
> EOF, where EOF _usually_ is -1, although other values are certainly
> technically legal too).
>
> So:
>
> - in practice, a lot of "good programming" uses "unsigned char" pointers
> for doing strings. There are LOTS of reasons for that, but "isspace()"
> and friends is the most obvious one.
As the isxxx() family does not seem to be a real problem, at least in
the context of the kernel source base, I'd like to learn other reasons
to use "unsigned char" for doing strings either in general or
specifically in the Linux kernel.
> - if you can't call "strlen()" on your strings without the compiler
> warning, there's two choices: the compiler warning is CRAP, or your
> program is bad. But as I just showed you, "unsigned char *" is actually
> often the *right* thing to use for string work, so it clearly wasn't
> the program that was bad.
OK, provided there are actually sound reasons to use "unsigned char*"
for strings, isn't it safer to always use it, and re-define strxxx() for
the kernel to take that one as argument? Or are there reasons to use
"char*" (or "signed char*") for strings as well? I'm afraid that if two
or three types are allowed to be used for strings, some inconsistencies
here or there will remain no matter what.
Another option could be to always use "char*" for strings and always compile
the kernel with -funsigned-char. At least it will be consistent with the
C idea of strings being "char*".
-- Sergei.
On 2/13/07, Sergei Organov <[email protected]> wrote:
> May I suggest another definition for a warning being entirely sucks?
> "The warning is entirely sucks if and only if it never has true
> positives." In all other cases it's only more or less sucks, IMHO.
You're totally missing the point. False positives are not a minor
annoyance, they're actively harmful as they hide other _useful_
warnings. So, you really want warnings to be about things that can and
should be fixed. So you really should aim for _zero false positives_
even if you risk not detecting some real positives.
"Pekka Enberg" <[email protected]> writes:
> On 2/13/07, Sergei Organov <[email protected]> wrote:
>> May I suggest another definition for a warning being entirely sucks?
>> "The warning is entirely sucks if and only if it never has true
>> positives." In all other cases it's only more or less sucks, IMHO.
>
> You're totally missing the point.
No, I don't.
> False positives are not a minor annoyance, they're actively harmful as
> they hide other _useful_ warnings.
Every warning I'm aware of do have false positives. They are indeed
harmful, so one takes steps to get rid of them. If a warning had none
false positives, it should be an error, not a warning in the first
place.
> So, you really want warnings to be about things that can and
> should be fixed.
That's the definition of a "perfect" warning, that is actually called
"error".
> So you really should aim for _zero false positives_ even if you risk
> not detecting some real positives.
With almost any warning out there one makes more or less efforts to
suppress the warning where it gives false positives, isn't it? At least
that's my experience so far.
-- Sergei.
On Tue, 13 Feb 2007, Sergei Organov wrote:
> >
> > "I want a char of indeterminate sign"!
>
> I'm afraid I don't follow. Do we have a way to say "I want an int of
> indeterminate sign" in C? The same way there doesn't seem to be a way
> to say "I want a char of indeterminate sign".
You're wrong.
Exactly because "char" *by*definition* is "indeterminate sign" as far as
something like "strlen()" is concerned.
"char" is _special_. Char is _different_. Char is *not* "int".
> So no, strlen() doesn't actually say that, no matter if we like it or
> not. It actually says "I want a char with implementation-defined sign".
You're arguing some strange semantic difference in the English language.
I'm not really interested.
THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".
That BY DEFINITION means that "strlen()" cannot care about the sign,
because the sign IS NOT DEFINED UNIVERSALLY!
And if you cannot accept that fact, it's your problem. Not mine.
The warning is CRAP. End of story.
Linus
On 2/13/07, Sergei Organov <[email protected]> wrote:
> With almost any warning out there one makes more or less efforts to
> suppress the warning where it gives false positives, isn't it?
Yes, as long it's the _compiler_ that's doing the effort. You
shouldn't need to annotate the source just to make the compiler shut
up. Once the compiler starts issuing enough false positives, it's time
to turn off that warning completely. Therefore, the only sane strategy
for a warning is to aim for zero false positives.
Pekka
Linus Torvalds <[email protected]> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
>> >
>> > "I want a char of indeterminate sign"!
>>
>> I'm afraid I don't follow. Do we have a way to say "I want an int of
>> indeterminate sign" in C? The same way there doesn't seem to be a way
>> to say "I want a char of indeterminate sign".
>
> You're wrong.
Sure, I knew it from the very beginning ;)
>
> Exactly because "char" *by*definition* is "indeterminate sign" as far as
> something like "strlen()" is concerned.
Thanks, I now understand that you either don't see the difference
between "indeterminate" and "implementation-defined" in this context or
consider it non-essential, so I think I've got your point.
> "char" is _special_. Char is _different_. Char is *not* "int".
Sure.
>
>> So no, strlen() doesn't actually say that, no matter if we like it or
>> not. It actually says "I want a char with implementation-defined sign".
>
> You're arguing some strange semantic difference in the English
> language.
Didn't I further explain what I meant exactly (that you've skipped)?
OK, never mind.
> I'm not really interested.
OK, no problem.
>
> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".
So just don't pass it "unsigned char*". End of story.
>
> That BY DEFINITION means that "strlen()" cannot care about the sign,
> because the sign IS NOT DEFINED UNIVERSALLY!
>
> And if you cannot accept that fact, it's your problem. Not mine.
I never had problems either with strlen() or with this warning, so was
curious why does the warning is such a problem for the kernel. I'm sorry
I took your time and haven't got an answer that I entirely agree
with. Thank you very much for your explanations anyway.
> The warning is CRAP. End of story.
Amen!
-- Sergei.
On Tue, 13 Feb 2007, Sergei Organov wrote:
> > Exactly because "char" *by*definition* is "indeterminate sign" as far as
> > something like "strlen()" is concerned.
>
> Thanks, I now understand that you either don't see the difference
> between "indeterminate" and "implementation-defined" in this context or
> consider it non-essential, so I think I've got your point.
The thing is, "implementation-defined" does actually matter when you can
_depend_ on it.
For example, there's a *huge* difference between "undefined" and
"implementation-defined". A program can actually depend on something like
char c = 0x80;
if (c < 0)
..
always having the *same* behaviour for a particular compiler (with a
particular set of compile flags - some compilers have flags to change the
sign behaviour).
So yes, there "implementation defined" actually has a real and worthwhile
meaning. It is guaranteed to have _some_ semantics within the confines of
that program, and they are defined to be consistent (again, within the
confines of that particular program).
So I agree that "implementation defined" doesn't always mean
"meaningless".
BUT (and this is a big but) within the discussion of "strlen()", that is
no longer true. "strlen()" exists _outside_ of a single particular
implementation. As such, "implementation-defined" is no longer something
that "strlen()" can depend on.
As an example of this argument, try this:
#include <string.h>
#include <stdio.h>
int main(int argc, char **argv)
{
char a1[] = { -1, 0 }, a2[] = { 1, 0 };
printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
return 0;
}
and *before* you compile it, try to guess what the output is.
And when that confuses you, try to compile it using gcc with the
"-funsigned-char" flag (or "-fsigned-char" if you started out on an
architecture where char was unsigned by default)
And when you really *really* think about it afterwards, I think you'll go
"Ahh.. Linus is right". It's more than "implementation-defined": it really
is totally indeterminate for code like this.
(Yeah, the above isn't "strlen()", but it's an even subtler issue with
EXACTLY THE SAME PROBLEM! And one where you can actually see the _effects_
of it)
Linus
"Pekka Enberg" <[email protected]> writes:
> On 2/13/07, Sergei Organov <[email protected]> wrote:
>> With almost any warning out there one makes more or less efforts to
>> suppress the warning where it gives false positives, isn't it?
>
> Yes, as long it's the _compiler_ that's doing the effort.
> You shouldn't need to annotate the source just to make the compiler
> shut up.
Sorry, what do you do with "variable 'xxx' might be used uninitialized"
warning when it's false? Turn it off? Annotate the source? Assign fake
initialization value? Change the compiler so that it does "the effort"
for you? Never encountered false positive from this warning?
> Once the compiler starts issuing enough false positives, it's
> time to turn off that warning completely.
Yes, I don't argue that. I said "otherwise the warning is more or less
sucks", and then it's up to programmers to decide if it's enough sucks
to be turned off. The decision depends on the importance of its true
positives then. Only if warning never has true positives it is
unconditional, total, unhelpful crap, -- that was my point.
> Therefore, the only sane strategy for a warning is to aim for zero
> false positives.
Sure. But unfortunately this in an unreachable aim in most cases.
-- Sergei.
On Tuesday 13 February 2007 2:25 pm, Linus Torvalds wrote:
> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".
>
> That BY DEFINITION means that "strlen()" cannot care about the sign,
> because the sign IS NOT DEFINED UNIVERSALLY!
>
> And if you cannot accept that fact, it's your problem. Not mine.
>
> The warning is CRAP. End of story.
In busybox we fed the compiler -funsigned-char to make it shut up. (And so we
had consistent bugs between arm and x86.)
Building tcc I had to feed it -fsigned-char because -funsigned char broke
stuff. :)
Rob
--
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exupery
On Tue, Feb 13, 2007 at 11:29:44PM +0300, Sergei Organov wrote:
> Sorry, what do you do with "variable 'xxx' might be used uninitialized"
> warning when it's false? Turn it off? Annotate the source? Assign fake
> initialization value? Change the compiler so that it does "the effort"
> for you? Never encountered false positive from this warning?
You pull git://git.kernel.org/...jgarzik/misc-2.6.git#gccbug
Jeff
On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
> I agree that making strxxx() family special is not a good idea. So what
> do we do for a random foo(char*) called with an 'unsigned char*'
> argument? Silence? Hmmm... It's not immediately obvious that it's indeed
> harmless. Yet another -Wxxx option to GCC to silence this particular
> case?
Silence would be good. "char *" has a special status in C, it can be:
- pointer to a char/to an array of chars (standard interpretation)
- pointer to a string
- generic pointer to memory you can read(/write)
Check the aliasing rules if you don't believe be on the third one.
And it's *way* more often cases 2 and 3 than 1 for the simple reason
that the signedness of char is unpredictable. As a result, a
signedness warning between char * and (un)signed char * is 99.99% of
the time stupid.
> May I suggest another definition for a warning being entirely sucks?
> "The warning is entirely sucks if and only if it never has true
> positives." In all other cases it's only more or less sucks, IMHO.
That means a warning that triggers on every line saying "there may be
a bug there" does not entirely suck?
> I'm afraid I don't follow. Do we have a way to say "I want an int of
> indeterminate sign" in C?
Almost completely. The rules on aliasing say you can convert pointer
between signed and unsigned variants and the accesses will be
unsurprising. The only problem is that the implicit conversion of
incompatible pointer parameters to a function looks impossible in the
draft I have. Probably has been corrected in the final version.
In any case, having for instance unsigned int * in a prototype really
means in the language "I want a pointer to integers, and I'm probably
going to use it them as unsigned, so beware". For the special case of
char, since the beware version would require a signed or unsigned tag,
it really means indeterminate.
C is sometimes called a high-level assembler for a reason :-)
> The same way there doesn't seem to be a way
> to say "I want a char of indeterminate sign". :( So no, strlen() doesn't
> actually say that, no matter if we like it or not. It actually says "I
> want a char with implementation-defined sign".
In this day and age it means "I want a 0-terminated string".
Everything else is explicitely signed char * or unsigned char *, often
through typedefs in the signed case.
> In fact it's implementation-defined, and this may make a difference
> here. strlen(), being part of C library, could be specifically
> implemented for given architecture, and as architecture is free to
> define the sign of "char", strlen() could in theory rely on particular
> sign of "char" as defined for given architecture. [Not that I think that
> any strlen() implementation actually depends on sign.]
That would require pointers tagged in a way or another, you can't
distinguish between pointers to differently-signed versions of the
same integer type otherwise (they're required to have same size and
alignment). You don't have that on modern architectures.
> Can we assure that no function taking 'char*' ever cares about the sign?
> I'm not sure, and I'm not a language lawyer, but if it's indeed the
> case, I'd probably agree that it might be a good idea for GCC to extend
> the C language so that function argument declared "char*" means either
> of "char*", "signed char*", or "unsigned char*" even though there is no
> precedent in the language.
It's a warning you're talking about. That means it is _legal_ in the
language (even if maybe implementation defined, but legal still).
Otherwise it would be an error.
> BTW, the next logical step would be for "int*" argument to stop meaning
> "signed int*" and become any of "int*", "signed int*" or "unsigned
> int*". Isn't it cool to be able to declare a function that won't produce
> warning no matter what int is passed to it? ;)
No, it wouldn't be logical, because char is *special*.
> Yes, indeed. So the real problem of the C language is inconsistency
> between strxxx() and isxxx() families of functions? If so, what is
> wrong with actually fixing the problem, say, by using wrappers over
> isxxx()? Checking... The kernel already uses isxxx() that are macros
> that do conversion to "unsigned char" themselves, and a few invocations
> of isspace() I've checked pass "char" as argument. So that's not a real
> problem for the kernel, right?
Because a cast to silence a warning silences every possible warning
even if the then-pointer turns for instance into an integer through an
unrelated change. Think for instance about an error_t going from
const char * (error string) to int (error code) through a patch, which
happened to be passed to an utf8_to_whatever conversion function that
takes an const unsigned char * as a parameter. Casting would hide the
impact of changing the type.
> As the isxxx() family does not seem to be a real problem, at least in
> the context of the kernel source base, I'd like to learn other reasons
> to use "unsigned char" for doing strings either in general or
> specifically in the Linux kernel.
Everybody who has ever done text manipulation in languages other than
english knows for a fact that chars must be unsigned, always. The
current utf8 support frenzy is driving that home even harder.
> OK, provided there are actually sound reasons to use "unsigned char*"
> for strings, isn't it safer to always use it, and re-define strxxx() for
> the kernel to take that one as argument? Or are there reasons to use
> "char*" (or "signed char*") for strings as well? I'm afraid that if two
> or three types are allowed to be used for strings, some inconsistencies
> here or there will remain no matter what.
"blahblahblah"'s type is const char *.
> Another option could be to always use "char*" for strings and always compile
> the kernel with -funsigned-char. At least it will be consistent with the
> C idea of strings being "char*".
The best option is to have gcc stop being stupid. -Funsigned-char
creates an assumption which is actually invisible in the code itself,
making it a ticking bomb for reuse in other projects.
OG.
On Tue, 13 Feb 2007, Sergei Organov wrote:
>
> Sorry, what do you do with "variable 'xxx' might be used uninitialized"
> warning when it's false? Turn it off? Annotate the source? Assign fake
> initialization value? Change the compiler so that it does "the effort"
> for you? Never encountered false positive from this warning?
The thing is, that warning is at least easy to shut up.
You just add a fake initialization. There isn't any real downside.
In contrast, to shut up the idiotic pointer-sign warning, you have to add
a cast.
Now, some people seem to think that adding casts is a way of life, and
think that C programs always have a lot of casts. That's NOT TRUE. It's
actually possible to avoid casts, and good C practice will do that to
quite a high degree, because casts in C are *dangerous*.
A language that doesn't allow arbitrary type-casting is a horrible
language (because it doesn't allow you to "get out" of the language type
system), and typecasts in C are really important. But they are an
important feature that should be used with caution, and as little as
possible, because they (by design, of course) break the type rules.
Now, since the _only_ reason for the -Wpointer-sign warning in the first
place is to warn about breaking type rules, if the way to shut it up is to
break them EVEN MORE, then the warnign is obviously totally broken. IT
CAUSES PEOPLE TO WRITE WORSE CODE! Which was against the whole point of
having the warning in the first place.
This is why certain warnings are fine. For example, the warning about
if (a=b)
..
is obviously warning about totally valid C code, but it's _still_ a fine
warning, because it's actually very easy to make that warning go away AND
IMPROVE THE CODE at the same time. Even if you actually meant to write the
assignment, you can write it as
if ((a = b) != 0)
..
or
a = b;
if (a)
..
both of which are actually more readable.
But if you have
unsigned char *mystring;
len = strlen(mystring);
then please tell me how to fix that warning without making the code
*worse* from a type-safety standpoint? You CANNOT. You'd have to cast it
explicitly (or assing it through a "void *" variable), both of which
actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
See? The warning made no sense to begin with, and it warns about something
where the alternatives are worse than what it warned about.
Ergo. It's a crap warning.
Linus
Olivier Galibert <[email protected]> writes:
> On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
[...]
>> May I suggest another definition for a warning being entirely sucks?
>> "The warning is entirely sucks if and only if it never has true
>> positives." In all other cases it's only more or less sucks, IMHO.
>
> That means a warning that triggers on every line saying "there may be
> a bug there" does not entirely suck?
You got me. This warning is indeed entirely sucks. My definition is
poor.
-- Sergei.
Linus Torvalds <[email protected]> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
>>
>> Sorry, what do you do with "variable 'xxx' might be used uninitialized"
>> warning when it's false? Turn it off? Annotate the source? Assign fake
>> initialization value? Change the compiler so that it does "the effort"
>> for you? Never encountered false positive from this warning?
>
> The thing is, that warning is at least easy to shut up.
>
> You just add a fake initialization. There isn't any real downside.
Yes, there is. There are at least 2 drawbacks. Minor one is
initialization eating cycles. Major one is that if later I change the
code and there will appear a pass that by mistake uses those fake value,
I won't get the warning anymore. The latter drawback is somewhat similar
to the drawbacks of explicit type casts.
[I'd, personally, try to do code reorganization instead so that it
becomes obvious for the compiler that the variable can't be used
uninitialized. Quite often it makes the code better, at least it was my
experience so far.]
> In contrast, to shut up the idiotic pointer-sign warning, you have to add
> a cast.
[Did I already say that I think it's wrong warning to be given in this
particular case, as the problem with the code is not in signedness?]
1. Don't try to hide the warning, at least not immediately, -- consider
fixing the code first. Ah, sorry, you've already decided for yourself
that the warning is idiotic, so there is no reason to try to...
2. If you add a cast, it's not in contrast, it's rather similar to fake
initialization above as they have somewhat similar drawback.
> Now, some people seem to think that adding casts is a way of life, and
> think that C programs always have a lot of casts.
Hopefully I'm not one of those.
[...]
> But if you have
>
> unsigned char *mystring;
>
> len = strlen(mystring);
>
> then please tell me how to fix that warning without making the code
> *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it
> explicitly (or assing it through a "void *" variable), both of which
> actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
Because instead of trying to fix the code, you insist on hiding the
warning. There are at least two actual fixes:
1. Do 'char *mystring' instead, as otherwise compiler can't figure out
that you are going to use 'mystring' to point to zero-terminated
array of characters.
2. Use "len = ustrlen(mystring)" if you indeed need something like C
strings, but built from "unsigned char" elements. Similar to (1), this
will explain your intent to those stupid compiler. Should I also
mention that due to the definition of strlen(), the implementation of
ustrlen() is a one-liner (thnx Andrew):
static inline size_t ustrlen(const unsigned char *s)
{
return strlen((const char *)s);
}
Overall, describe your intent to the compiler more clearly, as reading
programmer's mind is not the kind of business where compilers are
strong.
> See? The warning made no sense to begin with, and it warns about something
> where the alternatives are worse than what it warned about.
>
> Ergo. It's a crap warning.
No, I'm still not convinced. Well, I'll try to explain my point
differently. What would you think about the following line, should you
encounter it in real code:
len = strlen(my_tiny_integers);
I'd think that as 'my_tiny_integers' is used as an argument to strlen(),
it might be zero-terminated array of characters. But why does it have
such a name then?! Maybe it in fact holds an array of integers where 0
is not necessarily terminating, and the intent of this code is just to
find first occurrence of 0? It's confusing and I'd probably go check the
declaration. Declaration happens to be "unsigned char *my_tiny_integers"
that at least is rather consistent with the variable name, but doesn't
resolve my initial doubts. So I need to go check other usages of this
'my_tiny_integers' variable to figure out what's actually going on
here. Does it sound as a good programming practice? I don't think so.
Now, you probably realize that in the example you gave above you've
effectively declared "mystring" to be a pointer to
"tiny_unsigned_integer". As compiler (fortunately) doesn't guess intent
from variable names, it can't give us warning at the point of
declaration, but I still think we must be thankful that it gave us a
warning (even though gcc gives a wrong kind of warning) at the point of
dubious use.
-- Sergei.
Linus Torvalds <[email protected]> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
[...]
> BUT (and this is a big but) within the discussion of "strlen()", that is
> no longer true. "strlen()" exists _outside_ of a single particular
> implementation. As such, "implementation-defined" is no longer something
> that "strlen()" can depend on.
>
> As an example of this argument, try this:
>
> #include <string.h>
> #include <stdio.h>
>
> int main(int argc, char **argv)
> {
> char a1[] = { -1, 0 }, a2[] = { 1, 0 };
>
> printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
> return 0;
> }
>
> and *before* you compile it, try to guess what the output is.
Well, I'll try to play fair, so I didn't yet compile it. Now, strcmp()
is defined in the C standard so that its behavior doesn't depend on the
sign of char:
"The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
compared."
[Therefore, at least I don't need to build GCC multilib'ed on
-fsigned/unsigned-char to get consistent results even if strcmp() in
fact lives in a library, that was my first thought before I referred to
the standard ;)]
Suppose the char is signed. Then a1[0] < a2[0] (= -1 < 1) should be
true. On 2's-complement implementation with 8bit char, -1 converted by
strcmp() to unsigned char should be 0xFF, and 1 converted should be
1. So strcmp() should be equivalent to (0xFF < 1) that is false. So
I'd expect
1 0
result on implementation with signed char.
Now suppose the char is unsigned. Then on 2's-complement implementation
with 8bit-byte CPU, a1[0] should be 0xFF, and a2[0] should be 1. The
result from strcmp() won't change. So I'd expect
0 0
result on implementation with unsigned char.
Now I'm going to compile it (I must admit I'm slightly afraid to get
surprising results, so I've re-read my above reasonings before
compiling):
osv@osv tmp$ cat strcmp.c
#include <stdio.h>
#include <string.h>
int main()
{
char a1[] = { -1, 0 }, a2[] = { 1, 0 };
printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
return 0;
}
osv@osv tmp$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
...
gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)
osv@osv tmp$ gcc -O2 strcmp.c -o strcmp && ./strcmp
1 0
> And when that confuses you,
It didn't, or did I miss something? Is char unsigned by default?
> try to compile it using gcc with the
> "-funsigned-char" flag (or "-fsigned-char" if you started out on an
> architecture where char was unsigned by default)
osv@osv tmp$ gcc -O2 -fsigned-char strcmp.c -o strcmp && ./strcmp
1 0
osv@osv tmp$ gcc -O2 -funsigned-char strcmp.c -o strcmp && ./strcmp
0 0
osv@osv tmp$
Due to above, apparently char is indeed signed by default, so what?
> And when you really *really* think about it afterwards, I think you'll go
> "Ahh.. Linus is right". It's more than "implementation-defined": it really
> is totally indeterminate for code like this.
The fact is that strcmp() is explicitly defined in the C standard so
that it will bring the same result no matter what the sign of "char"
type is. Therefore, it obviously can't be used to determine the sign of
"char", so from the POV of usage of strcmp(), the sign of its argument
is indeed "indeterminate". What I fail to see is how this fact could
help in proving that for any function taking "char*" argument the sign
of char is indeterminate.
Anyway, it seems that you still miss (or ignore) my point that it's not
(only) sign of "char" that makes it suspect to call functions requesting
"char*" argument with "unsigned char*" value.
-- Sergei.
On Thu, 15 Feb 2007, Sergei Organov wrote:
>
> Yes, there is. There are at least 2 drawbacks. Minor one is
> initialization eating cycles. Major one is that if later I change the
> code and there will appear a pass that by mistake uses those fake value,
> I won't get the warning anymore. The latter drawback is somewhat similar
> to the drawbacks of explicit type casts.
I actually agree - it would clearly be *nicer* if gcc was accurate with
the warning. I'm just saying that usually the patch to shut gcc up is
quite benign.
But yes, it basically disabled the warning, and just *maybe* the warning
was valid, and it disabled it by initializing something to the wrong
value.
That's why warnings with false positives are often much worse than not
having the warning at all: you start dismissing them, and not even
bothering to fix them "properly". That's especially true if there _isn't_
a proper fix.
I personally much prefer a compiler that doesn't warn a lot, but *when* it
warns, it's 100% on the money. I think that compilers that warn about
things that "may" be broken are bogus. It needs to be iron-clad, with no
question about whether the warning is appropriate!
Which is, of course, why we then shut up some gcc warnings, and which gets
us back to the start of the discussion.. ;)
> [I'd, personally, try to do code reorganization instead so that it
> becomes obvious for the compiler that the variable can't be used
> uninitialized. Quite often it makes the code better, at least it was my
> experience so far.]
It's true that it sometimes works that way. Not always, though.
The gcc "uninitialized variable" thing is *usually* a good sign that the
code wasn't straightforward for the compiler to follow, and if the
compiler cannot follow it, then probably neither can most programmers. So
together with the fact that it's not at least _syntactically_ ugly to shut
up, it's not a warning I personally object to most of the time (unlike,
say, the pointer-sign one ;)
> [Did I already say that I think it's wrong warning to be given in this
> particular case, as the problem with the code is not in signedness?]
>
> 1. Don't try to hide the warning, at least not immediately, -- consider
> fixing the code first. Ah, sorry, you've already decided for yourself
> that the warning is idiotic, so there is no reason to try to...
This is actually something we've tried.
The problem with that approach is that once you have tons of warnings that
are "expected", suddenly *all* warnings just become noise. Not just the
bogus one. It really *is* a matter of: warnings should either be 100%
solid, or they should not be there at all.
And this is not just about compiler warnings. We've added some warnings of
our own (well, with compiler help, of course): things like the
"deprecated" warnings etc. I have often ended up shutting them up, and one
reason for that is that yes, I have literally added bugs to the kernel
that the compiler really *did* warn about, but because there were so many
other pointless warnings, I didn't notice!
I didn't make that up. I forget what the bug was, but it literally wasn't
more than a couple of months ago that I passed in the wrong type to a
function, and the compiler warned about it in big letters. It was just
totally hidden by all the other warning crap.
> 2. If you add a cast, it's not in contrast, it's rather similar to fake
> initialization above as they have somewhat similar drawback.
I agree that it's "unnecessary code", and in many ways exactly the same
thing. I just happen to believe that casts tend to be a lot more dangerous
than extraneous initializations. Casts have this nasty tendency of hiding
*other* problems too (ie you later change the thing you cast completely,
and now the compiler doesn't warn about a *real* bug, because the cast
will just silently force it to a wrong type).
And yeah, that may well be a personal hangup of mine. A lot of people
think casts in C are normal. Me, I actually consider C to be a type-safe
language (!) but it's only type-safe if you are careful, and avoiding
casts is one of the rules.
Others claim that C isn't type-safe at all, and I think it comes from
them having been involved in projects where people didn't have the same
"casts are good, but only when you use them really really carefully".
> > But if you have
> >
> > unsigned char *mystring;
> >
> > len = strlen(mystring);
> >
> > then please tell me how to fix that warning without making the code
> > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it
> > explicitly (or assing it through a "void *" variable), both of which
> > actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
>
> Because instead of trying to fix the code, you insist on hiding the
> warning.
No. I'm saying that I have *reasons* that I need my string to be unsigned.
That makes your first "fix" totally bogus:
> There are at least two actual fixes:
>
> 1. Do 'char *mystring' instead, as otherwise compiler can't figure out
> that you are going to use 'mystring' to point to zero-terminated
> array of characters.
And the second fix is a fix, but it is, again, worse than the warning:
> 2. Use "len = ustrlen(mystring)" if you indeed need something like C
> strings
Sure, I can do that, but the upside is what?
Getting rid of a warning that was bogus to begin with?
Linus
Linus Torvalds <[email protected]> writes:
> On Thu, 15 Feb 2007, Sergei Organov wrote:
[...Skip things I agree with...]
>> > But if you have
>> >
>> > unsigned char *mystring;
>> >
>> > len = strlen(mystring);
>> >
>> > then please tell me how to fix that warning without making the code
>> > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it
>> > explicitly (or assing it through a "void *" variable), both of which
>> > actually MAKE THE TYPE-CHECKING PROBLEM WORSE!
>>
>> Because instead of trying to fix the code, you insist on hiding the
>> warning.
>
> No. I'm saying that I have *reasons* that I need my string to be unsigned.
> That makes your first "fix" totally bogus:
Somehow I still suppose that most of those warnings could be fixed by
using "char*" instead. Yes, I believe you, you have reasons. But still I
only heard about isxxx() that appeared to be not a reason in practice
(in the context of the kernel tree). As you didn't tell about other
reasons, my expectation is that they are not in fact very common. I'm
curious what are actual reasons besides the fact that quite a lot of
code is apparently written this way in the kernel. The latter is indeed
a very sound reason for the warning to be sucks for kernel developers,
but it might be not a reason for it to be sucks in general.
>> There are at least two actual fixes:
>>
>> 1. Do 'char *mystring' instead, as otherwise compiler can't figure out
>> that you are going to use 'mystring' to point to zero-terminated
>> array of characters.
>
> And the second fix is a fix, but it is, again, worse than the warning:
>
>> 2. Use "len = ustrlen(mystring)" if you indeed need something like C
>> strings
>
> Sure, I can do that, but the upside is what?
1. You make it explicit, in a type-safe manner, that you are sure it's
safe to call "strlen()" with "unsigned char*" argument. We all agree
about it. Let's write it down in the program. And we can already add
"strcmp()" to the list. And, say, ustrdup() and ustrchr() returning
"unsigned char *", would be more type-safe either.
2. You make it more explicit that you indeed mean to use your own
representation of strings that is different than what C idiomatically
uses for strings, along with explicitly defined allowed set of
operations on this representation.
The actual divergence of opinion seems to be that you are sure it's safe
to call any foo(char*) with "unsigned char*" argument, and I'm not, due
to the reasons described below. If you are right, then there is indeed
little or no point in doing this work.
> Getting rid of a warning that was bogus to begin with?
I agree that if the warning has no true positives, it sucks. The problem
is that somehow I doubt it has none. And the reasons for the doubt are:
1. C standard does explicitly say that "char" is different type from
either "signed char" or "unsigned char". There should be some reason
for that, otherwise they'd simply write that "char" is a synonym for
one of "signed char" or "unsigned char", depending on implementation.
2. If "char" in fact matches one of the other two types for given
architecture, it is definitely different from another one, so
substituting "char" for the different one might be unsafe. As a
consequence, for a portable program it might be unsafe to substitute
"char" for any of signed or unsigned, as "char" may happen to
differ from any of them.
While the doubts aren't resolved, I prefer to at least be careful with
things that I don't entirely understand, so I wish compiler give me a
warning about subtle semantic differences (if any).
-- Sergei.
On Thu, 15 Feb 2007, Sergei Organov wrote:
>
> I agree that if the warning has no true positives, it sucks. The problem
> is that somehow I doubt it has none. And the reasons for the doubt are:
Why do you harp on "no true positives"?
That's a pointless thing. You can make *any* warning have "true
positives". My personal favorite is the unconditional warning:
warning: user is an idiot
and I _guarantee_ you that it has a lot of true positives.
It's the "no false negatives" angle you should look at.
THAT is what matters. The reason we don't see a lot of warnings about
idiot users is not that people don't do stupid things, but that
*sometimes* they actually don't do something stupid.
Yeah, I know, it's far-fetched, but still.
In other words, you're barking up *exactly* the wrong tree. You're looking
at it the wrong way around.
Think of it this way: in science, a theory is proven to be bad by a single
undeniable fact just showing that it's wrong.
The same is largely true of a warning. If the warning sometimes happens
for code that is perfectly fine, the warning is bad.
Linus
Olivier Galibert <[email protected]> writes:
> On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote:
>> I agree that making strxxx() family special is not a good idea. So what
>> do we do for a random foo(char*) called with an 'unsigned char*'
>> argument? Silence? Hmmm... It's not immediately obvious that it's indeed
>> harmless. Yet another -Wxxx option to GCC to silence this particular
>> case?
>
> Silence would be good. "char *" has a special status in C, it can be:
> - pointer to a char/to an array of chars (standard interpretation)
> - pointer to a string
> - generic pointer to memory you can read(/write)
>
> Check the aliasing rules if you don't believe be on the third one.
Generic pointer still reads 'void*' isn't it? If you don't believe me,
check memcpu() and friends. From the POV of aliasing rules, "char*",
"unsigned char*", and "signed char*" are all the same, as aliasing rules
mention "character type", that according to the standard, is defined
like this:
"The three types char, signed char, and unsigned char are collectively
called the character types."
> And it's *way* more often cases 2 and 3 than 1 for the simple reason
> that the signedness of char is unpredictable.
Are the first two different other than by 0-termination? If not, then
differentiate 1 and 2 by signedness doesn't make sense. If we throw
away signedness argument, then yes, 2 is more common than 1, due to
obvious reason that it's much more convenient to use 0-terminated arrays
of characters in C than plain arrays of characters.
For 3, I'd use "unsigned char*" (or, sometimes, maybe, "signed char*") to
do actual access (due to aliasing rules), but "void*" as an argument
type, as arbitrary pointer is spelled "void*" in C.
So, it's 2 that I'd consider the most common usage for "char*". Strings
of characters are idiomatically "char*" in C.
> As a result, a signedness warning between char * and (un)signed char *
> is 99.99% of the time stupid.
As a result of what? Of the fact that strings in C are "char*"? Of the
fact that one can use "char*" to access arbitrary memory? If the latter,
then "char*", "signed char*", and "unsigned char*" are all equal, and
you may well argue that there should be no warning between "signed
char*" and "unsigned char*" as well, as either of them could be used to
access memory.
Anyway, I've already stated that I don't think it's signedness that matters
here. Standard explicitly says "char", "signed char", and "unsigned
char" are all distinct types, so the warning should be about
incompatible pointer instead. Anyway, I even don't dispute those 99.99%
number, I just want somebody to explain me how dangerous those 0.001%
are, and isn't it exactly 0%?
[...]
>> I'm afraid I don't follow. Do we have a way to say "I want an int of
>> indeterminate sign" in C?
>
> Almost completely. The rules on aliasing say you can convert pointer
> between signed and unsigned variants and the accesses will be
> unsurprising.
Only from the point of view of aliasing. Aliasing rules don't disallow
other surprises.
>
> The only problem is that the implicit conversion of incompatible
> pointer parameters to a function looks impossible in the draft I have.
Why do you think it's impossible according to the draft?
> Probably has been corrected in the final version.
I doubt it's impossible either in the draft or in the final version.
> In any case, having for instance unsigned int * in a prototype really
> means in the language "I want a pointer to integers, and I'm probably
> going to use it them as unsigned, so beware".
> For the special case of char, since the beware version would require a
> signed or unsigned tag, it really means indeterminate.
IMHO, it would read "I want a pointer to alphabetic characters, and I'm
probably going to use them as alphabetic characters, so beware". No
signedness involved.
For example, suppose we have 3 function that compare two
zero-terminated arrays:
int compare1(char *c1, char *c2);
int compare2(unsigned char *c1, unsigned char *c2);
int compare3(signed char *c1, signed char *c2);
the first one might compare arguments alphabetically, while the second
and the third compare them numerically. All of the 3 are different, so
passing wrong type of argument to either of them might be dangerous.
> C is sometimes called a high-level assembler for a reason :-)
I'm afraid that those times when it actually was, are far in the past,
for better or worse.
>> The same way there doesn't seem to be a way to say "I want a char of
>> indeterminate sign". :( So no, strlen() doesn't actually say that, no
>> matter if we like it or not. It actually says "I want a char with
>> implementation-defined sign".
>
> In this day and age it means "I want a 0-terminated string".
IMHO, you've forgot to say "of alphabetic characters".
> Everything else is explicitely signed char * or unsigned char *, often
> through typedefs in the signed case.
Those are for signed and unsigned tiny integers, with confusing names,
that in turn are just historical baggage.
>> In fact it's implementation-defined, and this may make a difference
>> here. strlen(), being part of C library, could be specifically
>> implemented for given architecture, and as architecture is free to
>> define the sign of "char", strlen() could in theory rely on particular
>> sign of "char" as defined for given architecture. [Not that I think that
>> any strlen() implementation actually depends on sign.]
>
> That would require pointers tagged in a way or another, you can't
> distinguish between pointers to differently-signed versions of the
> same integer type otherwise (they're required to have same size and
> alignment). You don't have that on modern architectures.
How is it relevant? According to this argument, e.g., foo(int*) can't
actually rely on the sign of its argument, so any warning about
signedness of pointer argument is crap.
>> Can we assure that no function taking 'char*' ever cares about the sign?
>> I'm not sure, and I'm not a language lawyer, but if it's indeed the
>> case, I'd probably agree that it might be a good idea for GCC to extend
>> the C language so that function argument declared "char*" means either
>> of "char*", "signed char*", or "unsigned char*" even though there is no
>> precedent in the language.
>
> It's a warning you're talking about. That means it is _legal_ in the
> language (even if maybe implementation defined, but legal still).
> Otherwise it would be an error.
It's legal to pass "unsigned int*" to function requesting "int*", still
the warning on this is useful?
>> BTW, the next logical step would be for "int*" argument to stop meaning
>> "signed int*" and become any of "int*", "signed int*" or "unsigned
>> int*". Isn't it cool to be able to declare a function that won't produce
>> warning no matter what int is passed to it? ;)
>
> No, it wouldn't be logical, because char is *special*.
Yes, it's special, but in what manner? I fail to see in the standard
that "char" means either of "signed char" or "unsigned char".
>> Yes, indeed. So the real problem of the C language is inconsistency
>> between strxxx() and isxxx() families of functions? If so, what is
>> wrong with actually fixing the problem, say, by using wrappers over
>> isxxx()? Checking... The kernel already uses isxxx() that are macros
>> that do conversion to "unsigned char" themselves, and a few invocations
>> of isspace() I've checked pass "char" as argument. So that's not a real
>> problem for the kernel, right?
>
> Because a cast to silence a warning silences every possible warning
> even if the then-pointer turns for instance into an integer through an
> unrelated change.
The isxxx() family doesn't have pointer arguments, so this is
irrelevant. The isxxx() family, as implemented in the kernel, casts its
argument to "unsigned char", that is the right thing to do.
>> As the isxxx() family does not seem to be a real problem, at least in
>> the context of the kernel source base, I'd like to learn other reasons
>> to use "unsigned char" for doing strings either in general or
>> specifically in the Linux kernel.
>
> Everybody who has ever done text manipulation in languages other than
> english knows for a fact that chars must be unsigned, always. The
> current utf8 support frenzy is driving that home even harder.
Well, maybe, but it won't be C anymore. Strings in C are "char*", and
sign of "char" is implementation-defined. Though you can get what you
are asking for even now by using -funsigned-char.
>> OK, provided there are actually sound reasons to use "unsigned char*"
>> for strings, isn't it safer to always use it, and re-define strxxx() for
>> the kernel to take that one as argument? Or are there reasons to use
>> "char*" (or "signed char*") for strings as well? I'm afraid that if two
>> or three types are allowed to be used for strings, some inconsistencies
>> here or there will remain no matter what.
>
> "blahblahblah"'s type is const char *.
Yes, how did I forgot about it? So be it, strings in C are "char*".
>> Another option could be to always use "char*" for strings and always compile
>> the kernel with -funsigned-char. At least it will be consistent with the
>> C idea of strings being "char*".
>
> The best option is to have gcc stop being stupid.
Any compiler is stupid. AI didn't mature enough yet ;)
> -funsigned-char creates an assumption which is actually invisible in
> the code itself, making it a ticking bomb for reuse in other projects.
Well, on one hand you are asking for char to always be unsigned, but on
the other hand you deny the gcc option that provides exactly that. The
only option that remains is to argue char should always be unsigned to
the C committee, I'm afraid.
-- Sergei.
Sergei Organov <[email protected]> wrote:
> Linus Torvalds <[email protected]> writes:
>> Exactly because "char" *by*definition* is "indeterminate sign" as far as
>> something like "strlen()" is concerned.
>
> Thanks, I now understand that you either don't see the difference
> between "indeterminate" and "implementation-defined" in this context or
> consider it non-essential, so I think I've got your point.
If you don't code for a specific compiler with specific settings, there is
no implementation defining the signedness of char, and each part of the code
using char* will be wrong unless it handles both cases correctly.
Therefore it's either always wrong to call your char* function with char*,
unsigned char* _and_ signed char unless you can guarantee not to overflow any
of them, or it's always correct to call char* functions with any kind of these.
Off cause if you happen to code for specific compiler settings, one signedness
of char will become real and one warning will be legit. And if pigs fly, they
should wear googles to protect their eyes ...
>> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *".
>
> So just don't pass it "unsigned char*". End of story.
Using signed chars for strings is wrong in most countries on earth. It was
wrong when the first IBM PC came out in 1981, and creating a compiler in
1987 defaulting to signed char is a sure sign of originating from an
isolated country and knowing nothing about this planet. Using signed chars
in comparisons is especially wrong, and casting each char to unsigned before
comparing them is likely to be forgotten. Unsigned character strings are
useless because there is no such thing as char(-23), and if these strings
weren't casted to signed inside all IO functions, they wouldn't work correctly.
Only because many programmers don't compare chars, most programs will work
outside the USA. I repeat: Thanks to using signed chars, the programs only
work /by/ /accident/! Promoting the use of signed char strings is promoting
bugs and endangering the stability of all our systems. You should stop this
bullshit now, instead of increasing the pile.
>> That BY DEFINITION means that "strlen()" cannot care about the sign,
>> because the sign IS NOT DEFINED UNIVERSALLY!
>>
>> And if you cannot accept that fact, it's your problem. Not mine.
>
> I never had problems either with strlen() or with this warning, so was
> curious why does the warning is such a problem for the kernel.
The warning is a problem because either it clutters your build log hiding
real bugs, you typecast hiding errors whenever a char* function is called,
or you disable it globally hiding real signedness bugs. Either way you take
it, it's wrong.
--
Funny quotes:
12. He who laughs last thinks slowest.
Fri?, Spammer: [email protected] [email protected]
> From: Linus Torvalds
> Newsgroups: gmane.linux.kernel
> Subject: Re: somebody dropped a (warning) bomb
> Date: Thu, 15 Feb 2007 11:02:32 -0800 (PST)
[]
> Think of it this way: in science, a theory is proven to be bad by a single
> undeniable fact just showing that it's wrong.
If theory is very "beautiful" and it didn't lied so far -- one may
just add one or couple of new particles, weakly interacted, very
weakly interacted, or not interacting at all (with modern
budget$ / man-made energies ;)
So, let me stop this thread and ask those, who have plenty of time to
discuss truly silly things with undoubtedly more experienced kernel
hacker, to fix real bugs, please. Thanks!
____
On Thu, Feb 15, 2007 at 07:57:05AM -0800, Linus Torvalds wrote:
> I agree that it's "unnecessary code", and in many ways exactly the same
> thing. I just happen to believe that casts tend to be a lot more dangerous
> than extraneous initializations. Casts have this nasty tendency of hiding
> *other* problems too (ie you later change the thing you cast completely,
> and now the compiler doesn't warn about a *real* bug, because the cast
> will just silently force it to a wrong type).
Well one way you could cast it and still have the compiler tell you if
the type ever changes is doing something stupid like:
char* unsigned_char_star_to_char_star(unsigned char* in) {
return (char*)in;
}
Then call your strlen with:
strlen(unsigned_char_star_to_char_star(my_unfortunate_unsigned_char_string)
If you ever changed the type the compiler would warn you again since the
convertion function doesn't accept anything other than unsigned char *
being passed in for "convertion". Seems safer than a direct cast since
then you get no type checking anymore.
I hope it doesn't come to this though. Hopefully gcc will change it's
behaviour back or give an option of --dontcomplainaboutcharstar :)
--
Len Sorensen
On 02/15/2007 08:02 PM, Linus Torvalds wrote:
> Think of it this way: in science, a theory is proven to be bad by a
> single undeniable fact just showing that it's wrong.
>
> The same is largely true of a warning. If the warning sometimes
> happens for code that is perfectly fine, the warning is bad.
Slight difference; if a compulsory warning sometimes happens for code
that is perfectly fine, the warning is bad. I do want to be _able_ to
get as many warnings as a compiler can muster though.
Given char's special nature, shouldn't the conclusion of this thread
have long been simply that gcc needs -Wno-char-pointer-sign? (with
whatever default, as far as I'm concerned).
Rene.
Bodo Eggert <[email protected]> writes:
> Sergei Organov <[email protected]> wrote:
>> Linus Torvalds <[email protected]> writes:
[...]
> Using signed chars for strings is wrong in most countries on earth. It was
> wrong when the first IBM PC came out in 1981, and creating a compiler in
> 1987 defaulting to signed char is a sure sign of originating from an
> isolated country and knowing nothing about this planet.
Maybe I'm wrong, but wasn't it the case that C had no "signed char" type
that time, so "char" had to be abused for the "tiny signed integer"?
> Using signed chars in comparisons is especially wrong, and casting
> each char to unsigned before comparing them is likely to be
> forgotten.
If we start talking about the C language, my opinion is that it's C
problem that it allows numeric comparison of "char" variables at
all. If one actually needs to compare alphabetic characters numerically,
he should first cast to required integer type.
> Unsigned character strings are useless because there is no such thing
> as char(-23), and if these strings weren't casted to signed inside all
> IO functions, they wouldn't work correctly.
Didn't you swap "signed" and "unsigned" by mistake in this phrase? Or
are you indeed think that using "unsigned char*" for strings is useless?
> Only because many programmers don't compare chars, most programs will
> work outside the USA.
Comparison of characters being numeric is not a very good property of
the C language.
> I repeat: Thanks to using signed chars, the programs only
> work /by/ /accident/! Promoting the use of signed char strings is promoting
> bugs and endangering the stability of all our systems. You should stop this
> bullshit now, instead of increasing the pile.
Where did you see I promoted using of "singed char strings"?! If you
don't like the fact that in C language characters are "char", strings
are "char*", and the sign of char is implementation-defined, please
argue with the C committee, not with me.
Or use -funsigned-char to get dialect of C that fits your requirements
better.
-- Sergei.
Bodo Eggert <[email protected]> writes:
> Sergei Organov <[email protected]> wrote:
>> Linus Torvalds <[email protected]> writes:
>
>>> Exactly because "char" *by*definition* is "indeterminate sign" as far as
>>> something like "strlen()" is concerned.
>>
>> Thanks, I now understand that you either don't see the difference
>> between "indeterminate" and "implementation-defined" in this context or
>> consider it non-essential, so I think I've got your point.
>
> If you don't code for a specific compiler with specific settings, there is
> no implementation defining the signedness of char, and each part of the code
> using char* will be wrong unless it handles both cases correctly.
The problem here is that due to historical reasons, there could be code
out there that abuses "char" for "signed char" (not sure about "unsigned
char"). Old code and old habits are rather persistent.
> Therefore it's either always wrong to call your char* function with char*,
> unsigned char* _and_ signed char unless you can guarantee not to overflow any
> of them, or it's always correct to call char* functions with any kind
> of these.
How are you sure those who wrote foo(char*) agrees with your opinion or
even understands all the involved issues?
> Off cause if you happen to code for specific compiler settings, one signedness
> of char will become real and one warning will be legit. And if pigs fly, they
> should wear googles to protect their eyes ...
And if people were writing perfect portable code all the time, there
would be no need for warnings in the first place...
-- Sergei.
On Fri, 16 Feb 2007, Sergei Organov wrote:
> Bodo Eggert <[email protected]> writes:
> > Sergei Organov <[email protected]> wrote:
> >> Linus Torvalds <[email protected]> writes:
> [...]
> > Using signed chars for strings is wrong in most countries on earth. It was
> > wrong when the first IBM PC came out in 1981, and creating a compiler in
> > 1987 defaulting to signed char is a sure sign of originating from an
> > isolated country and knowing nothing about this planet.
>
> Maybe I'm wrong, but wasn't it the case that C had no "signed char" type
> that time, so "char" had to be abused for the "tiny signed integer"?
Mentioning tha availability of unsigned char was a change in K&R's
book "The C programming Language" from 1987 to 1988:
"For some time, type unsigned char has been available. The standard
introduces the signed keyword to make signedness explicit for char and
other integral objects."
So if there was no "signed char" before, the blame passes on to those who
didn't introduce it alongside "unsigned char".
> > Using signed chars in comparisons is especially wrong, and casting
> > each char to unsigned before comparing them is likely to be
> > forgotten.
>
> If we start talking about the C language, my opinion is that it's C
> problem that it allows numeric comparison of "char" variables at
> all. If one actually needs to compare alphabetic characters numerically,
> he should first cast to required integer type.
Char values are indexes in a character set. Taking the difference of
indexes is legal. Other languages have ord('x'), but it's only an
expensive typecast (byte('x') does exactly the same thing).
> > Unsigned character strings are useless because there is no such thing
> > as char(-23), and if these strings weren't casted to signed inside all
> > IO functions, they wouldn't work correctly.
>
> Didn't you swap "signed" and "unsigned" by mistake in this phrase? Or
> are you indeed think that using "unsigned char*" for strings is useless?
ACK
> > Only because many programmers don't compare chars, most programs will
> > work outside the USA.
>
> Comparison of characters being numeric is not a very good property of
> the C language.
NACK, as above
> > I repeat: Thanks to using signed chars, the programs only
> > work /by/ /accident/! Promoting the use of signed char strings is promoting
> > bugs and endangering the stability of all our systems. You should stop this
> > bullshit now, instead of increasing the pile.
>
> Where did you see I promoted using of "singed char strings"?!
char strings are often signed char strings, only using unsigned char
prevents this in a portable way. If you care about ordinal values of
your strings, use unsigned.
> If you
> don't like the fact that in C language characters are "char", strings
> are "char*", and the sign of char is implementation-defined, please
> argue with the C committee, not with me.
>
> Or use -funsigned-char to get dialect of C that fits your requirements
> better.
If you restrict yourself to a specific C compiler, you are doomed, and
that's not a good start.
--
What about those red balls they have on car aerials so you can spot your car
in a park. I think all cars should have them!
-- Homer Simpson
On Fri, 16 Feb 2007, Sergei Organov wrote:
> Bodo Eggert <[email protected]> writes:
> > Sergei Organov <[email protected]> wrote:
> >> Linus Torvalds <[email protected]> writes:
> > If you don't code for a specific compiler with specific settings, there is
> > no implementation defining the signedness of char, and each part of the code
> > using char* will be wrong unless it handles both cases correctly.
>
> The problem here is that due to historical reasons, there could be code
> out there that abuses "char" for "signed char" (not sure about "unsigned
> char"). Old code and old habits are rather persistent.
There could be code using trigraphs ... and gcc has an option for that.
If this code uses signed chars, using it on unsigned-char-archs is broken
and should be warned about, but the compiler will not warn about this
because this code will not use "signed char" and therefore it's bug-to-bug
syntax compatible, waiting for a semantic breakdown.
I'll say it again: Either the code using unspecified chars is correct, or
it isn't. If it's correct, neither using with signed nor with unsigned
chars is a bug and you should not warn at all, and if it's not correct,
you should always warn. Instead, gcc warns on "code compiles for $arch".
> > Therefore it's either always wrong to call your char* function with char*,
> > unsigned char* _and_ signed char unless you can guarantee not to overflow any
> > of them, or it's always correct to call char* functions with any kind
> > of these.
>
> How are you sure those who wrote foo(char*) agrees with your opinion or
> even understands all the involved issues?
Let's asume we have this piece of buggy code. We compile it on an unsigned
char architecture. No warning. *BOOM*
Let's asume there is correct code, and we use it as designed:
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: real issue
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
<scroll off screen/>
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
Warning: Wrong arch
You don't see "real issue". *BOOM*
What can you do about this warning? Let's asume we cast everywhere:
struct foo * p;
printf(strlen(char*)p); *BOOM*
Let's asume we disable this warning:
int f(unsigned short x)
{
if (!x)
return 0;
return (int) x + f(x-1);
}
f(-1); *BOOM*
Therefore unless you program for one arch with one set of compiler flags,
this warning is useless, and I did not see much code explicitely designed
to be non-portable.
Warning on wrong signedness is good, but if you can't enable it on
portable code, it's useless.
--
Funny quotes:
39. Ever wonder about those people who spend $2.00 apiece on those little
bottles of Evian water? Try spelling Evian backwards: NAIVE
Bodo Eggert <[email protected]> writes:
> On Fri, 16 Feb 2007, Sergei Organov wrote:
>> Bodo Eggert <[email protected]> writes:
>> > Sergei Organov <[email protected]> wrote:
>> >> Linus Torvalds <[email protected]> writes:
>
[...]
>> If we start talking about the C language, my opinion is that it's C
>> problem that it allows numeric comparison of "char" variables at
>> all. If one actually needs to compare alphabetic characters numerically,
>> he should first cast to required integer type.
>
> Char values are indexes in a character set. Taking the difference of
> indexes is legal. Other languages have ord('x'), but it's only an
> expensive typecast (byte('x') does exactly the same thing).
Why this typecast should be expensive? Is
char c1, c2;
...
if((unsigned char)c1 < (unsigned char)c2)
any more expensive than if(c1 < c2)?
[...]
>> Comparison of characters being numeric is not a very good property of
>> the C language.
>
> NACK, as above
The point was not to entirely disable it or to make it any more
expensive, but to somehow force programmer to be explicit that he indeed
needs numeric comparison of characters' indexes. Well, I do realize that
the above is not in the C "spirit" that is to allow implicit conversions
almost everywhere.
>> > I repeat: Thanks to using signed chars, the programs only
>> > work /by/ /accident/! Promoting the use of signed char strings is promoting
>> > bugs and endangering the stability of all our systems. You should stop this
>> > bullshit now, instead of increasing the pile.
>>
>> Where did you see I promoted using of "singed char strings"?!
>
> char strings are often signed char strings, only using unsigned char
> prevents this in a portable way. If you care about ordinal values of
> your strings, use unsigned.
It means that (as I don't want to change the type of my
strings/characters in case I suddenly start to care about ordinal values
of the characters) I must always use "unsigned char" for storing
characters in practice. Right?
There are at least three problems then:
1. The type of "abc" remains "char*", so effectively I'll have two
different representations of strings.
2. I can't mix two different representations of strings in C without
compromising type safety. In C, the type for characters is "char",
that by definition is different type than "unsigned char".
3. As I now represent characters by unsigned tiny integers, I loose
distinct type for unsigned tiny integers. Effectively this brings me
back in time to the old days when C didn't have distinct type for
signed tiny integers.
Therefore, your "solution" to one problem creates a bunch of
others. Worse yet, you in fact don't solve initial problem of comparing
strings. Take, for example, KOI8-R encoding. Comparing characters from
this encoding using "unsigned char" representation of character indexes
makes no more sense than comparing "signed char" representations, as
both comparisons will bring meaningless results, due to the fact that
the order of (some of) the characters in the table doesn't match their
alphabetical order.
>> If you don't like the fact that in C language characters are "char",
>> strings are "char*", and the sign of char is implementation-defined,
>> please argue with the C committee, not with me.
>>
>> Or use -funsigned-char to get dialect of C that fits your requirements
>> better.
>
> If you restrict yourself to a specific C compiler, you are doomed, and
> that's not a good start.
Isn't it you who insists that "char" should always be unsigned? It's
your problem then to use compiler that understands those imaginary
language that meets the requirement.
-- Sergei.
Rene Herman <[email protected]> writes:
[...]
> Given char's special nature, shouldn't the conclusion of this thread
> have long been simply that gcc needs -Wno-char-pointer-sign? (with
> whatever default, as far as I'm concerned).
I entirely agree that all the char business in C is messy enough to
justify separate warning switch(es) in GCC.
However, I still insist that the problem with the code:
void foo(char *c);
unsigned char *u;
signed char *s;
...
foo(u);
foo(s);
is not (only) in signedness, as neither 'u' nor 's' has compatible type
with the "char*", no matter what is the sign of "char", so if one cares
about type safety he needs warnings on both invocations of foo().
-- Sergei.
Bodo Eggert <[email protected]> writes:
> On Fri, 16 Feb 2007, Sergei Organov wrote:
[...]
> I'll say it again: Either the code using unspecified chars is correct, or
> it isn't. If it's correct, neither using with signed nor with unsigned
> chars is a bug and you should not warn at all, and if it's not correct,
> you should always warn. Instead, gcc warns on "code compiles for
> $arch".
Here is where we disagree. In my opinion, no matter what the sign of
char for given architecture is, it's incorrect to pass either "signed
char*" or "unsigned char*" argument to a function expecting "char*".
>> > Therefore it's either always wrong to call your char* function with char*,
>> > unsigned char* _and_ signed char unless you can guarantee not to overflow any
>> > of them, or it's always correct to call char* functions with any kind
>> > of these.
>>
>> How are you sure those who wrote foo(char*) agrees with your opinion or
>> even understands all the involved issues?
>
> Let's asume we have this piece of buggy code. We compile it on an unsigned
> char architecture. No warning. *BOOM*
There should be warning, -- that's my point. "char" is different. "char"
is distinct from either "signed char" or "unsigned char". Always. At
least it's how C is defined.
> Let's asume there is correct code, and we use it as designed:
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: real issue
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> <scroll off screen/>
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
> Warning: Wrong arch
>
> You don't see "real issue". *BOOM*
>
>
> What can you do about this warning? Let's asume we cast everywhere:
I already gave an answer in response to Linus:
static inline size_t ustrlen(const unsigned char *s)
{
return strlen((const char *)s);
}
>
> struct foo * p;
> printf(strlen(char*)p); *BOOM*
printf(ustrlen(p)); *NO BOOM*
unsigned char* u;
printf(ustrlen(u)); *NO WARNING*
-- Sergei.
Linus Torvalds <[email protected]> writes:
> On Thu, 15 Feb 2007, Sergei Organov wrote:
>>
>> I agree that if the warning has no true positives, it sucks. The problem
>> is that somehow I doubt it has none. And the reasons for the doubt are:
>
> Why do you harp on "no true positives"?
Because if somebody is capable to proof a warning has no true positives,
I immediately agree it's useless. I just wanted to check if it's indeed
the case. It seems it is not.
> That's a pointless thing. You can make *any* warning have "true
> positives". My personal favorite is the unconditional warning:
>
> warning: user is an idiot
>
> and I _guarantee_ you that it has a lot of true positives.
Yes, but there is obviously 0 correlation between the warning and the
user being an idiot (except that non-idiot will probably find a way to
either turn this warning off, or filter it out).
I already agreed that a warning may have true positives and still be
bad; and a warning having "no false negatives" is obviously a pretty
one, though unfortunately not that common in practice; and a warning
that has no true positives is useless. What we are arguing about are
intermediate cases that are most common in practice.
As programmers, we in fact are interested in correlation between a
warning and actual problem in the software, but for the kind of warnings
we are discussing (sign-correctness and type-safety in general), the
correlation depends on the style the program is written is, making it
difficult to use as a criteria in the discussion.
> It's the "no false negatives" angle you should look at.
I'd like to, but then I'll need to classify most of (all?) the GCC
warnings as bad, I'm afraid.
> THAT is what matters. The reason we don't see a lot of warnings about
> idiot users is not that people don't do stupid things, but that
> *sometimes* they actually don't do something stupid.
>
> Yeah, I know, it's far-fetched, but still.
>
> In other words, you're barking up *exactly* the wrong tree. You're looking
> at it the wrong way around.
>
> Think of it this way: in science, a theory is proven to be bad by a single
> undeniable fact just showing that it's wrong.
>
> The same is largely true of a warning. If the warning sometimes happens
> for code that is perfectly fine, the warning is bad.
Consider:
int find_first_zero(const int *a)
{
int index = 0;
while(*a++ != 0)
index++;
return index;
}
unsigned int array[] = { 1, 2, 3, 0, 25, 14 };
int index = find_first_zero(array); *WARNING*
So, by your logic, -Wpointer-sign warning is bad in general? If not,
then how will you "fix" the code above? And the upside of the fix is
what?
IMHO, the "problem" is that this warning is about violation of type
safety. Type safety by itself doesn't guarantee the code is perfectly
fine. Violation of type safety doesn't necessarily mean the code is
broken. A warning that warns about violation of type safety has the same
properties. Now, it's up to you to decide if you want warnings on type
safety or not.
What you are saying, on the other hand, for me reads like this: "I do
want warnings on violations of type safety in general, except for
"signed char" or "unsigned char" being used instead of "char"". I think
I do understand your reasons, -- I just don't agree with them.
-- Sergei.