strcmp("\x01", "\xef") returns 18 but it should return something < 0.
The reason is that the variable holding the result of the subtraction is
too small and overflows.
As strcmp is e.g. used to access data in squashfs this might result in
not finding files.
The same problem is fixed in strncmp.
Signed-off-by: Uwe Kleine-König <[email protected]>
Cc: Michael Buesch <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
Hello,
I didn't hit this problem in the wild, only when checking for something
else. Is this stable material anyhow?
Best regards
Uwe
lib/string.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/string.c b/lib/string.c
index b19b87a..661ff06 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL(strlcat);
#undef strcmp
int strcmp(const char *cs, const char *ct)
{
- signed char __res;
+ int __res;
while (1) {
if ((__res = *cs - *ct++) != 0 || !*cs++)
@@ -266,7 +266,7 @@ EXPORT_SYMBOL(strcmp);
*/
int strncmp(const char *cs, const char *ct, size_t count)
{
- signed char __res = 0;
+ int __res = 0;
while (count) {
if ((__res = *cs - *ct++) != 0 || !*cs++)
--
1.6.5.2
Uwe Kleine-König <[email protected]> writes:
> strcmp("\x01", "\xef") returns 18 but it should return something < 0.
> The reason is that the variable holding the result of the subtraction is
> too small and overflows.
When char is signed strcmp("\xef", "\x01") still returns something < 0.
You need to cast to unsigned char first.
Andreas.
--
Andreas Schwab, [email protected]
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E
"And now for something completely different."
On Tue, 17 Nov 2009, Uwe Kleine-König wrote:
>
> strcmp("\x01", "\xef") returns 18 but it should return something < 0.
> The reason is that the variable holding the result of the subtraction is
> too small and overflows.
No. The reason is that whoever wrote that function is a moron and doesn't
know the standard. And your fix is not correct _either_
The comparison should be done as *unsigned char*. As specified by POSIX
"The sign of a non-zero return value shall be determined by the sign of
the difference between the values of the first pair of bytes (both
interpreted as type unsigned char) that differ in the strings being
compared."
and both the original code and your change gets it wrong in different
ways.
> int strcmp(const char *cs, const char *ct)
> {
> - signed char __res;
> + int __res;
>
> while (1) {
> if ((__res = *cs - *ct++) != 0 || !*cs++)
So this is fundamentally incorrect both with "signed char __res" _and_
with "int __res", because '*cs' and '*ct' are both (possibly - it depends
on the compiler and architecture) signed chars.
So in the case you mention, strcmp() _should_ return a negative value,
because "\x01" is smaller than "\xef", but you have:
- *cs = 1, *ct = (char) 0xef = -17 _OR_ 239 depending on sign of 'char'
and as a result:
- signed char __res = 18 (incorrect, regardless: ct is larger)
- int __res = 18 (incorrect) or -238 (correct) depending on sign of char
so your patch doesn't actually help at all.
What would help is something like the appended, but I have not tested it
AT ALL. It may be total and utter crap too. Maybe it doesn't compile,
maybe it buggers your pet hedgehog. I just don't know.
Linus
---
lib/string.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/lib/string.c b/lib/string.c
index b19b87a..e96421a 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -246,13 +246,17 @@ EXPORT_SYMBOL(strlcat);
#undef strcmp
int strcmp(const char *cs, const char *ct)
{
- signed char __res;
+ unsigned char c1, c2;
while (1) {
- if ((__res = *cs - *ct++) != 0 || !*cs++)
+ c1 = *cs++;
+ c2 = *ct++;
+ if (c1 != c2)
+ return c1 < c2 ? -1 : 1;
+ if (!c1)
break;
}
- return __res;
+ return 0;
}
EXPORT_SYMBOL(strcmp);
#endif
@@ -266,14 +270,18 @@ EXPORT_SYMBOL(strcmp);
*/
int strncmp(const char *cs, const char *ct, size_t count)
{
- signed char __res = 0;
+ unsigned char c1, c2;
while (count) {
- if ((__res = *cs - *ct++) != 0 || !*cs++)
+ c1 = *cs++;
+ c2 = *ct++;
+ if (c1 != c2)
+ return c1 < c2 ? -1 : 1;
+ if (!c1)
break;
count--;
}
- return __res;
+ return 0;
}
EXPORT_SYMBOL(strncmp);
#endif
On Tuesday 17 November 2009 18:41:58 Linus Torvalds wrote:
>
> On Tue, 17 Nov 2009, Uwe Kleine-König wrote:
> >
> > strcmp("\x01", "\xef") returns 18 but it should return something < 0.
> > The reason is that the variable holding the result of the subtraction is
> > too small and overflows.
>
> No. The reason is that whoever wrote that function is a moron and doesn't
> know the standard. And your fix is not correct _either_
>
> The comparison should be done as *unsigned char*. As specified by POSIX
>
> "The sign of a non-zero return value shall be determined by the sign of
> the difference between the values of the first pair of bytes (both
> interpreted as type unsigned char) that differ in the strings being
> compared."
>
> and both the original code and your change gets it wrong in different
> ways.
>
> > int strcmp(const char *cs, const char *ct)
> > {
> > - signed char __res;
> > + int __res;
> >
> > while (1) {
> > if ((__res = *cs - *ct++) != 0 || !*cs++)
>
> So this is fundamentally incorrect both with "signed char __res" _and_
> with "int __res", because '*cs' and '*ct' are both (possibly - it depends
> on the compiler and architecture) signed chars.
>
> So in the case you mention, strcmp() _should_ return a negative value,
> because "\x01" is smaller than "\xef", but you have:
>
> - *cs = 1, *ct = (char) 0xef = -17 _OR_ 239 depending on sign of 'char'
>
> and as a result:
>
> - signed char __res = 18 (incorrect, regardless: ct is larger)
>
> - int __res = 18 (incorrect) or -238 (correct) depending on sign of char
>
> so your patch doesn't actually help at all.
>
> What would help is something like the appended, but I have not tested it
> AT ALL. It may be total and utter crap too. Maybe it doesn't compile,
> maybe it buggers your pet hedgehog. I just don't know.
>
> Linus
>
> ---
> lib/string.c | 20 ++++++++++++++------
> 1 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/lib/string.c b/lib/string.c
> index b19b87a..e96421a 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -246,13 +246,17 @@ EXPORT_SYMBOL(strlcat);
> #undef strcmp
> int strcmp(const char *cs, const char *ct)
> {
> - signed char __res;
> + unsigned char c1, c2;
>
> while (1) {
> - if ((__res = *cs - *ct++) != 0 || !*cs++)
> + c1 = *cs++;
> + c2 = *ct++;
> + if (c1 != c2)
> + return c1 < c2 ? -1 : 1;
> + if (!c1)
> break;
> }
> - return __res;
> + return 0;
> }
Well, that doesn't actually return the difference at all. Is that allowed?
What about this? (Manually hacked pseudo patch)
diff --git a/lib/string.c b/lib/string.c
index b19b87a..661ff06 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL(strlcat);
#undef strcmp
int strcmp(const char *cs, const char *ct)
{
- signed char __res;
+ int __res;
while (1) {
- if ((__res = *cs - *ct++) != 0 || !*cs++)
+ if ((__res = (unsigned char)(*cs) - (unsigned char)(*ct++)) != 0 || !*cs++)
@@ -266,7 +266,7 @@ EXPORT_SYMBOL(strcmp);
*/
--
Greetings, Michael.
On Tue, 17 Nov 2009, Michael Buesch wrote:
>
> Well, that doesn't actually return the difference at all. Is that allowed?
It's in fact the common implementation. Returning -1/0/1 is kind of
polite, since it means that the sign now fits in a minimal type (ie you
can save the end result in a "signed char" and it will _work_, even
though it's not guaranteed by the standard.
Returning any negative or positive number is certainly _allowed_ by the
standard, but I just checked, and glibc does the "polite" -1/0/1 thing. I
suspect many other libraries do too, and it's not like it costs you
anything more.
In fact, the written-out-with-unsigned-char-variables version is also
likely to generate better code than the "clever" one that does just one
subtract, because now the code can do all the comparisons in just
'unsigned char' and never needs to sign-extend the result to 'int'.
Not that it likely matters.
I double-checked, and the code generated from my patch looks sane.
strcmp:
pushq %rbp #
movq %rsp, %rbp #,
.L52:
movb (%rdi), %al #* cs, c1
movb (%rsi), %dl #* ct, c2
incq %rdi # cs
incq %rsi # ct
cmpb %dl, %al # c2, c1
je .L49 #,
sbbl %eax, %eax # D.13150
orl $1, %eax #, D.13150
jmp .L51 #
.L49:
testb %al, %al # c1
jne .L52 #,
xorl %eax, %eax # D.13150
.L51:
leave
ret
which is not horrible (of course, depending on whether you expect to find
differences early or late you might want to have make the "L49" case be
the fallthrough etc).
[ Using sbb+or is a standard x86 trick to get -1/1. You'll also find
"sbb+and" to get 0/value, or "sbb+add" to get value/value+1 ]
Linus
Hello Linus,
On Tue, Nov 17, 2009 at 09:41:58AM -0800, Linus Torvalds wrote:
> On Tue, 17 Nov 2009, Uwe Kleine-K?nig wrote:
> >
> > strcmp("\x01", "\xef") returns 18 but it should return something < 0.
> > The reason is that the variable holding the result of the subtraction is
> > too small and overflows.
>
> No. The reason is that whoever wrote that function is a moron and doesn't
> know the standard. And your fix is not correct _either_
OK, right.
Acked-by: Uwe Kleine-K?nig <[email protected]>
(BTW, this was already broken in 2.4.0, so I was unable to find out who
is the moron :-)
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |
On Tue, 17 Nov 2009, Uwe Kleine-K?nig wrote:
>
> (BTW, this was already broken in 2.4.0, so I was unable to find out who
> is the moron :-)
Yeah, I looked at the history too. It's so old that it might well be me.
Linus
On Tue, 17 Nov 2009, Linus Torvalds wrote:
>
> Yeah, I looked at the history too. It's so old that it might well be me.
In fact, it goes back to at least 1.2.13.
And the copyright dates do imply that they could go back way further.
At least the comment says it all:
"These are buggy as well.."
so I can at least claim that back in the _original_ 0.01 release, it was
correct:
extern inline int strcmp(const char * cs,const char * ct)
{
register int __res __asm__("ax");
__asm__("cld\n"
"1:\tlodsb\n\t"
"scasb\n\t"
"jne 2f\n\t"
"testb %%al,%%al\n\t"
"jne 1b\n\t"
"xorl %%eax,%%eax\n\t"
"jmp 3f\n"
"2:\tmovl $1,%%eax\n\t"
"jl 3f\n\t"
"negl %%eax\n"
"3:"
:"=a" (__res):"D" (cs),"S" (ct):"si","di");
return __res;
}
and as far as I can tell, the above is actually correct, even if you have
to be a bit crazy to write 'strcmp' as gcc inline asm (hey, I wrote _all_
the string routines that way, and I made gcc do some of them built-in.
Because I was a MAN, dammit!).
So the bug was apparently introduced when we went portable.
Linus
It's been like that for awhile.
http://www.linuxhq.com/kernel/v1.1/75/lib/string.c
Michael Buesch <[email protected]> writes:
> Well, that doesn't actually return the difference at all. Is that allowed?
The only requirement is that the compare functions return <0,0,>0
depending on the order.
Andreas.
--
Andreas Schwab, [email protected]
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
From: Linus Torvalds <[email protected]>
signed char __res = *cs - *ct;
is wrong for two reasons. The subtraction can overflow because __res
doesn't use a type big enough. Moreover the compared bytes should be
interpreted as unsigned char as specified by POSIX.
The same problem is fixed in strncmp.
Signed-off-by: Uwe Kleine-König <[email protected]>
Cc: Michael Buesch <[email protected]>
Cc: Andreas Schwab <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
Hello Linus,
I resend with a (hopefully) nice commit log as this didn't appear in
your tree up to now.
Best regards
Uwe
lib/string.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/lib/string.c b/lib/string.c
index b19b87a..e96421a 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -246,13 +246,17 @@ EXPORT_SYMBOL(strlcat);
#undef strcmp
int strcmp(const char *cs, const char *ct)
{
- signed char __res;
+ unsigned char c1, c2;
while (1) {
- if ((__res = *cs - *ct++) != 0 || !*cs++)
+ c1 = *cs++;
+ c2 = *ct++;
+ if (c1 != c2)
+ return c1 < c2 ? -1 : 1;
+ if (!c1)
break;
}
- return __res;
+ return 0;
}
EXPORT_SYMBOL(strcmp);
#endif
@@ -266,14 +270,18 @@ EXPORT_SYMBOL(strcmp);
*/
int strncmp(const char *cs, const char *ct, size_t count)
{
- signed char __res = 0;
+ unsigned char c1, c2;
while (count) {
- if ((__res = *cs - *ct++) != 0 || !*cs++)
+ c1 = *cs++;
+ c2 = *ct++;
+ if (c1 != c2)
+ return c1 < c2 ? -1 : 1;
+ if (!c1)
break;
count--;
}
- return __res;
+ return 0;
}
EXPORT_SYMBOL(strncmp);
#endif
--
1.6.5.2