With CONFIG_KPROBES=y on Ubuntu 9.10 x86_64 default install, I get the
following:
Error: Your awk doesn't support charactor-class.
Please try to use gawk.
CC arch/x86/lib/inat.o
arch/x86/lib/inat.c:24:25: error: inat-tables.c: No such file or directory
arch/x86/lib/inat.c: In function "inat_get_opcode_attribute":
arch/x86/lib/inat.c:29: error: "inat_primary_table" undeclared (first use in this function)
arch/x86/lib/inat.c:29: error: (Each undeclared identifier is reported only once
arch/x86/lib/inat.c:29: error: for each function it appears in.)
arch/x86/lib/inat.c: In function "inat_get_escape_attribute":
arch/x86/lib/inat.c:44: error: "inat_escape_tables" undeclared (first use in this function)
arch/x86/lib/inat.c: In function "inat_get_group_attribute":
arch/x86/lib/inat.c:67: error: "inat_group_tables" undeclared (first use in this function)
arch/x86/lib/inat.c: In function "inat_get_avx_attribute":
arch/x86/lib/inat.c:85: error: "inat_avx_tables" undeclared (first use in this function)
make[1]: *** [arch/x86/lib/inat.o] Error 1
make: *** [arch/x86/lib] Error 2
Either CONFIG_KPROBES=n or 'apt-get install gawk' does the trick, but is
this intentional?
If it is, the following documentation patch should be applied.
-andy
---
From: Andy Isaacson <[email protected]>
Date: Wed, 16 Dec 2009 15:51:30 -0800
Subject: [PATCH] Document requirement for gawk.
Signed-off-by: Andy Isaacson <[email protected]>
---
Documentation/Changes | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/Documentation/Changes b/Documentation/Changes
index 6d0f1ef..8d55162 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -49,6 +49,7 @@ o oprofile 0.9 # oprofiled --version
o udev 081 # udevinfo -V
o grub 0.93 # grub --version
o mcelog 0.6
+o gawk 3.0 # awk --version
Kernel compilation
==================
--
1.6.5.3
On Wed, Dec 16, 2009 at 03:56:17PM -0800, Andrew Isaacson wrote:
> With CONFIG_KPROBES=y on Ubuntu 9.10 x86_64 default install, I get the
> following:
>
> Error: Your awk doesn't support charactor-class.
> Please try to use gawk.
Aside of the incorrect suggestion (you need not just gawk, you need that
thing installed as awk), the use of GNUisms in there is actually pointless
since encoding is bloody fixed.
Is there any reason not to apply the patch below, to allow more awk
implementations to be used? After all, it's not like we're going to put
non-ASCII characters into the map file...
===
x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
Not all awk implementations (including the default awk in Ubuntu 9.10)
support POSIX character classes. Since x86-opcode-map.txt is plain
ASCII, we can just use explicit ranges for lower case, alphabetic, and
alphanumeric characters instead.
Signed-off-by: Roland Dreier <[email protected]>
---
arch/x86/tools/gen-insn-attr-x86.awk | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 7a68506..eaf11f5 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -6,8 +6,6 @@
# Awk implementation sanity check
function check_awk_implement() {
- if (!match("abc", "[[:lower:]]+"))
- return "Your awk doesn't support charactor-class."
if (sprintf("%x", 0) != "0")
return "Your awk has a printf-format problem."
return ""
@@ -44,12 +42,12 @@ BEGIN {
delete gtable
delete atable
- opnd_expr = "^[[:alpha:]/]"
+ opnd_expr = "^[A-Za-z/]"
ext_expr = "^\\("
sep_expr = "^\\|$"
- group_expr = "^Grp[[:alnum:]]+"
+ group_expr = "^Grp[0-9A-Za-z]+"
- imm_expr = "^[IJAO][[:lower:]]"
+ imm_expr = "^[IJAO][a-z]"
imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
@@ -62,7 +60,7 @@ BEGIN {
imm_flag["Ob"] = "INAT_MOFFSET"
imm_flag["Ov"] = "INAT_MOFFSET"
- modrm_expr = "^([CDEGMNPQRSUVW/][[:lower:]]+|NTA|T[012])"
+ modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
rex_expr = "^REX(\\.[XRWB]+)*"
fpu_expr = "^ESC" # TODO
On 12/16/2009 05:39 PM, Roland Dreier wrote:
> Is there any reason not to apply the patch below, to allow more awk
> implementations to be used? After all, it's not like we're going to put
> non-ASCII characters into the map file...
I guess the question is if it will break under any other circumstances,
but I guess we can find those when we get to them.
There was a long discussion about the use of awk on IRC today.
Apparently mawk, in particular, is actively broken, because the
maintainer believe that POSIX is crap. There are quite a few issues
with it, according to reports.
We need a sane scripting language available to the kernel build, and
given all the problems we have had with different versions or even just
sometimes different builds of sh, awk, and even bc -- plus the fact that
those utilities just don't necessarily do what we want makes it very
frustrating. Personally I think a dependency on Perl is better than the
mess we're in; I understand other people disagree. What is definitely
not acceptable, however, is the status quo. The situation is, quite
frankly, ridiculous enough that perhaps the right thing to do is to
write a small scripting engine and bundle it with the kernel. Something
that does what we need it to do, but is only one implementation and
something we can extend at will if need be.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, Dec 16, 2009 at 06:43:01PM -0800, H. Peter Anvin wrote:
> On 12/16/2009 05:39 PM, Roland Dreier wrote:
> > Is there any reason not to apply the patch below, to allow more awk
> > implementations to be used? After all, it's not like we're going to put
> > non-ASCII characters into the map file...
>
> I guess the question is if it will break under any other circumstances,
> but I guess we can find those when we get to them.
>
> There was a long discussion about the use of awk on IRC today.
> Apparently mawk, in particular, is actively broken, because the
> maintainer believe that POSIX is crap. There are quite a few issues
> with it, according to reports.
still is more compact than gawk, less loaded with GNU "extensions" and
happens to be default on debian and debian-derived userlands.
Roland Dreier wrote:
> Is there any reason not to apply the patch below, to allow more awk
> implementations to be used? After all, it's not like we're going to put
> non-ASCII characters into the map file...
Actually, the reason why I decided to use character classes is
[a-z] wasn't same as [[:lower:]] on some environment.
For example, before the POSIX standard, to match alphanumeric charac-
ters, you would have had to write /[A-Za-z0-9]/. If your character set
had other alphabetic characters in it, this would not match them, and
if your character set collated differently from ASCII, this might not
even match the ASCII alphanumeric characters. With the POSIX character
classes, you can write /[[:alnum:]]/, and this matches the alphabetic
and numeric characters in your character set, no matter what it is.
It seems that "your character set" doesn't mean "what character set are used
in the data", it means "what character set build env. is using".
So, actually, my first released script had used [a-z], but I needed to
move onto [[:lower:]].
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
Al Viro wrote:
> On Wed, Dec 16, 2009 at 03:56:17PM -0800, Andrew Isaacson wrote:
>> With CONFIG_KPROBES=y on Ubuntu 9.10 x86_64 default install, I get the
>> following:
>>
>> Error: Your awk doesn't support charactor-class.
>> Please try to use gawk.
>
> Aside of the incorrect suggestion (you need not just gawk, you need that
> thing installed as awk), the use of GNUisms in there is actually pointless
> since encoding is bloody fixed.
Hmm, maybe "POSIX awk" will be a better suggestion, isn't it?
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 07:33 PM, Masami Hiramatsu wrote:
> Roland Dreier wrote:
>> Is there any reason not to apply the patch below, to allow more awk
>> implementations to be used? After all, it's not like we're going to put
>> non-ASCII characters into the map file...
>
> Actually, the reason why I decided to use character classes is
> [a-z] wasn't same as [[:lower:]] on some environment.
>
> For example, before the POSIX standard, to match alphanumeric charac-
> ters, you would have had to write /[A-Za-z0-9]/. If your character set
> had other alphabetic characters in it, this would not match them, and
> if your character set collated differently from ASCII, this might not
> even match the ASCII alphanumeric characters. With the POSIX character
> classes, you can write /[[:alnum:]]/, and this matches the alphabetic
> and numeric characters in your character set, no matter what it is.
>
> It seems that "your character set" doesn't mean "what character set are used
> in the data", it means "what character set build env. is using".
>
> So, actually, my first released script had used [a-z], but I needed to
> move onto [[:lower:]].
>
This is correct if you are not in the C locale, but I'm not sure if we
support building the kernel in a non-C locale in the first place.
Do you have a known failure case? There is also the option of
explicitly setting LC_CTYPE=C. Sigh.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
> > Is there any reason not to apply the patch below, to allow more awk
> > implementations to be used? After all, it's not like we're going to put
> > non-ASCII characters into the map file...
> I guess the question is if it will break under any other circumstances,
> but I guess we can find those when we get to them.
I should have mentioned it, but with the current Debian testing version
(mawk 1.3.3-15) and my patch, I get a byte-for-byte identical
inat-tables.c to what I get with the unpatched kernel and gawk
1:3.1.6.dfsg-4.
I'm not sure how likely all this is to change in the future, but it's
hard for me to see a sane reason why eg [:lower:] and a-z would be
different for this use.
- R.
On Wednesday 16 December 2009 20:43:01 H. Peter Anvin wrote:
> On 12/16/2009 05:39 PM, Roland Dreier wrote:
> > Is there any reason not to apply the patch below, to allow more awk
> > implementations to be used? After all, it's not like we're going to put
> > non-ASCII characters into the map file...
>
> I guess the question is if it will break under any other circumstances,
> but I guess we can find those when we get to them.
>
> There was a long discussion about the use of awk on IRC today.
> Apparently mawk, in particular, is actively broken, because the
> maintainer believe that POSIX is crap. There are quite a few issues
> with it, according to reports.
if the kernel specifies posix, and that implementation doesn't do posix, then
that implementation doesn't build the kernel. Blacklisting known broken
implementations makes a certain amount of sense.
> We need a sane scripting language available to the kernel build, and
> given all the problems we have had with different versions or even just
> sometimes different builds of sh, awk, and even bc -- plus the fact that
> those utilities just don't necessarily do what we want makes it very
> frustrating.
1) Posix exists for a reason.
2) Busybox implements what the kernel has needed to build. (I test this every
release, and I fix it where necessary.)
> Personally I think a dependency on Perl is better than the
> mess we're in; I understand other people disagree.
Vehemently.
> What is definitely
> not acceptable, however, is the status quo. The situation is, quite
> frankly, ridiculous enough that perhaps the right thing to do is to
> write a small scripting engine and bundle it with the kernel. Something
> that does what we need it to do, but is only one implementation and
> something we can extend at will if need be.
*shrug* That's one way to avoid environmental dependencies.
> -hpa
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
On Wednesday 16 December 2009 21:45:01 H. Peter Anvin wrote:
> On 12/16/2009 07:33 PM, Masami Hiramatsu wrote:
> > Roland Dreier wrote:
> >> Is there any reason not to apply the patch below, to allow more awk
> >> implementations to be used? After all, it's not like we're going to put
> >> non-ASCII characters into the map file...
> >
> > Actually, the reason why I decided to use character classes is
> > [a-z] wasn't same as [[:lower:]] on some environment.
> >
> > For example, before the POSIX standard, to match alphanumeric
> > charac- ters, you would have had to write /[A-Za-z0-9]/. If your
> > character set had other alphabetic characters in it, this would not
> > match them, and if your character set collated differently from ASCII,
> > this might not even match the ASCII alphanumeric characters. With the
> > POSIX character classes, you can write /[[:alnum:]]/, and this matches
> > the alphabetic and numeric characters in your character set, no matter
> > what it is.
> >
> > It seems that "your character set" doesn't mean "what character set are
> > used in the data", it means "what character set build env. is using".
> >
> > So, actually, my first released script had used [a-z], but I needed to
> > move onto [[:lower:]].
>
> This is correct if you are not in the C locale, but I'm not sure if we
> support building the kernel in a non-C locale in the first place.
>
> Do you have a known failure case? There is also the option of
> explicitly setting LC_CTYPE=C. Sigh.
I remember first being surprised when Ubuntu "upgraded" to utf8 locale and
"sort" became case insensitive. Took a while to track down why so much stuff
was breaking.
I've exported LC_ALL=C in all my build environments ever since.
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
> For example, before the POSIX standard, to match alphanumeric charac-
> ters, you would have had to write /[A-Za-z0-9]/. If your character set
> had other alphabetic characters in it, this would not match them, and
> if your character set collated differently from ASCII, this might not
> even match the ASCII alphanumeric characters. With the POSIX character
> classes, you can write /[[:alnum:]]/, and this matches the alphabetic
> and numeric characters in your character set, no matter what it is.
I'm not sure I understand this, although I'm not a character set expert.
But is there really some possible locale + awk implementation where an
awk script, written in pure ASCII, operating on a pure ASCII input file,
will have [A-Za-z0-9] match a different set of ASCII characters than
[[:alnum:]] will match?
- R.
On Wed, Dec 16, 2009 at 10:43:47PM -0500, Masami Hiramatsu wrote:
> Al Viro wrote:
> > On Wed, Dec 16, 2009 at 03:56:17PM -0800, Andrew Isaacson wrote:
> >> With CONFIG_KPROBES=y on Ubuntu 9.10 x86_64 default install, I get the
> >> following:
> >>
> >> Error: Your awk doesn't support charactor-class.
> >> Please try to use gawk.
> >
> > Aside of the incorrect suggestion (you need not just gawk, you need that
> > thing installed as awk), the use of GNUisms in there is actually pointless
> > since encoding is bloody fixed.
>
> Hmm, maybe "POSIX awk" will be a better suggestion, isn't it?
Not really.
Note that meaning of [:alpha:] depends on locale, and you are working
with fixed input, in fixed encoding. So you *already* have a dependency
on locale and it's no better or worse way to express the set than A-Za-z
is. With LANG=C both describe the set you are after; in other locales
neither is guaranteed to. And having it set by caller of your script
is trivial.
IOW, you are actually misusing the language feature and while it's easy
to fix, the same fix will remove any benefits compared to universally
available alternative language feature.
Roland Dreier wrote:
>
> > For example, before the POSIX standard, to match alphanumeric charac-
> > ters, you would have had to write /[A-Za-z0-9]/. If your character set
> > had other alphabetic characters in it, this would not match them, and
> > if your character set collated differently from ASCII, this might not
> > even match the ASCII alphanumeric characters. With the POSIX character
> > classes, you can write /[[:alnum:]]/, and this matches the alphabetic
> > and numeric characters in your character set, no matter what it is.
>
> I'm not sure I understand this, although I'm not a character set expert.
> But is there really some possible locale + awk implementation where an
> awk script, written in pure ASCII, operating on a pure ASCII input file,
> will have [A-Za-z0-9] match a different set of ASCII characters than
> [[:alnum:]] will match?
I assume that in utf-8 locale(nowadays default in most of distro) alphabets
may be sorted as aAbBcC...zZ(or AaBb...), and in this case a-z means
aAbBcC...z.
As Al Viro said, if we run awk with LC_ALL=C, then the characters will be
sorted as ASCII. So, your patch is OK if you can add LC_ALL=C just before
$(AWK). (I'm not so sure whether Makefile can accept it...)
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 08:09 PM, Roland Dreier wrote:
>
> > > Is there any reason not to apply the patch below, to allow more awk
> > > implementations to be used? After all, it's not like we're going to put
> > > non-ASCII characters into the map file...
>
> > I guess the question is if it will break under any other circumstances,
> > but I guess we can find those when we get to them.
>
> I should have mentioned it, but with the current Debian testing version
> (mawk 1.3.3-15) and my patch, I get a byte-for-byte identical
> inat-tables.c to what I get with the unpatched kernel and gawk
> 1:3.1.6.dfsg-4.
>
> I'm not sure how likely all this is to change in the future, but it's
> hard for me to see a sane reason why eg [:lower:] and a-z would be
> different for this use.
>
If LC_COLL != C they can be. The solution would be to force LC_COLL=C.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
> As Al Viro said, if we run awk with LC_ALL=C, then the characters will be
> sorted as ASCII. So, your patch is OK if you can add LC_ALL=C just before
> $(AWK). (I'm not so sure whether Makefile can accept it...)
OK, updated patch:
===
x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
Not all awk implementations (including the default awk in Ubuntu 9.10)
support POSIX character classes. Since x86-opcode-map.txt is plain
ASCII, we can just use explicit ranges for lower case, alphabetic, and
alphanumeric characters instead. We set LC_ALL=C when invoking awk so
that ASCII ranges are never misinterpreted.
I verified that the Debian version of mawk, which doesn't have POSIX
character set suport, generates the byte-for-byte the same inat-tables.c
after this patch as gawk did on an unpatched kernel.
Signed-off-by: Roland Dreier <[email protected]>
---
arch/x86/lib/Makefile | 2 +-
arch/x86/tools/gen-insn-attr-x86.awk | 10 ++++------
2 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 45b20e4..0c46d60 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -5,7 +5,7 @@
inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
quiet_cmd_inat_tables = GEN $@
- cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+ cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
$(obj)/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
$(call cmd,inat_tables)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 7a68506..eaf11f5 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -6,8 +6,6 @@
# Awk implementation sanity check
function check_awk_implement() {
- if (!match("abc", "[[:lower:]]+"))
- return "Your awk doesn't support charactor-class."
if (sprintf("%x", 0) != "0")
return "Your awk has a printf-format problem."
return ""
@@ -44,12 +42,12 @@ BEGIN {
delete gtable
delete atable
- opnd_expr = "^[[:alpha:]/]"
+ opnd_expr = "^[A-Za-z/]"
ext_expr = "^\\("
sep_expr = "^\\|$"
- group_expr = "^Grp[[:alnum:]]+"
+ group_expr = "^Grp[0-9A-Za-z]+"
- imm_expr = "^[IJAO][[:lower:]]"
+ imm_expr = "^[IJAO][a-z]"
imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
@@ -62,7 +60,7 @@ BEGIN {
imm_flag["Ob"] = "INAT_MOFFSET"
imm_flag["Ov"] = "INAT_MOFFSET"
- modrm_expr = "^([CDEGMNPQRSUVW/][[:lower:]]+|NTA|T[012])"
+ modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
rex_expr = "^REX(\\.[XRWB]+)*"
fpu_expr = "^ESC" # TODO
Al Viro wrote:
> On Wed, Dec 16, 2009 at 10:43:47PM -0500, Masami Hiramatsu wrote:
>> Al Viro wrote:
>>> On Wed, Dec 16, 2009 at 03:56:17PM -0800, Andrew Isaacson wrote:
>>>> With CONFIG_KPROBES=y on Ubuntu 9.10 x86_64 default install, I get the
>>>> following:
>>>>
>>>> Error: Your awk doesn't support charactor-class.
>>>> Please try to use gawk.
>>>
>>> Aside of the incorrect suggestion (you need not just gawk, you need that
>>> thing installed as awk), the use of GNUisms in there is actually pointless
>>> since encoding is bloody fixed.
>>
>> Hmm, maybe "POSIX awk" will be a better suggestion, isn't it?
>
> Not really.
>
> Note that meaning of [:alpha:] depends on locale, and you are working
> with fixed input, in fixed encoding. So you *already* have a dependency
> on locale and it's no better or worse way to express the set than A-Za-z
> is. With LANG=C both describe the set you are after; in other locales
> neither is guaranteed to. And having it set by caller of your script
> is trivial.
>
> IOW, you are actually misusing the language feature and while it's easy
> to fix, the same fix will remove any benefits compared to universally
> available alternative language feature.
Ah, I see:)
I think that this is not so big issue as making new language in the kernel.
(wow, if someone wants to do that, I don't dare to stop him :) )
If just typing less than 100 characters can fix it, I'd like to do it; e.g.
Ualpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Lalpha = "abcdefghijklmnopqrstuvwxyz"
alpha = Ualpha Lalpha
digit = "0123456789"
alnum = alpha digit
and replace character classes with it. ("[[:alpha:]]"->"[" alpha "]")
That's so easy, isn't that? :-)
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 09:11 PM, Roland Dreier wrote:
>
> > As Al Viro said, if we run awk with LC_ALL=C, then the characters will be
> > sorted as ASCII. So, your patch is OK if you can add LC_ALL=C just before
> > $(AWK). (I'm not so sure whether Makefile can accept it...)
>
I would strongly prefer the following solution as it solves this entire
class of problems globally, although I'd really want Michal's ACK for it:
diff --git a/Makefile b/Makefile
index 33d4732..7fefc13 100644
--- a/Makefile
+++ b/Makefile
@@ -302,6 +302,10 @@ MAKEFLAGS += --include-dir=$(srctree)
$(srctree)/scripts/Kbuild.include: ;
include $(srctree)/scripts/Kbuild.include
+# Avoid funny character set dependencies
+LC_ALL=C
+export LC_ALL
+
# Make variables (CC, etc...)
AS = $(CROSS_COMPILE)as
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
H. Peter Anvin wrote:
> On 12/16/2009 08:09 PM, Roland Dreier wrote:
>>
>> > > Is there any reason not to apply the patch below, to allow more awk
>> > > implementations to be used? After all, it's not like we're going to put
>> > > non-ASCII characters into the map file...
>>
>> > I guess the question is if it will break under any other circumstances,
>> > but I guess we can find those when we get to them.
>>
>> I should have mentioned it, but with the current Debian testing version
>> (mawk 1.3.3-15) and my patch, I get a byte-for-byte identical
>> inat-tables.c to what I get with the unpatched kernel and gawk
>> 1:3.1.6.dfsg-4.
>>
>> I'm not sure how likely all this is to change in the future, but it's
>> hard for me to see a sane reason why eg [:lower:] and a-z would be
>> different for this use.
>>
>
> If LC_COLL != C they can be. The solution would be to force LC_COLL=C.
Hmm, it's complicated :(. Peter, Roland, I don't want to be annoyed
by those environmental variables. If my last suggestion is acceptable,
could you do as below on your patch :)?
> Ualpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> Lalpha = "abcdefghijklmnopqrstuvwxyz"
> alpha = Ualpha Lalpha
> digit = "0123456789"
> alnum = alpha digit
>
> and replace character classes with it. ("[[:alpha:]]"->"[" alpha "]")
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 09:21 PM, Masami Hiramatsu wrote:
>
> Hmm, it's complicated :(. Peter, Roland, I don't want to be annoyed
> by those environmental variables. If my last suggestion is acceptable,
> could you do as below on your patch :)?
>
See my posted patch - I think it is better. We're going to have this
problem again, guaranteed, unless we do something like it anyway.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wednesday 16 December 2009 23:21:44 H. Peter Anvin wrote:
> On 12/16/2009 09:11 PM, Roland Dreier wrote:
> > > As Al Viro said, if we run awk with LC_ALL=C, then the characters will
> > > be sorted as ASCII. So, your patch is OK if you can add LC_ALL=C just
> > > before $(AWK). (I'm not so sure whether Makefile can accept it...)
>
> I would strongly prefer the following solution as it solves this entire
> class of problems globally, although I'd really want Michal's ACK for it:
>
> diff --git a/Makefile b/Makefile
> index 33d4732..7fefc13 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -302,6 +302,10 @@ MAKEFLAGS += --include-dir=$(srctree)
> $(srctree)/scripts/Kbuild.include: ;
> include $(srctree)/scripts/Kbuild.include
>
> +# Avoid funny character set dependencies
> +LC_ALL=C
> +export LC_ALL
> +
> # Make variables (CC, etc...)
>
> AS = $(CROSS_COMPILE)as
Agreed. Otherwise we can get strange new breakabe in future just because
somebody builds in a _locale_ we've never heard of. We must specify this to
get consistent behavior.
And keep in mind, specifying this _is_ part of POSIX 2008. The standard is
online, even:
http://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_08
http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_02
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
On Wed, Dec 16, 2009 at 09:11:09PM -0800, Roland Dreier wrote:
>
> > As Al Viro said, if we run awk with LC_ALL=C, then the characters will be
> > sorted as ASCII. So, your patch is OK if you can add LC_ALL=C just before
> > $(AWK). (I'm not so sure whether Makefile can accept it...)
>
> OK, updated patch:
>
> ===
>
> x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
>
> Not all awk implementations (including the default awk in Ubuntu 9.10)
> support POSIX character classes. Since x86-opcode-map.txt is plain
> ASCII, we can just use explicit ranges for lower case, alphabetic, and
> alphanumeric characters instead. We set LC_ALL=C when invoking awk so
> that ASCII ranges are never misinterpreted.
>
> I verified that the Debian version of mawk, which doesn't have POSIX
> character set suport, generates the byte-for-byte the same inat-tables.c
> after this patch as gawk did on an unpatched kernel.
>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> arch/x86/lib/Makefile | 2 +-
> arch/x86/tools/gen-insn-attr-x86.awk | 10 ++++------
> 2 files changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index 45b20e4..0c46d60 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -5,7 +5,7 @@
> inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
> inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
> quiet_cmd_inat_tables = GEN $@
> - cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> + cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
This will result in error messages in english always from awk - no?
Some people will care.
Sam
H. Peter Anvin wrote:
> On 12/16/2009 09:11 PM, Roland Dreier wrote:
>>
>> > As Al Viro said, if we run awk with LC_ALL=C, then the characters will be
>> > sorted as ASCII. So, your patch is OK if you can add LC_ALL=C just before
>> > $(AWK). (I'm not so sure whether Makefile can accept it...)
>>
>
> I would strongly prefer the following solution as it solves this entire
> class of problems globally, although I'd really want Michal's ACK for it:
Sure, agreed.
BTW, is it possible user to overwrite this setting?
Thank you,
>
> diff --git a/Makefile b/Makefile
> index 33d4732..7fefc13 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -302,6 +302,10 @@ MAKEFLAGS += --include-dir=$(srctree)
> $(srctree)/scripts/Kbuild.include: ;
> include $(srctree)/scripts/Kbuild.include
>
> +# Avoid funny character set dependencies
> +LC_ALL=C
> +export LC_ALL
> +
> # Make variables (CC, etc...)
>
> AS = $(CROSS_COMPILE)as
>
>
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 09:49 PM, Masami Hiramatsu wrote:
>>
>> I would strongly prefer the following solution as it solves this entire
>> class of problems globally, although I'd really want Michal's ACK for it:
>
> Sure, agreed.
> BTW, is it possible user to overwrite this setting?
>
Doing it this way means the user can override it on the make command line.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 12/16/2009 09:45 PM, Sam Ravnborg wrote:
>>
>> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
>> index 45b20e4..0c46d60 100644
>> --- a/arch/x86/lib/Makefile
>> +++ b/arch/x86/lib/Makefile
>> @@ -5,7 +5,7 @@
>> inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
>> inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
>> quiet_cmd_inat_tables = GEN $@
>> - cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
>> + cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
>
> This will result in error messages in english always from awk - no?
> Some people will care.
>
True, we probably don't want to set LC_MESSAGES. Which means:
diff --git a/Makefile b/Makefile
index 33d4732..357e83b 100644
--- a/Makefile
+++ b/Makefile
@@ -302,6 +302,13 @@ MAKEFLAGS += --include-dir=$(srctree)
$(srctree)/scripts/Kbuild.include: ;
include $(srctree)/scripts/Kbuild.include
+# Avoid funny character set dependencies
+LC_ALL=
+LC_CTYPE=C
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
+
# Make variables (CC, etc...)
AS = $(CROSS_COMPILE)as
OK?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
H. Peter Anvin wrote:
> On 12/16/2009 09:49 PM, Masami Hiramatsu wrote:
>>>
>>> I would strongly prefer the following solution as it solves this entire
>>> class of problems globally, although I'd really want Michal's ACK for it:
>>
>> Sure, agreed.
>> BTW, is it possible user to overwrite this setting?
>>
>
> Doing it this way means the user can override it on the make command line.
Hmm, in that case, as Sam said, some user may do it because
they would like to understand error messages. And that can
cause other problems.
Hrm, if there is just this problem, for now, don't you think
fixing it with my workaround is better choice?
Maybe, locale problem should be discussed more widely.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 10:06 PM, Masami Hiramatsu wrote:
>
> Hmm, in that case, as Sam said, some user may do it because
> they would like to understand error messages. And that can
> cause other problems.
>
See my other patch. That doesn't override LC_MESSAGES, which is what
matters there.
> Hrm, if there is just this problem, for now, don't you think
> fixing it with my workaround is better choice?
No, because this problem is not just limited to this one script.
> Maybe, locale problem should be discussed more widely.
Well, this thread is on both LKML and linux-kbuild.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
> +# Avoid funny character set dependencies
> +LC_ALL=
> +LC_CTYPE=C
> +LC_COLLATE=C
> +LC_NUMERIC=C
> +export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
Yes, this plus my first patch (only touching the awk script, not the
Makefile) make sense to me.
H. Peter Anvin wrote:
> On 12/16/2009 10:06 PM, Masami Hiramatsu wrote:
>>
>> Hmm, in that case, as Sam said, some user may do it because
>> they would like to understand error messages. And that can
>> cause other problems.
>>
>
> See my other patch. That doesn't override LC_MESSAGES, which is what
> matters there.
>
>> Hrm, if there is just this problem, for now, don't you think
>> fixing it with my workaround is better choice?
>
> No, because this problem is not just limited to this one script.
>
>> Maybe, locale problem should be discussed more widely.
>
> Well, this thread is on both LKML and linux-kbuild.
Oh, I see :-).
So, if your patch is accepted, I'm OK for Roland's patch too(thanks!).
Thank you,
>
> -hpa
>
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/16/2009 10:17 PM, Roland Dreier wrote:
>
> > +# Avoid funny character set dependencies
> > +LC_ALL=
> > +LC_CTYPE=C
> > +LC_COLLATE=C
> > +LC_NUMERIC=C
> > +export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
>
> Yes, this plus my first patch (only touching the awk script, not the
> Makefile) make sense to me.
Right, of course.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 12/16/2009 10:20 PM, Masami Hiramatsu wrote:
>
> So, if your patch is accepted, I'm OK for Roland's patch too(thanks!).
>
Roland, Masami, Sam: can I add your respective Acked-by:?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
> Roland, Masami, Sam: can I add your respective Acked-by:?
Yes, please.
On Wed, Dec 16, 2009 at 09:56:46PM -0800, H. Peter Anvin wrote:
> On 12/16/2009 09:45 PM, Sam Ravnborg wrote:
> >>
> >> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> >> index 45b20e4..0c46d60 100644
> >> --- a/arch/x86/lib/Makefile
> >> +++ b/arch/x86/lib/Makefile
> >> @@ -5,7 +5,7 @@
> >> inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
> >> inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
> >> quiet_cmd_inat_tables = GEN $@
> >> - cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> >> + cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> >
> > This will result in error messages in english always from awk - no?
> > Some people will care.
> >
>
> True, we probably don't want to set LC_MESSAGES. Which means:
>
> diff --git a/Makefile b/Makefile
> index 33d4732..357e83b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -302,6 +302,13 @@ MAKEFLAGS += --include-dir=$(srctree)
> $(srctree)/scripts/Kbuild.include: ;
> include $(srctree)/scripts/Kbuild.include
>
> +# Avoid funny character set dependencies
> +LC_ALL=
> +LC_CTYPE=C
> +LC_COLLATE=C
> +LC_NUMERIC=C
> +export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
> +
> # Make variables (CC, etc...)
>
> AS = $(CROSS_COMPILE)as
>
>
> OK?
We tried something like this some time ago.
I cannot recall the details anymore but I managed to dig out this link:
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=191ba46a4aceb72f8481c58069b61f5e8cdef1f0
Today IIRC we no longer export the variable CFLAGS_vmlinux.lds.o so this may be a non-issue.
I bring up this only to say that there be some hidden side-effects of manipulating
with LC_*.
Back then I never understood exactly what was wrong - I just did the obvious revert.
One rationale behind the whish to add LC_* back then was to speed up the build.
I recall it had a measureable effect - but memory may trick me here.
Sam
On Wed, Dec 16, 2009 at 09:56:46PM -0800, H. Peter Anvin wrote:
> On 12/16/2009 09:45 PM, Sam Ravnborg wrote:
> >>
> >> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> >> index 45b20e4..0c46d60 100644
> >> --- a/arch/x86/lib/Makefile
> >> +++ b/arch/x86/lib/Makefile
> >> @@ -5,7 +5,7 @@
> >> inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
> >> inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
> >> quiet_cmd_inat_tables = GEN $@
> >> - cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> >> + cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> >
> > This will result in error messages in english always from awk - no?
> > Some people will care.
> >
>
> True, we probably don't want to set LC_MESSAGES. Which means:
>
> diff --git a/Makefile b/Makefile
> index 33d4732..357e83b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -302,6 +302,13 @@ MAKEFLAGS += --include-dir=$(srctree)
> $(srctree)/scripts/Kbuild.include: ;
> include $(srctree)/scripts/Kbuild.include
>
> +# Avoid funny character set dependencies
> +LC_ALL=
> +LC_CTYPE=C
> +LC_COLLATE=C
> +LC_NUMERIC=C
> +export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
> +
> # Make variables (CC, etc...)
>
> AS = $(CROSS_COMPILE)as
Took a closer look at the patch.
I think we should move this upwards in the Makefile so it take effect as soon as possible.
Around the line where we do:
MAKEFLAGS += -rR --no-print-directory
Sam
On 17.12.2009 06:56, H. Peter Anvin wrote:
> On 12/16/2009 09:45 PM, Sam Ravnborg wrote:
>>>
>>> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
>>> index 45b20e4..0c46d60 100644
>>> --- a/arch/x86/lib/Makefile
>>> +++ b/arch/x86/lib/Makefile
>>> @@ -5,7 +5,7 @@
>>> inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
>>> inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
>>> quiet_cmd_inat_tables = GEN $@
>>> - cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
>>> + cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
>>
>> This will result in error messages in english always from awk - no?
>> Some people will care.
>>
>
> True, we probably don't want to set LC_MESSAGES. Which means:
>
> diff --git a/Makefile b/Makefile
> index 33d4732..357e83b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -302,6 +302,13 @@ MAKEFLAGS += --include-dir=$(srctree)
> $(srctree)/scripts/Kbuild.include: ;
> include $(srctree)/scripts/Kbuild.include
>
> +# Avoid funny character set dependencies
> +LC_ALL=
> +LC_CTYPE=C
> +LC_COLLATE=C
> +LC_NUMERIC=C
> +export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
> +
> # Make variables (CC, etc...)
>
> AS = $(CROSS_COMPILE)as
>
>
> OK?
Definitely, setting this globally makes sure we don't run into similar
issues elsewhere. (I have to read the rest of the thread now).
Acked-by: Michal Marek <[email protected]>
Michal
On 17.12.2009 09:09, Sam Ravnborg wrote:
> On Wed, Dec 16, 2009 at 09:56:46PM -0800, H. Peter Anvin wrote:
>> On 12/16/2009 09:45 PM, Sam Ravnborg wrote:
>>>>
>>>> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
>>>> index 45b20e4..0c46d60 100644
>>>> --- a/arch/x86/lib/Makefile
>>>> +++ b/arch/x86/lib/Makefile
>>>> @@ -5,7 +5,7 @@
>>>> inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
>>>> inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
>>>> quiet_cmd_inat_tables = GEN $@
>>>> - cmd_inat_tables = $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
>>>> + cmd_inat_tables = LC_ALL=C $(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
>>>
>>> This will result in error messages in english always from awk - no?
>>> Some people will care.
>>>
>>
>> True, we probably don't want to set LC_MESSAGES. Which means:
>>
>> diff --git a/Makefile b/Makefile
>> index 33d4732..357e83b 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -302,6 +302,13 @@ MAKEFLAGS += --include-dir=$(srctree)
>> $(srctree)/scripts/Kbuild.include: ;
>> include $(srctree)/scripts/Kbuild.include
>>
>> +# Avoid funny character set dependencies
>> +LC_ALL=
>> +LC_CTYPE=C
>> +LC_COLLATE=C
>> +LC_NUMERIC=C
>> +export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
>> +
>> # Make variables (CC, etc...)
>>
>> AS = $(CROSS_COMPILE)as
>
> Took a closer look at the patch.
> I think we should move this upwards in the Makefile so it take effect as soon as possible.
> Around the line where we do:
>
> MAKEFLAGS += -rR --no-print-directory
There are only some variable assignments and a sed call in the SUBARCH
definition, but right, to be 100% sure that in the future there won't
appear anything locale-dependent, we should move it upwards.
Michal
H. Peter Anvin wrote:
> On 12/16/2009 10:20 PM, Masami Hiramatsu wrote:
>>
>> So, if your patch is accepted, I'm OK for Roland's patch too(thanks!).
>>
>
> Roland, Masami, Sam: can I add your respective Acked-by:?
Ah, yes, of course!
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
Roland Dreier wrote:
> Is there any reason not to apply the patch below, to allow more awk
> implementations to be used? After all, it's not like we're going to put
> non-ASCII characters into the map file...
>
> ===
>
> x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
>
> Not all awk implementations (including the default awk in Ubuntu 9.10)
> support POSIX character classes. Since x86-opcode-map.txt is plain
> ASCII, we can just use explicit ranges for lower case, alphabetic, and
> alphanumeric characters instead.
>
> Signed-off-by: Roland Dreier <[email protected]>
As far as Peter's LC_* patch is accepted:), this looks good to me.
Thank you for fixing and discussing that!
Acked-by: Masami Hiramatsu <[email protected]>
> ---
> arch/x86/tools/gen-insn-attr-x86.awk | 10 ++++------
> 1 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
> index 7a68506..eaf11f5 100644
> --- a/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -6,8 +6,6 @@
>
> # Awk implementation sanity check
> function check_awk_implement() {
> - if (!match("abc", "[[:lower:]]+"))
> - return "Your awk doesn't support charactor-class."
> if (sprintf("%x", 0) != "0")
> return "Your awk has a printf-format problem."
> return ""
> @@ -44,12 +42,12 @@ BEGIN {
> delete gtable
> delete atable
>
> - opnd_expr = "^[[:alpha:]/]"
> + opnd_expr = "^[A-Za-z/]"
> ext_expr = "^\\("
> sep_expr = "^\\|$"
> - group_expr = "^Grp[[:alnum:]]+"
> + group_expr = "^Grp[0-9A-Za-z]+"
>
> - imm_expr = "^[IJAO][[:lower:]]"
> + imm_expr = "^[IJAO][a-z]"
> imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
> imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
> imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
> @@ -62,7 +60,7 @@ BEGIN {
> imm_flag["Ob"] = "INAT_MOFFSET"
> imm_flag["Ov"] = "INAT_MOFFSET"
>
> - modrm_expr = "^([CDEGMNPQRSUVW/][[:lower:]]+|NTA|T[012])"
> + modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
> force64_expr = "\\([df]64\\)"
> rex_expr = "^REX(\\.[XRWB]+)*"
> fpu_expr = "^ESC" # TODO
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/17/2009 12:09 AM, Sam Ravnborg wrote:
>
> Took a closer look at the patch.
> I think we should move this upwards in the Makefile so it take effect as soon as possible.
> Around the line where we do:
>
> MAKEFLAGS += -rR --no-print-directory
>
This makes sense. I'm assuming Acked-by's I've received are still good,
and I'm going to commit this shortly.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
Commit-ID: c051346b7db27aaf674b8f3b4955240580b2a58a
Gitweb: http://git.kernel.org/tip/c051346b7db27aaf674b8f3b4955240580b2a58a
Author: H. Peter Anvin <[email protected]>
AuthorDate: Thu, 17 Dec 2009 06:56:11 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 17 Dec 2009 07:03:21 -0800
Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
There are a number of common Unix constructs like character ranges in
grep/sed/awk which don't work as expected with LC_COLLATE set to other
than C. Similarly, set LC_CTYPE and LC_NUMERIC to C to avoid other
nasty surprises.
In order to make sure these actually take effect we also have to
clear LC_ALL.
Signed-off-by: H. Peter Anvin <[email protected]>
Acked-by: Michal Marek <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Acked-by: Roland Dreier <[email protected]>
Cc: Sam Ravnborg <[email protected]>
LKML-Reference: <[email protected]>
---
Makefile | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/Makefile b/Makefile
index 33d4732..6e39af1 100644
--- a/Makefile
+++ b/Makefile
@@ -16,6 +16,13 @@ NAME = Man-Eating Seals of Antiquity
# o print "Entering directory ...";
MAKEFLAGS += -rR --no-print-directory
+# Avoid funny character set dependencies
+LC_ALL=
+LC_CTYPE=C
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
+
# We are using a recursive build, so we need to do a little thinking
# to get the ordering right.
#
Commit-ID: 4beb3d6d144c41525541cce2b611858b2645c725
Gitweb: http://git.kernel.org/tip/4beb3d6d144c41525541cce2b611858b2645c725
Author: Roland Dreier <[email protected]>
AuthorDate: Wed, 16 Dec 2009 17:39:48 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 17 Dec 2009 07:03:21 -0800
x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
Not all awk implementations (including the default awk in Ubuntu 9.10)
support POSIX character classes. Since x86-opcode-map.txt is plain
ASCII, we can just use explicit ranges for lower case, alphabetic, and
alphanumeric characters instead.
Signed-off-by: Roland Dreier <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/tools/gen-insn-attr-x86.awk | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 7a68506..eaf11f5 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -6,8 +6,6 @@
# Awk implementation sanity check
function check_awk_implement() {
- if (!match("abc", "[[:lower:]]+"))
- return "Your awk doesn't support charactor-class."
if (sprintf("%x", 0) != "0")
return "Your awk has a printf-format problem."
return ""
@@ -44,12 +42,12 @@ BEGIN {
delete gtable
delete atable
- opnd_expr = "^[[:alpha:]/]"
+ opnd_expr = "^[A-Za-z/]"
ext_expr = "^\\("
sep_expr = "^\\|$"
- group_expr = "^Grp[[:alnum:]]+"
+ group_expr = "^Grp[0-9A-Za-z]+"
- imm_expr = "^[IJAO][[:lower:]]"
+ imm_expr = "^[IJAO][a-z]"
imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
@@ -62,7 +60,7 @@ BEGIN {
imm_flag["Ob"] = "INAT_MOFFSET"
imm_flag["Ov"] = "INAT_MOFFSET"
- modrm_expr = "^([CDEGMNPQRSUVW/][[:lower:]]+|NTA|T[012])"
+ modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
rex_expr = "^REX(\\.[XRWB]+)*"
fpu_expr = "^ESC" # TODO
On Thu, Dec 17, 2009 at 04:18:35PM +0000, tip-bot for H. Peter Anvin wrote:
> Commit-ID: c051346b7db27aaf674b8f3b4955240580b2a58a
> Gitweb: http://git.kernel.org/tip/c051346b7db27aaf674b8f3b4955240580b2a58a
> Author: H. Peter Anvin <[email protected]>
> AuthorDate: Thu, 17 Dec 2009 06:56:11 -0800
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Thu, 17 Dec 2009 07:03:21 -0800
>
> Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
>
> There are a number of common Unix constructs like character ranges in
> grep/sed/awk which don't work as expected with LC_COLLATE set to other
> than C. Similarly, set LC_CTYPE and LC_NUMERIC to C to avoid other
> nasty surprises.
>
> In order to make sure these actually take effect we also have to
> clear LC_ALL.
>
> Signed-off-by: H. Peter Anvin <[email protected]>
> Acked-by: Michal Marek <[email protected]>
> Acked-by: Masami Hiramatsu <[email protected]>
> Acked-by: Roland Dreier <[email protected]>
> Cc: Sam Ravnborg <[email protected]>
> LKML-Reference: <[email protected]>
This is IMO -next fodder and not urgent stuff.
In the distant past something similar caused us troubles.
I assume that the patch posted by Roland fixes
the relevant troubles with the awk script and that
this patch is just for consistency.
So in other words stuff for -next and not -rc1.
Anyway - just MHO.
Sam
On 12/17/2009 02:09 PM, Sam Ravnborg wrote:
>
> This is IMO -next fodder and not urgent stuff.
> In the distant past something similar caused us troubles.
>
> I assume that the patch posted by Roland fixes
> the relevant troubles with the awk script and that
> this patch is just for consistency.
> So in other words stuff for -next and not -rc1.
>
No: the patch Roland posted fixes the problems with non-POSIX-compliant
awk (e.g. mawk), but it in turn requires this patch in order to not
break on non-C-locale-default systems (which are most systems nowadays.)
So -rc1.
-hpa
H. Peter Anvin wrote:
> On 12/17/2009 02:09 PM, Sam Ravnborg wrote:
>>
>> This is IMO -next fodder and not urgent stuff.
>> In the distant past something similar caused us troubles.
>>
>> I assume that the patch posted by Roland fixes
>> the relevant troubles with the awk script and that
>> this patch is just for consistency.
>> So in other words stuff for -next and not -rc1.
>>
>
> No: the patch Roland posted fixes the problems with non-POSIX-compliant
> awk (e.g. mawk), but it in turn requires this patch in order to not
> break on non-C-locale-default systems (which are most systems nowadays.)
Right, gawk requires locale setting.
btw, something went wrong with it...
---
$ make O=/home/mhiramat/kbin/2.6.32 all
Using /home/mhiramat/ksrc/linux-2.6-tip as source for kernel
GEN /home/mhiramat/kbin/2.6.32/Makefile
CHK include/linux/version.h
CHK include/linux/utsrelease.h
SYMLINK include/asm -> include/asm-x86
CALL /home/mhiramat/ksrc/linux-2.6-tip/scripts/checksyscalls.sh
CHK include/linux/compile.h
CHK include/linux/version.h
make[3]: `scripts/unifdef' ???????
TEST posttest
Succeed: decoded and checked 887431 instructions
Kernel: arch/x86/boot/bzImage is ready (#19)
Building modules, stage 2.
MODPOST 434 modules
WARNING: modpost: Found 2 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'
---
when I ran with LC_ALL=C, it shows message correctly.
---
$ make O=/home/mhiramat/kbin/2.6.32 all LC_ALL=C
Using /home/mhiramat/ksrc/linux-2.6-tip as source for kernel
GEN /home/mhiramat/kbin/2.6.32/Makefile
CHK include/linux/version.h
CHK include/linux/utsrelease.h
SYMLINK include/asm -> include/asm-x86
CALL /home/mhiramat/ksrc/linux-2.6-tip/scripts/checksyscalls.sh
CHK include/linux/compile.h
CHK include/linux/version.h
make[3]: `scripts/unifdef' is up to date.
TEST posttest
Succeed: decoded and checked 887431 instructions
Kernel: arch/x86/boot/bzImage is ready (#19)
Building modules, stage 2.
MODPOST 434 modules
WARNING: modpost: Found 2 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'
---
Maybe, "LC_ALL=" line broke it?
Thank you,
>
> So -rc1.
>
> -hpa
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/17/2009 03:34 PM, Masami Hiramatsu wrote:
> H. Peter Anvin wrote:
>> On 12/17/2009 02:09 PM, Sam Ravnborg wrote:
>>>
>>> This is IMO -next fodder and not urgent stuff.
>>> In the distant past something similar caused us troubles.
>>>
>>> I assume that the patch posted by Roland fixes
>>> the relevant troubles with the awk script and that
>>> this patch is just for consistency.
>>> So in other words stuff for -next and not -rc1.
>>>
>>
>> No: the patch Roland posted fixes the problems with non-POSIX-compliant
>> awk (e.g. mawk), but it in turn requires this patch in order to not
>> break on non-C-locale-default systems (which are most systems nowadays.)
>
> Right, gawk requires locale setting.
>
> btw, something went wrong with it...
>
> when I ran with LC_ALL=C, it shows message correctly.
>
> Maybe, "LC_ALL=" line broke it?
>
What happens if you remove that part?
-hpa
> btw, something went wrong with it...
> make[3]: `scripts/unifdef' ???????
> when I ran with LC_ALL=C, it shows message correctly.
> make[3]: `scripts/unifdef' is up to date.
Do you have LANG and/or LC_MESSAGES set? LC_ALL will override all other
locale settings, so hpa explicitly unset it so that the other LC_xxx
variables will be used. But it means your environment's LC_MESSAGES (or
LANG if LC_MESSAGES is not set) will be used.
- R.
On 12/17/2009 03:37 PM, H. Peter Anvin wrote:
>>
>> Maybe, "LC_ALL=" line broke it?
>
> What happens if you remove that part?
>
I mean, of course, the references to LC_ALL.
What would be even better, actually, would be to test the following
incremental patch:
-hpa
H. Peter Anvin wrote:
> On 12/17/2009 03:34 PM, Masami Hiramatsu wrote:
>> H. Peter Anvin wrote:
>>> On 12/17/2009 02:09 PM, Sam Ravnborg wrote:
>>>>
>>>> This is IMO -next fodder and not urgent stuff.
>>>> In the distant past something similar caused us troubles.
>>>>
>>>> I assume that the patch posted by Roland fixes
>>>> the relevant troubles with the awk script and that
>>>> this patch is just for consistency.
>>>> So in other words stuff for -next and not -rc1.
>>>>
>>>
>>> No: the patch Roland posted fixes the problems with non-POSIX-compliant
>>> awk (e.g. mawk), but it in turn requires this patch in order to not
>>> break on non-C-locale-default systems (which are most systems nowadays.)
>>
>> Right, gawk requires locale setting.
>>
>> btw, something went wrong with it...
>>
>> when I ran with LC_ALL=C, it shows message correctly.
>>
>> Maybe, "LC_ALL=" line broke it?
>>
>
> What happens if you remove that part?
Commented out LC_ALL= as below;
@@ -17,7 +17,7 @@ NAME = Man-Eating Seals of Antiquity
MAKEFLAGS += -rR --no-print-directory
# Avoid funny character set dependencies
-LC_ALL=
+#LC_ALL=
LC_CTYPE=C
LC_COLLATE=C
LC_NUMERIC=C
And I've gotten;
---
make O=/home/mhiramat/kbin/2.6.32 all
Using /home/mhiramat/ksrc/linux-2.6-tip as source for kernel
GEN /home/mhiramat/kbin/2.6.32/Makefile
CHK include/linux/version.h
CHK include/linux/utsrelease.h
SYMLINK include/asm -> include/asm-x86
CALL /home/mhiramat/ksrc/linux-2.6-tip/scripts/checksyscalls.sh
CHK include/linux/compile.h
CHK include/linux/version.h
make[3]: `scripts/unifdef' is up to date.
TEST posttest
Succeed: decoded and checked 887431 instructions
Kernel: arch/x86/boot/bzImage is ready (#19)
Building modules, stage 2.
MODPOST 434 modules
WARNING: modpost: Found 2 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'
---
It seems to be fixed. :-)
Thank you,
>
> -hpa
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
Roland Dreier wrote:
>
> > btw, something went wrong with it...
>
> > make[3]: `scripts/unifdef' ???????
>
> > when I ran with LC_ALL=C, it shows message correctly.
>
> > make[3]: `scripts/unifdef' is up to date.
>
> Do you have LANG and/or LC_MESSAGES set? LC_ALL will override all other
> locale settings, so hpa explicitly unset it so that the other LC_xxx
> variables will be used. But it means your environment's LC_MESSAGES (or
> LANG if LC_MESSAGES is not set) will be used.
Actually, I haven't touched it (I just installed Fedora11).
And I think most of users don't set it too.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
On 12/17/2009 03:41 PM, Masami Hiramatsu wrote:
>>
>> What happens if you remove that part?
>
> Commented out LC_ALL= as below;
>
OK, what about the "unexport" variant?
-hpa
On 12/17/2009 03:44 PM, Masami Hiramatsu wrote:
> Roland Dreier wrote:
>>
>> > btw, something went wrong with it...
>>
>> > make[3]: `scripts/unifdef' ???????
>>
>> > when I ran with LC_ALL=C, it shows message correctly.
>>
>> > make[3]: `scripts/unifdef' is up to date.
>>
>> Do you have LANG and/or LC_MESSAGES set? LC_ALL will override all other
>> locale settings, so hpa explicitly unset it so that the other LC_xxx
>> variables will be used. But it means your environment's LC_MESSAGES (or
>> LANG if LC_MESSAGES is not set) will be used.
>
> Actually, I haven't touched it (I just installed Fedora11).
> And I think most of users don't set it too.
>
Fedora will typically set LANG. Do:
printenv | egrep '^(LANG|LC_)'
... to verify.
-hpa
H. Peter Anvin wrote:
> On 12/17/2009 03:41 PM, Masami Hiramatsu wrote:
>>>
>>> What happens if you remove that part?
>>
>> Commented out LC_ALL= as below;
>>
>
> OK, what about the "unexport" variant?
Sure, unexport variant fixed it too.
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
H. Peter Anvin wrote:
> On 12/17/2009 03:44 PM, Masami Hiramatsu wrote:
>> Roland Dreier wrote:
>>>
>>> > btw, something went wrong with it...
>>>
>>> > make[3]: `scripts/unifdef' ???????
>>>
>>> > when I ran with LC_ALL=C, it shows message correctly.
>>>
>>> > make[3]: `scripts/unifdef' is up to date.
>>>
>>> Do you have LANG and/or LC_MESSAGES set? LC_ALL will override all other
>>> locale settings, so hpa explicitly unset it so that the other LC_xxx
>>> variables will be used. But it means your environment's LC_MESSAGES (or
>>> LANG if LC_MESSAGES is not set) will be used.
>>
>> Actually, I haven't touched it (I just installed Fedora11).
>> And I think most of users don't set it too.
>>
>
> Fedora will typically set LANG. Do:
>
> printenv | egrep '^(LANG|LC_)'
>
> ... to verify.
Ah, yes. it sets the LANG.
$ env | grep LANG
LANG=ja_JP.UTF-8
Thanks,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]
Commit-ID: 06b5dc646b9479b786d77749936f25910cd82a37
Gitweb: http://git.kernel.org/tip/06b5dc646b9479b786d77749936f25910cd82a37
Author: H. Peter Anvin <[email protected]>
AuthorDate: Thu, 17 Dec 2009 15:51:37 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 17 Dec 2009 15:51:37 -0800
Makefile: Unexport LC_ALL instead of clearing it
Apparently not all versions of glibc and utilities treat an empty
LC_ALL as nonexistent, causing error messages to be garbled. Instead,
explicitly unexport it from the environment.
Reported-and-tested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Roland Dreier <[email protected]>
Cc: Sam Ravnborg <[email protected]>
---
Makefile | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 6e39af1..5e92ca5 100644
--- a/Makefile
+++ b/Makefile
@@ -17,11 +17,11 @@ NAME = Man-Eating Seals of Antiquity
MAKEFLAGS += -rR --no-print-directory
# Avoid funny character set dependencies
-LC_ALL=
+unexport LC_ALL
LC_CTYPE=C
LC_COLLATE=C
LC_NUMERIC=C
-export LC_ALL LC_CTYPE LC_COLLATE LC_NUMERIC
+export LC_CTYPE LC_COLLATE LC_NUMERIC
# We are using a recursive build, so we need to do a little thinking
# to get the ordering right.