2009-12-24 11:13:09

by Simon Horman

[permalink] [raw]
Subject: [patch] Makefile: Unexport LANG

The recent changes to setting and unexport various LC_ variables
produces a problem on my system (Debian sid).

$ locale
LANG=ja_JP.utf8
LANGUAGE=ja_JP.utf8
LC_CTYPE="ja_JP.utf8"
LC_NUMERIC="ja_JP.utf8"
LC_TIME="ja_JP.utf8"
LC_COLLATE="ja_JP.utf8"
LC_MONETARY="ja_JP.utf8"
LC_MESSAGES="ja_JP.utf8"
LC_PAPER="ja_JP.utf8"
LC_NAME="ja_JP.utf8"
LC_ADDRESS="ja_JP.utf8"
LC_TELEPHONE="ja_JP.utf8"
LC_MEASUREMENT="ja_JP.utf8"
LC_IDENTIFICATION="ja_JP.utf8"
LC_ALL=ja_JP.utf8

Without this patch:
$ make
make[2]: ??: ?? make ? -jN ?????????: jobserver ??????????.
make[2]: ??: ?? make ? -jN ?????????: jobserver ??????????.

With this patch:
$ make
...
make[2]: warning: -jN forced in submake: disabling jobserver mode.
make[2]: warning: -jN forced in submake: disabling jobserver mode.
...

Cc: H. Peter Anvin <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Roland Dreier <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Simon Horman <[email protected]>

Index: linux-2.6/Makefile
===================================================================
--- linux-2.6.orig/Makefile 2009-12-24 22:09:29.000000000 +1100
+++ linux-2.6/Makefile 2009-12-24 22:10:58.000000000 +1100
@@ -17,6 +17,7 @@ NAME = Man-Eating Seals of Antiquity
MAKEFLAGS += -rR --no-print-directory

# Avoid funny character set dependencies
+unexport LANG
unexport LC_ALL
LC_CTYPE=C
LC_COLLATE=C


2009-12-26 04:31:07

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

Simon Horman wrote:
> The recent changes to setting and unexport various LC_ variables
> produces a problem on my system (Debian sid).
>
> $ locale
> LANG=ja_JP.utf8
> LANGUAGE=ja_JP.utf8
> LC_CTYPE="ja_JP.utf8"
> LC_NUMERIC="ja_JP.utf8"
> LC_TIME="ja_JP.utf8"
> LC_COLLATE="ja_JP.utf8"
> LC_MONETARY="ja_JP.utf8"
> LC_MESSAGES="ja_JP.utf8"
> LC_PAPER="ja_JP.utf8"
> LC_NAME="ja_JP.utf8"
> LC_ADDRESS="ja_JP.utf8"
> LC_TELEPHONE="ja_JP.utf8"
> LC_MEASUREMENT="ja_JP.utf8"
> LC_IDENTIFICATION="ja_JP.utf8"
> LC_ALL=ja_JP.utf8
>
> Without this patch:
> $ make
> make[2]: ??: ?? make ? -jN ?????????: jobserver ??????????.
> make[2]: ??: ?? make ? -jN ?????????: jobserver ??????????.
>
> With this patch:
> $ make
> ...
> make[2]: warning: -jN forced in submake: disabling jobserver mode.
> make[2]: warning: -jN forced in submake: disabling jobserver mode.
> ...
>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Michal Marek <[email protected]>
> Cc: Roland Dreier <[email protected]>
> Cc: Sam Ravnborg <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Signed-off-by: Simon Horman <[email protected]>

Tested on Fedora 11 too, and it works good. Thank you!

Tested-by: Masami Hiramatsu <[email protected]>

>
> Index: linux-2.6/Makefile
> ===================================================================
> --- linux-2.6.orig/Makefile 2009-12-24 22:09:29.000000000 +1100
> +++ linux-2.6/Makefile 2009-12-24 22:10:58.000000000 +1100
> @@ -17,6 +17,7 @@ NAME = Man-Eating Seals of Antiquity
> MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> +unexport LANG
> unexport LC_ALL
> LC_CTYPE=C
> LC_COLLATE=C
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-12-26 05:15:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On 12/25/2009 08:30 PM, Masami Hiramatsu wrote:
>>
>> # Avoid funny character set dependencies
>> +unexport LANG
>> unexport LC_ALL
>> LC_CTYPE=C
>> LC_COLLATE=C
>

At this point, it seems to me that we should just LC_ALL=C and be done
with it (see other thread.)

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-12-26 11:20:13

by Simon Horman

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On Fri, Dec 25, 2009 at 09:14:40PM -0800, H. Peter Anvin wrote:
> On 12/25/2009 08:30 PM, Masami Hiramatsu wrote:
> >>
> >> # Avoid funny character set dependencies
> >> +unexport LANG
> >> unexport LC_ALL
> >> LC_CTYPE=C
> >> LC_COLLATE=C
> >
>
> At this point, it seems to me that we should just LC_ALL=C and be done
> with it (see other thread.)

Sure, that would also work for the case that I'm seeing.

I tested the following:

# Avoid funny character set dependencies
LC_ALL=C
export LC_ALL

Though personally I would advocate tweaking the locale as needed closer
to awk scripts and the like, rather than the high-level general change that
was made. Fall-out from a high-level change seems inevitable to me.

2010-01-08 00:41:24

by Simon Horman

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On Sat, Dec 26, 2009 at 10:20:07PM +1100, Simon Horman wrote:
> On Fri, Dec 25, 2009 at 09:14:40PM -0800, H. Peter Anvin wrote:
> > On 12/25/2009 08:30 PM, Masami Hiramatsu wrote:
> > >>
> > >> # Avoid funny character set dependencies
> > >> +unexport LANG
> > >> unexport LC_ALL
> > >> LC_CTYPE=C
> > >> LC_COLLATE=C
> > >
> >
> > At this point, it seems to me that we should just LC_ALL=C and be done
> > with it (see other thread.)
>
> Sure, that would also work for the case that I'm seeing.
>
> I tested the following:
>
> # Avoid funny character set dependencies
> LC_ALL=C
> export LC_ALL
>
> Though personally I would advocate tweaking the locale as needed closer
> to awk scripts and the like, rather than the high-level general change that
> was made. Fall-out from a high-level change seems inevitable to me.

This seems to still be broken. Can we decide on a solution?

2010-01-08 00:44:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On 01/07/2010 04:41 PM, Simon Horman wrote:
> On Sat, Dec 26, 2009 at 10:20:07PM +1100, Simon Horman wrote:
>> On Fri, Dec 25, 2009 at 09:14:40PM -0800, H. Peter Anvin wrote:
>>> On 12/25/2009 08:30 PM, Masami Hiramatsu wrote:
>>>>>
>>>>> # Avoid funny character set dependencies
>>>>> +unexport LANG
>>>>> unexport LC_ALL
>>>>> LC_CTYPE=C
>>>>> LC_COLLATE=C
>>>>
>>>
>>> At this point, it seems to me that we should just LC_ALL=C and be done
>>> with it (see other thread.)
>>
>> Sure, that would also work for the case that I'm seeing.
>>
>> I tested the following:
>>
>> # Avoid funny character set dependencies
>> LC_ALL=C
>> export LC_ALL
>>
>> Though personally I would advocate tweaking the locale as needed closer
>> to awk scripts and the like, rather than the high-level general change that
>> was made. Fall-out from a high-level change seems inevitable to me.
>
> This seems to still be broken. Can we decide on a solution?
>

I think it's up to Michal to pick the preferred solution.

It has been pointed out that one option might also to be to *not*
override LC_CTYPE, and only override LC_COLLATE.

-hpa

2010-01-08 02:45:59

by Simon Horman

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On Thu, Jan 07, 2010 at 04:43:55PM -0800, H. Peter Anvin wrote:
> On 01/07/2010 04:41 PM, Simon Horman wrote:
> > On Sat, Dec 26, 2009 at 10:20:07PM +1100, Simon Horman wrote:
> >> On Fri, Dec 25, 2009 at 09:14:40PM -0800, H. Peter Anvin wrote:
> >>> On 12/25/2009 08:30 PM, Masami Hiramatsu wrote:
> >>>>>
> >>>>> # Avoid funny character set dependencies
> >>>>> +unexport LANG
> >>>>> unexport LC_ALL
> >>>>> LC_CTYPE=C
> >>>>> LC_COLLATE=C
> >>>>
> >>>
> >>> At this point, it seems to me that we should just LC_ALL=C and be done
> >>> with it (see other thread.)
> >>
> >> Sure, that would also work for the case that I'm seeing.
> >>
> >> I tested the following:
> >>
> >> # Avoid funny character set dependencies
> >> LC_ALL=C
> >> export LC_ALL
> >>
> >> Though personally I would advocate tweaking the locale as needed closer
> >> to awk scripts and the like, rather than the high-level general change that
> >> was made. Fall-out from a high-level change seems inevitable to me.
> >
> > This seems to still be broken. Can we decide on a solution?
> >
>
> I think it's up to Michal to pick the preferred solution.
>
> It has been pointed out that one option might also to be to *not*
> override LC_CTYPE, and only override LC_COLLATE.

I've confirmed that both of the following allow make to give sane output
for me. And they are better than my suggestion in the respect that the
error messages are according to the otherwise prevailing locale, not
suddenly switched to English.

# Avoid funny character set dependencies
unexport LC_ALL
LC_COLLATE=C
export LC_NUMERIC

# Avoid funny character set dependencies
unexport LC_ALL
LC_COLLATE=C
LC_NUMERIC=C
export LC_COLLATE LC_NUMERIC

I did not verify that they do something sensible for the awk concern
that originally introduced the locale change - but I think it is
unaffected by my locale settings.

2010-01-08 02:59:28

by Simon Horman

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On Fri, Jan 08, 2010 at 01:45:56PM +1100, Simon Horman wrote:
> On Thu, Jan 07, 2010 at 04:43:55PM -0800, H. Peter Anvin wrote:
> > On 01/07/2010 04:41 PM, Simon Horman wrote:
> > > On Sat, Dec 26, 2009 at 10:20:07PM +1100, Simon Horman wrote:
> > >> On Fri, Dec 25, 2009 at 09:14:40PM -0800, H. Peter Anvin wrote:
> > >>> On 12/25/2009 08:30 PM, Masami Hiramatsu wrote:
> > >>>>>
> > >>>>> # Avoid funny character set dependencies
> > >>>>> +unexport LANG
> > >>>>> unexport LC_ALL
> > >>>>> LC_CTYPE=C
> > >>>>> LC_COLLATE=C
> > >>>>
> > >>>
> > >>> At this point, it seems to me that we should just LC_ALL=C and be done
> > >>> with it (see other thread.)
> > >>
> > >> Sure, that would also work for the case that I'm seeing.
> > >>
> > >> I tested the following:
> > >>
> > >> # Avoid funny character set dependencies
> > >> LC_ALL=C
> > >> export LC_ALL
> > >>
> > >> Though personally I would advocate tweaking the locale as needed closer
> > >> to awk scripts and the like, rather than the high-level general change that
> > >> was made. Fall-out from a high-level change seems inevitable to me.
> > >
> > > This seems to still be broken. Can we decide on a solution?
> > >
> >
> > I think it's up to Michal to pick the preferred solution.

Is it just me or is Michal's email bouncing of late?

2010-01-08 11:57:49

by Michal Marek

[permalink] [raw]
Subject: Re: [patch] Makefile: Unexport LANG

On Thu, Jan 07, 2010 at 04:43:55PM -0800, H. Peter Anvin wrote:
> On 01/07/2010 04:41 PM, Simon Horman wrote:
> > On Sat, Dec 26, 2009 at 10:20:07PM +1100, Simon Horman wrote:
> >> On Fri, Dec 25, 2009 at 09:14:40PM -0800, H. Peter Anvin wrote:
> I think it's up to Michal to pick the preferred solution.
>
> It has been pointed out that one option might also to be to *not*
> override LC_CTYPE, and only override LC_COLLATE.

Yes, that's imo a good compromise. As I noted in another thread, the
only drawback of not setting LC_CTYPE is that it makes the behavior of
awk's tolower()/toupper() volatile, but that seems to be only used in a
single script. I'll post a patch in a separate email.


On Fri, Jan 08, 2010 at 01:59:24PM +1100, Simon Horman wrote:
> Is it just me or is Michal's email bouncing of late?

My email address was wrong, sues.cz does not exist, suse.cz does. Which
is why I overlooked this thread.

Michal

2010-01-08 12:16:51

by Michal Marek

[permalink] [raw]
Subject: [PATCH] Makefile: do not override LC_CTYPE

Setting LC_CTYPE=C breaks localized messages in some setups. With only
LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
so defined character classes and tolower()/toupper(). The former is not
a big issue, because we can assume that e.g. [:alpha:] will always
include a-zA-Z and we only ever process ASCII input. The latter seems
only affect arch/sh/tools/gen-mach-types, which we can handle separately.

So after this patch the meaning of ranges like [a-z], the behavior of
sort and join, etc. should be the same everywhere and at the same time
gcc should be able to print localized waring and error messages.
LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.

Reported-by: Simon Horman <[email protected]>
Reported-by: Sergei Trofimovich <[email protected]>
Signed-off-by: Michal Marek <[email protected]>
---

Note: if this still breaks for someone, we will simply set LC_ALL=C.

Makefile | 3 +--
arch/sh/tools/Makefile | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 09a320f..a7b4351 100644
--- a/Makefile
+++ b/Makefile
@@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory

# Avoid funny character set dependencies
unexport LC_ALL
-LC_CTYPE=C
LC_COLLATE=C
LC_NUMERIC=C
-export LC_CTYPE LC_COLLATE LC_NUMERIC
+export LC_COLLATE LC_NUMERIC

# We are using a recursive build, so we need to do a little thinking
# to get the ordering right.
diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
index 558a56b..2082af1 100644
--- a/arch/sh/tools/Makefile
+++ b/arch/sh/tools/Makefile
@@ -13,4 +13,4 @@
include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
@echo ' Generating $@'
$(Q)mkdir -p $(dir $@)
- $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
+ $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
--
1.6.5.3

2010-01-08 18:50:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

On 01/08/2010 04:16 AM, Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <[email protected]>
> Reported-by: Sergei Trofimovich <[email protected]>
> Signed-off-by: Michal Marek <[email protected]>

For what it's worth:

Acked-by: H. Peter Anvin <[email protected]>

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-01-09 00:00:13

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

Hi Michal,

sorry for messing up your email address in one of the previous threads.

On Fri, Jan 08, 2010 at 01:16:28PM +0100, Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <[email protected]>
> Reported-by: Sergei Trofimovich <[email protected]>
> Signed-off-by: Michal Marek <[email protected]>

Tested-by: Simon Horman <[email protected]>

> ---
>
> Note: if this still breaks for someone, we will simply set LC_ALL=C.

Personally I think it would be much better to set the locale explicitly
as needed, where needed, such as the LC_ALL=C sledgehammer that you
have inserted into arch/sh/tools. Or at a slightly higher level,
offer an awk-wrapper, as it seems to be the main (only?) cause of concern.

Surely the goal isn't to alter the user-experience - to the extent that a
build has a user-experience - but to force some tools to behave as desired.

Just an opinion. The patch below seems to work fine for me.

>
> Makefile | 3 +--
> arch/sh/tools/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 09a320f..a7b4351 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> unexport LC_ALL
> -LC_CTYPE=C
> LC_COLLATE=C
> LC_NUMERIC=C
> -export LC_CTYPE LC_COLLATE LC_NUMERIC
> +export LC_COLLATE LC_NUMERIC
>
> # We are using a recursive build, so we need to do a little thinking
> # to get the ordering right.
> diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
> index 558a56b..2082af1 100644
> --- a/arch/sh/tools/Makefile
> +++ b/arch/sh/tools/Makefile
> @@ -13,4 +13,4 @@
> include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
> @echo ' Generating $@'
> $(Q)mkdir -p $(dir $@)
> - $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> + $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> --
> 1.6.5.3

2010-01-09 00:08:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

On 01/08/2010 04:00 PM, Simon Horman wrote:
>
> Personally I think it would be much better to set the locale explicitly
> as needed, where needed, such as the LC_ALL=C sledgehammer that you
> have inserted into arch/sh/tools. Or at a slightly higher level,
> offer an awk-wrapper, as it seems to be the main (only?) cause of concern.
>

awk, sed, shell scripts, etc. all have the same problem.

-hpa

2010-01-09 00:10:34

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

Hi Michal,

Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.

Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
Could you also wrap it?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2010-01-09 00:16:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
> Hi Michal,
>
> Michal Marek wrote:
>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>> so defined character classes and tolower()/toupper(). The former is not
>> a big issue, because we can assume that e.g. [:alpha:] will always
>> include a-zA-Z and we only ever process ASCII input. The latter seems
>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
> Could you also wrap it?
>

This is tolower/toupper()? Do there exist locales where tolower/toupper
on ASCII input do weird things, or are we merely hypothesizing?

-hpa


2010-01-09 00:30:32

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?

Isn't it affect [A-Z] or [a-z]? If not, the patch good to me too.

Thank you,
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2010-01-09 00:44:35

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

> H. Peter Anvin wrote:
>> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>>> Hi Michal,
>>>
>>> Michal Marek wrote:
>>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for
>>>> not
>>>> so defined character classes and tolower()/toupper(). The former is
>>>> not
>>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>>> only affect arch/sh/tools/gen-mach-types, which we can handle
>>>> separately.
>>>
>>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>>> Could you also wrap it?
>>>
>>
>> This is tolower/toupper()? Do there exist locales where tolower/toupper
>> on ASCII input do weird things, or are we merely hypothesizing?
>
> Isn't it affect [A-Z] or [a-z]? If not, the patch good to me too.
>

[A-Z][a-z] is what LC_COLLATE is about.

-hpa

2010-01-09 00:54:04

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?

Ah, sorry, I was just hypothesizing.
---
#!/bin/sh
# en_US locale sorts alphabets as AaBb...
LANG=en_US
LC_ALL=
LC_COLLATE=C
LC_NUMERIC=C
export LC_COLLATE LC_NUMERIC
awk 'BEGIN{if (match("C","[a-z]")) {print "NG"} else {print "OK"} exit;}'
---
this returns "OK". So, the patch is OK for me too.

Thanks,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2010-01-09 01:08:22

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <[email protected]>
> Reported-by: Sergei Trofimovich <[email protected]>
> Signed-off-by: Michal Marek <[email protected]>

I checked that this change doesn't affect arch/x86/tools/gen-insn-attr-x86.awk.

Tested-by: Masami Hiramatsu <[email protected]>

Thank you!


> ---
>
> Note: if this still breaks for someone, we will simply set LC_ALL=C.
>
> Makefile | 3 +--
> arch/sh/tools/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 09a320f..a7b4351 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> unexport LC_ALL
> -LC_CTYPE=C
> LC_COLLATE=C
> LC_NUMERIC=C
> -export LC_CTYPE LC_COLLATE LC_NUMERIC
> +export LC_COLLATE LC_NUMERIC
>
> # We are using a recursive build, so we need to do a little thinking
> # to get the ordering right.
> diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
> index 558a56b..2082af1 100644
> --- a/arch/sh/tools/Makefile
> +++ b/arch/sh/tools/Makefile
> @@ -13,4 +13,4 @@
> include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
> @echo ' Generating $@'
> $(Q)mkdir -p $(dir $@)
> - $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> + $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2010-01-11 09:53:03

by Michal Marek

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

On 9.1.2010 01:16, H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?

In Turkish, uppercase i is İ (I with dot) and lowercase I is ı (i
without dot), see http://en.wikipedia.org/wiki/Dotted_and_dotless_I.

Michal

2010-01-11 10:49:18

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?

Turkish is the famous one for this and usually causes
internationalisation chaos. So yes they exist, and there are worse more
esoteric cases. There are good reasons sed and friends support classes as
well as old C locale style ranges.

Alan

2010-01-12 00:51:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Makefile: do not override LC_CTYPE

On 01/11/2010 02:52 AM, Alan Cox wrote:
>> This is tolower/toupper()? Do there exist locales where tolower/toupper
>> on ASCII input do weird things, or are we merely hypothesizing?
>
> Turkish is the famous one for this and usually causes
> internationalisation chaos. So yes they exist, and there are worse more
> esoteric cases. There are good reasons sed and friends support classes as
> well as old C locale style ranges.
>

Ah yes, forgot about Turkish. Apparently Lithuanian and Azeri also have
special rules for the letters I and J. Sigh.

-hpa