2007-04-17 17:25:49

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

From: John Anthony Kazos Jr. <[email protected]-a-k-j.com>

Convert the subdirectory "crypto" to UTF-8. The files changed are
<crypto/fcrypt.c> and <crypto/api.c>.

Signed-off-by: John Anthony Kazos Jr. <[email protected]>

---

I can't get my mail client to send in ISO-8859-1 instead of UTF-8, so the
actual patch is attached in octet-stream format. The below patch is just
for reference and will almost certainly NOT work.

Also, since the patch includes both encodings, whichever encoding you
select while viewing it will display one correct and one garbled line.
However, the bytecodes -are- correct.

diff -uprN linux-2.6.21-rc7.orig/crypto/api.c linux-2.6.21-rc7.mod/crypto/api.c
--- linux-2.6.21-rc7.orig/crypto/api.c 2007-04-17 11:41:50.000000000 -0400
+++ linux-2.6.21-rc7.mod/crypto/api.c 2007-04-17 13:21:52.000000000 -0400
@@ -6,7 +6,7 @@
* Copyright (c) 2005 Herbert Xu <[email protected]>
*
* Portions derived from Cryptoapi, by Alexander Kjeldaas <[email protected]>
- * and Nettle, by Niels M?ller.
+ * and Nettle, by Niels Möller.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
diff -uprN linux-2.6.21-rc7.orig/crypto/fcrypt.c linux-2.6.21-rc7.mod/crypto/fcrypt.c
--- linux-2.6.21-rc7.orig/crypto/fcrypt.c 2007-04-17 11:41:50.000000000 -0400
+++ linux-2.6.21-rc7.mod/crypto/fcrypt.c 2007-04-17 13:21:52.000000000 -0400
@@ -10,7 +10,7 @@
*
* Based on code:
*
- * Copyright (c) 1995 - 2000 Kungliga Tekniska H?gskolan
+ * Copyright (c) 1995 - 2000 Kungliga Tekniska Högskolan
* (Royal Institute of Technology, Stockholm, Sweden).
* All rights reserved.
*


Attachments:
patch-crypto.bin (1.08 kB)

2007-04-18 09:33:29

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

John Anthony Kazos Jr. wrote:
> Convert the subdirectory "crypto" to UTF-8. The files changed are
> <crypto/fcrypt.c> and <crypto/api.c>.

Aren't we using ASCII for C sources?
--
Stefan Richter
-=====-=-=== -=-- =--=-
http://arcgraph.de/sr/

2007-04-18 09:45:25

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

On Wed, 18 Apr 2007 11:33:29 +0200
Stefan Richter <[email protected]> wrote:

> John Anthony Kazos Jr. wrote:
> > Convert the subdirectory "crypto" to UTF-8. The files changed are
> > <crypto/fcrypt.c> and <crypto/api.c>.
>
> Aren't we using ASCII for C sources?

We haven't done that for many years - it makes it hard to spell some
authors names properly for one.

2007-05-02 04:44:09

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

On Tue, Apr 17, 2007 at 01:25:49PM -0400, John Anthony Kazos Jr. wrote:
> From: John Anthony Kazos Jr. <jak[email protected]>
>
> Convert the subdirectory "crypto" to UTF-8. The files changed are
> <crypto/fcrypt.c> and <crypto/api.c>.
>
> Signed-off-by: John Anthony Kazos Jr. <[email protected]>

Thanks. Could you fix up include/linux/crypto.h as well?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-05-02 09:58:14

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

> > Convert the subdirectory "crypto" to UTF-8. The files changed are
> > <crypto/fcrypt.c> and <crypto/api.c>.
> >
> > Signed-off-by: John Anthony Kazos Jr. <[email protected]>
>
> Thanks. Could you fix up include/linux/crypto.h as well?

Sure, will do. Since I've gotten almost no feedback about these patches
whatsoever, I figured either I was doing it wrong or nobody cared. I guess
I'll finish them all up and resubmit them and hope they stick. Gimme a
few days: There's a huge number of files throughout the tree to change,
and some of them (like the .map files) require non-trivial alterations.

(Were you able to successfully apply the patch attached in octet-stream
encoding?)

2007-05-02 10:01:02

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

John Anthony Kazos Jr. <[email protected]> wrote:
>
> (Were you able to successfully apply the patch attached in octet-stream
> encoding?)

Yes it worked fine.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-05-03 11:19:08

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: [PATCH] crypto: convert crypto.h to UTF-8

From: John Anthony Kazos Jr. <[email protected]>

Convert the encoding of <include/linux/crypto.h> from ISO-8859-1 to UTF-8.

Signed-off-by: John Anthony Kazos Jr. <[email protected]>

---

Did this file individually, per request. Will re-do the whole tree later
as I'm still working on my handy-dandy testing and patching tools and
don't have a lot of time outside of work until the summer gets underway.

The patch works in ISO-8859-1 encoding. If I am unsuccessful in getting
this mail to send as that, or if your viewer reinterprets it from that,
you'll need to use the file attached as MIME type octet-stream. Thanks for
your time and interest.

--- linux-2.6.21-rc7-git4.orig/include/linux/crypto.h 2007-04-20 20:22:13.000000000 -0400
+++ linux-2.6.21-rc7-git4.mod/include/linux/crypto.h 2007-05-03 07:09:47.000000000 -0400
@@ -6,7 +6,7 @@
* Copyright (c) 2005 Herbert Xu <[email protected]>
*
* Portions derived from Cryptoapi, by Alexander Kjeldaas <[email protected]>
- * and Nettle, by Niels M?ller.
+ * and Nettle, by Niels Möller.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free


Attachments:
include-linux-crypto-h-to-utf8.patch.bin (565.00 B)

2007-05-03 12:13:36

by Arne Georg Gleditsch

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8

"John Anthony Kazos Jr." <[email protected]> writes:
> Did this file individually, per request. Will re-do the whole tree later
> as I'm still working on my handy-dandy testing and patching tools and
> don't have a lot of time outside of work until the summer gets underway.

While this is probably inevitable, it would be nice if there was some
way to determine the actual coding system used for individual files.
Especially if we're mixing latin1 and utf8 in the same tree. Has
something like adding "/* -*- coding: utf-8; -*- */" or similar to the
top of converted files been considered?

--
Arne.

2007-05-03 13:27:43

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8

> > Did this file individually, per request. Will re-do the whole tree later
> > as I'm still working on my handy-dandy testing and patching tools and
> > don't have a lot of time outside of work until the summer gets underway.
>
> While this is probably inevitable, it would be nice if there was some
> way to determine the actual coding system used for individual files.
> Especially if we're mixing latin1 and utf8 in the same tree. Has
> something like adding "/* -*- coding: utf-8; -*- */" or similar to the
> top of converted files been considered?

There's no reason for any non-UTF-8 to be in the tree at all, so
eventually it won't be a problem. I'm (slowly but surely) working on
converting everything in the tree. GCC handles UTF-8 just fine, and all
non-stupid/non-broken distributions of GNU/Linux and other major Un*ces
should be based on (or at least compatible with) UTF-8 in basic
operations. Files like the keymaps will be more work to convert, but they
can be as well.

I'm operating on the assumption that anything in the tree that isn't UTF-8
is ISO-8859-1. Of course, I'm also checking it by hand to make sure a
small-O-with-umlaut doesn't become the Klingon logo...

Besides, based on the actual binary representation of UTF-8, it's
extremely unlikely for any ISO-8859-1 string to be detected as UTF-8. VIm
already does this: UTF-8 it handles natively, but open up one of these
unpatched files in VIm and you'll see "[converted]" at the bottom of your
screen. Should happen if you open the attached .patch.bin file in VIm.

2007-05-03 13:52:32

by Arne Georg Gleditsch

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8

"John Anthony Kazos Jr." <[email protected]> writes:
> Besides, based on the actual binary representation of UTF-8, it's
> extremely unlikely for any ISO-8859-1 string to be detected as UTF-8. VIm
> already does this: UTF-8 it handles natively, but open up one of these
> unpatched files in VIm and you'll see "[converted]" at the bottom of your
> screen. Should happen if you open the attached .patch.bin file in VIm.

Yes, I agree that heuristics can be used to determine the coding
system used with a high degree of probability. I'm just suggesting we
make the coding system explicit, in order to spare other applications
that do visual presentation of Linux source code having to perform
their own heuristics.

But hey, if I'm the only one wanting to see this particular bike shed
painted blue...

--
Arne.

2007-05-03 14:39:00

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8

> > Besides, based on the actual binary representation of UTF-8, it's
> > extremely unlikely for any ISO-8859-1 string to be detected as UTF-8. VIm
> > already does this: UTF-8 it handles natively, but open up one of these
> > unpatched files in VIm and you'll see "[converted]" at the bottom of your
> > screen. Should happen if you open the attached .patch.bin file in VIm.
>
> Yes, I agree that heuristics can be used to determine the coding
> system used with a high degree of probability. I'm just suggesting we
> make the coding system explicit, in order to spare other applications
> that do visual presentation of Linux source code having to perform
> their own heuristics.
>
> But hey, if I'm the only one wanting to see this particular bike shed
> painted blue...

In any other case, I would agree completely, but like I said, the entire
tree will be one encoding anyway, which will be UTF-8. Many tools would
already have these heuristics, and even if they don't, the code itself
will be perfectly legible no matter which encoding is used to view it, and
only some things like circuit diagrams and peoples' names may be slightly
mangled. Any byte which is 0x7F or less is identical in both encodings, so
all normal ASCII is unchanged.

It's an issue only in transition. No new non-UTF-8 patches should be
accepted, of course. And I won't be doing any changes that break anything
other than a live person reading it and wondering why it looks weird if
they don't have the right tools.

A line in <Documentation/CodingStyle> explicitly requiring UTF-8 sounds
like a good idea. I shall submit one later if I remember. Many mail
clients do this conversion automatically, which causes me headaches and
requires me to attach the patches separately. Quite annoying really. I'm
exploring the possibility of hacking together a simple SMTP client if
that'll help me with these fool binary patches.

Or maybe setting up a little mini git repository with UTF-8 changes is a
good idea? Hmm, I'll research that, and try it. Then I could just cc it to
trivial and the list and they can just pull it in one go. One more week
until summer vacation! I shall have time to do this then.

2007-05-03 15:28:01

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8


On May 3 2007 09:21, John Anthony Kazos Jr. wrote:
>
>There's no reason for any non-UTF-8 to be in the tree at all, so
>eventually it won't be a problem. I'm (slowly but surely) working on
>converting everything in the tree. GCC handles UTF-8 just fine, and all

In fact, GCC gives a crap about comments :)
and otherwise sees things as octets, not characters.
I think GCJ is the only one that really pays attention to encoding.

>non-stupid/non-broken distributions of GNU/Linux and other major Un*ces
>should be based on (or at least compatible with) UTF-8 in basic
>operations. Files like the keymaps will be more work to convert, but they
>can be as well.
>
>I'm operating on the assumption that anything in the tree that isn't UTF-8
>is ISO-8859-1. Of course, I'm also checking it by hand to make sure a
>small-O-with-umlaut doesn't become the Klingon logo...

This is probably all you'll ever see:
http://lkml.org/lkml/2007/1/8/222


Jan
--

2007-05-03 15:41:39

by John Anthony Kazos Jr.

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8

> >There's no reason for any non-UTF-8 to be in the tree at all, so
> >eventually it won't be a problem. I'm (slowly but surely) working on
> >converting everything in the tree. GCC handles UTF-8 just fine, and all
>
> In fact, GCC gives a crap about comments :)
> and otherwise sees things as octets, not characters.
> I think GCJ is the only one that really pays attention to encoding.

The changes to the keymap files and so forth aren't quite as simple
though. Have to convert the latin1 characters within character-constant
expressions to \0 notation, while the comments can be left as utf8.

> >non-stupid/non-broken distributions of GNU/Linux and other major Un*ces
> >should be based on (or at least compatible with) UTF-8 in basic
> >operations. Files like the keymaps will be more work to convert, but they
> >can be as well.
> >
> >I'm operating on the assumption that anything in the tree that isn't UTF-8
> >is ISO-8859-1. Of course, I'm also checking it by hand to make sure a
> >small-O-with-umlaut doesn't become the Klingon logo...
>
> This is probably all you'll ever see:
> http://lkml.org/lkml/2007/1/8/222

Does this mean you're doing it and I'll be ignored, or that few people
care and I'll be ignored? I figure if I just repost my patches to LKML
once per month, they'll eventually get merged (or at least I'll get
comments on how people actually want them). Things are tough on a
high-volume list. I think the git method may have the best chance of
success. We'll see.

2007-05-03 18:37:53

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] crypto: convert crypto.h to UTF-8


On May 3 2007 11:35, John Anthony Kazos Jr. wrote:
>
>> >non-stupid/non-broken distributions of GNU/Linux and other major Un*ces
>> >should be based on (or at least compatible with) UTF-8 in basic
>> >operations. Files like the keymaps will be more work to convert, but they
>> >can be as well.
>> >
>> >I'm operating on the assumption that anything in the tree that isn't UTF-8
>> >is ISO-8859-1. Of course, I'm also checking it by hand to make sure a
>> >small-O-with-umlaut doesn't become the Klingon logo...
>>
>> This is probably all you'll ever see:
>> http://lkml.org/lkml/2007/1/8/222
>
>Does this mean you're doing it and I'll be ignored, or that few people
>care and I'll be ignored?

Nah. I did a walkthrough once, and my discoveries were that
iso-8859-{2 .. 14} was a real minority if not nonexistant,
leaving you with almost obvious choices to guess what a file's
encoding is. If a name looks good, it must be UTF8 already.
Else try ISO-8859-1. If it still looks odd -- perhaps because
it's a weirdo character like "1/2" or it "does not sound right",
try cp437. etc.

> I figure if I just repost my patches to LKML
>once per month, they'll eventually get merged (or at least I'll get
>comments on how people actually want them). Things are tough on a
>high-volume list. I think the git method may have the best chance of
>success. We'll see.
>

Jan
--