2017-04-01 09:42:36

by Adam Borowski

[permalink] [raw]
Subject: [GIT PULL] runes

Hi Linus!
Please pull from

https://github.com/kilobyte/linux.git runes

It contains fixes for OLCUC handling, expanding its functionality to what I
presume was your intention in implementing OLCUC in 1991 but couldn't be
done then.

The highlight is a fix for a bug that bisects all the way to 0.01, although
it wasn't as visible then as shells didn't output ANSI codes during normal
operation.

The rest adds proper support for the Great Runes; enabled if iutf8 is set
(default on modern terminals).


Without these fixes, even basic shell output is mangled:
================================================
kilobyte@andunie:~$ stty olcuc
[01;32MKILOBYTE@ANDUNIE[00M:[01;34M~[00M$ LS
[01;34MGOATPORN[0M
================================================

With the fixes, to get the old-style OLCUC behaviour ASCII you can stty
-iutf8; neither ANSI codes nor Unicode will be mangled anymore. But, that's
quite useless -- I wonder what was your reason to implement OLCUC back in
the day even though it was obsolete by decades even then, and was already
dropped from relevant standards. Thus, I guess you want proper rune
support, this pull request adds this. I've chosen Elder Futhark as the
variant to implement.

You might want to consider dropping patch 3, it is required only for full
coverage of the target set (Elder Futhark) but not for regular ASCII
mapping, and brings some code complexity.

Required userspace support is present in default GUI installs of all
distributions I checked, both on old-style terminals that use server-side X
fonts and with client-side {true,open}type -- but not on console unless
someone adds the required characters to the charset, which is tricky because
of 256/512 glyph limitations. Thus, runes support currently works only on
graphical terminals.


Meow! -- sorry, ᛗᛖᛟᚹ!

Commits up to 14f34ba1d8748f252f941b5bb87efd7b1ed55868 on top of
c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201.

----------------------------------------------------------------
Adam Borowski (4):
n_tty: don't mangle tty codes in OLCUC mode
n_tty: use runes rather than uppercase in IUTF8 OLCUC mode
n_tty: support th, ae and ng runes
n_tty: wrap all OLCUC code in a config option

drivers/tty/Kconfig | 17 ++++++++
drivers/tty/n_tty.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 120 insertions(+), 12 deletions(-)

--
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!


Attachments:
(No filename) (2.53 kB)
signature.asc (488.00 B)
Download all attachments

2017-04-01 09:43:42

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 1/4] n_tty: don't mangle tty codes in OLCUC mode

Any terminal younger than ~40 years requires lowercase commands, thus they
need to be exempt from uppercasing. And Linux doesn't really support
ancient all-uppercase terminals anyway (no XCASE, etc) even if someone made
the effort to somehow connect them electrically.

This bug is reproducible as of Linux 0.11 and I see it in 0.01 sources
(whose images fail to boot for me, I didn't try very hard). It was less of
a failure then as the shell didn't produce tty codes for normal operation.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/n_tty.c | 58 ++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bdf0e6e89991..bbc9f07c19fa 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -87,6 +87,8 @@
# define n_tty_trace(f, args...)
#endif

+enum { ESnormal, ESesc, EScsi, ESsetG };
+
struct n_tty_data {
/* producer-published */
size_t read_head;
@@ -121,6 +123,7 @@ struct n_tty_data {
unsigned int column;
unsigned int canon_column;
size_t echo_tail;
+ unsigned int vt_state;

struct mutex atomic_read_lock;
struct mutex output_lock;
@@ -392,6 +395,40 @@ static inline int is_continuation(unsigned char c, struct tty_struct *tty)
return I_IUTF8(tty) && is_utf8_continuation(c);
}

+/* process one OLCUC char (possibly partial unicode)
+ *
+ * We need to partially parse ANSI sequences to avoid uppercasing them;
+ * only some commands require lowercase.
+ */
+
+static int do_olcuc_char(unsigned char c, struct tty_struct *tty)
+{
+ struct n_tty_data *ldata = tty->disc_data;
+
+ switch (ldata->vt_state) {
+ case ESesc:
+ ldata->vt_state = (c == '[') ? EScsi :
+ strchr("%()*+-./", c) ? ESsetG : ESnormal;
+ break;
+ case EScsi:
+ if (!strchr("?;0123456789>!c$\" \\", c))
+ ldata->vt_state = ESnormal;
+ break;
+ case ESsetG:
+ ldata->vt_state = ESnormal;
+ break;
+ default:
+ if (c == '\e')
+ ldata->vt_state = ESesc;
+ else if (c >= 'a' && c <= 'z')
+ c -= 32;
+ }
+ if (!iscntrl(c) && !is_continuation(c, tty))
+ ldata->column++;
+ tty_put_char(tty, c);
+ return 1;
+}
+
/**
* do_output_char - output one character
* @c: character (or partial unicode symbol)
@@ -462,12 +499,10 @@ static int do_output_char(unsigned char c, struct tty_struct *tty, int space)
ldata->column--;
break;
default:
- if (!iscntrl(c)) {
- if (O_OLCUC(tty))
- c = toupper(c);
- if (!is_continuation(c, tty))
- ldata->column++;
- }
+ if (O_OLCUC(tty))
+ return do_olcuc_char(c, tty);
+ if (!iscntrl(c) && !is_continuation(c, tty))
+ ldata->column++;
break;
}

@@ -568,12 +603,10 @@ static ssize_t process_output_block(struct tty_struct *tty,
ldata->column--;
break;
default:
- if (!iscntrl(c)) {
- if (O_OLCUC(tty))
- goto break_out;
- if (!is_continuation(c, tty))
- ldata->column++;
- }
+ if (O_OLCUC(tty))
+ goto break_out;
+ if (!iscntrl(c) && !is_continuation(c, tty))
+ ldata->column++;
break;
}
}
@@ -1895,6 +1928,7 @@ static int n_tty_open(struct tty_struct *tty)
ldata->num_overrun = 0;
ldata->no_room = 0;
ldata->lnext = 0;
+ ldata->vt_state = ESnormal;
tty->closing = 0;
/* indicate buffer work may resume */
clear_bit(TTY_LDISC_HALTED, &tty->flags);
--
2.11.0

2017-04-01 09:43:44

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 2/4] n_tty: use runes rather than uppercase in IUTF8 OLCUC mode

It is puzzling why OLCUC support has been added to Linux, despite it being
obsolete for a long long time before Linux was born. The only explanation
I see, as on all-caps displays those were often called "Great Runes" (see
the Jargon File), is that the intention was to support runes but that
wasn't possible before Unicode.

As the kernel's tty discipline knows about UTF-8, that's now possible.

Standards compliance:
Elder Futhark (2nd to 8th centuries), other than:
'y' 'c' are from Anglo-Saxon runes (5th to 11th centuries), 'c' should use
kauna rather than cen but as the former is shared with 'k', I sacrificed
period accuracy for usability.
'q' 'v' 'x' are from Medieval runes (12th to 15th centuries). 'x' could
use Anglo-Saxon eolhx but that's same glyph (and Unicode codepoint) as
Elder Futhark algiz 'z'.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/n_tty.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index bbc9f07c19fa..c36b9114f76b 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -401,8 +401,10 @@ static inline int is_continuation(unsigned char c, struct tty_struct *tty)
* only some commands require lowercase.
*/

-static int do_olcuc_char(unsigned char c, struct tty_struct *tty)
+static int do_olcuc_char(unsigned char c, struct tty_struct *tty, int space)
{
+ /* 3 bytes per character */
+ static const char *runes = "ᚨᛒᚳᛞᛖᚠᚷᚺᛁᛃᚲᛚᛗᚾᛟᛈᛩᚱᛊᛏᚢᚡᚹᛪᚣᛉ";
struct n_tty_data *ldata = tty->disc_data;

switch (ldata->vt_state) {
@@ -418,10 +420,24 @@ static int do_olcuc_char(unsigned char c, struct tty_struct *tty)
ldata->vt_state = ESnormal;
break;
default:
- if (c == '\e')
+ if (c == '\e') {
ldata->vt_state = ESesc;
- else if (c >= 'a' && c <= 'z')
+ break;
+ }
+ if (c >= 'a' && c <= 'z')
c -= 32;
+ if (I_IUTF8(tty)) {
+ if (c >= 'A' && c <= 'Z') {
+ if (space < 3)
+ return -1;
+ ldata->column++;
+ c -= 'A';
+ tty_put_char(tty, runes[3 * c + 0]);
+ tty_put_char(tty, runes[3 * c + 1]);
+ tty_put_char(tty, runes[3 * c + 2]);
+ return 1;
+ }
+ }
}
if (!iscntrl(c) && !is_continuation(c, tty))
ldata->column++;
@@ -500,7 +516,7 @@ static int do_output_char(unsigned char c, struct tty_struct *tty, int space)
break;
default:
if (O_OLCUC(tty))
- return do_olcuc_char(c, tty);
+ return do_olcuc_char(c, tty, space);
if (!iscntrl(c) && !is_continuation(c, tty))
ldata->column++;
break;
--
2.11.0

2017-04-01 09:43:48

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 4/4] n_tty: wrap all OLCUC code in a config option

Setting it to N, beside dropping runes, also disables old-style support
for all-caps OLCUC. To get those 40 years old terminals to work, set
CONFIG_TTY_RUNES=y which will DTRT when stty iutf8 is off.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/Kconfig | 17 +++++++++++++++++
drivers/tty/n_tty.c | 10 ++++++++++
2 files changed, 27 insertions(+)

diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 95103054c0e4..cfeed2b196e5 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -151,6 +151,23 @@ config LEGACY_PTY_COUNT
When not in use, each legacy PTY occupies 12 bytes on 32-bit
architectures and 24 bytes on 64-bit architectures.

+config TTY_RUNES
+ bool "Runes on OLCUC"
+ default y
+ ---help---
+ Certain terminals from the days of Unix' infancy supported only
+ all-caps, so-called "Great Runes". Linux still has rudimentary
+ support for those, although how can you connect one is another
+ matter; you enable that via "stty olcuc".
+
+ On anything modern, though (as in "stty iutf8"), you do get actual
+ runes. You need userspace fonts for that; modern distributions
+ tend to install both raster (xterm/rxvt) and TrueType (most
+ terminals) fonts on their default GUI setups. Alas, this is not
+ the case on console due to sharply limited character sets.
+
+ C'mon, you know you want to say Y!
+
config BFIN_JTAG_COMM
tristate "Blackfin JTAG Communication"
depends on BLACKFIN
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 3b8b745eb7cf..14d090576f09 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -123,7 +123,9 @@ struct n_tty_data {
unsigned int column;
unsigned int canon_column;
size_t echo_tail;
+#ifdef CONFIG_TTY_RUNES
unsigned int vt_state;
+#endif

struct mutex atomic_read_lock;
struct mutex output_lock;
@@ -395,6 +397,7 @@ static inline int is_continuation(unsigned char c, struct tty_struct *tty)
return I_IUTF8(tty) && is_utf8_continuation(c);
}

+#ifdef CONFIG_TTY_RUNES
/* process one OLCUC char (possibly partial unicode)
*
* We need to partially parse ANSI sequences to avoid uppercasing them;
@@ -475,6 +478,7 @@ static int do_olcuc_char(unsigned char c, struct tty_struct *tty, int space)
tty_put_char(tty, c);
return 1;
}
+#endif

/**
* do_output_char - output one character
@@ -546,8 +550,10 @@ static int do_output_char(unsigned char c, struct tty_struct *tty, int space)
ldata->column--;
break;
default:
+#ifdef CONFIG_TTY_RUNES
if (O_OLCUC(tty))
return do_olcuc_char(c, tty, space);
+#endif
if (!iscntrl(c) && !is_continuation(c, tty))
ldata->column++;
break;
@@ -650,8 +656,10 @@ static ssize_t process_output_block(struct tty_struct *tty,
ldata->column--;
break;
default:
+#ifdef CONFIG_TTY_RUNES
if (O_OLCUC(tty))
goto break_out;
+#endif
if (!iscntrl(c) && !is_continuation(c, tty))
ldata->column++;
break;
@@ -1975,7 +1983,9 @@ static int n_tty_open(struct tty_struct *tty)
ldata->num_overrun = 0;
ldata->no_room = 0;
ldata->lnext = 0;
+#ifdef CONFIG_TTY_RUNES
ldata->vt_state = ESnormal;
+#endif
tty->closing = 0;
/* indicate buffer work may resume */
clear_bit(TTY_LDISC_HALTED, &tty->flags);
--
2.11.0

2017-04-01 09:44:14

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 3/4] n_tty: support th, ae and ng runes

Especially 'th' is a prominent letter in Elder Futhark, far more widespread
than 't'. It has survived in English as 'þ' until a combination of disdain
from Latin-educated scribes and printing presses imported from Germany that
lacked this letter wiped it out.

Alas, we need to maintain a 1:1 relationship to keep alignment, thus you
need to write 'þ', 'æ' or 'ŋ' (or uppercase). Unless you're Icelandic,
it's easiest to use the Compose key.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/n_tty.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c36b9114f76b..3b8b745eb7cf 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -87,7 +87,7 @@
# define n_tty_trace(f, args...)
#endif

-enum { ESnormal, ESesc, EScsi, ESsetG };
+enum { ESnormal, ESesc, EScsi, ESsetG, ESxc3, ESxc5 };

struct n_tty_data {
/* producer-published */
@@ -404,7 +404,7 @@ static inline int is_continuation(unsigned char c, struct tty_struct *tty)
static int do_olcuc_char(unsigned char c, struct tty_struct *tty, int space)
{
/* 3 bytes per character */
- static const char *runes = "ᚨᛒᚳᛞᛖᚠᚷᚺᛁᛃᚲᛚᛗᚾᛟᛈᛩᚱᛊᛏᚢᚡᚹᛪᚣᛉ";
+ static const char *runes = "ᚨᛒᚳᛞᛖᚠᚷᚺᛁᛃᚲᛚᛗᚾᛟᛈᛩᚱᛊᛏᚢᚡᚹᛪᚣᛉᚦᛇᛜ";
struct n_tty_data *ldata = tty->disc_data;

switch (ldata->vt_state) {
@@ -419,6 +419,31 @@ static int do_olcuc_char(unsigned char c, struct tty_struct *tty, int space)
case ESsetG:
ldata->vt_state = ESnormal;
break;
+ case ESxc3:
+ if (c == 0xbe || c == 0x9e) { /* th */
+ c = 'A' + 26;
+ goto print_rune;
+ } else if (c == 0xa6 || c == 0x86) { /* ae */
+ c = 'A' + 27;
+ goto print_rune;
+ }
+ if (space < 2)
+ return -1;
+ tty_put_char(tty, 0xc3); /* no match, print the stolen prefix */
+ ldata->vt_state = ESnormal;
+ ldata->column++;
+ break;
+ case ESxc5:
+ if (c == 0x8b || c == 0x8a) { /* ng */
+ c = 'A' + 28;
+ goto print_rune;
+ }
+ if (space < 2)
+ return -1;
+ tty_put_char(tty, 0xc5); /* no match, print the stolen prefix */
+ ldata->vt_state = ESnormal;
+ ldata->column++;
+ break;
default:
if (c == '\e') {
ldata->vt_state = ESesc;
@@ -428,8 +453,10 @@ static int do_olcuc_char(unsigned char c, struct tty_struct *tty, int space)
c -= 32;
if (I_IUTF8(tty)) {
if (c >= 'A' && c <= 'Z') {
+print_rune:
if (space < 3)
return -1;
+ ldata->vt_state = ESnormal;
ldata->column++;
c -= 'A';
tty_put_char(tty, runes[3 * c + 0]);
@@ -437,6 +464,10 @@ static int do_olcuc_char(unsigned char c, struct tty_struct *tty, int space)
tty_put_char(tty, runes[3 * c + 2]);
return 1;
}
+ if (c == 0xc3)
+ return ldata->vt_state = ESxc3, 1;
+ if (c == 0xc5)
+ return ldata->vt_state = ESxc5, 1;
}
}
if (!iscntrl(c) && !is_continuation(c, tty))
--
2.11.0

2017-04-01 18:09:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] runes

On Sat, Apr 1, 2017 at 2:42 AM, Adam Borowski <[email protected]> wrote:
>
> It contains fixes for OLCUC handling, expanding its functionality to what I
> presume was your intention in implementing OLCUC in 1991 but couldn't be
> done then.

Yes, sadly, back then I was basically limited to Latin1, which is
obviously useless.

> With the fixes, to get the old-style OLCUC behaviour ASCII you can stty
> -iutf8; neither ANSI codes nor Unicode will be mangled anymore. But, that's
> quite useless -- I wonder what was your reason to implement OLCUC back in
> the day even though it was obsolete by decades even then, and was already
> dropped from relevant standards. Thus, I guess you want proper rune
> support, this pull request adds this. I've chosen Elder Futhark as the
> variant to implement.

The question that obviously springs to mind is why you didn't make
this the default state of a tty? As it is, it seems that people have
to do extra work (and know about this feature) to get the Great Runes
enabled.

That seems user-hostile and counter-productive?

Also, sadly, gmail seems hostile to this whole thing, and marked your
email as spam. Don't ask me why.

ᛚᛁᚾᚢᛊ