2008-03-23 15:21:30

by John T.

[permalink] [raw]
Subject: UTF-8 and Alt key in the console

Hello,

It is understood that although the Meta-key sequences
work in an xterm with vim on UTF-8, they don't on the
linux console.

That's because vim and xterm have an understanding about
how to function in UTF-8 regarding the Meta key. Xterm
translates the would-be ISO-8859 high-bit-char to its
UTF-8 representation, and vim catches that. This is the
way to move the traditional 8th-bit Meta convention from
single-byte encodings to UTF-8.

The linux console could function that way too, so that the
Meta-key would be recognized for those not willing to make
Meta send an ESC prefix; this behavior can be toggled with
the setmetamode command. Seems it's quite a simple code
snippet.

I'd like to know whether it would be an accepted change.

Regards,
--
John


____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs


2008-03-23 15:29:20

by Jan Engelhardt

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


On Sunday 2008-03-23 16:15, John T. wrote:
>
> It is understood that although the Meta-key sequences
> work in an xterm with vim on UTF-8, they don't on the
> linux console.

They also seem work on the console; I can use Alt-L in
mcedit to jump to a line.

2008-03-23 15:46:37

by John T.

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


--- Jan Engelhardt <[email protected]> wrote:

>
> On Sunday 2008-03-23 16:15, John T. wrote:
> >
> > It is understood that although the Meta-key sequences
> > work in an xterm with vim on UTF-8, they don't on the
> > linux console.
>
> They also seem work on the console; I can use Alt-L in
> mcedit to jump to a line.
>

That's because you are working in "meta sends ESC" mode.
Although this is OK for most applications, for some it isn't.
Thus there have always been two modes, "meta sends ESC"
and "meta sets 8th bit". (toggled with setmetamode on the
console)

Vim relies on "meta sets 8th bit". Unfortunatelly the code
for this options does not work in UTF-8 in the console. What
I'd like to do is make this a viable option in UTF-8.


____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

2008-03-23 16:54:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

John T. wrote:
>
> That's because you are working in "meta sends ESC" mode.
> Although this is OK for most applications, for some it isn't.
> Thus there have always been two modes, "meta sends ESC"
> and "meta sets 8th bit". (toggled with setmetamode on the
> console)
>
> Vim relies on "meta sets 8th bit". Unfortunatelly the code
> for this options does not work in UTF-8 in the console. What
> I'd like to do is make this a viable option in UTF-8.
>

No, fix vim instead.

"Meta sets 8th bit" is so obviously and totally broken, since it maps
onto real characters, and has been doing so for at least 20 years.
Meta-L maps onto LATIN CAPITAL LETTER I WITH GRAVE, both in 8-bit mode
and in your proposed UTF-8 mode. It just becomes even more obvious how
unbelievably broken it is when you try to map it onto UTF-8.

Seriously, fix the crap.

-hpa

2008-03-23 17:47:54

by John T.

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


--- "H. Peter Anvin" <[email protected]> wrote:

> John T. wrote:
> >
> > That's because you are working in "meta sends ESC" mode.
> > Although this is OK for most applications, for some it isn't.
> > Thus there have always been two modes, "meta sends ESC"
> > and "meta sets 8th bit". (toggled with setmetamode on the
> > console)
> >
> > Vim relies on "meta sets 8th bit". Unfortunatelly the code
> > for this options does not work in UTF-8 in the console. What
> > I'd like to do is make this a viable option in UTF-8.
> >
>
> No, fix vim instead.
>
> "Meta sets 8th bit" is so obviously and totally broken, since it maps
> onto real characters, and has been doing so for at least 20 years.
> Meta-L maps onto LATIN CAPITAL LETTER I WITH GRAVE, both in 8-bit mode
> and in your proposed UTF-8 mode. It just becomes even more obvious how
> unbelievably broken it is when you try to map it onto UTF-8.
>
> Seriously, fix the crap.
>
> -hpa
>

OK, let's see if I can answer this.

Vi has 32 years of ESC key use tradition which doesn't play
well with "meta sends ESC".

Even though "meta sets 8th bit" is "broken" in your point-of-view,
that didn't stop it from being used all these years. The fact
that it maps into real characters is not a problem if you can just
use a CTRL-V equivalent in bash or vim.

Furthermore, it is an _option_. No one is obliged to use it.
So it's a question of:

.. _forcing_ the end of "meta sets 8th bit"
.. leaving things the way they are, and have them keep working,
as xterm did.

So guess we should fix xterm too?

I think you're exagerating.


____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

2008-03-23 17:55:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

John T. wrote:
>
> OK, let's see if I can answer this.
>
> Vi has 32 years of ESC key use tradition which doesn't play
> well with "meta sends ESC".
>
> Even though "meta sets 8th bit" is "broken" in your point-of-view,
> that didn't stop it from being used all these years. The fact
> that it maps into real characters is not a problem if you can just
> use a CTRL-V equivalent in bash or vim.
>
> Furthermore, it is an _option_. No one is obliged to use it.
> So it's a question of:
>
> .. _forcing_ the end of "meta sets 8th bit"
> .. leaving things the way they are, and have them keep working,
> as xterm did.
>
> So guess we should fix xterm too?
>
> I think you're exagerating.
>

Hardly. vim clearly can deal with the ESC-is-prefix issue anyway, since
otherwise it wouldn't be able to use arrow keys.

That being said, quite frankly, *both* Meta key conventions are
incredibly broken.

What I would much prefer is to see would be a brand new convention where
different keys (Ctrl, Meta, Super, Hyper, Alt or even in some cases
Shift) issues a unique prefix which doesn't conflict with anything else.
Emacs has tried to promote such a convention of the format
<CAN> @ <bucky> <keystroke> which is a lot better, although it's a bit
Emacs-centric (using <CAN> / ^X as the initial character is not really a
very good choice.)

The best probably would be to introduce an escape code, along the lines
of other escape codes in the terminal interfae.

-hpa

2008-03-23 18:14:09

by John T.

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


--- "H. Peter Anvin" <[email protected]> wrote:

> John T. wrote:
> >
> > OK, let's see if I can answer this.
> >
> > Vi has 32 years of ESC key use tradition which doesn't play
> > well with "meta sends ESC".
> >
> > Even though "meta sets 8th bit" is "broken" in your point-of-view,
> > that didn't stop it from being used all these years. The fact
> > that it maps into real characters is not a problem if you can just
> > use a CTRL-V equivalent in bash or vim.
> >
> > Furthermore, it is an _option_. No one is obliged to use it.
> > So it's a question of:
> >
> > .. _forcing_ the end of "meta sets 8th bit"
> > .. leaving things the way they are, and have them keep working,
> > as xterm did.
> >
> > So guess we should fix xterm too?
> >
> > I think you're exagerating.
> >
>
> Hardly. vim clearly can deal with the ESC-is-prefix issue anyway, since
> otherwise it wouldn't be able to use arrow keys.

There's always the "timeout" hack. It is allright with the
arrow and function keys because the second character in these
cases (`[' usually) is not a commonly typed vim command.

> That being said, quite frankly, *both* Meta key conventions are
> incredibly broken.

Indeed, I agree with you here.

> What I would much prefer is to see would be a brand new convention where
> different keys (Ctrl, Meta, Super, Hyper, Alt or even in some cases
> Shift) issues a unique prefix which doesn't conflict with anything else.
> Emacs has tried to promote such a convention of the format
> <CAN> @ <bucky> <keystroke> which is a lot better, although it's a bit
> Emacs-centric (using <CAN> / ^X as the initial character is not really a
> very good choice.)
>
> The best probably would be to introduce an escape code, along the lines
> of other escape codes in the terminal interfae.

You're right.

Many say Unix is also broken compared to Plan 9.. sometimes it's
too late. The real fix for this issue seems like it'd be very
hard to accomplish. In the meantime, maybe we could do this easy
fix. Or not. But we have a situation.

> -hpa
>



____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

2008-03-23 18:46:40

by Jan Engelhardt

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


On Sunday 2008-03-23 19:13, John T. wrote:
>> Hardly. vim clearly can deal with the ESC-is-prefix issue anyway, since
>> otherwise it wouldn't be able to use arrow keys.
>
> There's always the "timeout" hack. It is allright with the
> arrow and function keys because the second character in these
> cases (`[' usually) is not a commonly typed vim command.
>[...]
>> The best probably would be to introduce an escape code, along the lines
>> of other escape codes in the terminal interfae.
>
> You're right.
>
> Many say Unix is also broken compared to Plan 9.. sometimes it's
> too late. The real fix for this issue seems like it'd be very
> hard to accomplish.

The idea of revamping the escape codes is not all that bad.

Thanks to terminfo, this should be easy. Change vt.c,
add corresponding terminfo entry and set TERM to something
that has not previously existed.

About the ESC key, I thought, would it suffice to replace its
current output of ^[ with ^[^[?

2008-03-28 23:27:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

Jan Engelhardt wrote:
>>> The best probably would be to introduce an escape code, along the lines
>>> of other escape codes in the terminal interfae.
>>
>> You're right.
>>
>> Many say Unix is also broken compared to Plan 9.. sometimes it's
>> too late. The real fix for this issue seems like it'd be very
>> hard to accomplish.
>
> The idea of revamping the escape codes is not all that bad.
>
> Thanks to terminfo, this should be easy. Change vt.c,
> add corresponding terminfo entry and set TERM to something
> that has not previously existed.
>
> About the ESC key, I thought, would it suffice to replace its
> current output of ^[ with ^[^[?

It would be better to assign a CSI (ESC [) code to it, like other
function keys. Unfortunately, the terminal everyone tries to emulate
(Linux does so quite poorly due to its broken implementation of ISO
2022, but that's less of an issue with UTF-8), VT 220, had ESC on the
F11 key, so the CSI 2 3 ~ sequence it uses we use for the F11 key.
Doesn't mean we can't assign another one.

One would also like to distinguish, say, Backspace from Ctrl-H. This is
trickier, because the termios settings don't permit compound keys. The
most obvious way to deal with that is an escape code for Ctrl-H, but
that has the risk of breaking a lot of other things.

-hpa

2008-03-29 00:07:45

by Jan Engelhardt

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


On Saturday 2008-03-29 00:26, H. Peter Anvin wrote:
>>
>> About the ESC key, I thought, would it suffice to replace its
>> current output of ^[ with ^[^[?
>
> It would be better to assign a CSI (ESC [) code to it, like other function
> keys. Unfortunately, the terminal everyone tries to emulate (Linux does so
> quite poorly due to its broken implementation of ISO 2022, but that's less of
> an issue with UTF-8), VT 220, had ESC on the F11 key, so the CSI 2 3 ~
> sequence it uses we use for the F11 key. Doesn't mean we can't assign another
> one.

Even so, the linux term is the least broken one of all. I often had
issues with remote login programs (largely Windows ones) that had a
different idea of VTxxx whenever you wished not to have it. Despite
TERM being vt100 and the local encoding being vt100 too, actual
escape sequences were different from what programs in the shell
expected. On one occassion, F keys worked, but the Ins/Home does not,
in another it was reversed, etc. As soon as I learnt of putty a
few years ago I was happy to have all the mess that windows ssh
programs cause solved because it implemented the "linux" term type
and that just seemed to work out-of-the-box. So it does not seem
as broken to me as VTxxx.

> One would also like to distinguish, say, Backspace from Ctrl-H. This is
> trickier, because the termios settings don't permit compound keys. The most
> obvious way to deal with that is an escape code for Ctrl-H, but that has the
> risk of breaking a lot of other things.

Like what? I know that ^H is abused for screen effects.. not much
you can do about it, but it is not that important anyway.

As for ^H, all that I think is needed is the generation of an
appropriate escape code for Ctrl-H and Backspace at the terminal
emulator level (read: a pure xterm thing what key gets translated
into what escape code), while the read side then interprets
"ESC CTRLH", "ESC BKSP" and the traditional "^H".

And while we are at it, I'd suggest a whole new set of escape
codes, the current sequences are particularly... bad for
stream synchronization. Right now one has to parse strings for
end-of-escape.. which is awkward. I'd just be able to
strchr(s, '^]') for example and know when the escape code
ends. (Compat should of course be honored where necessary.)

2008-03-29 00:24:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

Jan Engelhardt wrote:
> And while we are at it, I'd suggest a whole new set of escape
> codes, the current sequences are particularly... bad for
> stream synchronization. Right now one has to parse strings for
> end-of-escape.. which is awkward. I'd just be able to
> strchr(s, '^]') for example and know when the escape code
> ends. (Compat should of course be honored where necessary.)

I think it would be a major lose to move away from ISO 6429 format; the
format is self-terminating and really isn't all that complex.

-hpa

2008-03-29 00:44:47

by Jan Engelhardt

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console


On Saturday 2008-03-29 01:23, H. Peter Anvin wrote:
>> And while we are at it, I'd suggest a whole new set of escape
>> codes, the current sequences are particularly... bad for
>> stream synchronization. Right now one has to parse strings for
>> end-of-escape.. which is awkward. I'd just be able to
>> strchr(s, '^]') for example and know when the escape code
>> ends. (Compat should of course be honored where necessary.)
>
> I think it would be a major lose to move away from ISO 6429 format; the format
> is self-terminating and really isn't all that complex.

What do you mean by self-terminating? There is no easy
synchronization like in UTF-8, given you are anywhere inside
a text stream, how do you know (a) you are already in an
escape sequence and (b) how to figure out the rebegin of
normal text.

2008-03-29 01:07:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

Jan Engelhardt wrote:
>>
>> I think it would be a major lose to move away from ISO 6429 format;
>> the format is self-terminating and really isn't all that complex.
>
> What do you mean by self-terminating? There is no easy
> synchronization like in UTF-8, given you are anywhere inside
> a text stream, how do you know (a) you are already in an
> escape sequence and (b) how to figure out the rebegin of
> normal text.

(a) isn't readily supported (other than scanning backwards), but (b) is
pretty easy, see ISO 6429.

-hpa

2008-03-29 06:33:20

by David Newall

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

Jan Engelhardt wrote:
> What do you mean by self-terminating? There is no easy
> synchronization like in UTF-8, given you are anywhere inside
> a text stream, how do you know (a) you are already in an
> escape sequence and (b) how to figure out the rebegin of
> normal text.

It's not very useful being able to tell you are inside a escape sequence
unless you see that sequence from the start. You do need the complete
sequence to make sense of it.

2008-03-29 17:05:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: UTF-8 and Alt key in the console

David Newall wrote:
> Jan Engelhardt wrote:
>> What do you mean by self-terminating? There is no easy
>> synchronization like in UTF-8, given you are anywhere inside
>> a text stream, how do you know (a) you are already in an
>> escape sequence and (b) how to figure out the rebegin of
>> normal text.
>
> It's not very useful being able to tell you are inside a escape sequence
> unless you see that sequence from the start. You do need the complete
> sequence to make sense of it.

I think what Jan is alluding to is the property of UTF-8 text that you
can start in the middle of a string and either skip an incomplete
character or find the beginning of it. If you can search backwards, you
can find the beginning of an escape sequence, too; the "skip incomplete"
functionality is missing, though, but as you say, isn't actually all
that useful in real life *for the applications which use these kinds of
escape sequences.*

-hpa