2007-06-01 14:19:03

by DervishD

[permalink] [raw]
Subject: Kernel utf-8 handling

Hi all :)

I have a do-it-yourself Linux box, and I'm planning to move to UTF8
(currently I'm using es_ES locale, with latin1 encoding). One of my main
concerns (apart from programs with little or no utf8 support, which I
will have to suffer) is kernel handling, because I only use the console;
I only use X and a terminal emulator if I can't avoid it.

This said, I know that the console will give me no problems
regarding character representantion (heck, I'm pretty sure that I will
be able to use even the same font I'm using right now in the console if
I get the proper unicode map), but probably will give me problems when
*entering* characters. I've read that the kernel handles accented chars,
and things like '?' (ntilde) because it assumes that any composed
character (composed using dead keys, for example) is in the latin1
range. While this is not a perfect behaviour, it will work for me.

Will the console work as it works now if I can live with latin1
accented characters only? Is there any terminal emulator *for the
console*, not for X, that handles utf8? Will I be sentenced to X to be
able to use my computer with utf8?

Don't take me bad, I really would love to spend 100MiB of RAM just
to run mutt under xterm, but for the time being I prefer the console to
work even if I must run X...

Thanks a lot! :))

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!


2007-06-01 14:29:09

by CaT

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

On Fri, Jun 01, 2007 at 04:20:58PM +0200, DervishD wrote:
> This said, I know that the console will give me no problems
> regarding character representantion (heck, I'm pretty sure that I will
> be able to use even the same font I'm using right now in the console if
> I get the proper unicode map), but probably will give me problems when
> *entering* characters. I've read that the kernel handles accented chars,
> and things like '?' (ntilde) because it assumes that any composed
> character (composed using dead keys, for example) is in the latin1
> range. While this is not a perfect behaviour, it will work for me.

Hmmm. I've not yet played with moving to utf-8 but would

man console_codes

be of help to you?


--
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby

2007-06-01 14:37:44

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi CaT :)

* CaT <[email protected]> dixit:
> On Fri, Jun 01, 2007 at 04:20:58PM +0200, DervishD wrote:
> > This said, I know that the console will give me no problems
> > regarding character representantion (heck, I'm pretty sure that I will
> > be able to use even the same font I'm using right now in the console if
> > I get the proper unicode map), but probably will give me problems when
> > *entering* characters. I've read that the kernel handles accented chars,
> > and things like '?' (ntilde) because it assumes that any composed
> > character (composed using dead keys, for example) is in the latin1
> > range. While this is not a perfect behaviour, it will work for me.
>
> Hmmm. I've not yet played with moving to utf-8 but would
>
> man console_codes
>
> be of help to you?

Not, because I already know the escape sequence to go back and forth
to utf-8 mode O:)) I'm going to carry some tests using Ubuntu, but since
that is a distro I'm not sure if I will be able to apply the same
solutions and mechanisms to my do-it-yourself box.

Thanks anyway for answering :)

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-01 16:24:53

by Éric Piel

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

06/01/2007 04:20 PM, DervishD wrote/a écrit:
> Hi all :)
Hi!

>
> Will the console work as it works now if I can live with latin1
> accented characters only?
Just tested here, it _seems_ to work right on the console with Spanish
and French accentuated characters.

> Is there any terminal emulator *for the
> console*, not for X, that handles utf8?
fbiterm, I never dared to try though...

See you,
Eric

2007-06-01 16:28:34

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

DervishD wrote:
> Hi all :)
>
> I have a do-it-yourself Linux box, and I'm planning to move to UTF8
> (currently I'm using es_ES locale, with latin1 encoding). One of my main
> concerns (apart from programs with little or no utf8 support, which I
> will have to suffer) is kernel handling, because I only use the console;
> I only use X and a terminal emulator if I can't avoid it.

The switch is possible. You could try the latest development LFS LiveCD
(http://ums.usu.ru/~patrakov/test/lfslivecd-x86-6.3-pre2-r1897.iso) and see
if it works for you (be sure to CC: me if you post any feedback). It will
automatically configure the console using a simple menu-driven interface.
Some interesting programs to try there: mutt, tin, lynx, finch.

> This said, I know that the console will give me no problems
> regarding character representantion (heck, I'm pretty sure that I will
> be able to use even the same font I'm using right now in the console if
> I get the proper unicode map), but probably will give me problems when
> *entering* characters. I've read that the kernel handles accented chars,
> and things like '?' (ntilde) because it assumes that any composed
> character (composed using dead keys, for example) is in the latin1
> range. While this is not a perfect behaviour, it will work for me.

Yes.

> Will the console work as it works now if I can live with latin1
> accented characters only? Is there any terminal emulator *for the
> console*, not for X, that handles utf8? Will I be sentenced to X to be
> able to use my computer with utf8?

screen, jfbterm, iterm (jfbterm and iterm are not on my CD, but you can
build them from source there - the filesystem on the CD can be written to).
Note that jfbterm and iterm expect X fonts (or unifont) to be available, and
the CD doesn't have these fonts (fully switched to Xft).

--
Alexander E. Patrakov

2007-06-01 16:41:33

by Alexander E. Patrakov

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

I wrote:

> The switch is possible. You could try the latest development LFS LiveCD
> (http://ums.usu.ru/~patrakov/test/lfslivecd-x86-6.3-pre2-r1897.iso) and
> see if it works for you (be sure to CC: me if you post any feedback).

I was wrong. The problem is that input works, output works, but
copying-and-pasting with gpm doesn't work (the stable version of the CD,
http://ums.usu.ru/~patrakov/x86/lfslivecd-x86-6.2-5.iso, contains a
rejected-upstream kernel patch that hides the issue). This means that for
all serious Unicode work, you must use X.

--
Alexander E. Patrakov

2007-06-01 19:22:25

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Kernel utf-8 handling


On Jun 1 2007 16:20, DervishD wrote:
>
> This said, I know that the console will give me no problems
>regarding character representantion (heck, I'm pretty sure that I will
>be able to use even the same font I'm using right now in the console if
>I get the proper unicode map), but probably will give me problems when
>*entering* characters. I've read that the kernel handles accented chars,
>and things like 'ñ' (ntilde) because it assumes that any composed
>character (composed using dead keys, for example) is in the latin1
>range. While this is not a perfect behaviour, it will work for me.

(1) I can do <Compose><~><n> just fine on vt
(2) I can do <ö> just fine on vt too
(3) And copy+paste them both using GPM too, again w/o probs

so not sure where your problem is. I do however have a patch that
you could try should a problem arise. I should repost and ask
around again and beat until it's in :)



Jan
--

2007-06-01 20:51:58

by Jan Engelhardt

[permalink] [raw]
Subject: [PATCH] Kernel utf-8 handling

Hello to all,


the following patch still lingers in my tree, can we have it merged?


This patch fixes dead keys and copy/paste of non-ASCII characters
in UTF-8 mode on Linux console. See more details about the
original patch at: http://chris.heathens.co.nz/linux/utf8.html

Already posted on
(Oldest) http://lkml.org/lkml/2003/5/31/148
http://lkml.org/lkml/2005/12/24/69
(Recent) http://lkml.org/lkml/2006/8/7/75

Signed-off-by: Jan Engelhardt <[email protected]>
Cc: Alexander E. Patrakov <[email protected]>

---
drivers/char/consolemap.c | 78 +++++++++++++++++++++++++++++++++++++++++----
drivers/char/keyboard.c | 26 ++++++++++-----
drivers/char/selection.c | 48 +++++++++++++++++++++++----
include/linux/consolemap.h | 5 ++
4 files changed, 134 insertions(+), 23 deletions(-)

Index: linux-2.6.22-rc3-git6/drivers/char/consolemap.c
===================================================================
--- linux-2.6.22-rc3-git6.orig/drivers/char/consolemap.c
+++ linux-2.6.22-rc3-git6/drivers/char/consolemap.c
@@ -177,6 +177,7 @@ struct uni_pagedir {
unsigned long refcount;
unsigned long sum;
unsigned char *inverse_translations[4];
+ u16 *inverse_trans_unicode;
int readonly;
};

@@ -207,6 +208,41 @@ static void set_inverse_transl(struct vc
}
}

+static void set_inverse_trans_unicode(struct vc_data *conp,
+ struct uni_pagedir *p)
+{
+ int i, j, k, glyph;
+ u16 **p1, *p2;
+ u16 *q;
+
+ if (!p) return;
+ q = p->inverse_trans_unicode;
+ if (!q) {
+ q = p->inverse_trans_unicode =
+ kmalloc(MAX_GLYPH * sizeof(u16), GFP_KERNEL);
+ if (!q)
+ return;
+ }
+ memset(q, 0, MAX_GLYPH * sizeof(u16));
+
+ for (i = 0; i < 32; i++) {
+ p1 = p->uni_pgdir[i];
+ if (!p1)
+ continue;
+ for (j = 0; j < 32; j++) {
+ p2 = p1[j];
+ if (!p2)
+ continue;
+ for (k = 0; k < 64; k++) {
+ glyph = p2[k];
+ if (glyph >= 0 && glyph < MAX_GLYPH
+ && q[glyph] < 32)
+ q[glyph] = (i << 11) + (j << 6) + k;
+ }
+ }
+ }
+}
+
unsigned short *set_translate(int m, struct vc_data *vc)
{
inv_translate[vc->vc_num] = m;
@@ -217,19 +253,29 @@ unsigned short *set_translate(int m, str
* Inverse translation is impossible for several reasons:
* 1. The font<->character maps are not 1-1.
* 2. The text may have been written while a different translation map
- * was active, or using Unicode.
+ * was active.
* Still, it is now possible to a certain extent to cut and paste non-ASCII.
*/
-unsigned char inverse_translate(struct vc_data *conp, int glyph)
+u16 inverse_translate(struct vc_data *conp, int glyph, int use_unicode)
{
struct uni_pagedir *p;
+ int m;
if (glyph < 0 || glyph >= MAX_GLYPH)
return 0;
- else if (!(p = (struct uni_pagedir *)*conp->vc_uni_pagedir_loc) ||
- !p->inverse_translations[inv_translate[conp->vc_num]])
+ else if (!(p = (struct uni_pagedir *)*conp->vc_uni_pagedir_loc))
return glyph;
- else
- return p->inverse_translations[inv_translate[conp->vc_num]][glyph];
+ else if (use_unicode) {
+ if (!p->inverse_trans_unicode)
+ return glyph;
+ else
+ return p->inverse_trans_unicode[glyph];
+ } else {
+ m = inv_translate[conp->vc_num];
+ if (!p->inverse_translations[m])
+ return glyph;
+ else
+ return p->inverse_translations[m][glyph];
+ }
}

static void update_user_maps(void)
@@ -243,6 +289,7 @@ static void update_user_maps(void)
p = (struct uni_pagedir *)*vc_cons[i].d->vc_uni_pagedir_loc;
if (p && p != q) {
set_inverse_transl(vc_cons[i].d, p, USER_MAP);
+ set_inverse_trans_unicode(vc_cons[i].d, p);
q = p;
}
}
@@ -353,6 +400,10 @@ static void con_release_unimap(struct un
kfree(p->inverse_translations[i]);
p->inverse_translations[i] = NULL;
}
+ if (p->inverse_trans_unicode) {
+ kfree(p->inverse_trans_unicode);
+ p->inverse_trans_unicode = NULL;
+ }
}

void con_free_unimap(struct vc_data *vc)
@@ -511,6 +562,7 @@ int con_set_unimap(struct vc_data *vc, u

for (i = 0; i <= 3; i++)
set_inverse_transl(vc, p, i); /* Update all inverse translations */
+ set_inverse_trans_unicode(vc, p);

return err;
}
@@ -561,6 +613,7 @@ int con_set_default_unimap(struct vc_dat

for (i = 0; i <= 3; i++)
set_inverse_transl(vc, p, i); /* Update all inverse translations */
+ set_inverse_trans_unicode(vc, p);
dflt = p;
return err;
}
@@ -617,6 +670,19 @@ void con_protect_unimap(struct vc_data *
p->readonly = rdonly;
}

+/* may be called during an interrupt */
+u32 conv_8bit_to_uni(unsigned char c)
+{
+ /*
+ * Always use USER_MAP. This function is used by the keyboard,
+ * which shouldn't be affected by G0/G1 switching, etc.
+ * If the user map still contains default values, i.e. the
+ * direct-to-font mapping, then assume user is using Latin1.
+ */
+ unsigned short uni = translations[USER_MAP][c];
+ return uni == (0xf000 | c) ? c : uni;
+}
+
int
conv_uni_to_pc(struct vc_data *conp, long ucs)
{
Index: linux-2.6.22-rc3-git6/drivers/char/keyboard.c
===================================================================
--- linux-2.6.22-rc3-git6.orig/drivers/char/keyboard.c
+++ linux-2.6.22-rc3-git6/drivers/char/keyboard.c
@@ -24,6 +24,7 @@
* 21-08-02: Converted to input API, major cleanup. (Vojtech Pavlik)
*/

+#include <linux/consolemap.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/tty.h>
@@ -308,10 +309,9 @@ static void applkey(struct vc_data *vc,
* Many other routines do put_queue, but I think either
* they produce ASCII, or they produce some user-assigned
* string, and in both cases we might assume that it is
- * in utf-8 already. UTF-8 is defined for words of up to 31 bits,
- * but we need only 16 bits here
+ * in utf-8 already.
*/
-static void to_utf8(struct vc_data *vc, ushort c)
+static void to_utf8(struct vc_data *vc, uint c)
{
if (c < 0x80)
/* 0******* */
@@ -320,11 +320,21 @@ static void to_utf8(struct vc_data *vc,
/* 110***** 10****** */
put_queue(vc, 0xc0 | (c >> 6));
put_queue(vc, 0x80 | (c & 0x3f));
- } else {
+ } else if (c < 0x10000) {
+ if (c >= 0xD800 && c < 0xE000)
+ return;
+ if (c == 0xFFFF)
+ return;
/* 1110**** 10****** 10****** */
put_queue(vc, 0xe0 | (c >> 12));
put_queue(vc, 0x80 | ((c >> 6) & 0x3f));
put_queue(vc, 0x80 | (c & 0x3f));
+ } else if (c < 0x110000) {
+ /* 11110*** 10****** 10****** 10****** */
+ put_queue(vc, 0xf0 | (c >> 18));
+ put_queue(vc, 0x80 | ((c >> 12) & 0x3f));
+ put_queue(vc, 0x80 | ((c >> 6) & 0x3f));
+ put_queue(vc, 0x80 | (c & 0x3f));
}
}

@@ -393,7 +403,7 @@ static unsigned int handle_diacr(struct
return d;

if (kbd->kbdmode == VC_UNICODE)
- to_utf8(vc, d);
+ to_utf8(vc, conv_8bit_to_uni(d));
else if (d < 0x100)
put_queue(vc, d);

@@ -407,7 +417,7 @@ static void fn_enter(struct vc_data *vc)
{
if (diacr) {
if (kbd->kbdmode == VC_UNICODE)
- to_utf8(vc, diacr);
+ to_utf8(vc, conv_8bit_to_uni(diacr));
else if (diacr < 0x100)
put_queue(vc, diacr);
diacr = 0;
@@ -617,7 +627,7 @@ static void k_unicode(struct vc_data *vc
return;
}
if (kbd->kbdmode == VC_UNICODE)
- to_utf8(vc, value);
+ to_utf8(vc, conv_8bit_to_uni(value));
else if (value < 0x100)
put_queue(vc, value);
}
@@ -775,7 +785,7 @@ static void k_shift(struct vc_data *vc,
/* kludge */
if (up_flag && shift_state != old_state && npadch != -1) {
if (kbd->kbdmode == VC_UNICODE)
- to_utf8(vc, npadch & 0xffff);
+ to_utf8(vc, npadch);
else
put_queue(vc, npadch & 0xff);
npadch = -1;
Index: linux-2.6.22-rc3-git6/drivers/char/selection.c
===================================================================
--- linux-2.6.22-rc3-git6.orig/drivers/char/selection.c
+++ linux-2.6.22-rc3-git6/drivers/char/selection.c
@@ -20,6 +20,7 @@

#include <asm/uaccess.h>

+#include <linux/kbd_kern.h>
#include <linux/vt_kern.h>
#include <linux/consolemap.h>
#include <linux/selection.h>
@@ -34,6 +35,7 @@ extern void poke_blanked_console(void);
/* Variables for selection control. */
/* Use a dynamic buffer, instead of static (Dec 1994) */
struct vc_data *sel_cons; /* must not be deallocated */
+static int use_unicode;
static volatile int sel_start = -1; /* cleared by clear_selection */
static int sel_end;
static int sel_buffer_lth;
@@ -54,10 +56,11 @@ static inline void highlight_pointer(con
complement_pos(sel_cons, where);
}

-static unsigned char
+static u16
sel_pos(int n)
{
- return inverse_translate(sel_cons, screen_glyph(sel_cons, n));
+ return inverse_translate(sel_cons, screen_glyph(sel_cons, n),
+ use_unicode);
}

/* remove the current selection highlight, if any,
@@ -86,8 +89,8 @@ static u32 inwordLut[8]={
0xFF7FFFFF /* latin-1 accented letters, not division sign */
};

-static inline int inword(const unsigned char c) {
- return ( inwordLut[c>>5] >> (c & 0x1F) ) & 1;
+static inline int inword(const u16 c) {
+ return c > 0xff || (( inwordLut[c>>5] >> (c & 0x1F) ) & 1);
}

/* set inwordLut contents. Invoked by ioctl(). */
@@ -108,13 +111,36 @@ static inline unsigned short limit(const
return (v > u) ? u : v;
}

+/* stores the char in UTF8 and returns the number of bytes used (1-3) */
+int store_utf8(u16 c, char *p)
+{
+ if (c < 0x80) {
+ /* 0******* */
+ p[0] = c;
+ return 1;
+ } else if (c < 0x800) {
+ /* 110***** 10****** */
+ p[0] = 0xc0 | (c >> 6);
+ p[1] = 0x80 | (c & 0x3f);
+ return 2;
+ } else {
+ /* 1110**** 10****** 10****** */
+ p[0] = 0xe0 | (c >> 12);
+ p[1] = 0x80 | ((c >> 6) & 0x3f);
+ p[2] = 0x80 | (c & 0x3f);
+ return 3;
+ }
+}
+
/* set the current selection. Invoked by ioctl() or by kernel code. */
int set_selection(const struct tiocl_selection __user *sel, struct tty_struct *tty)
{
struct vc_data *vc = vc_cons[fg_console].d;
int sel_mode, new_sel_start, new_sel_end, spc;
char *bp, *obp;
- int i, ps, pe;
+ int i, ps, pe, multiplier;
+ u16 c;
+ struct kbd_struct *kbd = kbd_table + fg_console;

poke_blanked_console();

@@ -158,6 +184,7 @@ int set_selection(const struct tiocl_sel
clear_selection();
sel_cons = vc_cons[fg_console].d;
}
+ use_unicode = kbd && kbd->kbdmode == VC_UNICODE;

switch (sel_mode)
{
@@ -240,7 +267,8 @@ int set_selection(const struct tiocl_sel
sel_end = new_sel_end;

/* Allocate a new buffer before freeing the old one ... */
- bp = kmalloc((sel_end-sel_start)/2+1, GFP_KERNEL);
+ multiplier = use_unicode ? 3 : 1; /* chars can take up to 3 bytes */
+ bp = kmalloc((sel_end-sel_start)/2*multiplier+1, GFP_KERNEL);
if (!bp) {
printk(KERN_WARNING "selection: kmalloc() failed\n");
clear_selection();
@@ -251,8 +279,12 @@ int set_selection(const struct tiocl_sel

obp = bp;
for (i = sel_start; i <= sel_end; i += 2) {
- *bp = sel_pos(i);
- if (!isspace(*bp++))
+ c = sel_pos(i);
+ if (use_unicode)
+ bp += store_utf8(c, bp);
+ else
+ *bp++ = c;
+ if (!isspace(c))
obp = bp;
if (! ((i + 2) % vc->vc_size_row)) {
/* strip trailing blanks from line and add newline,
Index: linux-2.6.22-rc3-git6/include/linux/consolemap.h
===================================================================
--- linux-2.6.22-rc3-git6.orig/include/linux/consolemap.h
+++ linux-2.6.22-rc3-git6/include/linux/consolemap.h
@@ -8,9 +8,12 @@
#define IBMPC_MAP 2
#define USER_MAP 3

+#include <linux/types.h>
+
struct vc_data;

-extern unsigned char inverse_translate(struct vc_data *conp, int glyph);
+extern u16 inverse_translate(struct vc_data *conp, int glyph, int use_unicode);
extern unsigned short *set_translate(int m, struct vc_data *vc);
extern int conv_uni_to_pc(struct vc_data *conp, long ucs);
+extern u32 conv_8bit_to_uni(unsigned char c);
void console_map_init(void);

2007-06-01 22:17:37

by Ken Moffat

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

On Fri, Jun 01, 2007 at 04:20:58PM +0200, DervishD wrote:
> Hi all :)
>
> I have a do-it-yourself Linux box, and I'm planning to move to UTF8
> (currently I'm using es_ES locale, with latin1 encoding). One of my main
> concerns (apart from programs with little or no utf8 support, which I
> will have to suffer) is kernel handling, because I only use the console;
> I only use X and a terminal emulator if I can't avoid it.
>
[...]
>
> Will the console work as it works now if I can live with latin1
> accented characters only? Is there any terminal emulator *for the
> console*, not for X, that handles utf8? Will I be sentenced to X to be
> able to use my computer with utf8?
>
Sure, the console will work (don't know about a console terminal
emulator). I'm not very keen on compose keys - I find dead
diacriticals (like in X) are usually easier to enter, and I've got
all the dead latin1 accents working on my uk keymap. Other
diacriticals for normally-latin1 keymaps are a different matter
(e.g. caron, ogonek, dot above) - they could be mapped for a
specific letter on a specific key (e.g. AltGr z for ż ; z with dot
above) but the diacritical modifiers can't be mapped for non latin1,
at least in kbd-1.12. You can also alter the keymap to allow you do
ISO 14755 input (ctrl+shift+hex_digits) - useful for occasional
characters, if they are in your font and you can remember their
value.

Ken
--
das eine Mal als Tragödie, das andere Mal als Farce

2007-06-01 22:24:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Jan Engelhardt wrote:
>
> (1) I can do <Compose><~><n> just fine on vt
> (2) I can do <ö> just fine on vt too
> (3) And copy+paste them both using GPM too, again w/o probs
>

Both of those are in the 0-255 range, though. I thought the issue was
with characters 256+, like č. At least on my FC6 system that doesn't
work with gpm.

-hpa

2007-06-02 07:34:17

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi ?ric :)

* ?ric Piel <[email protected]> dixit:
> 06/01/2007 04:20 PM, DervishD wrote/a ?crit:
> > Will the console work as it works now if I can live with latin1
> >accented characters only?
> Just tested here, it _seems_ to work right on the console with Spanish
> and French accentuated characters.

Enough for me, thanks!

> >Is there any terminal emulator *for the
> >console*, not for X, that handles utf8?
> fbiterm, I never dared to try though...

I'll give it a try ;), thanks a lot :)

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-02 07:41:38

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi Alexander :)

* Alexander E. Patrakov <[email protected]> dixit:
> > I have a do-it-yourself Linux box, and I'm planning to move to UTF8
> >(currently I'm using es_ES locale, with latin1 encoding). One of my main
> >concerns (apart from programs with little or no utf8 support, which I
> >will have to suffer) is kernel handling, because I only use the console;
> >I only use X and a terminal emulator if I can't avoid it.
>
> The switch is possible.

The switch to utf-8 is possible, I know, but I wasn't so sure about
the switch to utf-8 *in the console only*.

> You could try the latest development LFS LiveCD
> (http://ums.usu.ru/~patrakov/test/lfslivecd-x86-6.3-pre2-r1897.iso)
> and see if it works for you (be sure to CC: me if you post any
> feedback). It will automatically configure the console using a simple
> menu-driven interface. Some interesting programs to try there: mutt,
> tin, lynx, finch.

I was going to take a look at Linux From Scratch (and BLFS, too) for
help, but before trying anything there I wanted to be sure that the
kernel will work. Otherwise I will put my effort in building a minimal X
setup with a suitable terminal emulator.

Anyway, if everything works in the console in the LFS LiveCD, I will
be sure that I can make the change and then I'll look directly in the
LFS book for further instructions.

Thanks a lot! :)

> > Will the console work as it works now if I can live with latin1
> >accented characters only? Is there any terminal emulator *for the
> >console*, not for X, that handles utf8? Will I be sentenced to X to be
> >able to use my computer with utf8?
>
> screen,

GNU screen? I'll give it a try.

> jfbterm, iterm (jfbterm and iterm are not on my CD, but you can
> build them from source there - the filesystem on the CD can be written to).

There are more terminal emulators for console than I supposed.

> Note that jfbterm and iterm expect X fonts (or unifont) to be available,
> and the CD doesn't have these fonts (fully switched to Xft).

Well, I'll deal with that issue if I like them more than screen or
plain console.

Thanks again :)

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-02 07:43:35

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi Alexander :)

* Alexander E. Patrakov <[email protected]> dixit:
> >The switch is possible. You could try the latest development LFS LiveCD
> >(http://ums.usu.ru/~patrakov/test/lfslivecd-x86-6.3-pre2-r1897.iso) and
> >see if it works for you (be sure to CC: me if you post any feedback).
>
> I was wrong. The problem is that input works, output works, but
> copying-and-pasting with gpm doesn't work (the stable version of the CD,
> http://ums.usu.ru/~patrakov/x86/lfslivecd-x86-6.2-5.iso, contains a
> rejected-upstream kernel patch that hides the issue). This means that for
> all serious Unicode work, you must use X.

Not a problem for me because I don't use GPM. I hate mice, and while
under X I'm forced (more or less) to use them, I don't use a mouse on
the console. I can blind type, so I prefer to keep my hands on the
keyboard. I work much more fast that way, and I haven't needed to copy
and paste between consoles ever.

Thanks anyway for the warning ;)

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-02 07:49:29

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi Ken :)

* Ken Moffat <[email protected]> dixit:
> On Fri, Jun 01, 2007 at 04:20:58PM +0200, DervishD wrote:
> > Will the console work as it works now if I can live with latin1
> > accented characters only? Is there any terminal emulator *for the
> > console*, not for X, that handles utf8? Will I be sentenced to X to be
> > able to use my computer with utf8?
> >
> Sure, the console will work (don't know about a console terminal
> emulator). I'm not very keen on compose keys - I find dead
> diacriticals (like in X) are usually easier to enter, and I've got
> all the dead latin1 accents working on my uk keymap.

I only use compose keys for a bunch of characters like ? (copyright
mark) for example. For any other I use dead diacriticals (for example,
to write the accented chars for my mother tongue).

> You can also alter the keymap to allow you do ISO 14755 input
> (ctrl+shift+hex_digits) - useful for occasional characters, if they
> are in your font and you can remember their value.

I have my keymap modified for that, and I use it as a last resort.
If I find myself using AltGr+hex (that's the key combo I use), I usually
put that character in a compose.

Thanks for the info, I'm very happy to hear that it works :)))

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-02 07:52:16

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi Jan :)

* Jan Engelhardt <[email protected]> dixit:
> On Jun 1 2007 16:20, DervishD wrote:
> >
> > This said, I know that the console will give me no problems
> >regarding character representantion (heck, I'm pretty sure that I will
> >be able to use even the same font I'm using right now in the console if
> >I get the proper unicode map), but probably will give me problems when
> >*entering* characters. I've read that the kernel handles accented chars,
> >and things like '?' (ntilde) because it assumes that any composed
> >character (composed using dead keys, for example) is in the latin1
> >range. While this is not a perfect behaviour, it will work for me.
>
> (1) I can do <Compose><~><n> just fine on vt
> (2) I can do <?> just fine on vt too
> (3) And copy+paste them both using GPM too, again w/o probs

Cool! :)))

> so not sure where your problem is.

Right now my only problem was lack of reliable information, but that
has been fully solved by all people who answered, so my next step is to
upgrade kbd and start making tests :))

> I do however have a patch that you could try should a problem arise. I
> should repost and ask around again and beat until it's in :)

Any patch regarding a better support of utf-8 is very welcome, and I
can help you beat anybody if that will make the patch in XD

Thanks a lot for the patch (I'll try it if I face problems) and for
your answer.

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-02 07:57:35

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi H. Peter :)

* H. Peter Anvin <[email protected]> dixit:
> Jan Engelhardt wrote:
> > (1) I can do <Compose><~><n> just fine on vt
> > (2) I can do <?> just fine on vt too
> > (3) And copy+paste them both using GPM too, again w/o probs
>
> Both of those are in the 0-255 range, though. I thought the issue was
> with characters 256+, like ??. At least on my FC6 system that doesn't
> work with gpm.

By now, to fully switch to utf-8, I only need to have ?????? and
their uppercase counterparts (for people who cannot see the chars, they
are the acute accented vowels and the ntilde). I really hope that, in
the future, there are a way of fully use Unicode (and I mean the entire
range) using framebuffer in console (there probably is an emulator for
that job already, I have to check), but for the time being that's
enough.

I really don't mind if I have to use an userspace program to have
unicode support on the console (the kernel shouldn't mess with
encodings, or have utf-8 only; I don't know the current status, that's
why I asked), but if the kernel does directly the job in the console
driver, that's not bad either.

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!

2007-06-02 10:53:04

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Kernel utf-8 handling


On Jun 2 2007 09:45, DervishD wrote:
>
> Not a problem for me because I don't use GPM. I hate mice, and while
>under X I'm forced (more or less) to use them,

Use the ratpoison WM then.



Jan
--

2007-06-02 10:53:25

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Kernel utf-8 handling


On Jun 2 2007 09:58, DervishD wrote:
>
> * H. Peter Anvin <[email protected]> dixit:
>> Jan Engelhardt wrote:
>> > (1) I can do <Compose><~><n> just fine on vt
>> > (2) I can do <ö> just fine on vt too
>> > (3) And copy+paste them both using GPM too, again w/o probs
>>
>> Both of those are in the 0-255 range, though. I thought the issue was
>> with characters 256+, like ??. At least on my FC6 system that doesn't
>> work with gpm.

I've got a cp437 "DOS" font loaded[1], and giving a glance at
/usr/share/kbd/unimaps/cp437.uni tells me right away:

0x01 U+263A

So I create U+263A in joe (ESC ' x 2 6 3 a), it displays fine (the
smiley character), and I can copy/paste it using GPM without
problems. Though, I have that patch[2] in my kernel. Can you check
if it works if you apply it?


[1] https://dev.computergmbh.de/svn/hxtools/trunk/kbd/ahnv.fnt
[2] http://lkml.org/lkml/2007/6/1/339
(I also have it for 2.6.18 and 2.6.20 if you need)



Jan
--

2007-06-02 19:52:21

by DervishD

[permalink] [raw]
Subject: Re: Kernel utf-8 handling

Hi Jan :)

* Jan Engelhardt <[email protected]> dixit:
> On Jun 2 2007 09:58, DervishD wrote:
> >
> > * H. Peter Anvin <[email protected]> dixit:
> >> Jan Engelhardt wrote:
> >> > (1) I can do <Compose><~><n> just fine on vt
> >> > (2) I can do <?> just fine on vt too
> >> > (3) And copy+paste them both using GPM too, again w/o probs
> >>
> >> Both of those are in the 0-255 range, though. I thought the issue was
> >> with characters 256+, like ??. At least on my FC6 system that doesn't
> >> work with gpm.
>
> I've got a cp437 "DOS" font loaded[1], and giving a glance at
> /usr/share/kbd/unimaps/cp437.uni tells me right away:
>
> 0x01 U+263A
>
> So I create U+263A in joe (ESC ' x 2 6 3 a), it displays fine (the
> smiley character), and I can copy/paste it using GPM without
> problems. Though, I have that patch[2] in my kernel. Can you check
> if it works if you apply it?

I cannot reboot my machine right now, and I don't have GPM
installed, but if you give me a few days I think I'll be able to test.
I have to upgrade the kernel too (I'm using 2.6.19.5), so make it a week
O:)) I'll test ASAP.

And thanks for the patch :))

Ra?l N??ez de Arenas Coronado

--
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!