2018-06-17 19:10:01

by Nicolas Pitre

[permalink] [raw]
Subject: [PATCH v2 0/4] have the vt console preserve unicode characters

The vt code translates UTF-8 strings into glyph index values and stores
those glyph values directly in the screen buffer. Because there can only
be at most 512 glyphs, it is impossible to represent most unicode
characters, in which case a default glyph (often '?') is displayed
instead. The original unicode value is then lost.

This also means that the /dev/vcs* devices only provide user space with
glyph index values, and then user applications must get hold of the
unicode-to-glyph table the kernel is using in order to back-translate
those into actual characters. It is not possible to get back the original
unicode value when multiple unicode characters map to the same glyph,
especially for the vast majority that maps to the default replacement
glyph.

The 512-glyph limitation is inherent to VGA displays, but users of
/dev/vcs* shouldn't have to be restricted to a narrow unicode space from
lossy screen content because of that. This is especially true for
accessibility applications such as BRLTTY that rely on /dev/vcs to rander
screen content onto braille terminals.

This patch series introduces unicode support to /dev/vcs* devices,
allowing full unicode access from userspace to the vt console which
can, amongst other purposes, appropriately translate actual unicode
screen content into braille. Memory is allocated, and possible CPU
overhead introduced, only if /dev/vcsu is read at least once.

I'm a prime user of this feature, as well as the BRLTTY maintainer Dave Mielke
who implemented support for this in BRLTTY. There is therefore a vested
interest in maintaining this feature as necessary. And this received
extensive testing as well at this point.

Patch #4 was used for validation and is included for completeness, however
if people think it is unappropriate for mainline then it can be dropped.

This is also available on top of v4.18-rc1 here:

git://git.linaro.org/people/nicolas.pitre/linux vt-unicode

Changes from v1:

- Rebased to v4.18-rc1.
- Dropped first patch (now in mainline as commit 4b4ecd9cb8).
- Removed a printk instance from an error path easily triggerable
from user space.
- Minor cleanup.

Diffstat:

drivers/tty/vt/vc_screen.c | 90 +++++++--
drivers/tty/vt/vt.c | 347 +++++++++++++++++++++++++++++++++--
include/linux/console_struct.h | 2 +
include/linux/selection.h | 5 +
4 files changed, 419 insertions(+), 25 deletions(-)



2018-06-17 19:08:09

by Nicolas Pitre

[permalink] [raw]
Subject: [PATCH v2 2/4] vt: introduce unicode mode for /dev/vcs

Now that the core vt code knows how to preserve unicode values for each
displayed character, it is then possible to let user space access it via
/dev/vcs*.

Unicode characters are presented as 32 bit values in native endianity
via the /dev/vcsu* devices, mimicking the simple /dev/vcs* devices.
Unicode with attributes (similarly to /dev/vcsa*) is not supported at
the moment.

Data is available only as long as the console is in UTF-8 mode. ENODATA
is returned otherwise.

This was tested with the latest development version (to become
version 5.7) of BRLTTY. Amongst other things, this allows ⠋⠕⠗ ⠞⠓⠊⠎
⠃⠗⠁⠊⠇⠇⠑⠀⠞⠑⠭⠞⠀to appear directly on braille displays regardless of the
console font being used.

Signed-off-by: Nicolas Pitre <[email protected]>
Tested-by: Dave Mielke <[email protected]>
---
drivers/tty/vt/vc_screen.c | 89 ++++++++++++++++++++++++++++++++------
drivers/tty/vt/vt.c | 61 ++++++++++++++++++++++++++
include/linux/selection.h | 5 +++
3 files changed, 141 insertions(+), 14 deletions(-)

diff --git a/drivers/tty/vt/vc_screen.c b/drivers/tty/vt/vc_screen.c
index e4a66e1fd0..9c44252e52 100644
--- a/drivers/tty/vt/vc_screen.c
+++ b/drivers/tty/vt/vc_screen.c
@@ -10,6 +10,12 @@
* Attribute/character pair is in native endianity.
* [minor: N+128]
*
+ * /dev/vcsuN: similar to /dev/vcsaN but using 4-byte unicode values
+ * instead of 1-byte screen glyph values.
+ * [minor: N+64]
+ *
+ * /dev/vcsuaN: same idea as /dev/vcsaN for unicode (not yet implemented).
+ *
* This replaces screendump and part of selection, so that the system
* administrator can control access using file system permissions.
*
@@ -51,6 +57,26 @@

#define CON_BUF_SIZE (CONFIG_BASE_SMALL ? 256 : PAGE_SIZE)

+/*
+ * Our minor space:
+ *
+ * 0 ... 63 glyph mode without attributes
+ * 64 ... 127 unicode mode without attributes
+ * 128 ... 191 glyph mode with attributes
+ * 192 ... 255 unused (reserved for unicode with attributes)
+ *
+ * This relies on MAX_NR_CONSOLES being <= 63, meaning 63 actual consoles
+ * with minors 0, 64, 128 and 192 being proxies for the foreground console.
+ */
+#if MAX_NR_CONSOLES > 63
+#warning "/dev/vcs* devices may not accommodate more than 63 consoles"
+#endif
+
+#define console(inode) (iminor(inode) & 63)
+#define use_unicode(inode) (iminor(inode) & 64)
+#define use_attributes(inode) (iminor(inode) & 128)
+
+
struct vcs_poll_data {
struct notifier_block notifier;
unsigned int cons_num;
@@ -102,7 +128,7 @@ vcs_poll_data_get(struct file *file)
poll = kzalloc(sizeof(*poll), GFP_KERNEL);
if (!poll)
return NULL;
- poll->cons_num = iminor(file_inode(file)) & 127;
+ poll->cons_num = console(file_inode(file));
init_waitqueue_head(&poll->waitq);
poll->notifier.notifier_call = vcs_notifier;
if (register_vt_notifier(&poll->notifier) != 0) {
@@ -140,7 +166,7 @@ vcs_poll_data_get(struct file *file)
static struct vc_data*
vcs_vc(struct inode *inode, int *viewed)
{
- unsigned int currcons = iminor(inode) & 127;
+ unsigned int currcons = console(inode);

WARN_CONSOLE_UNLOCKED();

@@ -164,7 +190,6 @@ static int
vcs_size(struct inode *inode)
{
int size;
- int minor = iminor(inode);
struct vc_data *vc;

WARN_CONSOLE_UNLOCKED();
@@ -175,8 +200,12 @@ vcs_size(struct inode *inode)

size = vc->vc_rows * vc->vc_cols;

- if (minor & 128)
+ if (use_attributes(inode)) {
+ if (use_unicode(inode))
+ return -EOPNOTSUPP;
size = 2*size + HEADER_SIZE;
+ } else if (use_unicode(inode))
+ size *= 4;
return size;
}

@@ -197,12 +226,10 @@ static ssize_t
vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
{
struct inode *inode = file_inode(file);
- unsigned int currcons = iminor(inode);
struct vc_data *vc;
struct vcs_poll_data *poll;
- long pos;
- long attr, read;
- int col, maxcol, viewed;
+ long pos, read;
+ int attr, uni_mode, row, col, maxcol, viewed;
unsigned short *org = NULL;
ssize_t ret;
char *con_buf;
@@ -218,7 +245,8 @@ vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
*/
console_lock();

- attr = (currcons & 128);
+ uni_mode = use_unicode(inode);
+ attr = use_attributes(inode);
ret = -ENXIO;
vc = vcs_vc(inode, &viewed);
if (!vc)
@@ -227,6 +255,10 @@ vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
ret = -EINVAL;
if (pos < 0)
goto unlock_out;
+ /* we enforce 32-bit alignment for pos and count in unicode mode */
+ if (uni_mode && (pos | count) & 3)
+ goto unlock_out;
+
poll = file->private_data;
if (count && poll)
poll->seen_last_update = true;
@@ -266,7 +298,27 @@ vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
con_buf_start = con_buf0 = con_buf;
orig_count = this_round;
maxcol = vc->vc_cols;
- if (!attr) {
+ if (uni_mode) {
+ unsigned int nr;
+
+ ret = vc_uniscr_check(vc);
+ if (ret)
+ break;
+ p /= 4;
+ row = p / vc->vc_cols;
+ col = p % maxcol;
+ nr = maxcol - col;
+ do {
+ if (nr > this_round/4)
+ nr = this_round/4;
+ vc_uniscr_copy_line(vc, con_buf0, row, col, nr);
+ con_buf0 += nr * 4;
+ this_round -= nr * 4;
+ row++;
+ col = 0;
+ nr = maxcol;
+ } while (this_round);
+ } else if (!attr) {
org = screen_pos(vc, p, viewed);
col = p % maxcol;
p += maxcol - col;
@@ -375,7 +427,6 @@ static ssize_t
vcs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
struct inode *inode = file_inode(file);
- unsigned int currcons = iminor(inode);
struct vc_data *vc;
long pos;
long attr, size, written;
@@ -396,7 +447,7 @@ vcs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
*/
console_lock();

- attr = (currcons & 128);
+ attr = use_attributes(inode);
ret = -ENXIO;
vc = vcs_vc(inode, &viewed);
if (!vc)
@@ -593,9 +644,15 @@ vcs_fasync(int fd, struct file *file, int on)
static int
vcs_open(struct inode *inode, struct file *filp)
{
- unsigned int currcons = iminor(inode) & 127;
+ unsigned int currcons = console(inode);
+ bool attr = use_attributes(inode);
+ bool uni_mode = use_unicode(inode);
int ret = 0;
-
+
+ /* we currently don't support attributes in unicode mode */
+ if (attr && uni_mode)
+ return -EOPNOTSUPP;
+
console_lock();
if(currcons && !vc_cons_allocated(currcons-1))
ret = -ENXIO;
@@ -628,6 +685,8 @@ void vcs_make_sysfs(int index)
{
device_create(vc_class, NULL, MKDEV(VCS_MAJOR, index + 1), NULL,
"vcs%u", index + 1);
+ device_create(vc_class, NULL, MKDEV(VCS_MAJOR, index + 65), NULL,
+ "vcsu%u", index + 1);
device_create(vc_class, NULL, MKDEV(VCS_MAJOR, index + 129), NULL,
"vcsa%u", index + 1);
}
@@ -635,6 +694,7 @@ void vcs_make_sysfs(int index)
void vcs_remove_sysfs(int index)
{
device_destroy(vc_class, MKDEV(VCS_MAJOR, index + 1));
+ device_destroy(vc_class, MKDEV(VCS_MAJOR, index + 65));
device_destroy(vc_class, MKDEV(VCS_MAJOR, index + 129));
}

@@ -647,6 +707,7 @@ int __init vcs_init(void)
vc_class = class_create(THIS_MODULE, "vc");

device_create(vc_class, NULL, MKDEV(VCS_MAJOR, 0), NULL, "vcs");
+ device_create(vc_class, NULL, MKDEV(VCS_MAJOR, 64), NULL, "vcsu");
device_create(vc_class, NULL, MKDEV(VCS_MAJOR, 128), NULL, "vcsa");
for (i = 0; i < MIN_NR_CONSOLES; i++)
vcs_make_sysfs(i);
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 7b636638b3..062ce6be79 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -481,6 +481,67 @@ static void vc_uniscr_copy_area(struct uni_screen *dst,
}
}

+/*
+ * Called from vcs_read() to make sure unicode screen retrieval is possible.
+ * This will initialize the unicode screen buffer if not already done.
+ * This returns 0 if OK, or a negative error code otherwise.
+ * In particular, -ENODATA is returned if the console is not in UTF-8 mode.
+ */
+int vc_uniscr_check(struct vc_data *vc)
+{
+ struct uni_screen *uniscr;
+ unsigned short *p;
+ int x, y, mask;
+
+ if (__is_defined(NO_VC_UNI_SCREEN))
+ return -EOPNOTSUPP;
+
+ WARN_CONSOLE_UNLOCKED();
+
+ if (!vc->vc_utf)
+ return -ENODATA;
+
+ if (vc->vc_uni_screen)
+ return 0;
+
+ uniscr = vc_uniscr_alloc(vc->vc_cols, vc->vc_rows);
+ if (!uniscr)
+ return -ENOMEM;
+
+ /*
+ * Let's populate it initially with (imperfect) reverse translation.
+ * This is the next best thing we can do short of having it enabled
+ * from the start even when no users rely on this functionality. True
+ * unicode content will be available after a complete screen refresh.
+ */
+ p = (unsigned short *)vc->vc_origin;
+ mask = vc->vc_hi_font_mask | 0xff;
+ for (y = 0; y < vc->vc_rows; y++) {
+ char32_t *line = uniscr->lines[y];
+ for (x = 0; x < vc->vc_cols; x++) {
+ u16 glyph = scr_readw(p++) & mask;
+ line[x] = inverse_translate(vc, glyph, true);
+ }
+ }
+
+ vc->vc_uni_screen = uniscr;
+ return 0;
+}
+
+/*
+ * Called from vcs_read() to get the unicode data from the screen.
+ * This must be preceded by a successful call to vc_uniscr_check() once
+ * the console lock has been taken.
+ */
+void vc_uniscr_copy_line(struct vc_data *vc, void *dest,
+ unsigned int row, unsigned int col, unsigned int nr)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ BUG_ON(!uniscr);
+ memcpy(dest, &uniscr->lines[row][col], nr * sizeof(char32_t));
+}
+

static void con_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
enum con_scroll dir, unsigned int nr)
diff --git a/include/linux/selection.h b/include/linux/selection.h
index 5b278ce99d..2b34df9f1e 100644
--- a/include/linux/selection.h
+++ b/include/linux/selection.h
@@ -42,4 +42,9 @@ extern u16 vcs_scr_readw(struct vc_data *vc, const u16 *org);
extern void vcs_scr_writew(struct vc_data *vc, u16 val, u16 *org);
extern void vcs_scr_updated(struct vc_data *vc);

+extern int vc_uniscr_check(struct vc_data *vc);
+extern void vc_uniscr_copy_line(struct vc_data *vc, void *dest,
+ unsigned int row, unsigned int col,
+ unsigned int nr);
+
#endif
--
2.17.1


2018-06-17 19:08:25

by Nicolas Pitre

[permalink] [raw]
Subject: [PATCH v2 3/4] vt: unicode fallback for scrollback

There is currently no provision for scrollback content in the core code,
leaving that to backend video drivers where this can be highly optimized.
There is currently no common method for those drivers to tell the core
what part of the scrollback is actually displayed and what size the
scrollback buffer is either. Because of that, the unicode screen buffer
has no provision for any scrollback.

At least we can provide backtranslated glyph values when the scrollback
is active which should be plenty good enough for now.

Signed-off-by: Nicolas Pitre <[email protected]>
Tested-by: Dave Mielke <[email protected]>
---
drivers/tty/vt/vc_screen.c | 3 ++-
drivers/tty/vt/vt.c | 31 +++++++++++++++++++++++++++++--
include/linux/selection.h | 2 +-
3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/vt/vc_screen.c b/drivers/tty/vt/vc_screen.c
index 9c44252e52..2384ea85ff 100644
--- a/drivers/tty/vt/vc_screen.c
+++ b/drivers/tty/vt/vc_screen.c
@@ -311,7 +311,8 @@ vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
do {
if (nr > this_round/4)
nr = this_round/4;
- vc_uniscr_copy_line(vc, con_buf0, row, col, nr);
+ vc_uniscr_copy_line(vc, con_buf0, viewed,
+ row, col, nr);
con_buf0 += nr * 4;
this_round -= nr * 4;
row++;
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 062ce6be79..2d14bb195d 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -533,13 +533,40 @@ int vc_uniscr_check(struct vc_data *vc)
* This must be preceded by a successful call to vc_uniscr_check() once
* the console lock has been taken.
*/
-void vc_uniscr_copy_line(struct vc_data *vc, void *dest,
+void vc_uniscr_copy_line(struct vc_data *vc, void *dest, int viewed,
unsigned int row, unsigned int col, unsigned int nr)
{
struct uni_screen *uniscr = get_vc_uniscr(vc);
+ int offset = row * vc->vc_size_row + col * 2;
+ unsigned long pos;

BUG_ON(!uniscr);
- memcpy(dest, &uniscr->lines[row][col], nr * sizeof(char32_t));
+
+ pos = (unsigned long)screenpos(vc, offset, viewed);
+ if (pos >= vc->vc_origin && pos < vc->vc_scr_end) {
+ /*
+ * Desired position falls in the main screen buffer.
+ * However the actual row/col might be different if
+ * scrollback is active.
+ */
+ row = (pos - vc->vc_origin) / vc->vc_size_row;
+ col = ((pos - vc->vc_origin) % vc->vc_size_row) / 2;
+ memcpy(dest, &uniscr->lines[row][col], nr * sizeof(char32_t));
+ } else {
+ /*
+ * Scrollback is active. For now let's simply backtranslate
+ * the screen glyphs until the unicode screen buffer does
+ * synchronize with console display drivers for a scrollback
+ * buffer of its own.
+ */
+ u16 *p = (u16 *)pos;
+ int mask = vc->vc_hi_font_mask | 0xff;
+ char32_t *uni_buf = dest;
+ while (nr--) {
+ u16 glyph = scr_readw(p++) & mask;
+ *uni_buf++ = inverse_translate(vc, glyph, true);
+ }
+ }
}


diff --git a/include/linux/selection.h b/include/linux/selection.h
index 2b34df9f1e..067d2e99c7 100644
--- a/include/linux/selection.h
+++ b/include/linux/selection.h
@@ -43,7 +43,7 @@ extern void vcs_scr_writew(struct vc_data *vc, u16 val, u16 *org);
extern void vcs_scr_updated(struct vc_data *vc);

extern int vc_uniscr_check(struct vc_data *vc);
-extern void vc_uniscr_copy_line(struct vc_data *vc, void *dest,
+extern void vc_uniscr_copy_line(struct vc_data *vc, void *dest, int viewed,
unsigned int row, unsigned int col,
unsigned int nr);

--
2.17.1


2018-06-17 19:08:35

by Nicolas Pitre

[permalink] [raw]
Subject: [PATCH v2 4/4] vt: coherence validation code for the unicode screen buffer

Make sure the unicode screen buffer matches the video screen content.
This is provided for debugging convenience and disabled by default.

Signed-off-by: Nicolas Pitre <[email protected]>
---
drivers/tty/vt/vt.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 2d14bb195d..81129671c9 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -321,6 +321,8 @@ void schedule_console_callback(void)
* Code to manage unicode-based screen buffers
*/

+#define VC_UNI_SCREEN_DEBUG 0
+
#ifdef NO_VC_UNI_SCREEN
/* this disables and optimizes related code away at compile time */
#define get_vc_uniscr(vc) NULL
@@ -569,6 +571,42 @@ void vc_uniscr_copy_line(struct vc_data *vc, void *dest, int viewed,
}
}

+/* this is for validation and debugging only */
+static void vc_uniscr_debug_check(struct vc_data *vc)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+ unsigned short *p;
+ int x, y, mask;
+
+ if (!VC_UNI_SCREEN_DEBUG || !uniscr)
+ return;
+
+ WARN_CONSOLE_UNLOCKED();
+
+ /*
+ * Make sure our unicode screen translates into the same glyphs
+ * as the actual screen. This is brutal indeed.
+ */
+ p = (unsigned short *)vc->vc_origin;
+ mask = vc->vc_hi_font_mask | 0xff;
+ for (y = 0; y < vc->vc_rows; y++) {
+ char32_t *line = uniscr->lines[y];
+ for (x = 0; x < vc->vc_cols; x++) {
+ u16 glyph = scr_readw(p++) & mask;
+ char32_t uc = line[x];
+ int tc = conv_uni_to_pc(vc, uc);
+ if (tc == -4)
+ tc = conv_uni_to_pc(vc, 0xfffd);
+ if (tc == -4)
+ tc = conv_uni_to_pc(vc, '?');
+ if (tc != glyph)
+ pr_notice("%s: mismatch at %d,%d: "
+ "glyph=%#x tc=%#x\n", __func__,
+ x, y, glyph, tc);
+ }
+ }
+}
+

static void con_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
enum con_scroll dir, unsigned int nr)
@@ -2717,6 +2755,7 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co
do_con_trol(tty, vc, orig);
}
con_flush(vc, draw_from, draw_to, &draw_x);
+ vc_uniscr_debug_check(vc);
console_conditional_schedule();
console_unlock();
notify_update(vc);
--
2.17.1


2018-06-17 19:09:23

by Nicolas Pitre

[permalink] [raw]
Subject: [PATCH v2 1/4] vt: preserve unicode values corresponding to screen characters

The vt code translates UTF-8 strings into glyph index values and stores
those glyph values directly in the screen buffer. Because there can only
be at most 512 glyphs, it is impossible to represent most unicode
characters, in which case a default glyph (often '?') is displayed
instead. The original unicode value is then lost.

This patch implements the basic screen buffer handling to preserve unicode
values alongside corresponding display glyphs. It is not activated by
default, meaning that people not relying on that functionality won't get
the implied overhead.

Signed-off-by: Nicolas Pitre <[email protected]>
Tested-by: Dave Mielke <[email protected]>
---
drivers/tty/vt/vt.c | 220 +++++++++++++++++++++++++++++++--
include/linux/console_struct.h | 2 +
2 files changed, 211 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 1eb1a376a0..7b636638b3 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -317,6 +317,171 @@ void schedule_console_callback(void)
schedule_work(&console_work);
}

+/*
+ * Code to manage unicode-based screen buffers
+ */
+
+#ifdef NO_VC_UNI_SCREEN
+/* this disables and optimizes related code away at compile time */
+#define get_vc_uniscr(vc) NULL
+#else
+#define get_vc_uniscr(vc) vc->vc_uni_screen
+#endif
+
+typedef uint32_t char32_t;
+
+/*
+ * Our screen buffer is preceded by an array of line pointers so that
+ * scrolling only implies some pointer shuffling.
+ */
+struct uni_screen {
+ char32_t *lines[0];
+};
+
+static struct uni_screen *vc_uniscr_alloc(unsigned int cols, unsigned int rows)
+{
+ struct uni_screen *uniscr;
+ void *p;
+ unsigned int memsize, i;
+
+ /* allocate everything in one go */
+ memsize = cols * rows * sizeof(char32_t);
+ memsize += rows * sizeof(char32_t *);
+ p = kmalloc(memsize, GFP_KERNEL);
+ if (!p)
+ return NULL;
+
+ /* initial line pointers */
+ uniscr = p;
+ p = uniscr->lines + rows;
+ for (i = 0; i < rows; i++) {
+ uniscr->lines[i] = p;
+ p += cols * sizeof(char32_t);
+ }
+ return uniscr;
+}
+
+static void vc_uniscr_set(struct vc_data *vc, struct uni_screen *new_uniscr)
+{
+ kfree(vc->vc_uni_screen);
+ vc->vc_uni_screen = new_uniscr;
+}
+
+static void vc_uniscr_putc(struct vc_data *vc, char32_t uc)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr)
+ uniscr->lines[vc->vc_y][vc->vc_x] = uc;
+}
+
+static void vc_uniscr_insert(struct vc_data *vc, unsigned int nr)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr) {
+ char32_t *ln = uniscr->lines[vc->vc_y];
+ unsigned int x = vc->vc_x, cols = vc->vc_cols;
+
+ memmove(&ln[x + nr], &ln[x], (cols - x - nr) * sizeof(*ln));
+ memset32(&ln[x], ' ', nr);
+ }
+}
+
+static void vc_uniscr_delete(struct vc_data *vc, unsigned int nr)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr) {
+ char32_t *ln = uniscr->lines[vc->vc_y];
+ unsigned int x = vc->vc_x, cols = vc->vc_cols;
+
+ memcpy(&ln[x], &ln[x + nr], (cols - x - nr) * sizeof(*ln));
+ memset32(&ln[cols - nr], ' ', nr);
+ }
+}
+
+static void vc_uniscr_clear_line(struct vc_data *vc, unsigned int x,
+ unsigned int nr)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr) {
+ char32_t *ln = uniscr->lines[vc->vc_y];
+
+ memset32(&ln[x], ' ', nr);
+ }
+}
+
+static void vc_uniscr_clear_lines(struct vc_data *vc, unsigned int y,
+ unsigned int nr)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr) {
+ unsigned int cols = vc->vc_cols;
+
+ while (nr--)
+ memset32(uniscr->lines[y++], ' ', cols);
+ }
+}
+
+static void vc_uniscr_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
+ enum con_scroll dir, unsigned int nr)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr) {
+ unsigned int s, d, rescue, clear;
+ char32_t *save[nr];
+
+ s = clear = t;
+ d = t + nr;
+ rescue = b - nr;
+ if (dir == SM_UP) {
+ swap(s, d);
+ swap(clear, rescue);
+ }
+ memcpy(save, uniscr->lines + rescue, nr * sizeof(*save));
+ memmove(uniscr->lines + d, uniscr->lines + s,
+ (b - t - nr) * sizeof(*uniscr->lines));
+ memcpy(uniscr->lines + clear, save, nr * sizeof(*save));
+ vc_uniscr_clear_lines(vc, clear, nr);
+ }
+}
+
+static void vc_uniscr_copy_area(struct uni_screen *dst,
+ unsigned int dst_cols,
+ unsigned int dst_rows,
+ struct uni_screen *src,
+ unsigned int src_cols,
+ unsigned int src_top_row,
+ unsigned int src_bot_row)
+{
+ unsigned int dst_row = 0;
+
+ if (!dst)
+ return;
+
+ while (src_top_row < src_bot_row) {
+ char32_t *src_line = src->lines[src_top_row];
+ char32_t *dst_line = dst->lines[dst_row];
+
+ memcpy(dst_line, src_line, src_cols * sizeof(char32_t));
+ if (dst_cols - src_cols)
+ memset32(dst_line + src_cols, ' ', dst_cols - src_cols);
+ src_top_row++;
+ dst_row++;
+ }
+ while (dst_row < dst_rows) {
+ char32_t *dst_line = dst->lines[dst_row];
+
+ memset32(dst_line, ' ', dst_cols);
+ dst_row++;
+ }
+}
+
+
static void con_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
enum con_scroll dir, unsigned int nr)
{
@@ -326,6 +491,7 @@ static void con_scroll(struct vc_data *vc, unsigned int t, unsigned int b,
nr = b - t - 1;
if (b > vc->vc_rows || t >= b || nr < 1)
return;
+ vc_uniscr_scroll(vc, t, b, dir, nr);
if (con_is_visible(vc) && vc->vc_sw->con_scroll(vc, t, b, dir, nr))
return;

@@ -533,6 +699,7 @@ static void insert_char(struct vc_data *vc, unsigned int nr)
{
unsigned short *p = (unsigned short *) vc->vc_pos;

+ vc_uniscr_insert(vc, nr);
scr_memmovew(p + nr, p, (vc->vc_cols - vc->vc_x - nr) * 2);
scr_memsetw(p, vc->vc_video_erase_char, nr * 2);
vc->vc_need_wrap = 0;
@@ -545,6 +712,7 @@ static void delete_char(struct vc_data *vc, unsigned int nr)
{
unsigned short *p = (unsigned short *) vc->vc_pos;

+ vc_uniscr_delete(vc, nr);
scr_memcpyw(p, p + nr, (vc->vc_cols - vc->vc_x - nr) * 2);
scr_memsetw(p + vc->vc_cols - vc->vc_x - nr, vc->vc_video_erase_char,
nr * 2);
@@ -845,10 +1013,11 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
{
unsigned long old_origin, new_origin, new_scr_end, rlth, rrem, err = 0;
unsigned long end;
- unsigned int old_rows, old_row_size;
+ unsigned int old_rows, old_row_size, first_copied_row;
unsigned int new_cols, new_rows, new_row_size, new_screen_size;
unsigned int user;
unsigned short *newscreen;
+ struct uni_screen *new_uniscr = NULL;

WARN_CONSOLE_UNLOCKED();

@@ -875,6 +1044,14 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
if (!newscreen)
return -ENOMEM;

+ if (get_vc_uniscr(vc)) {
+ new_uniscr = vc_uniscr_alloc(new_cols, new_rows);
+ if (!new_uniscr) {
+ kfree(newscreen);
+ return -ENOMEM;
+ }
+ }
+
if (vc == sel_cons)
clear_selection();

@@ -884,6 +1061,7 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
err = resize_screen(vc, new_cols, new_rows, user);
if (err) {
kfree(newscreen);
+ kfree(new_uniscr);
return err;
}

@@ -904,18 +1082,24 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
* Cursor near the bottom, copy contents from the
* bottom of buffer
*/
- old_origin += (old_rows - new_rows) * old_row_size;
+ first_copied_row = (old_rows - new_rows);
} else {
/*
* Cursor is in no man's land, copy 1/2 screenful
* from the top and bottom of cursor position
*/
- old_origin += (vc->vc_y - new_rows/2) * old_row_size;
+ first_copied_row = (vc->vc_y - new_rows/2);
}
- }
-
+ old_origin += first_copied_row * old_row_size;
+ } else
+ first_copied_row = 0;
end = old_origin + old_row_size * min(old_rows, new_rows);

+ vc_uniscr_copy_area(new_uniscr, new_cols, new_rows,
+ get_vc_uniscr(vc), rlth/2, first_copied_row,
+ min(old_rows, new_rows));
+ vc_uniscr_set(vc, new_uniscr);
+
update_attr(vc);

while (old_origin < end) {
@@ -1013,6 +1197,7 @@ struct vc_data *vc_deallocate(unsigned int currcons)
vc->vc_sw->con_deinit(vc);
put_pid(vc->vt_pid);
module_put(vc->vc_sw->owner);
+ vc_uniscr_set(vc, NULL);
kfree(vc->vc_screenbuf);
vc_cons[currcons].d = NULL;
}
@@ -1171,15 +1356,22 @@ static void csi_J(struct vc_data *vc, int vpar)

switch (vpar) {
case 0: /* erase from cursor to end of display */
+ vc_uniscr_clear_line(vc, vc->vc_x,
+ vc->vc_cols - vc->vc_x);
+ vc_uniscr_clear_lines(vc, vc->vc_y + 1,
+ vc->vc_rows - vc->vc_y - 1);
count = (vc->vc_scr_end - vc->vc_pos) >> 1;
start = (unsigned short *)vc->vc_pos;
break;
case 1: /* erase from start to cursor */
+ vc_uniscr_clear_line(vc, 0, vc->vc_x + 1);
+ vc_uniscr_clear_lines(vc, 0, vc->vc_y);
count = ((vc->vc_pos - vc->vc_origin) >> 1) + 1;
start = (unsigned short *)vc->vc_origin;
break;
case 2: /* erase whole display */
case 3: /* (and scrollback buffer later) */
+ vc_uniscr_clear_lines(vc, 0, vc->vc_rows);
count = vc->vc_cols * vc->vc_rows;
start = (unsigned short *)vc->vc_origin;
break;
@@ -1200,25 +1392,27 @@ static void csi_J(struct vc_data *vc, int vpar)
static void csi_K(struct vc_data *vc, int vpar)
{
unsigned int count;
- unsigned short * start;
+ unsigned short *start = (unsigned short *)vc->vc_pos;
+ int offset;

switch (vpar) {
case 0: /* erase from cursor to end of line */
+ offset = 0;
count = vc->vc_cols - vc->vc_x;
- start = (unsigned short *)vc->vc_pos;
break;
case 1: /* erase from start of line to cursor */
- start = (unsigned short *)(vc->vc_pos - (vc->vc_x << 1));
+ offset = -vc->vc_x;
count = vc->vc_x + 1;
break;
case 2: /* erase whole line */
- start = (unsigned short *)(vc->vc_pos - (vc->vc_x << 1));
+ offset = -vc->vc_x;
count = vc->vc_cols;
break;
default:
return;
}
- scr_memsetw(start, vc->vc_video_erase_char, 2 * count);
+ vc_uniscr_clear_line(vc, vc->vc_x + offset, count);
+ scr_memsetw(start + offset, vc->vc_video_erase_char, 2 * count);
vc->vc_need_wrap = 0;
if (con_should_update(vc))
do_update_region(vc, (unsigned long) start, count);
@@ -1232,6 +1426,7 @@ static void csi_X(struct vc_data *vc, int vpar) /* erase the following vpar posi
vpar++;
count = (vpar > vc->vc_cols - vc->vc_x) ? (vc->vc_cols - vc->vc_x) : vpar;

+ vc_uniscr_clear_line(vc, vc->vc_x, count);
scr_memsetw((unsigned short *)vc->vc_pos, vc->vc_video_erase_char, 2 * count);
if (con_should_update(vc))
vc->vc_sw->con_clear(vc, vc->vc_y, vc->vc_x, 1, count);
@@ -2188,7 +2383,7 @@ static void con_flush(struct vc_data *vc, unsigned long draw_from,
/* acquires console_lock */
static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int count)
{
- int c, tc, ok, n = 0, draw_x = -1;
+ int c, next_c, tc, ok, n = 0, draw_x = -1;
unsigned int currcons;
unsigned long draw_from = 0, draw_to = 0;
struct vc_data *vc;
@@ -2382,6 +2577,7 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co
con_flush(vc, draw_from, draw_to, &draw_x);
}

+ next_c = c;
while (1) {
if (vc->vc_need_wrap || vc->vc_decim)
con_flush(vc, draw_from, draw_to,
@@ -2392,6 +2588,7 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co
}
if (vc->vc_decim)
insert_char(vc, 1);
+ vc_uniscr_putc(vc, next_c);
scr_writew(himask ?
((vc_attr << 8) & ~himask) + ((tc & 0x100) ? himask : 0) + (tc & 0xff) :
(vc_attr << 8) + tc,
@@ -2412,6 +2609,7 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co

tc = conv_uni_to_pc(vc, ' '); /* A space is printed in the second column */
if (tc < 0) tc = ' ';
+ next_c = ' ';
}
notify_write(vc, c);

diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h
index c0ec478ea5..2c8d323989 100644
--- a/include/linux/console_struct.h
+++ b/include/linux/console_struct.h
@@ -19,6 +19,7 @@

struct vt_struct;
struct uni_pagedir;
+struct uni_screen;

#define NPAR 16

@@ -140,6 +141,7 @@ struct vc_data {
struct vc_data **vc_display_fg; /* [!] Ptr to var holding fg console for this display */
struct uni_pagedir *vc_uni_pagedir;
struct uni_pagedir **vc_uni_pagedir_loc; /* [!] Location of uni_pagedir variable for this console */
+ struct uni_screen *vc_uni_screen; /* unicode screen content */
bool vc_panic_force_write; /* when oops/panic this VC can accept forced output/blanking */
/* additional information is in vt_kern.h */
};
--
2.17.1


2018-06-17 23:01:12

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] vt: preserve unicode values corresponding to screen characters

On Sun, 17 Jun 2018 15:07:03 -0400, Nicolas Pitre said:

> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
> index 1eb1a376a0..7b636638b3 100644
> --- a/drivers/tty/vt/vt.c
> +++ b/drivers/tty/vt/vt.c
> @@ -317,6 +317,171 @@ void schedule_console_callback(void)
> schedule_work(&console_work);
> }
>
> +/*
> + * Code to manage unicode-based screen buffers
> + */
> +
> +#ifdef NO_VC_UNI_SCREEN

This preprocessor variable seems to be dangling in the breeze, with
no way for it to get set? As a result, we pick up the #else define by
default.


Attachments:
(No filename) (497.00 B)

2018-06-17 23:24:34

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] vt: preserve unicode values corresponding to screen characters

On Sun, 17 Jun 2018, [email protected] wrote:

> On Sun, 17 Jun 2018 15:07:03 -0400, Nicolas Pitre said:
>
> > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
> > index 1eb1a376a0..7b636638b3 100644
> > --- a/drivers/tty/vt/vt.c
> > +++ b/drivers/tty/vt/vt.c
> > @@ -317,6 +317,171 @@ void schedule_console_callback(void)
> > schedule_work(&console_work);
> > }
> >
> > +/*
> > + * Code to manage unicode-based screen buffers
> > + */
> > +
> > +#ifdef NO_VC_UNI_SCREEN
>
> This preprocessor variable seems to be dangling in the breeze, with
> no way for it to get set? As a result, we pick up the #else define by
> default.

That's actually what's intended here.

If vc_screen.c ever becomes configurable then this could be used to
compile out this code. Or if someone wants to disable it for some
debugging reasons. For now it is just a self-explanatory placeholder
with very little chance of being set by mistake for any other purpose.


Nicolas

2018-06-17 23:25:40

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] vt: preserve unicode values corresponding to screen characters

On Sun, 17 Jun 2018 19:17:51 -0400, Nicolas Pitre said:
> On Sun, 17 Jun 2018, [email protected] wrote:

> > This preprocessor variable seems to be dangling in the breeze, with
> > no way for it to get set? As a result, we pick up the #else define by
> > default.
>
> That's actually what's intended here.
>
> If vc_screen.c ever becomes configurable then this could be used to
> compile out this code. Or if someone wants to disable it for some
> debugging reasons. For now it is just a self-explanatory placeholder
> with very little chance of being set by mistake for any other purpose.

Oh, OK. It's code that makes sense if it's intentional - but could just as
easily been an unintended debugging/whatever leftover...


Attachments:
(No filename) (497.00 B)

2018-06-18 22:03:01

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] vt: coherence validation code for the unicode screen buffer

On Sun, Jun 17, 2018 at 10:07 PM, Nicolas Pitre
<[email protected]> wrote:
> Make sure the unicode screen buffer matches the video screen content.
> This is provided for debugging convenience and disabled by default.

> +#define VC_UNI_SCREEN_DEBUG 0

> + if (!VC_UNI_SCREEN_DEBUG || !uniscr)
> + return;

Hmm... Interesting. I would rather go with
#ifdef ..._DEBUG
...
#else
return;
#endif

It will relax requirement on how to define _DEBUG. I don't recall I
see something like you proposing in the kernel for the same purpose.

> +
> + WARN_CONSOLE_UNLOCKED();
> +
> + /*
> + * Make sure our unicode screen translates into the same glyphs
> + * as the actual screen. This is brutal indeed.
> + */
> + p = (unsigned short *)vc->vc_origin;
> + mask = vc->vc_hi_font_mask | 0xff;
> + for (y = 0; y < vc->vc_rows; y++) {
> + char32_t *line = uniscr->lines[y];
> + for (x = 0; x < vc->vc_cols; x++) {
> + u16 glyph = scr_readw(p++) & mask;
> + char32_t uc = line[x];
> + int tc = conv_uni_to_pc(vc, uc);
> + if (tc == -4)
> + tc = conv_uni_to_pc(vc, 0xfffd);
> + if (tc == -4)
> + tc = conv_uni_to_pc(vc, '?');
> + if (tc != glyph)

> + pr_notice("%s: mismatch at %d,%d: "
> + "glyph=%#x tc=%#x\n", __func__,
> + x, y, glyph, tc);

Don't split format string in printk(). checkpatch will not warn on longer lines.

> + }
> + }

--
With Best Regards,
Andy Shevchenko

2018-06-19 01:51:02

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] vt: coherence validation code for the unicode screen buffer

On Tue, 19 Jun 2018, Andy Shevchenko wrote:

> On Sun, Jun 17, 2018 at 10:07 PM, Nicolas Pitre
> <[email protected]> wrote:
> > Make sure the unicode screen buffer matches the video screen content.
> > This is provided for debugging convenience and disabled by default.
>
> > +#define VC_UNI_SCREEN_DEBUG 0
>
> > + if (!VC_UNI_SCREEN_DEBUG || !uniscr)
> > + return;
>
> Hmm... Interesting. I would rather go with
> #ifdef ..._DEBUG
> ...
> #else
> return;
> #endif
>
> It will relax requirement on how to define _DEBUG. I don't recall I
> see something like you proposing in the kernel for the same purpose.

Some random examples:

include/crypto/scatterwalk.h:106;
if (ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE && !PageSlab(page))

include/math-emu/single.h:68:
if (!FP_INHIBIT_RESULTS)

This form also allows for the compiler to parse and validate the code
whether or not the feature is enabled, while still optimizing it away in
the end if not enabled.

> > +
> > + WARN_CONSOLE_UNLOCKED();
> > +
> > + /*
> > + * Make sure our unicode screen translates into the same glyphs
> > + * as the actual screen. This is brutal indeed.
> > + */
> > + p = (unsigned short *)vc->vc_origin;
> > + mask = vc->vc_hi_font_mask | 0xff;
> > + for (y = 0; y < vc->vc_rows; y++) {
> > + char32_t *line = uniscr->lines[y];
> > + for (x = 0; x < vc->vc_cols; x++) {
> > + u16 glyph = scr_readw(p++) & mask;
> > + char32_t uc = line[x];
> > + int tc = conv_uni_to_pc(vc, uc);
> > + if (tc == -4)
> > + tc = conv_uni_to_pc(vc, 0xfffd);
> > + if (tc == -4)
> > + tc = conv_uni_to_pc(vc, '?');
> > + if (tc != glyph)
>
> > + pr_notice("%s: mismatch at %d,%d: "
> > + "glyph=%#x tc=%#x\n", __func__,
> > + x, y, glyph, tc);
>
> Don't split format string in printk(). checkpatch will not warn on longer lines.

I didn't do it like that for checkpatch but to keep the code readable.
I don't particularly care either ways though.


Nicolas

2018-06-19 04:53:20

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] vt: coherence validation code for the unicode screen buffer

On Mon, 2018-06-18 at 21:50 -0400, Nicolas Pitre wrote:
> On Tue, 19 Jun 2018, Andy Shevchenko wrote:
[]
> > > + /*
> > > + * Make sure our unicode screen translates into the same glyphs
> > > + * as the actual screen. This is brutal indeed.
> > > + */
> > > + p = (unsigned short *)vc->vc_origin;
> > > + mask = vc->vc_hi_font_mask | 0xff;
> > > + for (y = 0; y < vc->vc_rows; y++) {
> > > + char32_t *line = uniscr->lines[y];
> > > + for (x = 0; x < vc->vc_cols; x++) {
> > > + u16 glyph = scr_readw(p++) & mask;
> > > + char32_t uc = line[x];
> > > + int tc = conv_uni_to_pc(vc, uc);
> > > + if (tc == -4)
> > > + tc = conv_uni_to_pc(vc, 0xfffd);
> > > + if (tc == -4)
> > > + tc = conv_uni_to_pc(vc, '?');
> > > + if (tc != glyph)
> > > + pr_notice("%s: mismatch at %d,%d: "
> > > + "glyph=%#x tc=%#x\n", __func__,
> > > + x, y, glyph, tc);
> >
> > Don't split format string in printk(). checkpatch will not warn on longer lines.
>
> I didn't do it like that for checkpatch but to keep the code readable.
> I don't particularly care either ways though.

If one glyph is off, then perhaps others are off too.
Perhaps this message should be ratelimited.


2018-06-19 12:15:53

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] vt: coherence validation code for the unicode screen buffer

On Mon, 18 Jun 2018, Joe Perches wrote:

> On Mon, 2018-06-18 at 21:50 -0400, Nicolas Pitre wrote:
> > On Tue, 19 Jun 2018, Andy Shevchenko wrote:
> []
> > > > + /*
> > > > + * Make sure our unicode screen translates into the same glyphs
> > > > + * as the actual screen. This is brutal indeed.
> > > > + */
> > > > + p = (unsigned short *)vc->vc_origin;
> > > > + mask = vc->vc_hi_font_mask | 0xff;
> > > > + for (y = 0; y < vc->vc_rows; y++) {
> > > > + char32_t *line = uniscr->lines[y];
> > > > + for (x = 0; x < vc->vc_cols; x++) {
> > > > + u16 glyph = scr_readw(p++) & mask;
> > > > + char32_t uc = line[x];
> > > > + int tc = conv_uni_to_pc(vc, uc);
> > > > + if (tc == -4)
> > > > + tc = conv_uni_to_pc(vc, 0xfffd);
> > > > + if (tc == -4)
> > > > + tc = conv_uni_to_pc(vc, '?');
> > > > + if (tc != glyph)
> > > > + pr_notice("%s: mismatch at %d,%d: "
> > > > + "glyph=%#x tc=%#x\n", __func__,
> > > > + x, y, glyph, tc);
> > >
> > > Don't split format string in printk(). checkpatch will not warn on longer lines.
> >
> > I didn't do it like that for checkpatch but to keep the code readable.
> > I don't particularly care either ways though.
>
> If one glyph is off, then perhaps others are off too.
> Perhaps this message should be ratelimited.

Remember that this is costly debugging code that is off by default.
No production kernel should have it enabled.


Nicolas

2018-06-19 13:11:51

by Adam Borowski

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Sun, Jun 17, 2018 at 03:07:02PM -0400, Nicolas Pitre wrote:
> The vt code translates UTF-8 strings into glyph index values and stores
> those glyph values directly in the screen buffer. Because there can only
> be at most 512 glyphs, it is impossible to represent most unicode
> characters, in which case a default glyph (often '?') is displayed
> instead. The original unicode value is then lost.
>
> The 512-glyph limitation is inherent to VGA displays, but users of
> /dev/vcs* shouldn't have to be restricted to a narrow unicode space from
> lossy screen content because of that. This is especially true for
> accessibility applications such as BRLTTY that rely on /dev/vcs to rander
> screen content onto braille terminals.

You're thinking small. That 256 possible values for Braille are easily
encodable within the 512-glyph space (256 char + stolen fg brightness bit,
another CGA peculiarity). Your patchset, though, can be used for proper
Unicode support for the rest of us.

The 256/512 value limitation applies only to CGA-compatible hardware; these
days this means vgacon. But most people use other drivers. Nouveau forces
graphical console, on arm* there's no such thing as VGA[1], etc.

Thus, it'd be nice to use the structure you add to implement full Unicode
range for the vast majority of people. This includes even U+2800..FF. :)

> This patch series introduces unicode support to /dev/vcs* devices,
> allowing full unicode access from userspace to the vt console which
> can, amongst other purposes, appropriately translate actual unicode
> screen content into braille. Memory is allocated, and possible CPU
> overhead introduced, only if /dev/vcsu is read at least once.

What about doing so if any updated console driver is loaded? Possibly, once
the vt in question has been switched to (>99% people never see anything but
tty1 during boot-up, all others showing nothing but getty). Or perhaps the
moment any non-ASCII character is output to the given vt.

If memory usage is a concern, it's possible to drop the old structure and
convert back only in the rare case the driver is unloaded; reads of old-
style /dev/vc{s,sa}\d* are not speed-critical thus can use conversion on the
fuly. Unicode takes only 21 bits out of 32 you allocate, that's plenty of
space for attributes: they currently take 8 bits; naive way gives us free 3
bits that could be used for additional attributes.

Especially underline is in common use these days; efficient support for CJK
would also use one bit to mark left/right half. And it's decades overdue to
drop blink, which is not even supported by anything but vgacon anyway!
(Graphical drivers tend to show this bit as bright background, but don't
accept SGR codes other thank blink[2].)

> I'm a prime user of this feature, as well as the BRLTTY maintainer Dave Mielke
> who implemented support for this in BRLTTY. There is therefore a vested
> interest in maintaining this feature as necessary. And this received
> extensive testing as well at this point.

So, you care only about people with faulty wetware. Thus, it sounds like
work that benefits sighted people would need to be done by people other than
you. So I'm only mentioning possible changes; they could possibly go after
your patchset goes in:

A) if memory is considered to be at premium, what about storing only one
32-bit value, masked 21 bits char 11 bits attr? On non-vgacon, there's
no reason to keep the old structures.
B) if being this frugal wrt memory is ridiculous today, what about instead
going for 32 bits char (wasteful) 32 bits attr? This would be much nicer
15 bit fg color + 15 bit bg color + underline + CJK or something.
You already triple memory use; variant A) above would reduce that to 2x,
variant B) to 4x.

Considering that modern machines can draw complex scenes of several
megapixels 60 times a second, it could be reasonable to drop the complexity
of two structures even on vgacon: converting characters on the fly during vt
switch is beyond notice on any hardware Linux can run.

> This is also available on top of v4.18-rc1 here:
>
> git://git.linaro.org/people/nicolas.pitre/linux vt-unicode



Meow!

[1]. config VGA_CONSOLE
depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC && !SUPERH && \
(!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) && \
!ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !NDS32 && !S390

[2]. Sounds like an easy improvement; not so long ago I added "\e[48;5;m",
"\e[48;2;m" and "\e[100m" which could be improved when on unblinking drivers.
Heck, even VGA can be switched to unblinking by flipping bit 3 of the
Attribute Mode Control Register -- like we already flip foreground
brightness when 512 glyphs are needed.
--
⢀⣴⠾⠻⢶⣦⠀ There's an easy way to tell toy operating systems from real ones.
⣾⠁⢰⠒⠀⣿⡁ Just look at how their shipped fonts display U+1F52B, this makes
⢿⡄⠘⠷⠚⠋⠀ the intended audience obvious. It's also interesting to see OSes
⠈⠳⣄⠀⠀⠀⠀ go back and forth wrt their intended target.

2018-06-19 14:02:23

by Dave Mielke

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

[quoted lines by Adam Borowski on 2018/06/19 at 15:09 +0200]

>You're thinking small. That 256 possible values for Braille are easily
>encodable within the 512-glyph space (256 char + stolen fg brightness bit,
>another CGA peculiarity).

Not at all. We braille users, especially when working with languages other than
English, need more than 256 non-braille characters. Even for those who can live
with just 256 non-braille characters, it's still a major pain having to come up
with a usable braille-capable font for every needed 256 non-braille characters
set. I can assure you, as an actual braille user, that the limitation has been
a very long-standing problem and it's a great relief that it's finally been
resolved.

--
I believe the Bible to be the very Word of God: http://Mielke.cc/bible/
Dave Mielke | 2213 Fox Crescent | WebHome: http://Mielke.cc/
EMail: [email protected] | Ottawa, Ontario | Twitter: @Dave_Mielke
Phone: +1 613 726 0014 | Canada K2A 1H7 |

2018-06-19 15:16:01

by Adam Borowski

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Tue, Jun 19, 2018 at 09:52:13AM -0400, Dave Mielke wrote:
> [quoted lines by Adam Borowski on 2018/06/19 at 15:09 +0200]
>
> >You're thinking small. That 256 possible values for Braille are easily
> >encodable within the 512-glyph space (256 char + stolen fg brightness bit,
> >another CGA peculiarity).
>
> Not at all. We braille users, especially when working with languages other than
> English, need more than 256 non-braille characters. Even for those who can live
> with just 256 non-braille characters, it's still a major pain having to come up
> with a usable braille-capable font for every needed 256 non-braille characters
> set. I can assure you, as an actual braille user, that the limitation has been
> a very long-standing problem and it's a great relief that it's finally been
> resolved.

Ok, I thought Braille is limited to 2x3 dots, recently extended to 2x4;
thanks for the explanation!

But those of us who are sighted, are greatly annoyed by characters that are
usually taken for granted being randomly missing. For example, no console
font+mapping shipped with Debian supports ░▒▓▄▀ (despite them being a
commonly used part of the BIOS charset), so unless you go out of your way to
beat them back they'll be corrupted (usually into ♦). Then Perl6 wants 「」⚛,
and so on. All these problems would instantly disappear the moment console
sheds the limit of 256/512 glyphs.

So I'm pretty happy seeing this patch set.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ There's an easy way to tell toy operating systems from real ones.
⣾⠁⢰⠒⠀⣿⡁ Just look at how their shipped fonts display U+1F52B, this makes
⢿⡄⠘⠷⠚⠋⠀ the intended audience obvious. It's also interesting to see OSes
⠈⠳⣄⠀⠀⠀⠀ go back and forth wrt their intended target.

2018-06-19 15:31:37

by Dave Mielke

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

[quoted lines by Adam Borowski on 2018/06/19 at 17:14 +0200]

>> Not at all. We braille users, especially when working with languages other than
>> English, need more than 256 non-braille characters. Even for those who can live
>> with just 256 non-braille characters, it's still a major pain having to come up
>> with a usable braille-capable font for every needed 256 non-braille characters
>> set. I can assure you, as an actual braille user, that the limitation has been
>> a very long-standing problem and it's a great relief that it's finally been
>> resolved.
>
>Ok, I thought Braille is limited to 2x3 dots, recently extended to 2x4;
>thanks for the explanation!

Yes, that's correct, but we do things like use a single braille cell to mean
more than one thing and figure out which it is by context. That', for example,
is how we handle box drawing characters. Also, for another example, we can tell
based on which language we're currently reading.

>But those of us who are sighted, are greatly annoyed by characters that are
>usually taken for granted being randomly missing. For example, no console
>font+mapping shipped with Debian supports ░▒▓▄▀ (despite them being a
>commonly used part of the BIOS charset), so unless you go out of your way to
>beat them back they'll be corrupted (usually into ♦). Then Perl6 wants 「」⚛,
>and so on. All these problems would instantly disappear the moment console
>sheds the limit of 256/512 glyphs.

Yes, it's really the very same problem. It's just a little more annoying, I
think, in braille since, if we want the 256 braille cell characters, we need to
give up 256 useful non-braille characters.

>So I'm pretty happy seeing this patch set.

So am I! :-)

--
I believe the Bible to be the very Word of God: http://Mielke.cc/bible/
Dave Mielke | 2213 Fox Crescent | WebHome: http://Mielke.cc/
EMail: [email protected] | Ottawa, Ontario | Twitter: @Dave_Mielke
Phone: +1 613 726 0014 | Canada K2A 1H7 |

2018-06-19 15:35:41

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Tue, 19 Jun 2018, Adam Borowski wrote:

> On Sun, Jun 17, 2018 at 03:07:02PM -0400, Nicolas Pitre wrote:
> > The vt code translates UTF-8 strings into glyph index values and stores
> > those glyph values directly in the screen buffer. Because there can only
> > be at most 512 glyphs, it is impossible to represent most unicode
> > characters, in which case a default glyph (often '?') is displayed
> > instead. The original unicode value is then lost.
> >
> > The 512-glyph limitation is inherent to VGA displays, but users of
> > /dev/vcs* shouldn't have to be restricted to a narrow unicode space from
> > lossy screen content because of that. This is especially true for
> > accessibility applications such as BRLTTY that rely on /dev/vcs to rander
> > screen content onto braille terminals.
>
> You're thinking small. That 256 possible values for Braille are easily
> encodable within the 512-glyph space (256 char + stolen fg brightness bit,
> another CGA peculiarity).

Braille is not just about 256 possible patterns. It is often the case
that a single print character is transcoded into a sequence of braille
characters given that there is more than 256 possible print characters.
And there are different transcoding rules for different languages, and
even different rules across different countries with the same language.
This may get complicated very quickly and you really don't want that
processing to live in the kernel.

The point is not to have a font that displays braille but to let user
space access the actual unicode character that corresponds to a given
screen position.

> Your patchset, though, can be used for proper
> Unicode support for the rest of us.

Absolutely. I think it is generic enough so that display drivers that
would benefit from it may do so already. My patchset introduces one
user: vc_screen. The selection code could be yet another easy convert.
Beyond that it is a matter of extending the kernel interface for larger
font definitions, etc. But being sight impaired myself I won't play with
actual display driver code.

> The 256/512 value limitation applies only to CGA-compatible hardware; these
> days this means vgacon. But most people use other drivers. Nouveau forces
> graphical console, on arm* there's no such thing as VGA[1], etc.

I do agree with you.

> Thus, it'd be nice to use the structure you add to implement full Unicode
> range for the vast majority of people. This includes even U+2800..FF. :)

Be my guest if you want to use this structure. As for U+2800..FF, like I
said earlier, this is not what most people use when communicating, so it
is of little interest even to blind users except for displaying native
braille documents, or showing off. ;-)

> > This patch series introduces unicode support to /dev/vcs* devices,
> > allowing full unicode access from userspace to the vt console which
> > can, amongst other purposes, appropriately translate actual unicode
> > screen content into braille. Memory is allocated, and possible CPU
> > overhead introduced, only if /dev/vcsu is read at least once.
>
> What about doing so if any updated console driver is loaded? Possibly, once
> the vt in question has been switched to (>99% people never see anything but
> tty1 during boot-up, all others showing nothing but getty). Or perhaps the
> moment any non-ASCII character is output to the given vt.

Right now it is activated only when an actual user manifests itself. I
think this is the right thing to do. If an updated console driver is
loaded then it will activate unicode handling right away as you say.

> If memory usage is a concern, it's possible to drop the old structure and
> convert back only in the rare case the driver is unloaded; reads of old-
> style /dev/vc{s,sa}\d* are not speed-critical thus can use conversion on the
> fuly. Unicode takes only 21 bits out of 32 you allocate, that's plenty of
> space for attributes: they currently take 8 bits; naive way gives us free 3
> bits that could be used for additional attributes.

If the core console code makes the switch to full unicode then yes, that
would be the way to go to maintain backward compatibility. However
vgacon users would see a performance drop when switching between VT's
and we used to brag about how fast the Linux console used to be 20 years
ago. Does it still matter today?

> > I'm a prime user of this feature, as well as the BRLTTY maintainer Dave Mielke
> > who implemented support for this in BRLTTY. There is therefore a vested
> > interest in maintaining this feature as necessary. And this received
> > extensive testing as well at this point.
>
> So, you care only about people with faulty wetware. Thus, it sounds like
> work that benefits sighted people would need to be done by people other than
> you.

Hard for me to contribute more if I can't enjoy the result.

> So I'm only mentioning possible changes; they could possibly go after
> your patchset goes in:
>
> A) if memory is considered to be at premium, what about storing only one
> 32-bit value, masked 21 bits char 11 bits attr? On non-vgacon, there's
> no reason to keep the old structures.

Absolutely. As soon as vgacon is officially relegated to second class
citizen i.e. perform the glyph translation each time it requires
a refresh instead of dictating how the core console code works then the
central glyph buffer can go.

> B) if being this frugal wrt memory is ridiculous today, what about instead
> going for 32 bits char (wasteful) 32 bits attr? This would be much nicer
> 15 bit fg color + 15 bit bg color + underline + CJK or something.
> You already triple memory use; variant A) above would reduce that to 2x,
> variant B) to 4x.
>
> Considering that modern machines can draw complex scenes of several
> megapixels 60 times a second, it could be reasonable to drop the complexity
> of two structures even on vgacon: converting characters on the fly during vt
> switch is beyond notice on any hardware Linux can run.

You certainly won't find any objections from me.

In the mean time, both systems may work in parallel for a smooth
transition.


Nicolas

2018-06-21 01:45:29

by Adam Borowski

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Tue, Jun 19, 2018 at 11:34:34AM -0400, Nicolas Pitre wrote:
> On Tue, 19 Jun 2018, Adam Borowski wrote:
> > Thus, it'd be nice to use the structure you add to implement full Unicode
> > range for the vast majority of people. This includes even U+2800..FF. :)
>
> Be my guest if you want to use this structure. As for U+2800..FF, like I
> said earlier, this is not what most people use when communicating, so it
> is of little interest even to blind users except for displaying native
> braille documents, or showing off. ;-)

It's meant for displaying braille to _sighted_ people. And in real world,
the main [ab]use is a way to show images that won't get corrupted by
proportional fonts. :-þ

> If the core console code makes the switch to full unicode then yes, that
> would be the way to go to maintain backward compatibility. However
> vgacon users would see a performance drop when switching between VT's
> and we used to brag about how fast the Linux console used to be 20 years
> ago. Does it still matter today?

I've seen this slowness. A long time ago, on a server that someone gave an
_ISA_ graphics card (it was an old machine, and it was 1.5 decades ago).
Indeed, switching VTs took around a second. But this was drawing speed, not
Unicode conversion.

There are three cases when a character can enter the screen:
* being printed by the tty. This is the only case not sharply rate-limited.
It already has to do the conversion. If we eliminate the old struct, it
might even be a speed-up when lots of text gets blasted to a non-active
VT.
* VT switch
* scrollback

The last two cases are initiated by the user, and within human reaction time
you need to convert usually 2000 -- up to 20k-ish -- characters. The
conversion is done by a 3-level array. I think a ZX Spectrum can handle
this fine without a visible slowdown.

> > > I'm a prime user of this feature, as well as the BRLTTY maintainer Dave Mielke
> > > who implemented support for this in BRLTTY. There is therefore a vested
> > > interest in maintaining this feature as necessary. And this received
> > > extensive testing as well at this point.
> >
> > So, you care only about people with faulty wetware. Thus, it sounds like
> > work that benefits sighted people would need to be done by people other than
> > you.
>
> Hard for me to contribute more if I can't enjoy the result.

Obviously.

The primary users would be:
* people who want symbols uncorrupted (especially if their language uses a
non-latin script)
* CJK people (as discussed below)

It could also simplify the life for distros -- less required configuration:
a single font needed for currently supported charsets together has mere
~1000 glyphs, at 8x16 that's 16000 bytes (+ mapping). Obviously for CJK
that's more.

> > So I'm only mentioning possible changes; they could possibly go after
> > your patchset goes in:
> >
> > A) if memory is considered to be at premium, what about storing only one
> > 32-bit value, masked 21 bits char 11 bits attr? On non-vgacon, there's
> > no reason to keep the old structures.
>
> Absolutely. As soon as vgacon is officially relegated to second class
> citizen i.e. perform the glyph translation each time it requires
> a refresh instead of dictating how the core console code works then the
> central glyph buffer can go.

Per the analysis above, on-the-fly translation is so unobtrusive that it
shouldn't be a problem.

> > B) if being this frugal wrt memory is ridiculous today, what about instead
> > going for 32 bits char (wasteful) 32 bits attr? This would be much nicer
> > 15 bit fg color + 15 bit bg color + underline + CJK or something.
> > You already triple memory use; variant A) above would reduce that to 2x,
> > variant B) to 4x.
>
> You certainly won't find any objections from me.

Right, let's see if your patchset gets okayed before building atop it.

> In the mean time, both systems may work in parallel for a smooth
> transition.

Sounds like a good idea.


WRT support for fonts >512 glyphs: I talked to a Chinese hacker (log
starting at 15:32 on https://irclog.whitequark.org/linux-sunxi/2018-06-19),
she said there are multiple popular non-mainline patchsets implementing CJK
on console. None of them got accepted because of pretty bad code like
https://github.com/Gentoo-zh/linux-cjktty/commit/b6160f85ef5bc5c2cae460f6c0a1aba3e417464f
but getting this done cleanly would require just:
* your patchset here
* console driver using the Unicode structure
* loading such larger fonts (the one in cjktty is built-in)
* double-width characters in vt.c


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ There's an easy way to tell toy operating systems from real ones.
⣾⠁⢰⠒⠀⣿⡁ Just look at how their shipped fonts display U+1F52B, this makes
⢿⡄⠘⠷⠚⠋⠀ the intended audience obvious. It's also interesting to see OSes
⠈⠳⣄⠀⠀⠀⠀ go back and forth wrt their intended target.

2018-06-21 02:22:37

by Dave Mielke

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

[quoted lines by Adam Borowski on 2018/06/21 at 03:43 +0200]

>It's meant for displaying braille to _sighted_ people. And in real world,
>the main [ab]use is a way to show images that won't get corrupted by
>proportional fonts. :-?

It's not abuse at all. I often use U+28xx to show sighted people what the
braille for something looks like. I often need to do this when, for example, I
need them to comapre what I'm showing them to what's on an actual braille
display. U+28xx is the only way for me to do this without a lengthy description
containing sequences of dot number combinations.

Also, U+28xx is the only reasonable way to share braille music between people
who use different languages.

>The primary users would be:
>* people who want symbols uncorrupted (especially if their language uses a
> non-latin script)
>* CJK people (as discussed below)

Again, that's not true. Why aren't braille users included in this list? After
all, it's we who motivated this enhancement. I guess actual blind people
mustn't count just because there are relatively fewer of us. :-(

--
I believe the Bible to be the very Word of God: http://Mielke.cc/bible/
Dave Mielke | 2213 Fox Crescent | WebHome: http://Mielke.cc/
EMail: [email protected] | Ottawa, Ontario | Twitter: @Dave_Mielke
Phone: +1 613 726 0014 | Canada K2A 1H7 |

2018-06-21 03:00:04

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Thu, 21 Jun 2018, Adam Borowski wrote:

> On Tue, Jun 19, 2018 at 11:34:34AM -0400, Nicolas Pitre wrote:
> > On Tue, 19 Jun 2018, Adam Borowski wrote:
> > > Thus, it'd be nice to use the structure you add to implement full Unicode
> > > range for the vast majority of people. This includes even U+2800..FF. :)
> >
> > Be my guest if you want to use this structure. As for U+2800..FF, like I
> > said earlier, this is not what most people use when communicating, so it
> > is of little interest even to blind users except for displaying native
> > braille documents, or showing off. ;-)
>
> It's meant for displaying braille to _sighted_ people. And in real world,
> the main [ab]use is a way to show images that won't get corrupted by
> proportional fonts. :-þ

⠨⠺⠑⠇⠇⠂ ⠥⠎⠊⠝⠛ ⠇⠁⠞⠑⠎⠞ ⠨⠨⠃⠗⠇⠞⠞⠽ ⠊⠞ ⠊⠎ ⠏⠕⠎⠎⠊⠃⠇⠑ ⠞⠕ ⠞⠽⠏⠑ ⠁⠝⠙ ⠙⠊⠎⠏⠇⠁⠽ ⠝⠁⠞⠊⠧⠑
⠃⠗⠁⠊⠇⠇⠑ ⠕⠝ ⠁ ⠃⠗⠁⠊⠇⠇⠑ ⠞⠑⠗⠍⠊⠝⠁⠇ ⠕⠝⠉⠑ ⠞⠓⠊⠎ ⠏⠁⠞⠉⠓ ⠎⠑⠗⠊⠑⠎ ⠊⠎ ⠁⠏⠏⠇⠊⠑⠙ ⠞⠕ ⠞⠓⠑
⠅⠑⠗⠝⠑⠇⠂ ⠊⠗⠗⠑⠎⠏⠑⠉⠞⠊⠧⠑ ⠕⠋ ⠞⠓⠑ ⠁⠉⠞⠥⠁⠇ ⠉⠕⠝⠎⠕⠇⠑ ⠋⠕⠝⠞ ⠊⠝ ⠥⠎⠑⠲ ⠨⠁⠝⠙ ⠋⠕⠗ ⠞⠓⠁⠞ ⠺⠑
⠥⠎⠑ ⠉⠕⠗⠗⠑⠎⠏⠕⠝⠙⠊⠝⠛ ⠥⠝⠊⠉⠕⠙⠑ ⠉⠓⠁⠗⠁⠉⠞⠑⠗⠎⠲

> > If the core console code makes the switch to full unicode then yes, that
> > would be the way to go to maintain backward compatibility. However
> > vgacon users would see a performance drop when switching between VT's
> > and we used to brag about how fast the Linux console used to be 20 years
> > ago. Does it still matter today?
>
> I've seen this slowness. A long time ago, on a server that someone gave an
> _ISA_ graphics card (it was an old machine, and it was 1.5 decades ago).
> Indeed, switching VTs took around a second. But this was drawing speed, not
> Unicode conversion.
>
> There are three cases when a character can enter the screen:
> * being printed by the tty. This is the only case not sharply rate-limited.
> It already has to do the conversion. If we eliminate the old struct, it
> might even be a speed-up when lots of text gets blasted to a non-active
> VT.

That's a good point.

> * VT switch
> * scrollback
>
> The last two cases are initiated by the user, and within human reaction time
> you need to convert usually 2000 -- up to 20k-ish -- characters. The
> conversion is done by a 3-level array. I think a ZX Spectrum can handle
> this fine without a visible slowdown.

In the scrollback case, currently each driver is doing its own thing.
The vgacon driver is probably the most efficient as it only moves the
base memory register around without copying anything at all. And that
part doesn't have to change.

> The primary users would be:
> * people who want symbols uncorrupted (especially if their language uses a
> non-latin script)
> * CJK people (as discussed below)
>
> It could also simplify the life for distros -- less required configuration:
> a single font needed for currently supported charsets together has mere
> ~1000 glyphs, at 8x16 that's 16000 bytes (+ mapping). Obviously for CJK
> that's more.
>
> > > So I'm only mentioning possible changes; they could possibly go after
> > > your patchset goes in:
> > >
> > > A) if memory is considered to be at premium, what about storing only one
> > > 32-bit value, masked 21 bits char 11 bits attr? On non-vgacon, there's
> > > no reason to keep the old structures.
> >
> > Absolutely. As soon as vgacon is officially relegated to second class
> > citizen i.e. perform the glyph translation each time it requires
> > a refresh instead of dictating how the core console code works then the
> > central glyph buffer can go.
>
> Per the analysis above, on-the-fly translation is so unobtrusive that it
> shouldn't be a problem.
>
> > > B) if being this frugal wrt memory is ridiculous today, what about instead
> > > going for 32 bits char (wasteful) 32 bits attr? This would be much nicer
> > > 15 bit fg color + 15 bit bg color + underline + CJK or something.
> > > You already triple memory use; variant A) above would reduce that to 2x,
> > > variant B) to 4x.
> >
> > You certainly won't find any objections from me.
>
> Right, let's see if your patchset gets okayed before building atop it.

May I add your ACK to it?


Nicolas

2018-06-21 03:04:21

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Wed, 20 Jun 2018, Dave Mielke wrote:

> [quoted lines by Adam Borowski on 2018/06/21 at 03:43 +0200]
>
> >It's meant for displaying braille to _sighted_ people. And in real world,
> >the main [ab]use is a way to show images that won't get corrupted by
> >proportional fonts. :-?
>
> It's not abuse at all. I often use U+28xx to show sighted people what the
> braille for something looks like. I often need to do this when, for example, I
> need them to comapre what I'm showing them to what's on an actual braille
> display. U+28xx is the only way for me to do this without a lengthy description
> containing sequences of dot number combinations.
>
> Also, U+28xx is the only reasonable way to share braille music between people
> who use different languages.
>
> >The primary users would be:
> >* people who want symbols uncorrupted (especially if their language uses a
> > non-latin script)
> >* CJK people (as discussed below)
>
> Again, that's not true. Why aren't braille users included in this list? After
> all, it's we who motivated this enhancement. I guess actual blind people
> mustn't count just because there are relatively fewer of us. :-(

I'd say blind people fall in the "people who want symbols uncorrupted"
category.


Nicolas

2018-06-22 01:56:08

by Adam Borowski

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Wed, Jun 20, 2018 at 10:21:37PM -0400, Dave Mielke wrote:
> [quoted lines by Adam Borowski on 2018/06/21 at 03:43 +0200]
>
> >It's meant for displaying braille to _sighted_ people. And in real world,
> >the main [ab]use is a way to show images that won't get corrupted by
> >proportional fonts. :-þ
>
> It's not abuse at all. I often use U+28xx to show sighted people what the
> braille for something looks like. I often need to do this when, for example, I
> need them to comapre what I'm showing them to what's on an actual braille
> display. U+28xx is the only way for me to do this without a lengthy description
> containing sequences of dot number combinations.

What you describe is the intended use. Abuse is when people use these
glyphs to write text like this:

⡎⠉⠂⠠⠤⡀⣄⠤⡀⠀⠀⠀⡄⠀⡄⡠⠤⡀⡄⠀⡄⠀⠀⠀⣄⠤⡀⡠⠤⡀⠠⠤⡀⡠⠤⡇⠀⠀⠀⠤⡧⠄⣇⠤⡀⠠⡅⠀⡠⠤⠄⠎⢉⠆
⠣⠤⠂⠪⠭⠇⠇⠀⠇⠀⠀⠀⠨⠭⠃⠣⠤⠃⠣⠤⠃⠀⠀⠀⠇⠀⠀⠫⠭⠁⠪⠭⠇⠣⠤⠇⠀⠀⠀⠀⠣⠄⠇⠀⠇⠀⠣⠀⠬⠭⠂⠀⠅⠀

(Not sure if you're completely blind or merely very weakly sighted; if the
former, this is my way to show you how actual Latin letters look like,
without a lengthy description of letter shapes. :) )

or for graphs. Here's commits per UTC hour of day:

⡀⠀⠀⢀⣴⣤⣴⣶⣶⣶⣾⣦
⣿⣷⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿

git log --pretty=format:'%at'|
perl -pe 'use integer;/^(\d+)$/ or die;$_=$1/3600%24 ."\n"'|
sort -n|uniq -c|cut -c3-7|braillegraph -y 8

or arbitrary images, like my .sig in all my mails in this thread.

But your patch set doesn't special-case braille in any way, thus allowing
such abuse to work on the console is merely an unintended side effect.

> >The primary users would be:
> >* people who want symbols uncorrupted (especially if their language uses a
> > non-latin script)
> >* CJK people (as discussed below)
>
> Again, that's not true. Why aren't braille users included in this list? After
> all, it's we who motivated this enhancement. I guess actual blind people
> mustn't count just because there are relatively fewer of us. :-(

Well, I meant users of Unicode display fonts, ie, _additional_ functionality
that's not yet coded but would rely on this patchset. What you guys want is
already included.

The reason I'm raising this issue now is because if the Unicode struct would
be the primary one, there's no point in keeping vc_data in addition to
uni_screen. And that would require designing the struct well from the
start, to avoid unnecessary changes in the future.

But then, taking a bitmask from that 32-bit value wouldn't be a big change
-- you already take variously 8 or 9 bits out of a 16-bit field, depending
on 256 vs 512 glyph mode.

The other point is a quite pointless assumption that existing scrollback is
"optimized". Even vgacon mostly uses software scrollback these days, as the
amount of VGA display memory is really small.

I don't know much about console display drivers in general, though, and it
looks like most of them are unmaintained (just noticed that sisusb for
example hasn't seen a maintainer action for 13 years, and that person's
domain expired in 2006).


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ There's an easy way to tell toy operating systems from real ones.
⣾⠁⢰⠒⠀⣿⡁ Just look at how their shipped fonts display U+1F52B, this makes
⢿⡄⠘⠷⠚⠋⠀ the intended audience obvious. It's also interesting to see OSes
⠈⠳⣄⠀⠀⠀⠀ go back and forth wrt their intended target.

2018-06-22 06:42:51

by Samuel Thibault

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

Adam Borowski, le ven. 22 juin 2018 03:54:45 +0200, a ecrit:
> if the former, this is my way to show you how actual Latin letters
> look like, without a lengthy description of letter shapes. :) )

What will unfortunately not work: braille displays only show one line at
a time, so they can't show the global shape of a figure, unless it is
only 4pixels tall.

Samuel

2018-06-22 16:02:51

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

> The other point is a quite pointless assumption that existing scrollback is
> "optimized". Even vgacon mostly uses software scrollback these days, as the
> amount of VGA display memory is really small.

All of our console driver code is horribly unoptimized for most of
todays hardware. Long ago I did look at what was needed but it's a
seriously non-trivial change. In particular

- Console I/O occurs under enough locks to keep fort knox safe. That
means it's very very hard to accelerate

- The logic is plain wrong for a lot of modern video. We shouldn't be
scrolling, we should be rendering the current backing text buffer at
video refresh rate or similar and if the source of the updates outruns
us it doesn't matter - we don't have to draw all the glyphs as if we
were fast enough they would have been a blur anyway.

> I don't know much about console display drivers in general, though, and it
> looks like most of them are unmaintained (just noticed that sisusb for
> example hasn't seen a maintainer action for 13 years, and that person's
> domain expired in 2006).

There has been some work on them but they are not in a good state, and as
a result we have problems like these as well as the inability to nicely
support multi-console systems except in Wayland/X.

Alan


2018-06-22 16:30:13

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Fri, 22 Jun 2018, Alan Cox wrote:

> > The other point is a quite pointless assumption that existing scrollback is
> > "optimized". Even vgacon mostly uses software scrollback these days, as the
> > amount of VGA display memory is really small.
>
> All of our console driver code is horribly unoptimized for most of
> todays hardware. Long ago I did look at what was needed but it's a
> seriously non-trivial change. In particular
>
> - Console I/O occurs under enough locks to keep fort knox safe. That
> means it's very very hard to accelerate
>
> - The logic is plain wrong for a lot of modern video. We shouldn't be
> scrolling, we should be rendering the current backing text buffer at
> video refresh rate or similar and if the source of the updates outruns
> us it doesn't matter - we don't have to draw all the glyphs as if we
> were fast enough they would have been a blur anyway.

My executive summary from what you say is that there is no longer an
advantage to maintain a central vga-style glyph buffer in the core
console code, right?


Nicolas

2018-06-22 17:52:43

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Fri, 22 Jun 2018 12:28:17 -0400 (EDT)
Nicolas Pitre <[email protected]> wrote:

> On Fri, 22 Jun 2018, Alan Cox wrote:
>
> > > The other point is a quite pointless assumption that existing scrollback is
> > > "optimized". Even vgacon mostly uses software scrollback these days, as the
> > > amount of VGA display memory is really small.
> >
> > All of our console driver code is horribly unoptimized for most of
> > todays hardware. Long ago I did look at what was needed but it's a
> > seriously non-trivial change. In particular
> >
> > - Console I/O occurs under enough locks to keep fort knox safe. That
> > means it's very very hard to accelerate
> >
> > - The logic is plain wrong for a lot of modern video. We shouldn't be
> > scrolling, we should be rendering the current backing text buffer at
> > video refresh rate or similar and if the source of the updates outruns
> > us it doesn't matter - we don't have to draw all the glyphs as if we
> > were fast enough they would have been a blur anyway.
>
> My executive summary from what you say is that there is no longer an
> advantage to maintain a central vga-style glyph buffer in the core
> console code, right?

Yeah. The only driver that it suits is the VGA text mode driver, which at
2GHz+ is going to be fast enough whatever format you convert from. We
have the memory, the processor power and the fact almost all our displays
are bitmapped (or more complex still) all in favour of throwing away that
limit.

Alan

2018-06-25 00:34:12

by Adam Borowski

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] have the vt console preserve unicode characters

On Wed, Jun 20, 2018 at 10:59:08PM -0400, Nicolas Pitre wrote:
> On Thu, 21 Jun 2018, Adam Borowski wrote:
>
> > On Tue, Jun 19, 2018 at 11:34:34AM -0400, Nicolas Pitre wrote:
> > > On Tue, 19 Jun 2018, Adam Borowski wrote:
> > > > Thus, it'd be nice to use the structure you add to implement full Unicode
> > > > range for the vast majority of people. This includes even U+2800..FF. :)

> > > If the core console code makes the switch to full unicode then yes, that
> > > would be the way to go to maintain backward compatibility. However
> > > vgacon users would see a performance drop when switching between VT's
> > > and we used to brag about how fast the Linux console used to be 20 years
> > > ago. Does it still matter today?
>
> > * VT switch
> > * scrollback
> >
> > The last two cases are initiated by the user, and within human reaction time
> > you need to convert usually 2000 -- up to 20k-ish -- characters. The
> > conversion is done by a 3-level array. I think a ZX Spectrum can handle
> > this fine without a visible slowdown.
>
> In the scrollback case, currently each driver is doing its own thing.
> The vgacon driver is probably the most efficient as it only moves the
> base memory register around without copying anything at all. And that
> part doesn't have to change.

As long as the data is still in video memory, yeah. Soft scrollback is not
yet the default, because some userspace tools assume vt switch clears
scrollback and do so for security reasons. All known tools that do so have
been fixed (at least in Debian), but as you can run new kernels with
arbitrarily old userspace, it's better to wait a bit longer before switching
to something effectively identical to soft scrollback. Failure mode: after
logout, you can scroll back to the supposedly cleared content of the old
session.

Your code avoids this, at the cost of losing data about anything
representable by the currently loaded charset for anything inside
scrollback.

But in the near future, it'd be good to have soft scrollback work the same
for all drivers.

> > Right, let's see if your patchset gets okayed before building atop it.
>
> May I add your ACK to it?

I don't believe I'm knowledgeful nor active enough in this part for my ACKs
to be meaningful. On the other hand, I've analyzed your patchset long
enough to see no problems with it, thus if you have an use for my tags, then
sure, you have my ACK.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ There's an easy way to tell toy operating systems from real ones.
⣾⠁⢰⠒⠀⣿⡁ Just look at how their shipped fonts display U+1F52B, this makes
⢿⡄⠘⠷⠚⠋⠀ the intended audience obvious. It's also interesting to see OSes
⠈⠳⣄⠀⠀⠀⠀ go back and forth wrt their intended target.