2018-07-18 02:07:05

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 0/3] use unicode for vt mouse paste

Hi!
Based on Nicolas' nice work (in tty-next), let's avoid corrupting characters
that have been copy+pasted via mouse selection. The uniscr array holds
their original identity even if they got mangled by glyph conversion.
The glyph conversion lossily turns similar-looking characters into a
representation, and everyone else into a replacement character.

There's no proper handling for CJK (yet?) but anything of wcwidth()==1 will
work fine.

The whole thing doesn't get enabled until something reads from /dev/vcsu for
that console, but let's test this code first before enabling it widely.


Diffstat:
drivers/tty/vt/selection.c | 48 +++++++++++++++++++++++++++++-------------------
drivers/tty/vt/vt.c | 10 ++++++++++
include/linux/selection.h | 1 +
3 files changed, 40 insertions(+), 19 deletions(-)


--
// If you believe in so-called "intellectual property", please immediately
// cease using counterfeit alphabets. Instead, contact the nearest temple
// of Amon, whose priests will provide you with scribal services for all
// your writing needs, for Reasonable And Non-Discriminatory prices.


2018-07-18 02:11:55

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 1/3] vt: don't reinvent min()

All the helper function saved us was a cast.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/vt/selection.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/tty/vt/selection.c b/drivers/tty/vt/selection.c
index 90ea1cc52b7a..34e7110f310d 100644
--- a/drivers/tty/vt/selection.c
+++ b/drivers/tty/vt/selection.c
@@ -116,12 +116,6 @@ static inline int atedge(const int p, int size_row)
return (!(p % size_row) || !((p + 2) % size_row));
}

-/* constrain v such that v <= u */
-static inline unsigned short limit(const unsigned short v, const unsigned short u)
-{
- return (v > u) ? u : v;
-}
-
/* stores the char in UTF8 and returns the number of bytes used (1-3) */
static int store_utf8(u16 c, char *p)
{
@@ -167,10 +161,10 @@ int set_selection(const struct tiocl_selection __user *sel, struct tty_struct *t
if (copy_from_user(&v, sel, sizeof(*sel)))
return -EFAULT;

- v.xs = limit(v.xs - 1, vc->vc_cols - 1);
- v.ys = limit(v.ys - 1, vc->vc_rows - 1);
- v.xe = limit(v.xe - 1, vc->vc_cols - 1);
- v.ye = limit(v.ye - 1, vc->vc_rows - 1);
+ v.xs = min_t(u16, v.xs - 1, vc->vc_cols - 1);
+ v.ys = min_t(u16, v.ys - 1, vc->vc_rows - 1);
+ v.xe = min_t(u16, v.xe - 1, vc->vc_cols - 1);
+ v.ye = min_t(u16, v.ye - 1, vc->vc_rows - 1);
ps = v.ys * vc->vc_size_row + (v.xs << 1);
pe = v.ye * vc->vc_size_row + (v.xe << 1);

--
2.18.0


2018-07-18 02:11:55

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 3/3] vt: selection: take screen contents from uniscr if available

This preserves whatever was written even if we can't currently display the
given glyph. Mouse paste won't corrupt any character of wcwidth() == 1
anymore.

Note that for now uniscr doesn't get allocated until something reads
/dev/vcsuN for that console, making this code dormant for most users.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/vt/selection.c | 11 +++++++----
drivers/tty/vt/vt.c | 10 ++++++++++
include/linux/selection.h | 1 +
3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/vt/selection.c b/drivers/tty/vt/selection.c
index 69ca337d3220..07496c711d7d 100644
--- a/drivers/tty/vt/selection.c
+++ b/drivers/tty/vt/selection.c
@@ -57,11 +57,13 @@ static inline void highlight_pointer(const int where)
complement_pos(sel_cons, where);
}

-static u16
+static u32
sel_pos(int n)
{
+ if (use_unicode)
+ return screen_glyph_unicode(sel_cons, n / 2);
return inverse_translate(sel_cons, screen_glyph(sel_cons, n),
- use_unicode);
+ 0);
}

/**
@@ -90,7 +92,8 @@ static u32 inwordLut[]={
0x07FFFFFE, /* lowercase */
};

-static inline int inword(const u16 c) {
+static inline int inword(const u32 c)
+{
return c > 0x7f || (( inwordLut[c>>5] >> (c & 0x1F) ) & 1);
}

@@ -167,7 +170,7 @@ int set_selection(const struct tiocl_selection __user *sel, struct tty_struct *t
struct tiocl_selection v;
char *bp, *obp;
int i, ps, pe, multiplier;
- u16 c;
+ u32 c;
int mode;

poke_blanked_console();
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 7fcb0ff2dccf..19da4c0b4b8e 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -4545,6 +4545,16 @@ u16 screen_glyph(struct vc_data *vc, int offset)
}
EXPORT_SYMBOL_GPL(screen_glyph);

+u32 screen_glyph_unicode(struct vc_data *vc, int n)
+{
+ struct uni_screen *uniscr = get_vc_uniscr(vc);
+
+ if (uniscr)
+ return uniscr->lines[n / vc->vc_cols][n % vc->vc_cols];
+ return inverse_translate(vc, screen_glyph(vc, n * 2), 1);
+}
+EXPORT_SYMBOL_GPL(screen_glyph_unicode);
+
/* used by vcs - note the word offset */
unsigned short *screen_pos(struct vc_data *vc, int w_offset, int viewed)
{
diff --git a/include/linux/selection.h b/include/linux/selection.h
index 067d2e99c79f..a8f5b97b216f 100644
--- a/include/linux/selection.h
+++ b/include/linux/selection.h
@@ -32,6 +32,7 @@ extern unsigned char default_blu[];

extern unsigned short *screen_pos(struct vc_data *vc, int w_offset, int viewed);
extern u16 screen_glyph(struct vc_data *vc, int offset);
+extern u32 screen_glyph_unicode(struct vc_data *vc, int offset);
extern void complement_pos(struct vc_data *vc, int offset);
extern void invert_screen(struct vc_data *vc, int offset, int count, int shift);

--
2.18.0


2018-07-18 02:13:03

by Adam Borowski

[permalink] [raw]
Subject: [PATCH 2/3] vt: selection: handle storing of characters above U+FFFF

Those above U+10FFFF get replaced with U+FFFD.

Signed-off-by: Adam Borowski <[email protected]>
---
drivers/tty/vt/selection.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/vt/selection.c b/drivers/tty/vt/selection.c
index 34e7110f310d..69ca337d3220 100644
--- a/drivers/tty/vt/selection.c
+++ b/drivers/tty/vt/selection.c
@@ -116,8 +116,8 @@ static inline int atedge(const int p, int size_row)
return (!(p % size_row) || !((p + 2) % size_row));
}

-/* stores the char in UTF8 and returns the number of bytes used (1-3) */
-static int store_utf8(u16 c, char *p)
+/* stores the char in UTF8 and returns the number of bytes used (1-4) */
+static int store_utf8(u32 c, char *p)
{
if (c < 0x80) {
/* 0******* */
@@ -128,13 +128,26 @@ static int store_utf8(u16 c, char *p)
p[0] = 0xc0 | (c >> 6);
p[1] = 0x80 | (c & 0x3f);
return 2;
- } else {
+ } else if (c < 0x10000) {
/* 1110**** 10****** 10****** */
p[0] = 0xe0 | (c >> 12);
p[1] = 0x80 | ((c >> 6) & 0x3f);
p[2] = 0x80 | (c & 0x3f);
return 3;
- }
+ } else if (c < 0x110000) {
+ /* 11110*** 10****** 10****** 10****** */
+ p[0] = 0xf0 | (c >> 18);
+ p[1] = 0x80 | ((c >> 12) & 0x3f);
+ p[2] = 0x80 | ((c >> 6) & 0x3f);
+ p[3] = 0x80 | (c & 0x3f);
+ return 4;
+ } else {
+ /* outside Unicode, replace with U+FFFD */
+ p[0] = 0xef;
+ p[1] = 0xbf;
+ p[2] = 0xbd;
+ return 3;
+ }
}

/**
@@ -273,7 +286,7 @@ int set_selection(const struct tiocl_selection __user *sel, struct tty_struct *t
sel_end = new_sel_end;

/* Allocate a new buffer before freeing the old one ... */
- multiplier = use_unicode ? 3 : 1; /* chars can take up to 3 bytes */
+ multiplier = use_unicode ? 4 : 1; /* chars can take up to 4 bytes */
bp = kmalloc_array((sel_end - sel_start) / 2 + 1, multiplier,
GFP_KERNEL);
if (!bp) {
--
2.18.0


2018-07-18 03:12:20

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 0/3] use unicode for vt mouse paste

On Wed, 18 Jul 2018, Adam Borowski wrote:

> Hi!
> Based on Nicolas' nice work (in tty-next), let's avoid corrupting characters
> that have been copy+pasted via mouse selection. The uniscr array holds
> their original identity even if they got mangled by glyph conversion.
> The glyph conversion lossily turns similar-looking characters into a
> representation, and everyone else into a replacement character.
>
> There's no proper handling for CJK (yet?) but anything of wcwidth()==1 will
> work fine.
>
> The whole thing doesn't get enabled until something reads from /dev/vcsu for
> that console, but let's test this code first before enabling it widely.

Glad to see this. For the whole set you may add:

Acked-by: Nicolas Pitre <[email protected]>


> Diffstat:
> drivers/tty/vt/selection.c | 48 +++++++++++++++++++++++++++++-------------------
> drivers/tty/vt/vt.c | 10 ++++++++++
> include/linux/selection.h | 1 +
> 3 files changed, 40 insertions(+), 19 deletions(-)
>
>
> --
> // If you believe in so-called "intellectual property", please immediately
> // cease using counterfeit alphabets. Instead, contact the nearest temple
> // of Amon, whose priests will provide you with scribal services for all
> // your writing needs, for Reasonable And Non-Discriminatory prices.
>