2017-06-26 12:53:05

by Leonard Crestez

[permalink] [raw]
Subject: [PATCH v2 1/2] scripts/gdb: lx-dmesg: Cast log_buf to void* for addr fetch

In some cases it is possible for the str() conversion here to throw
encoding errors because log_buf might not point to valid ascii. For
example:

(gdb) python print str(gdb.parse_and_eval("log_buf"))
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0303' in
position 24: ordinal not in range(128)

Avoid this by explicitly casting to (void *) inside the gdb expression.

Signed-off-by: Leonard Crestez <[email protected]>
Reviewed-by: Jan Kiszka <[email protected]>

---
Changes since v1:
* Fix title (use "scripts/gdb" header instead of "gdb/scripts")
* Use "void *" instead of "void*"

Link: https://lkml.org/lkml/2017/6/23/461
Signed-off-by: Leonard Crestez <[email protected]>
---
scripts/gdb/linux/dmesg.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index 5afd109..f5a0303 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -24,7 +24,7 @@ class LxDmesg(gdb.Command):

def invoke(self, arg, from_tty):
log_buf_addr = int(str(gdb.parse_and_eval(
- "'printk.c'::log_buf")).split()[0], 16)
+ "(void *)'printk.c'::log_buf")).split()[0], 16)
log_first_idx = int(gdb.parse_and_eval("'printk.c'::log_first_idx"))
log_next_idx = int(gdb.parse_and_eval("'printk.c'::log_next_idx"))
log_buf_len = int(gdb.parse_and_eval("'printk.c'::log_buf_len"))
--
2.7.4


2017-06-26 12:53:21

by Leonard Crestez

[permalink] [raw]
Subject: [PATCH v2 2/2] scripts/gdb: lx-dmesg: Use explicit encoding=utf8 errors=replace

Use errors=replace because it is never desirable for lx-dmesg to fail on
string decoding errors, not even if the log buffer is corrupt and we show
incorrect info.

The kernel will sometimes print utf8, for example the copyright symbol from
jffs2. In order to make this work specify 'utf8' everywhere because python2
otherwise defaults to 'ascii'.

In theory the second errors='replace' is not be required because everything
that can be decoded as utf8 should also be encodable back to utf8. But
it's better to be extra safe here. It's worth noting that this is
definitely not true for encoding='ascii', unknown characters are
replaced with U+FFFD REPLACEMENT CHARACTER and they fail to encode back
to ascii.

Signed-off-by: Leonard Crestez <[email protected]>

---
Changes since v1:
* Add encoding='utf8'
* Only do an explicit encode for python2. On python3 this returns a
bytes object which formats to b'BLAH' instead.
* Elaborate commit message explaining what's wrong. The original patch
was hacked together while debugging something else.

Link: https://lkml.org/lkml/2017/6/23/405
Signed-off-by: Leonard Crestez <[email protected]>
---
scripts/gdb/linux/dmesg.py | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index f5a0303..6d2e09a 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -12,6 +12,7 @@
#

import gdb
+import sys

from linux import utils

@@ -52,13 +53,19 @@ class LxDmesg(gdb.Command):
continue

text_len = utils.read_u16(log_buf[pos + 10:pos + 12])
- text = log_buf[pos + 16:pos + 16 + text_len].decode()
+ text = log_buf[pos + 16:pos + 16 + text_len].decode(
+ encoding='utf8', errors='replace')
time_stamp = utils.read_u64(log_buf[pos:pos + 8])

for line in text.splitlines():
- gdb.write("[{time:12.6f}] {line}\n".format(
+ msg = u"[{time:12.6f}] {line}\n".format(
time=time_stamp / 1000000000.0,
- line=line))
+ line=line)
+ # With python2 gdb.write will attempt to convert unicode to
+ # ascii and might fail so pass an utf8-encoded str instead.
+ if sys.hexversion < 0x03000000:
+ msg = msg.encode(encoding='utf8', errors='replace')
+ gdb.write(msg)

pos += length

--
2.7.4

2017-07-07 09:15:55

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] scripts/gdb: lx-dmesg: Cast log_buf to void* for addr fetch

On 2017-06-26 14:52, Leonard Crestez wrote:
> In some cases it is possible for the str() conversion here to throw
> encoding errors because log_buf might not point to valid ascii. For
> example:
>
> (gdb) python print str(gdb.parse_and_eval("log_buf"))
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u0303' in
> position 24: ordinal not in range(128)
>
> Avoid this by explicitly casting to (void *) inside the gdb expression.
>
> Signed-off-by: Leonard Crestez <[email protected]>
> Reviewed-by: Jan Kiszka <[email protected]>
>
> ---
> Changes since v1:
> * Fix title (use "scripts/gdb" header instead of "gdb/scripts")
> * Use "void *" instead of "void*"
>
> Link: https://lkml.org/lkml/2017/6/23/461
> Signed-off-by: Leonard Crestez <[email protected]>
> ---
> scripts/gdb/linux/dmesg.py | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
> index 5afd109..f5a0303 100644
> --- a/scripts/gdb/linux/dmesg.py
> +++ b/scripts/gdb/linux/dmesg.py
> @@ -24,7 +24,7 @@ class LxDmesg(gdb.Command):
>
> def invoke(self, arg, from_tty):
> log_buf_addr = int(str(gdb.parse_and_eval(
> - "'printk.c'::log_buf")).split()[0], 16)
> + "(void *)'printk.c'::log_buf")).split()[0], 16)
> log_first_idx = int(gdb.parse_and_eval("'printk.c'::log_first_idx"))
> log_next_idx = int(gdb.parse_and_eval("'printk.c'::log_next_idx"))
> log_buf_len = int(gdb.parse_and_eval("'printk.c'::log_buf_len"))
>

Acked-by: Jan Kiszka <[email protected]>

Andrew, please pick this up.

Jan

--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

2017-07-07 09:16:44

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] scripts/gdb: lx-dmesg: Use explicit encoding=utf8 errors=replace

On 2017-06-26 14:52, Leonard Crestez wrote:
> Use errors=replace because it is never desirable for lx-dmesg to fail on
> string decoding errors, not even if the log buffer is corrupt and we show
> incorrect info.
>
> The kernel will sometimes print utf8, for example the copyright symbol from
> jffs2. In order to make this work specify 'utf8' everywhere because python2
> otherwise defaults to 'ascii'.
>
> In theory the second errors='replace' is not be required because everything
> that can be decoded as utf8 should also be encodable back to utf8. But
> it's better to be extra safe here. It's worth noting that this is
> definitely not true for encoding='ascii', unknown characters are
> replaced with U+FFFD REPLACEMENT CHARACTER and they fail to encode back
> to ascii.
>
> Signed-off-by: Leonard Crestez <[email protected]>
>
> ---
> Changes since v1:
> * Add encoding='utf8'
> * Only do an explicit encode for python2. On python3 this returns a
> bytes object which formats to b'BLAH' instead.
> * Elaborate commit message explaining what's wrong. The original patch
> was hacked together while debugging something else.
>
> Link: https://lkml.org/lkml/2017/6/23/405
> Signed-off-by: Leonard Crestez <[email protected]>
> ---
> scripts/gdb/linux/dmesg.py | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
> index f5a0303..6d2e09a 100644
> --- a/scripts/gdb/linux/dmesg.py
> +++ b/scripts/gdb/linux/dmesg.py
> @@ -12,6 +12,7 @@
> #
>
> import gdb
> +import sys
>
> from linux import utils
>
> @@ -52,13 +53,19 @@ class LxDmesg(gdb.Command):
> continue
>
> text_len = utils.read_u16(log_buf[pos + 10:pos + 12])
> - text = log_buf[pos + 16:pos + 16 + text_len].decode()
> + text = log_buf[pos + 16:pos + 16 + text_len].decode(
> + encoding='utf8', errors='replace')
> time_stamp = utils.read_u64(log_buf[pos:pos + 8])
>
> for line in text.splitlines():
> - gdb.write("[{time:12.6f}] {line}\n".format(
> + msg = u"[{time:12.6f}] {line}\n".format(
> time=time_stamp / 1000000000.0,
> - line=line))
> + line=line)
> + # With python2 gdb.write will attempt to convert unicode to
> + # ascii and might fail so pass an utf8-encoded str instead.
> + if sys.hexversion < 0x03000000:
> + msg = msg.encode(encoding='utf8', errors='replace')
> + gdb.write(msg)
>
> pos += length
>
>

Acked-by: Jan Kiszka <[email protected]>

Andrew, please pick this up.

Jan

--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux