Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751897AbdGGJQo (ORCPT ); Fri, 7 Jul 2017 05:16:44 -0400 Received: from goliath.siemens.de ([192.35.17.28]:37170 "EHLO goliath.siemens.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751032AbdGGJQm (ORCPT ); Fri, 7 Jul 2017 05:16:42 -0400 Subject: Re: [PATCH v2 2/2] scripts/gdb: lx-dmesg: Use explicit encoding=utf8 errors=replace To: Leonard Crestez , Kieran Bingham , Andrew Morton Cc: linux-kernel@vger.kernel.org References: From: Jan Kiszka Message-ID: Date: Fri, 7 Jul 2017 11:16:37 +0200 User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); de; rv:1.8.1.12) Gecko/20080226 SUSE/2.0.0.12-1.1 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2866 Lines: 78 On 2017-06-26 14:52, Leonard Crestez wrote: > Use errors=replace because it is never desirable for lx-dmesg to fail on > string decoding errors, not even if the log buffer is corrupt and we show > incorrect info. > > The kernel will sometimes print utf8, for example the copyright symbol from > jffs2. In order to make this work specify 'utf8' everywhere because python2 > otherwise defaults to 'ascii'. > > In theory the second errors='replace' is not be required because everything > that can be decoded as utf8 should also be encodable back to utf8. But > it's better to be extra safe here. It's worth noting that this is > definitely not true for encoding='ascii', unknown characters are > replaced with U+FFFD REPLACEMENT CHARACTER and they fail to encode back > to ascii. > > Signed-off-by: Leonard Crestez > > --- > Changes since v1: > * Add encoding='utf8' > * Only do an explicit encode for python2. On python3 this returns a > bytes object which formats to b'BLAH' instead. > * Elaborate commit message explaining what's wrong. The original patch > was hacked together while debugging something else. > > Link: https://lkml.org/lkml/2017/6/23/405 > Signed-off-by: Leonard Crestez > --- > scripts/gdb/linux/dmesg.py | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py > index f5a0303..6d2e09a 100644 > --- a/scripts/gdb/linux/dmesg.py > +++ b/scripts/gdb/linux/dmesg.py > @@ -12,6 +12,7 @@ > # > > import gdb > +import sys > > from linux import utils > > @@ -52,13 +53,19 @@ class LxDmesg(gdb.Command): > continue > > text_len = utils.read_u16(log_buf[pos + 10:pos + 12]) > - text = log_buf[pos + 16:pos + 16 + text_len].decode() > + text = log_buf[pos + 16:pos + 16 + text_len].decode( > + encoding='utf8', errors='replace') > time_stamp = utils.read_u64(log_buf[pos:pos + 8]) > > for line in text.splitlines(): > - gdb.write("[{time:12.6f}] {line}\n".format( > + msg = u"[{time:12.6f}] {line}\n".format( > time=time_stamp / 1000000000.0, > - line=line)) > + line=line) > + # With python2 gdb.write will attempt to convert unicode to > + # ascii and might fail so pass an utf8-encoded str instead. > + if sys.hexversion < 0x03000000: > + msg = msg.encode(encoding='utf8', errors='replace') > + gdb.write(msg) > > pos += length > > Acked-by: Jan Kiszka Andrew, please pick this up. Jan -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux