Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp2835041pxb; Fri, 8 Oct 2021 16:53:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyGRgFbGdoNFL/HjfjNncByQ5xtf1BQZwXdjTrSIdfyghizdIKSAlPU1lg/bCG9yfdj6oIJ X-Received: by 2002:a17:902:a50d:b0:13d:8d71:aa91 with SMTP id s13-20020a170902a50d00b0013d8d71aa91mr12137929plq.24.1633737229507; Fri, 08 Oct 2021 16:53:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633737229; cv=none; d=google.com; s=arc-20160816; b=En/AO9hxlWn8K7nc8Vtf54n2VY+0FmGCIhHIn20TWoMnFgCrcDlfAr4HvU3C5R9i5e 3i3idBrvnr8ihTffi4FwxXe40X8WZ/P1bI2tV6TFOYaU6q7L1tbkAVq74a1KABqAb32+ aht4hgZTW0N2JMJEvrcXmSyAWUxfA/NX4ONlVjVB4Q4DvXf2rVJmRBrLnACQcrqDPZqR f8v60rgC/OUkrziD6JbdWZdwS7j4z1TVA2ubL1+FRuZT+nJdOC8uqO/TdRTmBhw0R6pK 80u3zXbptbH459fBPyoDWkNwn6q2N4UzMKPpriFi+nkjeqSCgp1kOAJ8JpWh2ttTSpxF YihQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ezETEItaR/J7gVYokGl5RwwQegihaa7iGOTvGlA0fCo=; b=tQl8gyLQU2m7MTv02YNld3D9WkmvY8XINsDThmeG4d7yxcRbGXaVTglHDaEOp/Nuzc NlwR9TdI859kqDpV218pWY0z7rULJYtTZntsK38enzs/S4kSMoukNvDNfH/obkEqfBlE EzEbJJI3Q+B7i4AX887ucTPwpONnQr5x85aqgwJPRw/CHmRw+OA/KhO27brMDroAlMsV NNKzkXf0YJraS9RnH6z5YjYQW+2Vej/Q7eRE0Y+jCF9G0foZcDBSGH2qnsgoQZmx6LWN LwrpAG7QgbGNts0CbcKF4jmss0VYOxkpk8tmzz9iw4nqW2BpLsChHPvS1XcXQSvdZU+d zXIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Aj5BCG0O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t37si1083637pgl.46.2021.10.08.16.53.36; Fri, 08 Oct 2021 16:53:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Aj5BCG0O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243893AbhJHXxj (ORCPT + 99 others); Fri, 8 Oct 2021 19:53:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243797AbhJHXxi (ORCPT ); Fri, 8 Oct 2021 19:53:38 -0400 Received: from mail-io1-xd29.google.com (mail-io1-xd29.google.com [IPv6:2607:f8b0:4864:20::d29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A50ADC061570 for ; Fri, 8 Oct 2021 16:51:42 -0700 (PDT) Received: by mail-io1-xd29.google.com with SMTP id r134so1320390iod.11 for ; Fri, 08 Oct 2021 16:51:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ezETEItaR/J7gVYokGl5RwwQegihaa7iGOTvGlA0fCo=; b=Aj5BCG0OeDtBjcw0y+N8PBCSsKH+ppd8XKj64hJe3IOQABTMfuQNw77uJ2FiX3CcGw x3TFwr7vUhP8JM+FRTXTrtguDMEeWuBiPIuOsRJ9n8X5aPHNTX7WPWab9VNoAlWMK6kb ucVW2Yf1ZL/FcRCwT4yMEcPkYazlI6vEd5JFHXQoJJolu0c0SyO2HZqCt1W2CkmkKD2b 3dmGEwQjUIes/LMgcd29Tbt69eVDSoaTebhmfILtX2IJUmXvJyI6HLEdVFWJgWl4BcHz tft3xzxd2+mQ8CJFJYwmICjV1ljSsdZMmbcQ1M/6+Sv1i+5+GFQ+fulBMNKZEDBGkqaa 1Cag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ezETEItaR/J7gVYokGl5RwwQegihaa7iGOTvGlA0fCo=; b=lZvYwjikEQe3vAjmqQZlv3LjoeRczaRU+g8BgBBUY8BDZgbMDr1EOCWrT7IlZCj5lr PPC55XLuCuse3/F+v/F/oSL0560S7SFGjIpV+pu5hyoCP4e1OnxjJa9ccnmtm59fs5Jz /Lv0k8YT8Hx6lEJ2VuhK9zbsM1bfhrYEUkzXCyLXhyKb8bF42DiOBVnTDDqLs/2cPEKF hB09pjAz7izomMvjqc1WeFsN6IgKXq5/1e76unRQyBUBCJOGv1yP9oLHZtnEZmvzEleW kAx22Bky+8ao5tbtyi2HSw/S8zSj+lcSOPyYUaIfZRSZGk1KS236VIms8yN40717BoAf fo5g== X-Gm-Message-State: AOAM532vgB960ssFHcyUzFXtBoFgDACvhpy1M+tYyMUqWeMbd7SO9BY+ 23jOglEgdBAb84ngft7jpttpuJ8RnmKB+ICm6Z0NEA== X-Received: by 2002:a6b:b5d8:: with SMTP id e207mr9302657iof.52.1633737100442; Fri, 08 Oct 2021 16:51:40 -0700 (PDT) MIME-Version: 1.0 References: <20211008210752.1109785-1-dlatypov@google.com> In-Reply-To: <20211008210752.1109785-1-dlatypov@google.com> From: Daniel Latypov Date: Fri, 8 Oct 2021 16:51:29 -0700 Message-ID: Subject: Re: [PATCH] kunit: tool: continue past invalid utf-8 output To: brendanhiggins@google.com, davidgow@google.com Cc: linux-kernel@vger.kernel.org, kunit-dev@googlegroups.com, linux-kselftest@vger.kernel.org, skhan@linuxfoundation.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov wrote: > > kunit.py currently crashes and fails to parse kernel output if it's not > fully valid utf-8. > > This can come from memory corruption or or just inadvertently printing > out binary data as strings. > > E.g. adding this line into a kunit test > pr_info("\x80") > will cause this exception > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte > > We can tell Python how to handle errors, see > https://docs.python.org/3/library/codecs.html#error-handlers > > Unfortunately, it doesn't seem like there's a way to specify this in > just one location, so we need to repeat ourselves quite a bit. > > Specify `errors='backslashreplace'` so we instead: > * print out the offending byte as '\x80' > * try and continue parsing the output. > * as long as the TAP lines themselves are valid, we're fine. > > Signed-off-by: Daniel Latypov > --- > tools/testing/kunit/kunit.py | 3 ++- > tools/testing/kunit/kunit_kernel.py | 4 ++-- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py > index 9c9ed4071e9e..28ae096d4b53 100755 > --- a/tools/testing/kunit/kunit.py > +++ b/tools/testing/kunit/kunit.py > @@ -457,9 +457,10 @@ def main(argv, linux=None): > sys.exit(1) > elif cli_args.subcommand == 'parse': > if cli_args.file == None: > + sys.stdin.reconfigure(errors='backslashreplace') Ugh, pytype doesn't like this even though it's valid. I can squash the error with sys.stdin.reconfigure(errors='backslashreplace') # pytype: disable=attribute-error I had wanted us to avoid having anything specific to pytype in the code. But mypy (the more common typechecker iirc) hasn't been smart enough to typecheck our code since the QEMU support landed. If we don't add this directive, both typecheckers will report at least one spurious warning. Should I go ahead and add it, Brendan/David? > kunit_output = sys.stdin > else: > - with open(cli_args.file, 'r') as f: > + with open(cli_args.file, 'r', errors='backslashreplace') as f: > kunit_output = f.read().splitlines() > request = KunitParseRequest(cli_args.raw_output, > None, > diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py > index faa6320e900e..f08c6c36a947 100644 > --- a/tools/testing/kunit/kunit_kernel.py > +++ b/tools/testing/kunit/kunit_kernel.py > @@ -135,7 +135,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations): > stdin=subprocess.PIPE, > stdout=subprocess.PIPE, > stderr=subprocess.STDOUT, > - text=True, shell=True) > + text=True, shell=True, errors='backslashreplace') > > class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > """An abstraction over command line operations performed on a source tree.""" > @@ -172,7 +172,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > stdin=subprocess.PIPE, > stdout=subprocess.PIPE, > stderr=subprocess.STDOUT, > - text=True) > + text=True, errors='backslashreplace') > > def get_kconfig_path(build_dir) -> str: > return get_file_path(build_dir, KCONFIG_PATH) > > base-commit: a032094fc1ed17070df01de4a7883da7bb8d5741 > -- > 2.33.0.882.g93a45727a2-goog >