Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp3640190pxb; Wed, 13 Oct 2021 09:55:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzONlq5oOQfxny4pEa4C+MIFpMhcPidMBrZqbGYayJR9ZkZeseKwL/pU9pfRfGJCx/IcWmN X-Received: by 2002:a17:906:9554:: with SMTP id g20mr387411ejy.173.1634144103427; Wed, 13 Oct 2021 09:55:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634144103; cv=none; d=google.com; s=arc-20160816; b=ifOc7psf4cFbpdpvuuR6U/2N9r0MdinFZDP3zjKilUgUWqZs6n0oKd4MhQH8VneCFp wfqaMeSnpeyWbNy7hU32ld3u+iJGZeETTrLiPudScF70+JLnXRh5sLCFECFSzqGVLU0R riuwmYk8CI/c2hQuEe4DLBM6bRbmbvAVJWu3Q4FPGIyr6uunL5J6j1RixxvE6BGgBx+2 dQ4ADjYH5J3clzGzn6UA73AwWUL+nmLMWVV5YFHR2gpPwkV9gkZe227tfIEfNBWFzXmG TtXd4KOe0dwKt2/EZ1BelCtYCZ3LnVmkel5Mwo6i/Dz+EALcCV0EI/XrhBMFShf/tBE2 Jy1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=jOJ5c26GcDkFy/5jnHqq9oUC7WBrX7rXHxf39J9YV6g=; b=D5X2b2sVoGjEYyMAD0d3z9wwFY0hpnkdTvBP/FiQYVt8L9HgAw4QFL0Od9xCdCw+oI Qkgd7nvGXrLx/XIfqnD2CIOwO0wRs+oxSn9tATLvDt/Zt4qfZ1lmODVEDfZTE5fNUNnW kD2+I5nwDfTCFUs7gzR/akOv6SL7eUri0O30l54aGBAv18472bnZTbkADO4jNeGd/AZO lqO6zjLrm85lcremyPkCEqoQSKMQdbL16somUMYl3otLTrd899s1uaRbe5ZesfrnxjYG DTnuV1RZQqzpjrtJoNLM51wxPNXh6acd/omVTN2uYAUUEuFSqCwi5J2EImRSg7pB1avW 3cjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=TeYhoeER; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z23si11806edm.184.2021.10.13.09.54.39; Wed, 13 Oct 2021 09:55:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=TeYhoeER; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230228AbhJMQyP (ORCPT + 99 others); Wed, 13 Oct 2021 12:54:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235884AbhJMQyL (ORCPT ); Wed, 13 Oct 2021 12:54:11 -0400 Received: from mail-il1-x12f.google.com (mail-il1-x12f.google.com [IPv6:2607:f8b0:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47CE1C061570 for ; Wed, 13 Oct 2021 09:52:08 -0700 (PDT) Received: by mail-il1-x12f.google.com with SMTP id i11so375604ila.12 for ; Wed, 13 Oct 2021 09:52:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jOJ5c26GcDkFy/5jnHqq9oUC7WBrX7rXHxf39J9YV6g=; b=TeYhoeERypD/RHvk+nJDlPeSqLwCWWcKYh1HJWg5YxueD+6xegPODVw6KGvYWcPLKi ANCuWFQi9mSUKkFbdNgp1OQYXnwALsjnH94TaKDBG/ctbWkr4Lu21U27zQTwbzrVbXJm REyU4ovmRfJGdLkeRVyCaLw3elowmYBrfi3Ith1dAPYvH46jx+1nS4wHb/3nNURu1EOp ZAhsxUlYk7qyDAMEQbUfZ2xUBcHi+vn55O8om+40cubhUQ3KuNlUYyhtrDeChPhrOLBy XMt9Xo5Em8ATNNlfSj1pmTKkZ0GcOM/Nm0Zbr9bS1ubkFIDaSjkMA2DG+3j/ICoGfyYp 8CbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jOJ5c26GcDkFy/5jnHqq9oUC7WBrX7rXHxf39J9YV6g=; b=VKtzXMeg8TR/ilbixl/KG1GvJSoklLxYpiJfs1qwQxsvcQ0cMjYs+rsMfoACso8AOe 9kAm/qdyg/pIQUyueIqATu9bGa+EpHXkESZRJ+S1F1Eq/WpFiJAHhd7tqbAxWVnP9I7S 4KQwkIp8bS/ajtvkqyEBSb5xFTd8MMfihW7hsDuXvEvVTt1bPW8vIe+2eGMxmTEzJxFn PvhiFDr7VtmwJO68rmapODgCN2sS72nDudoV6WTeq6xfgO9v+iIFy2tpDcaqPqZygwiZ MYkFerV745nY4qDKiKRhx+8a9X9zcGlTfC9N0iEdVSICiM2VU+xhjUhQKX6Fx1tQIhLo ANgg== X-Gm-Message-State: AOAM530eZRuJ2NvTN1DbrsC8TdYDaEB1wuAGdYIgYQjMmKVOfcMt2uIg VSm9POjcemvgaVVMC4ToV4YQgu3ryePsP2Ck13ksHg== X-Received: by 2002:a05:6e02:1846:: with SMTP id b6mr111937ilv.63.1634143927586; Wed, 13 Oct 2021 09:52:07 -0700 (PDT) MIME-Version: 1.0 References: <20211008210752.1109785-1-dlatypov@google.com> In-Reply-To: From: Daniel Latypov Date: Wed, 13 Oct 2021 09:51:56 -0700 Message-ID: Subject: Re: [PATCH] kunit: tool: continue past invalid utf-8 output To: brendanhiggins@google.com, davidgow@google.com Cc: linux-kernel@vger.kernel.org, kunit-dev@googlegroups.com, linux-kselftest@vger.kernel.org, skhan@linuxfoundation.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 8, 2021 at 4:51 PM Daniel Latypov wrote: > > On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov wrote: > > > > kunit.py currently crashes and fails to parse kernel output if it's not > > fully valid utf-8. > > > > This can come from memory corruption or or just inadvertently printing > > out binary data as strings. > > > > E.g. adding this line into a kunit test > > pr_info("\x80") > > will cause this exception > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte > > > > We can tell Python how to handle errors, see > > https://docs.python.org/3/library/codecs.html#error-handlers > > > > Unfortunately, it doesn't seem like there's a way to specify this in > > just one location, so we need to repeat ourselves quite a bit. > > > > Specify `errors='backslashreplace'` so we instead: > > * print out the offending byte as '\x80' > > * try and continue parsing the output. > > * as long as the TAP lines themselves are valid, we're fine. > > > > Signed-off-by: Daniel Latypov > > --- > > tools/testing/kunit/kunit.py | 3 ++- > > tools/testing/kunit/kunit_kernel.py | 4 ++-- > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py > > index 9c9ed4071e9e..28ae096d4b53 100755 > > --- a/tools/testing/kunit/kunit.py > > +++ b/tools/testing/kunit/kunit.py > > @@ -457,9 +457,10 @@ def main(argv, linux=None): > > sys.exit(1) > > elif cli_args.subcommand == 'parse': > > if cli_args.file == None: > > + sys.stdin.reconfigure(errors='backslashreplace') > > Ugh, pytype doesn't like this even though it's valid. > I can squash the error with > sys.stdin.reconfigure(errors='backslashreplace') # pytype: > disable=attribute-error > > I had wanted us to avoid having anything specific to pytype in the code. > But mypy (the more common typechecker iirc) hasn't been smart enough > to typecheck our code since the QEMU support landed. > > If we don't add this directive, both typecheckers will report at least > one spurious warning. > Should I go ahead and add it, Brendan/David? Friendly ping. Should we go ahead and add "# pytype: disable=attribute-error" here? > > > kunit_output = sys.stdin > > else: > > - with open(cli_args.file, 'r') as f: > > + with open(cli_args.file, 'r', errors='backslashreplace') as f: > > kunit_output = f.read().splitlines() > > request = KunitParseRequest(cli_args.raw_output, > > None, > > diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py > > index faa6320e900e..f08c6c36a947 100644 > > --- a/tools/testing/kunit/kunit_kernel.py > > +++ b/tools/testing/kunit/kunit_kernel.py > > @@ -135,7 +135,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations): > > stdin=subprocess.PIPE, > > stdout=subprocess.PIPE, > > stderr=subprocess.STDOUT, > > - text=True, shell=True) > > + text=True, shell=True, errors='backslashreplace') > > > > class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > > """An abstraction over command line operations performed on a source tree.""" > > @@ -172,7 +172,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > > stdin=subprocess.PIPE, > > stdout=subprocess.PIPE, > > stderr=subprocess.STDOUT, > > - text=True) > > + text=True, errors='backslashreplace') > > > > def get_kconfig_path(build_dir) -> str: > > return get_file_path(build_dir, KCONFIG_PATH) > > > > base-commit: a032094fc1ed17070df01de4a7883da7bb8d5741 > > -- > > 2.33.0.882.g93a45727a2-goog > >