Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp56514pxb; Wed, 20 Oct 2021 16:25:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyh7xTJYIXBE0HDlUON5lRpdyO/WEdM/LSg1BbEfuzp+Fo+F95bPxro+7MQeCl6e4cI0ZbJ X-Received: by 2002:a17:90a:51:: with SMTP id 17mr2172839pjb.185.1634772308386; Wed, 20 Oct 2021 16:25:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634772308; cv=none; d=google.com; s=arc-20160816; b=bK074sdCd5v2t7rVKq+lDobeJkZnktd1bMsHRBb3OawIwWYjz/XyZYiB55ExZsnWGa EY68LPcgYrB+MYRP2haWUYunmnWDjauMepxjKlqTyJay7Bi2paMyMT7322chMPNg7PKM viwD3MddbyS3ulnnx76AgMPCI7fXndnxvx3IViyDM1vMRjTUHspEbwh3RzGjrQx607fx 9BBg23HOInQNrIPOJLbhI2Mb/4Y60bLgK7u3YfxdUnCskhJ8Y0+WATi81++cU2rLozNK UnG7rI+SZc8nCyKRiQj7JXd3A3huJuunKK0VXgwT8MlgLFvjNLqJ2VAIBopH835GyExt LidA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=WutBa9WFAX4tc3k0wjtFSo8EbR3mrDBc6qpprgC2Kuc=; b=FQJaShAgJGeKv/Ohc68dwQ50NQfb/VSCGiMt16WchlJQGEeGWNtqo0bh8BnObtp2me eHYbCiSWdYTi3FHbT6lcyaUl+zScKVGPOzA64+MoVtQ+ai7Vb9CNGuk/nOHQttCRDUGy IU0uWplFpg6TRVnkBPeZwvkOh3wBVU0q+5CRF3HfzKzVTvbC2wFXRfP65wCqsG0pRDFp omvhIWLi2EBpVIfW2UJc+8LAIz8vS1seEtszDWtgQKiuheP4AQ6XLw5P8w8yOSA6rTzR a70qOU6+gfd5tKBi9QK6kwkAbjdBbiJu2Jt7xXkXcQTdJBlnRU4uGDjqcGxmWsxK1LMZ lu+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AZw2K34x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z124si4602235pfb.163.2021.10.20.16.24.54; Wed, 20 Oct 2021 16:25:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AZw2K34x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbhJTXYm (ORCPT + 99 others); Wed, 20 Oct 2021 19:24:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229998AbhJTXYl (ORCPT ); Wed, 20 Oct 2021 19:24:41 -0400 Received: from mail-il1-x130.google.com (mail-il1-x130.google.com [IPv6:2607:f8b0:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55DDCC061749 for ; Wed, 20 Oct 2021 16:22:26 -0700 (PDT) Received: by mail-il1-x130.google.com with SMTP id i6so6682646ila.12 for ; Wed, 20 Oct 2021 16:22:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WutBa9WFAX4tc3k0wjtFSo8EbR3mrDBc6qpprgC2Kuc=; b=AZw2K34xtdcLXaJk8DEaQMf+B/MWbDi+mIQUKb3VFWgcx6UysmY4isNNa20tv8qsXV btg8vt2RWb5kn6wOVyn9DFxhH6bca3vsTXZHCXPKZCzKOLAGpLTWkS32KKC6LljyoV4Q LjMH2v2qsml/7Gi7wVpx8+0CF6nRKMTn19Qdj++hM2Z2QhpMibJsAz8FmZ3smG0HkGu8 HeqDr3BnWB9TTIIatgDIUzuwRGN+9exoBtrS1kvF1n8OR4pk4a3+meHOnjKAFr5snDON Ij75zrRcAsgDd0cK4grmX++Xb7TrN4iKqQxR2CD0kdrkLZIPKoMcUSjmEgRKW6ztfUQO eBqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WutBa9WFAX4tc3k0wjtFSo8EbR3mrDBc6qpprgC2Kuc=; b=0/lNbcLoGFlSHQR9hMw1Jsg720b97J+i+Wv0EA9T8YWGcvj2j5VPoENAFv+CHqevs6 d7M58x5KtQwik+UNndqLE0RJVAF6sj7MDt0dozMJG7cPd591LbyaB0nsPeuPcnjDizKl DaUUP4smk0OFTsHhQ/yxJz2+Us1hJICxywWm3KrbHxMJcvlj7A1Pp1hTWx88JbjAyFeI WonkNDz1iD3njGTz1ZLp08C33s0/RH0rcFcBRJwV0L6Q7Ga/tcbW7TYuiCR9bqlt9cIH nhAYWt7QGoc2i7Prn+bp9SigT32CVGX1uNa4nezvi/ohTuaqKDr6WQSjoTt54VFninaA NnZw== X-Gm-Message-State: AOAM532jo1RIgV5lx2gvuSGnAlo8ji5lSOhB3H7dP74FRcS+sOD4qfqn oQrJIrae/NAYPuMJ1t6TdB/S9S3n7KjYtXzGZEB1wBJn/dYZug== X-Received: by 2002:a05:6e02:164d:: with SMTP id v13mr1301693ilu.10.1634772145515; Wed, 20 Oct 2021 16:22:25 -0700 (PDT) MIME-Version: 1.0 References: <20211008210752.1109785-1-dlatypov@google.com> In-Reply-To: From: Daniel Latypov Date: Wed, 20 Oct 2021 16:22:13 -0700 Message-ID: Subject: Re: [PATCH] kunit: tool: continue past invalid utf-8 output To: brendanhiggins@google.com, davidgow@google.com Cc: linux-kernel@vger.kernel.org, kunit-dev@googlegroups.com, linux-kselftest@vger.kernel.org, skhan@linuxfoundation.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 13, 2021 at 9:51 AM Daniel Latypov wrote: > > On Fri, Oct 8, 2021 at 4:51 PM Daniel Latypov wrote: > > > > On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov wrote: > > > > > > kunit.py currently crashes and fails to parse kernel output if it's not > > > fully valid utf-8. > > > > > > This can come from memory corruption or or just inadvertently printing > > > out binary data as strings. > > > > > > E.g. adding this line into a kunit test > > > pr_info("\x80") > > > will cause this exception > > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte > > > > > > We can tell Python how to handle errors, see > > > https://docs.python.org/3/library/codecs.html#error-handlers > > > > > > Unfortunately, it doesn't seem like there's a way to specify this in > > > just one location, so we need to repeat ourselves quite a bit. > > > > > > Specify `errors='backslashreplace'` so we instead: > > > * print out the offending byte as '\x80' > > > * try and continue parsing the output. > > > * as long as the TAP lines themselves are valid, we're fine. > > > > > > Signed-off-by: Daniel Latypov > > > --- > > > tools/testing/kunit/kunit.py | 3 ++- > > > tools/testing/kunit/kunit_kernel.py | 4 ++-- > > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py > > > index 9c9ed4071e9e..28ae096d4b53 100755 > > > --- a/tools/testing/kunit/kunit.py > > > +++ b/tools/testing/kunit/kunit.py > > > @@ -457,9 +457,10 @@ def main(argv, linux=None): > > > sys.exit(1) > > > elif cli_args.subcommand == 'parse': > > > if cli_args.file == None: > > > + sys.stdin.reconfigure(errors='backslashreplace') > > > > Ugh, pytype doesn't like this even though it's valid. > > I can squash the error with > > sys.stdin.reconfigure(errors='backslashreplace') # pytype: > > disable=attribute-error > > > > I had wanted us to avoid having anything specific to pytype in the code. > > But mypy (the more common typechecker iirc) hasn't been smart enough > > to typecheck our code since the QEMU support landed. > > > > If we don't add this directive, both typecheckers will report at least > > one spurious warning. > > Should I go ahead and add it, Brendan/David? > > Friendly ping. > Should we go ahead and add "# pytype: disable=attribute-error" here? I've sent out a v2 with this: https://lore.kernel.org/linux-kselftest/20211020232121.1748376-1-dlatypov@google.com > > > > > > kunit_output = sys.stdin > > > else: > > > - with open(cli_args.file, 'r') as f: > > > + with open(cli_args.file, 'r', errors='backslashreplace') as f: > > > kunit_output = f.read().splitlines() > > > request = KunitParseRequest(cli_args.raw_output, > > > None, > > > diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py > > > index faa6320e900e..f08c6c36a947 100644 > > > --- a/tools/testing/kunit/kunit_kernel.py > > > +++ b/tools/testing/kunit/kunit_kernel.py > > > @@ -135,7 +135,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations): > > > stdin=subprocess.PIPE, > > > stdout=subprocess.PIPE, > > > stderr=subprocess.STDOUT, > > > - text=True, shell=True) > > > + text=True, shell=True, errors='backslashreplace') > > > > > > class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > > > """An abstraction over command line operations performed on a source tree.""" > > > @@ -172,7 +172,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > > > stdin=subprocess.PIPE, > > > stdout=subprocess.PIPE, > > > stderr=subprocess.STDOUT, > > > - text=True) > > > + text=True, errors='backslashreplace') > > > > > > def get_kconfig_path(build_dir) -> str: > > > return get_file_path(build_dir, KCONFIG_PATH) > > > > > > base-commit: a032094fc1ed17070df01de4a7883da7bb8d5741 > > > -- > > > 2.33.0.882.g93a45727a2-goog > > >