Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp778371pxb; Mon, 25 Oct 2021 18:47:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwoMq5brmMKdk2uVJOcQDsy2JABAseAl7J8skJtVRWUUFuoywtOUuUuJ4ow2CBEQh3ND/00 X-Received: by 2002:a17:906:b06:: with SMTP id u6mr26923380ejg.330.1635212836173; Mon, 25 Oct 2021 18:47:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635212836; cv=none; d=google.com; s=arc-20160816; b=u7HqRfVb3vsntNXplHpdxtipczrTCM4Xw5hc/CeQKBuKRVOViCOQJUzYBS1FQh0sB5 ppbl4SIA8cQOdGkX+4mSbHDFs1MBiLnWByWzvCnoslJJXMS+KXe/pz4pcF09o5hVMQeK ZGPXop0Ur/kq+cgb8pycxR4hT8wmWmNwjs6rERkkd1edw5xaenY/sqB1k8EIMEbhTx78 udxRq1lZgerBolL5mNHNhZ2RqGNZCwzJAyWLvmvjvjItIaEaRz3rIT0MJq7yY9ZReEaG 1ZMblNy5T8xwSu9q53EMAK4aZgBtJQq6jEXab4pwrS6LKQZ+ntvX8JQRS1Er+8duDR8k x/VA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=c2CDSyP0BPE42EK8zWKX+zWHNEpFqbF5USxlRD9PCWs=; b=mpo0F9QtALZNZaehwmINH11bCuyjI0BMiSblQKjpr3LH801jWcg69ILquqDv8FEY3v /6G+8VJLl5f4DBXq0Sob8xocUD/luqSa6qMn4zCyhSjwdQl6ZBliL4DItWmIV7Ri3YB/ qo8Lg2Kq+Jxa/GTtnCMWzZAhynH4IE0dsXHNamXQsnOlZefQFgfHUnCXb7GLp1CoHPxp QPR2p70zbow8IoXcQj3ctZDF2N00aHyzpabKfJDZVIyaaGvQB8wIBKjgfy2ieYpuIQO/ kRH906Gi3GDB+/AVujadDRv0KblvAQiVoA34e2hzV6du1NRId/2taOlgtXc+8BfjelPH MeZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=aE8sHvgA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d5si22695606eje.143.2021.10.25.18.46.51; Mon, 25 Oct 2021 18:47:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=aE8sHvgA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232971AbhJYVbo (ORCPT + 99 others); Mon, 25 Oct 2021 17:31:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232689AbhJYVbn (ORCPT ); Mon, 25 Oct 2021 17:31:43 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B7AAC061767 for ; Mon, 25 Oct 2021 14:29:20 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id t5-20020a17090a4e4500b001a0a284fcc2so465871pjl.2 for ; Mon, 25 Oct 2021 14:29:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c2CDSyP0BPE42EK8zWKX+zWHNEpFqbF5USxlRD9PCWs=; b=aE8sHvgAznCPxZiRrxaiqLdB4a8mvafjpp55X8LuRzIeS89UeiMfZIVT9vuStJEfza aWqg1qG0hpFTbkQW0GPVbcjftvGzc5Olc/kH43YzK64y2+WHLo6KRtneJ5PjlwRzDwCG aSzm1HRNLjVA6FcODu5GlEStHGjw9xFzJOJnGqv062z2K/zrSaymrbtx0kCRbPsQzcqy Y1+m2yljBCtnkfUYgiK6octTr4hZf2kb92X1DBjBPkeab0HqrQwfgEL0Y5AXi5jM1lMM TReApvgwMomYHiggIkctX9idMMuSMy8YZyc/aW71hphRfPqid21d0eJTqGnYbrAzTpcj XlTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c2CDSyP0BPE42EK8zWKX+zWHNEpFqbF5USxlRD9PCWs=; b=O1StW5kZPGOaJYTaUHXhtus94LUDRksTAXg8yAOG2irsaQ3YT8kFvBWNWO2S3fWI11 ipTH2awFthAq6SS0sDFrETQmhlnr+HdgaaEfwlLDHWqc5C9Iuiep9dPD4gquEpSmdy1X 5hawYn/jHLRVperHkuClF+aOKMz72/wkt279TgFcV9WYWRRSdQVYMVciX6kWhF2g3KZZ K/DBiYSlrHoDlbEM6rFHIZN+pO6hGrqgFozECM1jf55fa3drJEFEefyDEiyglPpyVNU4 YvC1bYuOonSHHLx8DCSyJznLI1qcudOyZDbEtzOV5K1FF74RgMyJUtmd2TQd2r880eBZ 968Q== X-Gm-Message-State: AOAM530Ckxnx/WkKLcxxT5+5T4uCFaO5gHAWlnEYaA26quGm2rfBss1s XXQr7ocmjekJK9P4AVxGechK2T71TyBMc0UWrCsmbA== X-Received: by 2002:a17:90a:d311:: with SMTP id p17mr9053244pju.95.1635197359711; Mon, 25 Oct 2021 14:29:19 -0700 (PDT) MIME-Version: 1.0 References: <20211008210752.1109785-1-dlatypov@google.com> In-Reply-To: From: Brendan Higgins Date: Mon, 25 Oct 2021 14:29:08 -0700 Message-ID: Subject: Re: [PATCH] kunit: tool: continue past invalid utf-8 output To: Daniel Latypov Cc: davidgow@google.com, linux-kernel@vger.kernel.org, kunit-dev@googlegroups.com, linux-kselftest@vger.kernel.org, skhan@linuxfoundation.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 13, 2021 at 9:52 AM Daniel Latypov wrote: > > On Fri, Oct 8, 2021 at 4:51 PM Daniel Latypov wrote: > > > > On Fri, Oct 8, 2021 at 2:08 PM Daniel Latypov wrote: > > > > > > kunit.py currently crashes and fails to parse kernel output if it's not > > > fully valid utf-8. > > > > > > This can come from memory corruption or or just inadvertently printing > > > out binary data as strings. > > > > > > E.g. adding this line into a kunit test > > > pr_info("\x80") > > > will cause this exception > > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1961: invalid start byte > > > > > > We can tell Python how to handle errors, see > > > https://docs.python.org/3/library/codecs.html#error-handlers > > > > > > Unfortunately, it doesn't seem like there's a way to specify this in > > > just one location, so we need to repeat ourselves quite a bit. > > > > > > Specify `errors='backslashreplace'` so we instead: > > > * print out the offending byte as '\x80' > > > * try and continue parsing the output. > > > * as long as the TAP lines themselves are valid, we're fine. > > > > > > Signed-off-by: Daniel Latypov > > > --- > > > tools/testing/kunit/kunit.py | 3 ++- > > > tools/testing/kunit/kunit_kernel.py | 4 ++-- > > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py > > > index 9c9ed4071e9e..28ae096d4b53 100755 > > > --- a/tools/testing/kunit/kunit.py > > > +++ b/tools/testing/kunit/kunit.py > > > @@ -457,9 +457,10 @@ def main(argv, linux=None): > > > sys.exit(1) > > > elif cli_args.subcommand == 'parse': > > > if cli_args.file == None: > > > + sys.stdin.reconfigure(errors='backslashreplace') > > > > Ugh, pytype doesn't like this even though it's valid. > > I can squash the error with > > sys.stdin.reconfigure(errors='backslashreplace') # pytype: > > disable=attribute-error > > > > I had wanted us to avoid having anything specific to pytype in the code. > > But mypy (the more common typechecker iirc) hasn't been smart enough > > to typecheck our code since the QEMU support landed. > > > > If we don't add this directive, both typecheckers will report at least > > one spurious warning. > > Should I go ahead and add it, Brendan/David? > > Friendly ping. > Should we go ahead and add "# pytype: disable=attribute-error" here? Sorry, missed this. Yeah, I am fine with disabling the type checkers if they fail to understand valid code. > > > kunit_output = sys.stdin > > > else: > > > - with open(cli_args.file, 'r') as f: > > > + with open(cli_args.file, 'r', errors='backslashreplace') as f: > > > kunit_output = f.read().splitlines() > > > request = KunitParseRequest(cli_args.raw_output, > > > None, > > > diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py > > > index faa6320e900e..f08c6c36a947 100644 > > > --- a/tools/testing/kunit/kunit_kernel.py > > > +++ b/tools/testing/kunit/kunit_kernel.py > > > @@ -135,7 +135,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations): > > > stdin=subprocess.PIPE, > > > stdout=subprocess.PIPE, > > > stderr=subprocess.STDOUT, > > > - text=True, shell=True) > > > + text=True, shell=True, errors='backslashreplace') > > > > > > class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > > > """An abstraction over command line operations performed on a source tree.""" > > > @@ -172,7 +172,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): > > > stdin=subprocess.PIPE, > > > stdout=subprocess.PIPE, > > > stderr=subprocess.STDOUT, > > > - text=True) > > > + text=True, errors='backslashreplace') > > > > > > def get_kconfig_path(build_dir) -> str: > > > return get_file_path(build_dir, KCONFIG_PATH) > > > > > > base-commit: a032094fc1ed17070df01de4a7883da7bb8d5741 > > > -- > > > 2.33.0.882.g93a45727a2-goog > > >