Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp4932293rwb; Sat, 10 Dec 2022 19:01:25 -0800 (PST) X-Google-Smtp-Source: AA0mqf5NIrQyqNXsKqG9Ehj4Pa7PvP8R7jRiqk6BAIhf4gqpNr8aPJMxMOuhXFr9Y2beCrJa1CSJ X-Received: by 2002:a17:902:bb8b:b0:185:4421:72cf with SMTP id m11-20020a170902bb8b00b00185442172cfmr10981722pls.20.1670727685436; Sat, 10 Dec 2022 19:01:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670727685; cv=none; d=google.com; s=arc-20160816; b=YGxf4EsRS4YgrQk1CFEGgygoVT3VwlweVeZQBXS+lybBV5rliAqbCvNEQFH0SwpR4H 1iU7B8+Gor9CDT7vqiQjnavIroGc9ZKY0YNtZkYFCbZ1Xp1j0t1DBzyagJ/U2+a5ndQo YktG+0W+tfOVzXFHFGh7wAXAC0Hg8GgGOo//o5N22afn7sSB1pw/QU97ZOlUT742VSvB uWC3P905jU8BRv3/ukZFlk4GQ/V5YCJIDKs99zzhye19BPs/eg9etpgytXrw0g4Oy3Z8 zhFUjg/OBuGYQaMZcsvhVjmQGSZt4SIqrx1I0BfuKCPU8ta8y4uLXiqjVflXY0RbLB3p ZoJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=UZ95qn5QqusYCYJAiv66YC8tzXNQS6Xv5hZBfIYTeRM=; b=jviMbACa/j5lrCTfZzzAlODlws3ZKFkSU3W6pCpWNq3mfxMXGVpeAWOCq0zV1ig7KA Il6LUZCD3QA7nHTB7MUuG/VMo/4NXJJ/fC7IqBFcE2j08CtQ1M0r4p2a/cEL8vjRdl3N RCIrOwho3bJ5eP4/0H7CnBL7E7rWqYCfGI03SEGg26/N1cqJMqzrsTxMdTGAG0bWzsne MuQZZ4HBlr04VWOO2r8Wnlb3EqLEP3upz3flS9Y7JW748ItXAzzi3YG8XyFD2erI6BE4 3in90QTmYPbtdZZekOevAIC8hT4D3cVtrZ/twJQ1iCsIPxZXRid5HQ/JT9VK7iKOwqC7 QrAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gWPF8u48; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q5-20020a170902bd8500b001895a1d382dsi5307175pls.451.2022.12.10.19.01.14; Sat, 10 Dec 2022 19:01:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gWPF8u48; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229875AbiLKCw0 (ORCPT + 77 others); Sat, 10 Dec 2022 21:52:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229830AbiLKCwZ (ORCPT ); Sat, 10 Dec 2022 21:52:25 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C06F913F27; Sat, 10 Dec 2022 18:52:23 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4F72360D39; Sun, 11 Dec 2022 02:52:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E86FC433EF; Sun, 11 Dec 2022 02:52:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670727142; bh=rlKZwK4Kj1fXsPuO+MKRZgsrKdoPPkaovawTnxF2v9A=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=gWPF8u48O67+JbK7s4pT1emnY6txtEG37xTU/4QNsWfmg9cg4oBF7HAtdSIdgzB/z OLTF5u8hy2dNJw9Liopf/GjC2QXwOcqU8CE1Wl7RkzPVkTBe2p5f3LeZ+1Mdzdho4V UVd7ZdWeJjtfBtZ5M2VgpbYdkjlSXy76u+IgV7d49H6H4cZVIPSrlE1qxvXgU8J/tX yqIjREuC08ueh+Qt6RXe/YrotxMSg3oXR1X7ca4TBNolskrXluYcanrauT52Go/ttW NTeQpVSJuWylX9d78paDwog5YF2nD7k3eQnsbcT6UsWmFcm54icPXWMfcRqV8zgvQJ 8y23ArxGsuptg== Date: Sun, 11 Dec 2022 11:52:18 +0900 From: Masami Hiramatsu (Google) To: Alexei Starovoitov Cc: Steven Rostedt , Masami Hiramatsu , LKML , bpf , Borislav Petkov , Linus Torvalds , Andrew Morton , Peter Zijlstra , Kees Cook , Josh Poimboeuf , KP Singh , Mark Rutland , Florent Revest , Greg Kroah-Hartman , Christoph Hellwig , Chris Mason Subject: Re: [PATCH v2] panic: Taint kernel if fault injection has been used Message-Id: <20221211115218.2e6e289bb85f8cf53c11aa97@kernel.org> In-Reply-To: <20221208043628.el5yykpjr4j45zqx@macbook-pro-6.dhcp.thefacebook.com> References: <167019256481.3792653.4369637751468386073.stgit@devnote3> <20221204223001.6wea7cgkofjsiy2z@macbook-pro-6.dhcp.thefacebook.com> <20221205075921.02edfe6b54abc5c2f9831875@kernel.org> <20221206021700.oryt26otos7vpxjh@macbook-pro-6.dhcp.thefacebook.com> <20221206162035.97ae19674d6d17108bed1910@kernel.org> <20221207040146.zhm3kyduqp7kosqa@macbook-pro-6.dhcp.thefacebook.com> <20221206233947.4c27cc9d@gandalf.local.home> <20221207074806.6f869be2@gandalf.local.home> <20221208043628.el5yykpjr4j45zqx@macbook-pro-6.dhcp.thefacebook.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alexei, On Wed, 7 Dec 2022 20:36:28 -0800 Alexei Starovoitov wrote: > Yet for 2 days this 'taint' arguing is preventing people from looking at the bug. > And that happens all the time on lkml. Somebody reports a bug and kernel devs > jump on the poor person: > "Can you repro without taint?", > "Can you repro with upstream kernel?" > This is discouraging. > The 'taint' concept makes it easier for kernel devs to ignore bug reports > and push back on the reporter. > Do it few times and people stop reporting bugs. That seems off topic for me. You seems complained against the taint flag itself. > Say, this particular bug in rethook was found by one of our BPF CI developers. > They're not very familiar with the kernel, but they can see plenty of 'rethook' > references in the stack trace, lookup MAINTAINER file and ping Massami, > but to the question "can you repro without taint?" they can only say NO, > because this is how our CI works. So they will keep silence and the bug will be lost. BTW, this sounds like the BPF CI system design issue. If user is NOT easily identifying what test caused the issue (e.g. what tests ran on the system until the bug was found), the CI system is totally useless, because after finding a problem, it must be investigated to solve the problem. Without investigation, how would you usually fix the bug?? > That's not the only reason why I'm against generalizing 'taint'. > Tainting because HW is misbehaving makes sense, but tainting because > of OoO module or because of live-patching does not. > It becomes an excuse that people abuse. yeah, it is possible to be abused. but that is the problem who abuse it. > Right now syzbot is finding all sorts of bugs. Most of the time syzbot > turns error injection on to find those allocation issues. > If syzbot reports will start coming as tainted there will be even less > attention to them. That will not be good. Hmm, what kind of error injection does syzbot do? I would like to know how it is used. For example, does that use only a specify set of injection points, or use all existing points? If the latter, I feel safer because syzbot ensures the current all ALLOW_ERROR_INJECTION() functions will work with error injection. If not, we need to consider removing the ALLOW_ERROR_INJECTION() from the function which is not tested well (or add this taint flag.) Documentation/fault-injection/fault-injection.rst has no explanation about ALLOW_ERROR_INJECTION(), but obviously the ALLOW_ERROR_INJECTION() marked functions and its caller MUST be designed safely against the error injection. e.g. - It must return an error code. (so EI_ETYPE_NONE must be removed) - Caller must check the return value always. (but I thought this was the reason why we need this test framework...) - It should not run any 'effective' code before checking an error. For example, increment counter, call other functions etc. (this means it can return without any side-effect) Anything else? [...] > All these years we've been working on improving bpf introspection and > debuggability. Today crash dumps look like this: > bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377 > bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37 > bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline] > __bpf_prog_run include/linux/filter.h:600 [inline] > bpf_prog_run include/linux/filter.h:607 [inline] > > The 2nd from the top is a bpf prog. The rest are kernel functions. > bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk > ^^ is a prog tag ^^ name of bpf prog > > If you do 'bpftool prog show' you can see both tag and name. > 'bpftool prog dump jited' > dumps x86 code mixed with source line text. > Often enough +0x2b offset will have some C code right next to it. This is good, but this only works when the vmcore is dumped and on the stack. My concern about the function error injection is that makes some side effects, which can cause a problem afterwards (this means after unloading the bpf prog) > > One can monitor all prog load/unload via perf or via audit. Ah, audit is helpful :), because we can dig the log what was loaded before crash. Thank you, -- Masami Hiramatsu (Google)