Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp282203ybh; Wed, 11 Mar 2020 00:49:48 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuAoK4Efvt0XNp7ahr6naKX6b3xY/N9uzhomSfxnbS4SyHQJVeS5NtJite1DonnBiQLCKF5 X-Received: by 2002:a05:6808:56:: with SMTP id v22mr1020044oic.116.1583912987993; Wed, 11 Mar 2020 00:49:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583912987; cv=none; d=google.com; s=arc-20160816; b=aqNilCecrs5dbgUGMEELjTx7wiZTjOe3mMIBnoAhKN09W0Ois90qYhlyiEGvhhr1tj y21T8d8ZRk+lmii0ISG8hXwfeqjJHnNcLyM1SNBtTCakO2X1i8Ur9AdEZ3wRHg0zqKPK W4aycjJ2JdIdu2xAXduLTYsDg/G31EQ0TZPnKI4ChN7PeispkanD6Ymna4dbQNHhvE08 VaL2mkIkUOCS6KUV6rQSBhLK2JpedyqzgBIfkSLKJakWP4PCBSNIGUXrTF59hiKlZ3K1 wX2IEYkhEfbAqtYCBsSlN8sCifLvXFEGtuWmVI/meAWGgbo3u6DO87b7Ew2gU2sjm6V6 +xRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=/eqecdMRJZLVMRGCtqKWrVrrguycwREkZ/S7PAnza7o=; b=nXiVaBcmFVH6d1r/Eq1423PV0ou4+u6ZZoBjuhX8WwwxuMlhnFOCZpXQnx7zd03QOV o71SSIav5mPg1xGjqc5XhiMfsIfyNkjq62CeZNyc/WMCFyqgvkV0jnT3w0ARJ7zI7tDu Dts5gmMCn1w5LJO9ulk8mCddgKdoPao3sp+ZmaKIdLGM111YAbQINb3v+zuZ46dlCm7Z 7kVOyrOKpH4ClbyCRZ98pIYyVDFCTulHk9NEvpfuclUP5dNPX7AmdHNnyBXF25Ujyf2+ 7asb50BEeXuBPgFDooDxvo1EAOR0yoXolUhKApg0HoVec8lPCdbmLDWClvwBPTUh+oxc sy/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=MPM3l4NW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e8si769193otj.25.2020.03.11.00.49.35; Wed, 11 Mar 2020 00:49:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=MPM3l4NW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728377AbgCKHsc (ORCPT + 99 others); Wed, 11 Mar 2020 03:48:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:48086 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726160AbgCKHsc (ORCPT ); Wed, 11 Mar 2020 03:48:32 -0400 Received: from devnote2 (NE2965lan1.rev.em-net.ne.jp [210.141.244.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 11DAC206B7; Wed, 11 Mar 2020 07:48:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583912911; bh=h63yO/1roKktV4/aGFgyvBaApa9Kyssf1SJdq2X5Omc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=MPM3l4NWa55jBhrfZOgMq+GGJK3rPcGnuS9FxXMhgvZBXnsJA1EFTx2dTVwovOb5M poXUheXyzwgGWPTUxdLBb3P24gEDdrI/dU+3qtF93S+TELZ4DjM6Wf8Fe63JzFpCzW 5Ffy+wqm1PyyDf+maUwfPDkZchIZAPYp4T/z4Eqg= Date: Wed, 11 Mar 2020 16:48:26 +0900 From: Masami Hiramatsu To: Mathieu Desnoyers Cc: rostedt , Thomas Gleixner , linux-kernel , Peter Zijlstra , Alexei Starovoitov , paulmck , "Joel Fernandes, Google" , Frederic Weisbecker , Jason Wessel Subject: Re: Instrumentation and RCU Message-Id: <20200311164826.0e7eab4bcf7d58b922d59ae9@kernel.org> In-Reply-To: <831351096.24668.1583887061530.JavaMail.zimbra@efficios.com> References: <87mu8p797b.fsf@nanos.tec.linutronix.de> <87fteh73sp.fsf@nanos.tec.linutronix.de> <20200310170951.87c29e9c1cfbddd93ccd92b3@kernel.org> <87pndk5tb4.fsf@nanos.tec.linutronix.de> <450878559.23455.1583854311078.JavaMail.zimbra@efficios.com> <20200310114657.099122fd@gandalf.local.home> <1760242532.23694.1583857291763.JavaMail.zimbra@efficios.com> <20200311091815.fce458348bb7641b60f600d9@kernel.org> <831351096.24668.1583887061530.JavaMail.zimbra@efficios.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.32; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 10 Mar 2020 20:37:41 -0400 (EDT) Mathieu Desnoyers wrote: > ----- On Mar 10, 2020, at 8:18 PM, Masami Hiramatsu mhiramat@kernel.org wrote: > [...] > > >> An approach where the "in_tracer" flag is tested and set by the instrumentation > >> (function tracer, kprobes, tracepoints) would work here. Let's say the beginning > >> of the int3 ISR is part of the code which is invisible to instrumentation, and > >> before we issue rcu_nmi_enter(), we handle the in_tracer flag: > >> > >> rcu_nmi_enter(); > >> > >> (recursion_ctx->in_tracer == false) > >> set recursion_ctx->in_tracer = true > >> do_int3() { > >> rcu_nmi_enter(); > >> > >> if (recursion_ctx->in_tracer == true) > >> iret > >> > >> We can change "in_tracer" for "in_breakpoint", "in_tracepoint" and > >> "in_function_trace" if we ever want to allow different types of instrumentation > >> to nest. I'm not sure whether this is useful or not through. > > > > Kprobes already has its own "in_kprobe" flag, and the recursion path is > > not so simple. Since the int3 replaces the original instruction, we have to > > execute the original instruction with single-step and fixup. > > > > This means it involves do_debug() too. Thus, we can not do iret directly > > from do_int3 like above, but if recursion happens, we have no way to > > recover to origonal execution path (and call BUG()). > > I think that all the code involved when hitting a breakpoint which would > be the minimal subset required to act as if the kprobe was not there in the > first place (single-step, fixup) should be hidden from kprobes > instrumentation. I suspect this is the current intent today with noprobe > annotations, but Thomas' proposal brings this a step further. > > However, any other kprobe code (and tracer callbacks) beyond that > minimalistic "effect-less" kprobe could be protected by a > per-recursion-context in_kprobe flag. Would you mean "in_kprobe" flag will prevent recursive execution of kprobes but not prevent other tracer like tracepoint? If so, it is already done I think. As I pointed, kprobe itself has in_kprobe like flag for checking re-entrance. Thus the kprobe handler can call the function which has a tracepoint safely. Anyway, I agree with you to port all kprobe int3/debug handling parts to the effect-less (offlimit) area, except for its pre/post handlers. > > As my previous email, I showed a patch which is something like > > "bust_kprobes()" for oops path. That is not safe but no other way to escape > > from this recursion hell. (Maybe we can try to call it instead of calling > > BUG() so that the kernel can continue to run, but I'm not sure we can > > safely make the pagetable to readonly again.) > > As long as we provide a minimalistic "effect-less" kprobe implementation > in a non-instrumentable section which can be used whenever we are in a > recursion scenario, I think we could achieve something recursion-free without > requiring a bust_kprobes() work-around. Yeah, I hope so. The bust_kprobes() is something like an emergency escape hammer which everyone hopes never be used :) Thank you, -- Masami Hiramatsu