Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp1561759ybf; Sun, 1 Mar 2020 12:18:47 -0800 (PST) X-Google-Smtp-Source: APXvYqyBO9nWEYhAwm4wRuTpOGhOcNMsPEDuiaCm+LP2VNwyjxjqQUGLmEa7mjxGaFajf04aJe0o X-Received: by 2002:aca:3857:: with SMTP id f84mr8998478oia.150.1583093927803; Sun, 01 Mar 2020 12:18:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583093927; cv=none; d=google.com; s=arc-20160816; b=Uzr18ESVZP3K5niNtOdQQmHpUhHiQ+q2I5vmRlxSqms/Sbw6Gt7Ooto5OPLudMI/hd DRoAG1XaTZvOQAU+d4qYwmqn0OsgmThlX1vlhBuQwa6AKpwszgRO4hjsE3sIw/IBf62m IVFTVQDlV0xSsXz7DoWXQ9TetLdnp8e3Vk3CVwME4R6Kt57gFpG/7uX+MfCy2chdvlp9 LLEjkabJT4oX+z4/zfiJqmMe1x1VmB6lSRMJApDwB/cRrb0lC6aZ+rkVIPbl65NDC2Ke 5HBnEF/8PVbvudw1omw0DWuSDHQSOJ4xpg7Vssl2b2BFCkon8zyqZB0L0VhJUMytuEId kZvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=hPTFSgjCeqBD465Xkqe+4xAFaLv2pHS7w/6QGlX4qr4=; b=eUbJPBQn7WIGh1MY5F3IOFaRY89HAvncM4ysR0cBN/2t11idRx7X1RDWaJ35IogxnJ ATWwfmE0kzGpo0ISbdGSlqUlxWVKnibv2R4fHFfuP+H98ZU/xtPFeoWoG0Ak7TmIXgRk 15yj0kePv/e44oook7KHLszDC9Xjz3TpB7vAL7Eum7JC0OMSQBBS0KDq26i1NhdUoGl3 uXGrVHyhRDeoS3BTR/ehjQS2v543+nPSIKnIITcmW00fReNFgIdcVWLQDsFaTIKKRYhm xkcqDxpPi2rv6XrCdyg/aYANaHnl8odJq0cdnjtIr813Yox0HIdTs+P5zNL3/s+f4f8o BQFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=KgSpdzY6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f18si5394103otf.166.2020.03.01.12.18.35; Sun, 01 Mar 2020 12:18:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=KgSpdzY6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726682AbgCAUSJ (ORCPT + 99 others); Sun, 1 Mar 2020 15:18:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:38150 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726050AbgCAUSJ (ORCPT ); Sun, 1 Mar 2020 15:18:09 -0500 Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DE241246BF; Sun, 1 Mar 2020 20:18:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583093887; bh=KzkSFo/GsGp5GsjGhDDykjWPGoPV/ln8s1shF/RLaEs=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=KgSpdzY6dkTXpUsm16sYLGn7BPAcfwJmrisL9EU/ZiQ1bLGFhDfhHk3tEjZ03YTMH za7TMRiFh4zxlNI6zBegkbOowCKsCqTqMlAx4NNDhl4kfJFUk7uD4Isw/swPViu/wc 1U0977bq3kdjNnM91bVBghTmVgwaIIWwwYDiJNvw= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id A2E58352272F; Sun, 1 Mar 2020 12:18:07 -0800 (PST) Date: Sun, 1 Mar 2020 12:18:07 -0800 From: "Paul E. McKenney" To: Andy Lutomirski Cc: Andy Lutomirski , Thomas Gleixner , Steven Rostedt , Peter Zijlstra , LKML , X86 ML , Brian Gerst , Juergen Gross , Paolo Bonzini , Arnd Bergmann Subject: Re: [patch 4/8] x86/entry: Move irq tracing on syscall entry to C-code Message-ID: <20200301201807.GZ2935@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200301193034.GY2935@paulmck-ThinkPad-P72> <5BCFDB36-26B6-4881-94D9-4AB0731F8DC5@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5BCFDB36-26B6-4881-94D9-4AB0731F8DC5@amacapital.net> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Mar 01, 2020 at 11:39:42AM -0800, Andy Lutomirski wrote: > > > > On Mar 1, 2020, at 11:30 AM, Paul E. McKenney wrote: > > > > On Sun, Mar 01, 2020 at 10:54:23AM -0800, Andy Lutomirski wrote: > >>> On Sun, Mar 1, 2020 at 10:26 AM Paul E. McKenney wrote: > >>> > >>> On Sun, Mar 01, 2020 at 07:12:25PM +0100, Thomas Gleixner wrote: > >>>> Andy Lutomirski writes: > >>>>> On Sun, Mar 1, 2020 at 7:21 AM Thomas Gleixner wrote: > >>>>>> Andy Lutomirski writes: > >>>>>>>> On Mar 1, 2020, at 2:16 AM, Thomas Gleixner wrote: > >>>>>>>> Ok, but for the time being anything before/after CONTEXT_KERNEL is unsafe > >>>>>>>> except trace_hardirq_off/on() as those trace functions do not allow to > >>>>>>>> attach anything AFAICT. > >>>>>>> > >>>>>>> Can you point to whatever makes those particular functions special? I > >>>>>>> failed to follow the macro maze. > >>>>>> > >>>>>> Those are not tracepoints and not going through the macro maze. See > >>>>>> kernel/trace/trace_preemptirq.c > >>>>> > >>>>> That has: > >>>>> > >>>>> void trace_hardirqs_on(void) > >>>>> { > >>>>> if (this_cpu_read(tracing_irq_cpu)) { > >>>>> if (!in_nmi()) > >>>>> trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > >>>>> tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1); > >>>>> this_cpu_write(tracing_irq_cpu, 0); > >>>>> } > >>>>> > >>>>> lockdep_hardirqs_on(CALLER_ADDR0); > >>>>> } > >>>>> EXPORT_SYMBOL(trace_hardirqs_on); > >>>>> NOKPROBE_SYMBOL(trace_hardirqs_on); > >>>>> > >>>>> But this calls trace_irq_enable_rcuidle(), and that's the part of the > >>>>> macro maze I got lost in. I found: > >>>>> > >>>>> #ifdef CONFIG_TRACE_IRQFLAGS > >>>>> DEFINE_EVENT(preemptirq_template, irq_disable, > >>>>> TP_PROTO(unsigned long ip, unsigned long parent_ip), > >>>>> TP_ARGS(ip, parent_ip)); > >>>>> > >>>>> DEFINE_EVENT(preemptirq_template, irq_enable, > >>>>> TP_PROTO(unsigned long ip, unsigned long parent_ip), > >>>>> TP_ARGS(ip, parent_ip)); > >>>>> #else > >>>>> #define trace_irq_enable(...) > >>>>> #define trace_irq_disable(...) > >>>>> #define trace_irq_enable_rcuidle(...) > >>>>> #define trace_irq_disable_rcuidle(...) > >>>>> #endif > >>>>> > >>>>> But the DEFINE_EVENT doesn't have the "_rcuidle" part. And that's > >>>>> where I got lost in the macro maze. I looked at the gcc asm output, > >>>>> and there is, indeed: > >>>> > >>>> DEFINE_EVENT > >>>> DECLARE_TRACE > >>>> __DECLARE_TRACE > >>>> __DECLARE_TRACE_RCU > >>>> static inline void trace_##name##_rcuidle(proto) > >>>> __DO_TRACE > >>>> if (rcuidle) > >>>> .... > >>>> > >>>>> But I also don't see why this is any different from any other tracepoint. > >>>> > >>>> Indeed. I took a wrong turn at some point in the macro jungle :) > >>>> > >>>> So tracing itself is fine, but then if you have probes or bpf programs > >>>> attached to a tracepoint these use rcu_read_lock()/unlock() which is > >>>> obviosly wrong in rcuidle context. > >>> > >>> Definitely, any such code needs to use tricks similar to that of the > >>> tracing code. Or instead use something like SRCU, which is OK with > >>> readers from idle. Or use something like Steve Rostedt's workqueue-based > >>> approach, though please be very careful with this latter, lest the > >>> battery-powered embedded guys come after you for waking up idle CPUs > >>> too often. ;-) > >> > >> Are we okay if we somehow ensure that all the entry code before > >> enter_from_user_mode() only does rcuidle tracing variants and has > >> kprobes off? Including for BPF use cases? > > > > That would work, though if BPF used SRCU instead of RCU, this would > > be unnecessary. Sadly, SRCU has full memory barriers in each of > > srcu_read_lock() and srcu_read_unlock(), but we are working on it. > > (As always, no promises!) > > > >> It would be *really* nice if we could statically verify this, as has > >> been mentioned elsewhere in the thread. It would also probably be > >> good enough if we could do it at runtime. Maybe with lockdep on, we > >> verify rcu state in tracepoints even if the tracepoint isn't active? > >> And we could plausibly have some widget that could inject something > >> into *every* kprobeable function to check rcu state. > > > > Or just have at least one testing step that activates all tracepoints, > > but with lockdep enabled? > > Also kprobe. I didn't realize we could test all kprobe points easily, but yes, that would also be good. > I don’t suppose we could make notrace imply nokprobe. Then all kprobeable functions would also have entry/exit tracepoints, right? On this, I must defer to the tracing and kprobes people. Thanx, Paul