Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp1794222ybf; Sun, 1 Mar 2020 18:19:49 -0800 (PST) X-Google-Smtp-Source: APXvYqyqyigcTHeJRWUjH6SCB4i+VQSGCOyW57QJDcIoXGV4OvYb4rG9yr1bj9hs1h/uNFWer3Zs X-Received: by 2002:aca:120e:: with SMTP id 14mr9609762ois.135.1583115589552; Sun, 01 Mar 2020 18:19:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583115589; cv=none; d=google.com; s=arc-20160816; b=iFNk/U8e/9ActrNBrBvV1ho4kHfRviBW2+sAr2xZTjLHTY4RGOmCby9+209hk0UWfY sClQ8muQ73NSWIgEofnBALHaJytgDvwOs/xcNh+J5Kuv/QmK8Wk3Cbqex/tY9i+8aH+C TEJ+Ty0kB1gFt0PGBp55jMntVhBgROQpW49cccQYelFoYQooKlK9reZL47bNg9yN0ZZU nMfX7CEKOArftEbiUtmcQkp4mFg0kjka7wbs7GiMGw9VZHFqP6POm8PubY7S3RmJiodQ tl1veCp8iAb9s6+Wz4d4dT9m1Gkj3zFUyRbYPKFCtN2Itz9nY0t8L/BzqMjqIJqcyZjr ClKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:in-reply-to:cc:references:message-id :date:subject:mime-version:from:content-transfer-encoding :dkim-signature; bh=kyo4kbhh23iAcYk7hy9gOFWO01bzwKixpj+i9q2tgto=; b=zqvDnHqn6aj8+0nUxoS/u6tp5RKwje89Qou5nDzYicB8VBqJXM6lHyfH3QTW8sCVth Z/Dgb8H0l+Zp4Vk9gUNi80gT9wObTtAuuDhutGFJuiJ6jQuz9ArmvRsENbfaEpZvFGbY e10L83EZWALESlx1uRrzrK5c6oCS8Ud87tcJQSwv7ZxwHs1qcOgRimI1Q5chjZHurYZ4 fQJJs7rCWSKtftWsmdh3rb5RBPQ7Oi0rbI+/CqIS9C1pMFLGabHo9CQ9gl7pgjqIqLPb kyEIqKq0tCqi9GStGrnhP/qXuIum7yAF3kA4gybAV6QIzbaC/lgWUksp9O6KJgrNFVoW IhRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=YqoxAvSj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 7si5146809oij.29.2020.03.01.18.19.37; Sun, 01 Mar 2020 18:19:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=YqoxAvSj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726889AbgCBCS6 (ORCPT + 99 others); Sun, 1 Mar 2020 21:18:58 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:34446 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726720AbgCBCS5 (ORCPT ); Sun, 1 Mar 2020 21:18:57 -0500 Received: by mail-pl1-f196.google.com with SMTP id j7so3573789plt.1 for ; Sun, 01 Mar 2020 18:18:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=kyo4kbhh23iAcYk7hy9gOFWO01bzwKixpj+i9q2tgto=; b=YqoxAvSjjRUoND1kBmgCsMhlNsOVFG/AHQzyxjQB/iuN0//njUMMZhs674kYe0AuLZ yhkxukGBbZqpXbnWe9lE/XNvOiHqr7y5mTcPo2QrrFDjc8yXhhA2vRv2l/KuDtQPoJSS 0NdXu8OkiF4IoiCIBwL69HNiMAWis7s0xphyQi1HMPeK72FH+D+rCDFZu32zsqlzryKb 58aCb4ouAvS3ecvC2zXaGtYjbjZ+fNw2dVYy67ouw6qvyN3y9IQnN1+QoBDpFyeihiJt wLLWlW48jsUjiuUS5fvoLtvNQ/lUd3O8qDK+lhvTRD1ekCXKLLN7txBBjD6+HFKjf256 SxLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=kyo4kbhh23iAcYk7hy9gOFWO01bzwKixpj+i9q2tgto=; b=AtP79kTaF6ADrs9wvCpjoRGt1xK1eMl+D/X7j9bJiX2RwG9Zf9W0sDw4LAE5LPo3BL xFLdIiIptiS8B0/OH6qcQiPyJyjs9o27Z0DoH8Jwe1U2gzdwIMLlX1V6WNuApjVSiWbA oQBYYRWLZ60rGzCOQVh3YaFSewNqxX5O1Dv7mjtMwHfO59w+cZnAjXSy1uIDWKSbrQYi 2cVxi4/gI7ozSXevRWa3VKkyIlvOnZ3lX03RhwouXvJobfmCtghZc80FMdbK87+U9B9/ XibZ5zdLo0x/bHFJRrlOrDyNsz11hFiB6jC6ZHSjwrL4hekdDpEJeawcIJGYnqvwZ+an Si9w== X-Gm-Message-State: APjAAAWGOcXLLdPenIp/QOD4ieoemYfRnjwRNKZsJCrLvTLXDHJ7dtKQ PDCTBPkB9CZs8NGCt499Ce8pDg== X-Received: by 2002:a17:902:8b81:: with SMTP id ay1mr14588830plb.275.1583115536430; Sun, 01 Mar 2020 18:18:56 -0800 (PST) Received: from ?IPv6:2600:1010:b069:8a27:f0bb:e3c9:d875:6efc? ([2600:1010:b069:8a27:f0bb:e3c9:d875:6efc]) by smtp.gmail.com with ESMTPSA id c5sm18456659pfi.10.2020.03.01.18.18.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 01 Mar 2020 18:18:55 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: [patch 4/8] x86/entry: Move irq tracing on syscall entry to C-code Date: Sun, 1 Mar 2020 18:18:51 -0800 Message-Id: References: <20200302011018.GA161499@google.com> Cc: Andy Lutomirski , "Paul E. McKenney" , Thomas Gleixner , Steven Rostedt , Peter Zijlstra , LKML , X86 ML , Brian Gerst , Juergen Gross , Paolo Bonzini , Arnd Bergmann In-Reply-To: <20200302011018.GA161499@google.com> To: Joel Fernandes X-Mailer: iPhone Mail (17D50) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 1, 2020, at 5:10 PM, Joel Fernandes wrote:= >=20 > =EF=BB=BFOn Sun, Mar 01, 2020 at 10:54:23AM -0800, Andy Lutomirski wrote: >>> On Sun, Mar 1, 2020 at 10:26 AM Paul E. McKenney wr= ote: >>>=20 >>> On Sun, Mar 01, 2020 at 07:12:25PM +0100, Thomas Gleixner wrote: >>>> Andy Lutomirski writes: >>>>> On Sun, Mar 1, 2020 at 7:21 AM Thomas Gleixner wr= ote: >>>>>> Andy Lutomirski writes: >>>>>>>> On Mar 1, 2020, at 2:16 AM, Thomas Gleixner wr= ote: >>>>>>>> Ok, but for the time being anything before/after CONTEXT_KERNEL is u= nsafe >>>>>>>> except trace_hardirq_off/on() as those trace functions do not allow= to >>>>>>>> attach anything AFAICT. >>>>>>>=20 >>>>>>> Can you point to whatever makes those particular functions special? = I >>>>>>> failed to follow the macro maze. >>>>>>=20 >>>>>> Those are not tracepoints and not going through the macro maze. See >>>>>> kernel/trace/trace_preemptirq.c >>>>>=20 >>>>> That has: >>>>>=20 >>>>> void trace_hardirqs_on(void) >>>>> { >>>>> if (this_cpu_read(tracing_irq_cpu)) { >>>>> if (!in_nmi()) >>>>> trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_A= DDR1); >>>>> tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1); >>>>> this_cpu_write(tracing_irq_cpu, 0); >>>>> } >>>>>=20 >>>>> lockdep_hardirqs_on(CALLER_ADDR0); >>>>> } >>>>> EXPORT_SYMBOL(trace_hardirqs_on); >>>>> NOKPROBE_SYMBOL(trace_hardirqs_on); >>>>>=20 >>>>> But this calls trace_irq_enable_rcuidle(), and that's the part of the >>>>> macro maze I got lost in. I found: >>>>>=20 >>>>> #ifdef CONFIG_TRACE_IRQFLAGS >>>>> DEFINE_EVENT(preemptirq_template, irq_disable, >>>>> TP_PROTO(unsigned long ip, unsigned long parent_ip), >>>>> TP_ARGS(ip, parent_ip)); >>>>>=20 >>>>> DEFINE_EVENT(preemptirq_template, irq_enable, >>>>> TP_PROTO(unsigned long ip, unsigned long parent_ip), >>>>> TP_ARGS(ip, parent_ip)); >>>>> #else >>>>> #define trace_irq_enable(...) >>>>> #define trace_irq_disable(...) >>>>> #define trace_irq_enable_rcuidle(...) >>>>> #define trace_irq_disable_rcuidle(...) >>>>> #endif >>>>>=20 >>>>> But the DEFINE_EVENT doesn't have the "_rcuidle" part. And that's >>>>> where I got lost in the macro maze. I looked at the gcc asm output, >>>>> and there is, indeed: >>>>=20 >>>> DEFINE_EVENT >>>> DECLARE_TRACE >>>> __DECLARE_TRACE >>>> __DECLARE_TRACE_RCU >>>> static inline void trace_##name##_rcuidle(proto) >>>> __DO_TRACE >>>> if (rcuidle) >>>> .... >>>>=20 >>>>> But I also don't see why this is any different from any other tracepoi= nt. >>>>=20 >>>> Indeed. I took a wrong turn at some point in the macro jungle :) >>>>=20 >>>> So tracing itself is fine, but then if you have probes or bpf programs >>>> attached to a tracepoint these use rcu_read_lock()/unlock() which is >>>> obviosly wrong in rcuidle context. >>>=20 >>> Definitely, any such code needs to use tricks similar to that of the >>> tracing code. Or instead use something like SRCU, which is OK with >>> readers from idle. Or use something like Steve Rostedt's workqueue-base= d >>> approach, though please be very careful with this latter, lest the >>> battery-powered embedded guys come after you for waking up idle CPUs >>> too often. ;-) >>>=20 >>=20 >> Are we okay if we somehow ensure that all the entry code before >> enter_from_user_mode() only does rcuidle tracing variants and has >> kprobes off? Including for BPF use cases? >>=20 >> It would be *really* nice if we could statically verify this, as has >> been mentioned elsewhere in the thread. It would also probably be >> good enough if we could do it at runtime. Maybe with lockdep on, we >> verify rcu state in tracepoints even if the tracepoint isn't active? >> And we could plausibly have some widget that could inject something >> into *every* kprobeable function to check rcu state. >=20 > You are talking about verifying that a non-rcuidle tracepoint is not calle= d > into when RCU is not watching right? I think that's fine, though I feel > lockdep kernels should not be slowed down any more than they already are. I= > feel over time if we add too many checks to lockdep enabled kernels, then i= t > becomes too slow even for "debug" kernels. May be it is time for a > CONFIG_LOCKDEP_SLOW or some such? And then anyone who wants to go crazy on= > runtime checking can do so. I myself want to add a few. >=20 > Note that the checking is being added into "non rcu-idle" tracepoints many= of > which are probably always called when RCU is watching, making such checkin= g > useless for those tracepoints (and slowing them down however less). >=20 Indeed. Static analysis would help a lot here. > Also another note would be that the whole reason we are getting rid of the= > "make RCU watch when rcuidle" logic in DO_TRACE is because it is slow for > tracepoints that are frequently called into. Another reason to do it is > because tracepoint callbacks are expected to know what they are doing and > turn on RCU watching as appropriate (as consensus on the matter suggests).= Whoa there. We arch people need crystal clear rules as to what tracepoints c= an be called in what contexts and what is the responsibility of the callback= s. >=20 > thanks, >=20 > - Joel >=20