Received: by 10.192.165.148 with SMTP id m20csp2262088imm; Thu, 26 Apr 2018 08:14:55 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+V209TqAG21SB6jAWZsgxegvO8HDUgHIkoghq3DYuF7GbqsM+v1bj7vvUUqZ87IhvbQnAi X-Received: by 10.99.95.20 with SMTP id t20mr27091954pgb.400.1524755695379; Thu, 26 Apr 2018 08:14:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524755695; cv=none; d=google.com; s=arc-20160816; b=TR7V/T3C3d4ALxIDs6jETiK/hRuqT/unUNjOqwyuZYAqeLIKxfxxckJ3UHzRz9v9AR HxWyl6gGhUe31oysEMilt3pD4iLaYo4pedlvY2+/FBA4SIXq+vSijbkH52TqcObECtX6 O82zzo5+yKko92CUVFrjbyE31lbkX91uKApVdLA1f9+e5IUKrc3Kp1UTllhcLyHQL0O7 XCNUYsggImihX3lUUz6a/NNGID9zNQJ1djNnZ+/Jc1V6IZs3I4a2wYG2/y6LXnTMpvul ccOi6Tnc9HIxwxC1gjkGEpDgunVu62fHPhQshyENQtMGxXi80zOsqm1n/WwebVYg/1Zw xqAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:arc-authentication-results; bh=82jtPTH+rVrw8wWomBWjAQgABgCkce6dOejMJnXirX8=; b=dplCNiNVbafWa6UFUJrHM2BJU1jVhk+2IKsgxk6Q5zZhdKVQhJz0H/F5Nh5m5Y030p nKgH5G+HgYCDkd+fwVh1JL/W8v/7jvUn8xTGPx0VWrE0PsaQpSifmJLa70Xj15V64ebD EPjpp392diD/lAQbLKKT8RXrlnyUrs5jH+52sBBY+R5nEM87hxz10hTV6T+33OCqKRs8 4oMVqmAWNM6fl9qytxNJmuioKNm18j4PnpfY64HZ/mcp64wSGYhG6vhFezHPY3Bj5Mz5 iRswX+NhbolF6+u3JYYdwTi3FIr4uV+gkMf4aKYY8QKvWZfcHMHjb336FIvFz4Y4J1g9 41LQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 71-v6si19702517plb.511.2018.04.26.08.14.40; Thu, 26 Apr 2018 08:14:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756700AbeDZPNW (ORCPT + 99 others); Thu, 26 Apr 2018 11:13:22 -0400 Received: from mail.efficios.com ([167.114.142.138]:38324 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755861AbeDZPNS (ORCPT ); Thu, 26 Apr 2018 11:13:18 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id A83AA1B802A; Thu, 26 Apr 2018 11:13:17 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id eKlRtXLTAjRD; Thu, 26 Apr 2018 11:13:16 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id E24951B8027; Thu, 26 Apr 2018 11:13:16 -0400 (EDT) X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ZmgeD8uUN3yi; Thu, 26 Apr 2018 11:13:16 -0400 (EDT) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id C5CAF1B8019; Thu, 26 Apr 2018 11:13:16 -0400 (EDT) Date: Thu, 26 Apr 2018 11:13:16 -0400 (EDT) From: Mathieu Desnoyers To: Joel Fernandes Cc: "Paul E. McKenney" , rostedt , Namhyung Kim , Masami Hiramatsu , linux-kernel , linux-rt-users , Peter Zijlstra , Ingo Molnar , Tom Zanussi , Thomas Gleixner , Boqun Feng , fweisbec , Randy Dunlap , kbuild test robot , baohong liu , vedang patel , kernel-team Message-ID: <2099399401.1995.1524755596613.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20180423172244.694dbc9d@gandalf.local.home> <849066633.939.1524612064698.JavaMail.zimbra@efficios.com> <68e4c123-a223-5e26-e57a-da2515041bf3@google.com> <20180425001049.GX26088@linux.vnet.ibm.com> <20180425042056.GA21412@linux.vnet.ibm.com> <1267842641.1791.1524692456344.JavaMail.zimbra@efficios.com> Subject: Re: [RFC v4 3/4] irqflags: Avoid unnecessary calls to trace_ if you can MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.8_GA_2009 (ZimbraWebClient - FF52 (Linux)/8.8.8_GA_2009) Thread-Topic: irqflags: Avoid unnecessary calls to trace_ if you can Thread-Index: mLptWZZ+rYhlkxYdbIeozRyk5mP+gQ== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Apr 25, 2018, at 7:13 PM, Joel Fernandes joelaf@google.com wrote: > Hi Mathieu, > > On Wed, Apr 25, 2018 at 2:40 PM, Mathieu Desnoyers > wrote: >> ----- On Apr 25, 2018, at 5:27 PM, Joel Fernandes joelaf@google.com wrote: >> >>> On Tue, Apr 24, 2018 at 9:20 PM, Paul E. McKenney >>> wrote: >>> [..] >>>>> > >>>>> > Sounds good, thanks. >>>>> > >>>>> > Also I found the reason for my boot issue. It was because the >>>>> > init_srcu_struct in the prototype was being done in an initcall. >>>>> > Instead if I do it in start_kernel before the tracepoint is used, it >>>>> > fixes it (although I don't know if this is dangerous to do like this >>>>> > but I can get it to boot atleast.. Let me know if this isn't the >>>>> > right way to do it, or if something else could go wrong) >>>>> > >>>>> > diff --git a/init/main.c b/init/main.c >>>>> > index 34823072ef9e..ecc88319c6da 100644 >>>>> > --- a/init/main.c >>>>> > +++ b/init/main.c >>>>> > @@ -631,6 +631,7 @@ asmlinkage __visible void __init start_kernel(void) >>>>> > WARN(!irqs_disabled(), "Interrupts were enabled early\n"); >>>>> > early_boot_irqs_disabled = false; >>>>> > >>>>> > + init_srcu_struct(&tracepoint_srcu); >>>>> > lockdep_init_early(); >>>>> > >>>>> > local_irq_enable(); >>>>> > -- >>>>> > >>>>> > I benchmarked it and the performance also looks quite good compared >>>>> > to the rcu tracepoint version. >>>>> > >>>>> > If you, Paul and other think doing the init_srcu_struct like this >>>>> > should be Ok, then I can try to work more on your srcu prototype and >>>>> > roll into my series and post them in the next RFC series (or let me >>>>> > know if you wanted to work your srcu stuff in a separate series..). >>>>> >>>>> That is definitely not what I was expecting, but let's see if it works >>>>> anyway... ;-) >>>>> >>>>> But first, I was instead expecting something like this: >>>>> >>>>> DEFINE_SRCU(tracepoint_srcu); >>>>> >>>>> With this approach, some of the initialization happens at compile time >>>>> and the rest happens at the first call_srcu(). >>>>> >>>>> This will work -only- if the first call_srcu() doesn't happen until after >>>>> workqueue_init_early() has been invoked. Which I believe must have been >>>>> the case in your testing, because otherwise it looks like __call_srcu() >>>>> would have complained bitterly. >>>>> >>>>> On the other hand, if you need to invoke call_srcu() before the call >>>>> to workqueue_init_early(), then you need the patch that I am beating >>>>> into shape. Plus you would need to use DEFINE_SRCU() and to avoid >>>>> invoking init_srcu_struct(). >>>> >>>> And here is the patch. I do not intend to send it upstream unless it >>>> actually proves necessary, and it appears that current SRCU does what >>>> you need. >>>> >>>> You would only need this patch if you wanted to invoke call_srcu() >>>> before workqueue_init_early() was called, which does not seem likely. >>> >>> Cool. So I was chatting with Paul and just to update everyone as well, >>> I tried the DEFINE_SRCU instead of the late init_srcu_struct call and >>> can make it past boot too (thanks Paul!). Also I don't see a reason we >>> need the RCU callback to execute early and its fine if it runs later. >>> >>> Also, I was thinking of introducing a separate trace_*event*_srcu API >>> as a replacement to the _rcuidle API. Then I can make use of it for my >>> tracepoints, and then later can use it for the other tracepoints >>> needing _rcuidle. After that we can finally get rid of the _rcuidle >>> API if there are no other users of it. This is just a rough plan, but >>> let me know if there's any issue with this plan that you can think >>> off. >>> IMO, I believe its simpler if the caller worries about whether it can >>> tolerate if tracepoint probes can block or not, than making it a >>> property of the tracepoint. That would also simplify the patch to >>> introduce srcu and keep the tracepoint creation API simple and less >>> confusing, but let me know if I'm missing something about this. >> >> One problem with your approach is that you can have multiple callers >> for the same tracepoint name, where some could be non-preemptible and >> others blocking. Also, there is then no clear way for the callback > > Shouldn't it be responsibility of the caller to make sure it calls > correct API? So if you're wanting to allow probes to block, then you'd > call trace*blocking, if not then you don't. So the caller side can > just always do the right thing. That's a caller side issue. The issue there is that tracepoint.c has APIs both for instrumentation and for registration of probe providers (callbacks). I want tracepoint.c to provide guarantees that it won't connect incompatible probes and callsites together. > >> >> Regarding the name, I'm OK with having something along the lines of >> trace_*event*_blocking or such. Please don't use "srcu" or other naming >> that is explicitly tied to the underlying mechanism used internally >> however: what we want to convey is that this specific tracepoint probe > > Problem is that _blocking isn't the right word either. In my IRQ trace > point case, it will look something like this then: > > local_irq_disable(); > // IRQs are now off. > trace_irq_disable_blocking(..); > > This wouldn't make sense. What we really want is to use the SRCU > implementation so that its low overhead... > > So it would be something like: > > local_irq_disable(); > // IRQs are now off. > trace_irq_disable_srcu(..); > > I also Ok if, as Paul was saying in his last email, that just for > _rcuidle, we use SRCU so that we don't have to do the rcu_enter_irq > stuff. Or we kill the _rcuidle API completely and use _srcu for those > users instead. We already have 1 implementation specific name anyway > (rcuidle), we're just replacing it with another one. If in the future, > if we want to change that name we can always do so (Also if you will, > correcting the existing already bad naming is a different problem and > we're not making it any worse tbh). Using SRCU rather than the sched-rcu tracepoint synchronization in your use-case it caused by a limitation of sched-rcu: it cannot be efficiently used within idle code. So you don't care about the "can_sleep" property of SRCU. You could event mix SRCU and sched-rcu callsites for the same probe name, and it would be perfectly valid. So even though both "can_sleep" and "rcuidle" caller variants would end up using SRCU under the hood, each can have its own caller API, e.g.: * trace_() -> only non-sleeping probes can register to those. Uses sched-rcu under the hood. * trace__can_sleep() -> both sleeping and non-sleeping probes can register to those. Uses SRCU under the hood. * trace__rcuidle() -> only non-sleeping probes can register to those, uses SRCU under the hood. > >> can be preempted and block. The underlying implementation could move to >> a different RCU flavor brand in the future, and it should not impact >> users of the tracepoint APIs. >> >> In order to ensure that probes that may block only register themselves >> to tracepoints that allow blocking, we should introduce new tracepoint >> declaration/definition *and* registration APIs also contain the >> "BLOCKING/blocking" keywords (or such), so we can ensure that a >> tracepoint probe being registered to a "blocking" tracepoint is indeed >> allowed to block. > > I feel this problem you're describing is slightly out of the scope of > the issues we're talking about, I think. Even right now, someone can > write a callback that blocks and then bad things will happen. If I > understand correctly, all callbacks right now will execute in a > preempt disabled section because of rcu_read_lock_sched. So we already > have a problem (without the SRCU changes) that if a callback blocks, > then we'll have hard to diagnose sleeping while atomic issues. Sorry > if I missed your point. The current situation is that no callback whatsoever can sleep. If we introduce an API allowing some callbacks to sleep, I want to make sure we don't end up registering sleepable callbacks to non-preemptible callsites. Considering that the callback can be provided by a kernel module whereas the callsite is within the kernel, having this kind of correctness validation within tracepoint.c appears important. Thanks! Mathieu > > thanks, > > - Joel -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com