Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp209691imm; Tue, 15 May 2018 00:03:23 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpxS0nx2zzPGlbn8LnThdKzK1zxtG7E7oGiEwi08Jvc9CBb2hdE3v1f/7z5YrjfOuyHYzWC X-Received: by 2002:a62:c413:: with SMTP id y19-v6mr13792839pff.97.1526367803609; Tue, 15 May 2018 00:03:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526367803; cv=none; d=google.com; s=arc-20160816; b=l5PO09ks6rhZvTcwR4ELDRfA8qxHsrGZ4Kh75Ti0HFGC+sMNDoKlLlBZMdHFTSgTR+ +VpuU/Ya6ZRdYBKisRJv5HVVbAiKSp3tpiP89KCrV/SEymDv4QpG/0zyzuLw+dRiVAE5 Yb9E6Y51gNsxLGLPxaU6lggfzSjWX5NgOXfu7bO7OxInRce/QtLGJL/r9sh/4TgPY4RT TNFduSkuoZHg8fFJcVxrszLQsq6s+nzVJc+1WLnXfeBKtD83P/oBELHGeWs1WD5Ol2DG lVty/czlwZNefx0u/mwP/Z0NMKqd0CVAX3aUsSRqxyXRHrVoNg4AXgzANIVQvycB/Kp2 KYDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=3b+qmHVaoFOdqH3z0wbbgMATcxFeabDlC5L+o+ZbQ1A=; b=rBlgCmau0vbBpU8P84WagWrvTDVUI2QnsYb6DCvF4+PkM72GYqtK4v1YYFji10F8rn hNacpOnr1uZXlnNZsAiPgzyOMy9AGN1eUC+R+MWG7okOCVaFckMsVQ3zN+J4Qy77y1HC FSNJFiCNGctRBzud82BcahAU7RKACCUuMWRKbCuEhMzxZU+SWRSGVwNcPO0PKN01OBVm lJ8HZwMnWLge1Aywsh8jVh1uiZJO1CFCJ3sMYaRfxWVN1uY+581R7VQL0wykyYyktptj f5NtN8/mB8eHsqLDQuVD3BdxXSlvB3UP81p8h0dBD3bC9GJAjcQ01uafH+7lkbM3g7mH rE7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes-org.20150623.gappssmtp.com header.s=20150623 header.b=tIoEPcil; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z3-v6si9082570pgr.171.2018.05.15.00.03.09; Tue, 15 May 2018 00:03:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes-org.20150623.gappssmtp.com header.s=20150623 header.b=tIoEPcil; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752237AbeEOHCq (ORCPT + 99 others); Tue, 15 May 2018 03:02:46 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:34539 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752044AbeEOHCp (ORCPT ); Tue, 15 May 2018 03:02:45 -0400 Received: by mail-pg0-f67.google.com with SMTP id k2-v6so1004597pgc.1 for ; Tue, 15 May 2018 00:02:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=3b+qmHVaoFOdqH3z0wbbgMATcxFeabDlC5L+o+ZbQ1A=; b=tIoEPcilCAtYcoiIUSoJaHDum9fu+JM0RuIIcFsFk67BcnevSyveaM8xFCpW6Z3EiM kgm1aM6Gf/rIakUQxiSA6xh9WWofRlZ5L7JjJeuTsMhaLziEjTMe5+RGstuLsIGG7LwV XNMzeBU2n5z5lWfpNukA7FdBh/Hm+HDS1KB5Psf9uKZT0ESYUAUq3cRWt/KUU3Bt/6hv wwk8/G2+Yad8kxQi6ompcmvezF0psFVnpd1Ni/gDLj30Ia0fHp4s7ELUy0YA6XhvEckL 4Iin2r1H6W1ThjGj621kuMGbzoleYZTjqvqThKJxrzLS71Lu3mnyamFQwZV2gcVNDgTb Ub/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=3b+qmHVaoFOdqH3z0wbbgMATcxFeabDlC5L+o+ZbQ1A=; b=Gh9io3hkdkko1SfRjEUSuRt/zpdCgQ74o7ALTrjfKMUcN+JoTeet9aQ0rj1PE3WXjW sWRqkWku6Dw4BUQcpiI/5quDMPvh3cK7NSTQTRQoijByzRxTbExrKkQenMi7rr+6ZdE8 WmA9QG0RP03PY5YNsGpt8eB7YYK0J91EhGXws8sV0SXhj2fSg/3D6PjngM5dzMm0+Sbk 1c3X8tMwroJ2n1kUlX5HR27NZ8NvS8kqM5JzeOkh+RiKQEXrCE8dQCkEwhs2kH5IWp2H 1AH/ulUcbNC+c0JLc1q6GyhYabUcXKJimnJT3YIvcyceODsd/SvNQpL2eBjCKWWWkG1f ufZA== X-Gm-Message-State: ALKqPwdPM2mOmDkCjI2cokfzGcQWfahacojj/qtFOOYL6ZiCHzbTr8cb /ShzBZkh4Rea8ZUeR/BymmprKg== X-Received: by 2002:a62:4cca:: with SMTP id e71-v6mr13630154pfj.171.1526367764442; Tue, 15 May 2018 00:02:44 -0700 (PDT) Received: from localhost ([2620:0:1000:1600:3122:ea9c:d178:eb]) by smtp.gmail.com with ESMTPSA id 68-v6sm22289894pff.35.2018.05.15.00.02.43 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 15 May 2018 00:02:43 -0700 (PDT) Date: Tue, 15 May 2018 00:02:43 -0700 From: Joel Fernandes To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , byungchul.park@lge.com, kernel-team@android.com Subject: Re: [PATCH RFC 1/8] rcu: Add comment documenting how rcu_seq_snap works Message-ID: <20180515070243.GA55557@joelaf.mtv.corp.google.com> References: <20180514031541.67247-1-joel@joelfernandes.org> <20180514031541.67247-2-joel@joelfernandes.org> <20180514173816.GA26088@linux.vnet.ibm.com> <20180515015133.GH209519@joelaf.mtv.corp.google.com> <20180515035951.GB26088@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180515035951.GB26088@linux.vnet.ibm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Paul, Good morning, hope you're having a great Tuesday. I managed to find some evening hours today to dig into this a bit more. On Mon, May 14, 2018 at 08:59:52PM -0700, Paul E. McKenney wrote: > On Mon, May 14, 2018 at 06:51:33PM -0700, Joel Fernandes wrote: > > On Mon, May 14, 2018 at 10:38:16AM -0700, Paul E. McKenney wrote: > > > On Sun, May 13, 2018 at 08:15:34PM -0700, Joel Fernandes (Google) wrote: > > > > rcu_seq_snap may be tricky for someone looking at it for the first time. > > > > Lets document how it works with an example to make it easier. > > > > > > > > Signed-off-by: Joel Fernandes (Google) > > > > --- > > > > kernel/rcu/rcu.h | 24 +++++++++++++++++++++++- > > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > index 003671825d62..fc3170914ac7 100644 > > > > --- a/kernel/rcu/rcu.h > > > > +++ b/kernel/rcu/rcu.h > > > > @@ -91,7 +91,29 @@ static inline void rcu_seq_end(unsigned long *sp) > > > > WRITE_ONCE(*sp, rcu_seq_endval(sp)); > > > > } > > > > > > > > -/* Take a snapshot of the update side's sequence number. */ > > > > +/* > > > > + * Take a snapshot of the update side's sequence number. > > > > + * > > > > + * This function predicts what the grace period number will be the next > > > > + * time an RCU callback will be executed, given the current grace period's > > > > + * number. This can be gp+1 if RCU is idle, or gp+2 if a grace period is > > > > + * already in progress. > > > > > > How about something like this? > > > > > > This function returns the earliest value of the grace-period > > > sequence number that will indicate that a full grace period has > > > elapsed since the current time. Once the grace-period sequence > > > number has reached this value, it will be safe to invoke all > > > callbacks that have been registered prior to the current time. > > > This value is the current grace-period number plus two to the > > > power of the number of low-order bits reserved for state, then > > > rounded up to the next value in which the state bits are all zero. > > > > This makes sense too, but do you disagree with what I said? > > In a pedantic sense, definitely. RCU callbacks are being executed pretty > much all the time on a busy system, so it is only the recently queued > ones that are guaranteed to be deferred that long. And my experience > indicates that someone really will get confused by that distinction, > so I feel justified in being pedantic in this case. Ok I agree, I'll include your comment above. > > Also just to let you know, thanks so much for elaborately providing an > > example on the other thread where we are discussing the rcu_seq_done check. I > > will take some time to trace this down and see if I can zero in on the same > > understanding as yours. > > > > I get why we use rcu_seq_snap there in rcu_start_this_gp but the way it its > > used is 'c' is the requested GP obtained from _snap, and we are comparing that with the existing > > rnp->gp_seq in rcu_seq_done. When that rnp->gp_seq reaches 'c', it only > > means rnp->gp_seq is done, it doesn't tell us if 'c' is done which is what > > we were trying to check in that loop... that's why I felt that check wasn't > > correct - that's my (most likely wrong) take on the matter, and I'll get back > > once I trace this a bit more hopefully today :-P > > If your point is that interrupts are disabled throughout, so there isn't > much chance of the grace period completing during that time, you are > mostly right. The places you might not be right are the idle loop and > offline CPUs. And yes, call_rcu() doesn't like queuing callbacks onto > offline CPUs, but IIRC it is just fine in the case where callbacks have > been offloaded from that CPU. > > And if you instead say that "c" is the requested final ->gp_seq value > obtained from _snap(), the thought process might go more easily. Yes I agree with c being the requested final value which is the GP for which the callbacks will be queued. At the end of the GP c, the callbacks will have executed. About the rcu_seq_done check and why I believe its not right to use it in that funnel locking loop, if you could allow me to try argument my point from a different angle... We agreed that the way gp_seq numbers work and are compared with each other to identify if a GP is elapsed or not, is different from the way the previous numbers (gp_num) were compared. Most notably, before the gp_seq conversions - inorder to start a GP, we were doing gp_num += 1, and completed had to catch up to gp_num + 1 to mark the end. Now with gp_seq, for a gp to start, we don't do the "+1", we just set the state bits. To mark the end, we clear the state bits and increment the gp_num part of gp_seq. However, in the below commit 12d6c129fd0a ("rcu: Convert grace-period requests to ->gp_seq"). You did a one-to-one replacement of the ULONG_CMP_GE with rcu_seq_done. You did so even though the gp_seq numbers work differently from previously used numbers (gp_num and completed). I would then argue that because of the differences above, a one-to-one replacement of the ULONG_CMP_GE with the rcu_seq_done wouldn't make sense. I argue this because, in previous code - the ULONG_CMP_GE made sense for the gp_num way of things because, if c == gp_num, that means that : - c started already - c has finished. Which worked correctly, because we have nothing to do and we can bail without setting any flag. Where as now, with the gp_seq regime, c == gp_seq means: - c-1 finished (I meant -1 subtracted from the gp_num part of c) This would cause us to bail without setting any flag for starting c. I did some tracing and I could never hit the rcu_seq_done check because it never happens in my tracing that _snap returned something for which rcu_seq_done returned true, so I'm not sure if this check is needed, but you're the expert ;) @@ -1629,16 +1583,16 @@ static bool rcu_start_this_gp(struct rcu_node *rnp, struct rcu_data *rdp, * not be released. */ raw_lockdep_assert_held_rcu_node(rnp); + WARN_ON_ONCE(c & 0x2); /* Catch any lingering use of ->gpnum. */ + WARN_ON_ONCE(((rnp->completed << RCU_SEQ_CTR_SHIFT) >> RCU_SEQ_CTR_SHIFT) != rcu_seq_ctr(rnp->gp_seq)); /* Catch any ->completed/->gp_seq mismatches. */ trace_rcu_this_gp(rnp, rdp, c, TPS("Startleaf")); for (rnp_root = rnp; 1; rnp_root = rnp_root->parent) { if (rnp_root != rnp) raw_spin_lock_rcu_node(rnp_root); - WARN_ON_ONCE(ULONG_CMP_LT(rnp_root->gpnum + - need_future_gp_mask(), c)); if (need_future_gp_element(rnp_root, c) || - ULONG_CMP_GE(rnp_root->gpnum, c) || + rcu_seq_done(&rnp_root->gp_seq, c) || ^^^^ A direct replacement of ULONG_CMP_GE is bit weird? It means we bail out if c-1 completed, and we don't set any flag for starting c. That could result in the clean up never starting c? (rnp != rnp_root && - rnp_root->gpnum != rnp_root->completed)) { + rcu_seq_state(rcu_seq_current(&rnp_root->gp_seq)))) { trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted")); goto unlock_out; }