Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp7116521pxv; Fri, 30 Jul 2021 10:20:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+w/WG1vDuFddGnvFhzQ3Iyc5oAsZqxfQ8h75vmD1uAMfBLH/EW3IbtC+H3v1NyRZF8tH+ X-Received: by 2002:aa7:dc49:: with SMTP id g9mr1742341edu.301.1627665634940; Fri, 30 Jul 2021 10:20:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627665634; cv=none; d=google.com; s=arc-20160816; b=xNZa/Kq+3Y8B5dXeJNFKCsgwmB/wrY207nKMgvzexcTE28RF+AuYXKYqpCF1d2AgNL kCd++lnegKkcDASOSVc9bHaBoTebMO9etI2uVvLVk/KkFW31+K3Y6V1DVekvsWE51vg1 +HcsI1bgndd2hdiGixbQTQ0xKKIpd5YZ5C5fk+Ov4yw2qflWHPxJvqCBlIz69UnKA0XH Y+VLWx+tx5nMJrqiBu9Oave4Q3XE+9bvO7BbFYFl2N3vpEMY1+W+0YwRWI4FFzcwaH3m CiFtAnkp2yTmXw9UbPoe3y19qAs9TV0krJu3ylAT9TRMIh4un3s26AQLy9QHgxweCN2a XRQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=h9PmzBduaHzCBkWx5yvXt38+fYiQ5KQPZW+oXu4nV3c=; b=x7OB7y487Bxdae1AP53wdb7t5x+KZjRsuMI6itt5c/5kIQfRrSyyd6KSnJUSyq/S6O Frcwb3/JU2tutIqDg6G7STdlzIyI3ZWalbXPwwUvk9U2wsQC/iPU7+uS6aXBmwCKbea2 VrCZ4nlTMJ7DBTurmmFgbQ/qlw4wK6tS067k8vxhVQ43obL11NvawGRwBZK6+c2eXfbP gqvcZrd/TjxMDFDwOIvlL9pqhRKIFIwzDQSFufa27gjDgw0oJFfpoMkMwZybPtBKy7Lj dIfJjsqyPxTgyMx9AtcomhimrIChVL8yHANSzPg0FBYbuyynf72OR53XAD9xF3N31nGd J8hA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N17uVNCC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u1si951717ejy.366.2021.07.30.10.20.08; Fri, 30 Jul 2021 10:20:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N17uVNCC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230132AbhG3RSG (ORCPT + 99 others); Fri, 30 Jul 2021 13:18:06 -0400 Received: from mail.kernel.org ([198.145.29.99]:57530 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229761AbhG3RSF (ORCPT ); Fri, 30 Jul 2021 13:18:05 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id A979A60E93; Fri, 30 Jul 2021 17:18:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1627665480; bh=DRo2yjn7ZjhRYqZjBVZhU9wmXQ2e5A7P6sYXN+gy2Xg=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=N17uVNCC1eq49CzR7MVriN7QvmU63JmO5hqP3lk3ENc7HGl7UOTrzM/J3BB1/0CNA XwlWfguUdZfibXW1vjVTBbZEnAXJxcEK4uRvY4CR86XIbkZlHe9jQ8NZB8FNjRHpL7 Ox7KkqAYH6e97+hh+IjgYmolXJKHk/igJTHSXYa6D9/84Qroe7/crM7VCg6Lr2fsWo QRyZpL9JLkMivb1XoKOCDij/lN6PvhW68wPRE97hGcc34d4qI0/gpXAeoTxaSafOn6 fegTw0jArKf/Q3mzD2EDvOIDGItEtpk0ae54QwdCHOYWcc9G3adFut8WWK9PXX+9mI IqzhSBwFflGcg== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 72B6E5C06D1; Fri, 30 Jul 2021 10:18:00 -0700 (PDT) Date: Fri, 30 Jul 2021 10:18:00 -0700 From: "Paul E. McKenney" To: Boqun Feng Cc: Frederic Weisbecker , rcu@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, mingo@kernel.org, jiangshanlai@gmail.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org, Linus Torvalds Subject: Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates Message-ID: <20210730171800.GH4397@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <20210721202042.GA1472052@paulmck-ThinkPad-P17-Gen-1> <20210721202127.2129660-4-paulmck@kernel.org> <20210729105331.GA301667@lothringen> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 30, 2021 at 01:56:41PM +0800, Boqun Feng wrote: > On Thu, Jul 29, 2021 at 12:53:31PM +0200, Frederic Weisbecker wrote: > > On Thu, Jul 29, 2021 at 03:58:04PM +0800, Boqun Feng wrote: > > > > The following litmus test, also adapted from the one supplied off-list > > > > by Frederic Weisbecker, models the RCU grace-period kthread detecting > > > > a non-idle CPU that is concurrently transitioning to idle: > > > > > > > > C dynticks-into-idle > > > > > > > > { > > > > DYNTICKS=1; (* Initially non-idle. *) > > > > } > > > > > > > > P0(int *X, int *DYNTICKS) > > > > { > > > > int dynticks; > > > > > > > > // Non-idle. > > > > WRITE_ONCE(*X, 1); > > > > dynticks = READ_ONCE(*DYNTICKS); > > > > smp_store_release(DYNTICKS, dynticks + 1); > > > > smp_mb(); > > > > > > this smp_mb() is not needed, as we rely on the release-acquire pair to > > > provide the ordering. > > > > > > This means that if we use different implementations (one w/ smp_mb(), > > > another w/o) rcu_dynticks_inc() for idle-to-nonidle and nonidle-to-idle, > > > we could save a smp_mb(). Thoughts? > > > > That's exactly what I wanted to propose but everybody was sober. Namely order > > only the RCU read side critical sections before/after idle together: > > > > READ side critical section > > //enter idle > > //exit idle > > smp_mb() > > READ side critical section > > > > instead of ordering the RCU read side critical section before idle - with the RCU > > idle extended quiescent state - with the RCU read side critical section after idle: > > > > READ side critical section > > //enter idle > > smp_mb(); > > //exit idle > > smp_mb() > > READ side critical section > > > > So the side effect now is that if the write side waits for the reader to > > report a quiescent state and scans its dynticks state and see it's not yet in > > RCU idle mode, then later on when the read side enters in RCU idle mode we > > expect it to see the write side updates. > > But after the barrier removal the reader will only see the write side update > > once we exit RCU idle mode. > > > > So the following may happen: > > > > P0(int *X, int *Y, int *DYNTICKS) > > { > > int y; > > > > WRITE_ONCE(*X, 1); > > smp_store_release(DYNTICKS, 1); // rcu_eqs_enter > > //smp_mb() not there anymore > > y = READ_ONCE(*Y); > > smp_store_release(DYNTICKS, 2); // rcu_eqs_exit() > > smp_mb(); > > } > > > > P1(int *X, int *Y, int *DYNTICKS) > > { > > int x; > > int dynticks; > > > > WRITE_ONCE(*Y, 1); > > smp_mb(); > > dynticks = smp_load_acquire(DYNTICKS); > > x = READ_ONCE(*X); > > } > > > > exists (1:x=0 /\ 0:y=0) > > > > Thanks for the detailed explanation ;-) > > > Theoretically it shouldn't matter because the RCU idle mode isn't > > supposed to perform RCU reads. But theoretically again once a CPU > > Right, in LOCKDEP=y kernel, rcu_read_lock_held() requires > rcu_is_watching(), so rcu_dereference() is not allowed in idle mode, > unless using RCU_NONIDLE() or rcu_irq_enter_irqson() to temporarily exit > the idle mode. > > > has reported a quiescent state, any further read is expected to see > > the latest updates from the write side. > > Yes, but in your above case, doesn't P0 already reach to a quiescent > state even before WRITE_ONCE()? IOW, that case is similar to the > following: > > P0(int *X, int *Y) > { > // in QS > > WRITE_ONCE(*X, 1); > y = READ_ONCE(*Y); > } > > P1(int *X, int *Y) > { > WRITE_ONCE(*Y, 1); > synchronize_rcu(); > x = READ_ONCE(*X); > } > > exists (1:x=0 /\ 0:y=0) > > And RCU doesn't guarantee the READ_ONCE() on P0 sees the WRITE_ONCE() on > P1. > > > > > So I don't know what to think. In practice I believe it's not a big deal > > because RCU idle mode code is usually a fragile path that just handles > > cpuidle code to put the CPU in/out low power mode. But what about dragons... > > My current thought is that if the cpuidle code requires some ordering > with synchronize_rcu(), RCU_NONIDLE() should be used, and ordering can > be guaranteed in this case (RCU_NONIDLE() has a rcu_eqs_exit() in it). > Otherwise, it's a bug. > > So looks like we can drop that smp_mb() in rcu_eqs_enter()? At least, we > can say something in the doc to prevent people from relying on the > ordering between normal reads in RCU idle mode and synchronize_rcu(). > > Thoughts? Is there a benchmark that can show a system-level difference? My guess is that the realtime interrupt-latency and scheduler-latency benchmarks would have the best chance of seeing this. Thanx, Paul