Received: by 2002:a05:6a10:6d25:0:0:0:0 with SMTP id gq37csp757078pxb; Sat, 11 Sep 2021 21:05:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJytsOgpCtDMggirnf8zZcsnwf3stEeRr974elyfwPcpn08oQBWUNgV9s3ZSwzFdOy98WjQb X-Received: by 2002:a5d:8588:: with SMTP id f8mr4001477ioj.46.1631419509106; Sat, 11 Sep 2021 21:05:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631419509; cv=none; d=google.com; s=arc-20160816; b=GLApQMOgBjSsko0kb+V0+thVZ79eZ3ru3GjAtYJpVw72vsMyUdJJXfYLdhBKg/WDkm 8wR6FmsLAD59tVXSbqVCuJmq+3dIYysLlf2P03r9uJXI/4PJEFvu8JcHjiOkGGfJOCaL XbvIbSMfNf3ZCB/vvhDfnWr/zj3OTA0aPj9HpBdy2K8XrUTXXIEMujmaCUl12u+mjsjG gLLSKuKLNyxjMQTN58bwOI6jXvurecMUGHiksOgONjNSYy97D6oya8SN6fnM3ITheELO JPWV7CcoVKf0Pd/AnHDwIqOxBgJLA5LvSVhun56rUfF+cZ8/PZT12eKv/uiHvgVdj7sF g8DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=HhLg5DITdMaoXGUKwt3y/aLN8E5bLmiGQtimHz9p9Ro=; b=0P3osx7pHe6cuXeNMnCivO8wDWl4cjwqnKOb/n0fuuN0c8nr4aySfAa/2/js/94AXK InuNGV3MVsESZi1uskBkTxaQ4oKyEAg6TJY7Pz8Vy7GzWXgMHMG0YXvpDdW+jBNbDSpp QLajlYoH8uFrLCzRCP9Kw9UGzhQW4zL1jh32BgsExjs3RObvOegFm8Yc1wt9QfQ7Od42 9afy7nDRShEMB6BWi6ELHhTCLNogaC5BvLt1Bs4RemCvtGI9+tdr/neRXz3lMDt8qrn5 FjBdYuYng5jm5mci8/by68YLO7R0tOmbpNiPekvTkQB3Lbz4j+/uuQdvet0CSFy0wytZ shGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=k+WexDFw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 4si2738813jaj.45.2021.09.11.21.04.42; Sat, 11 Sep 2021 21:05:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=k+WexDFw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229540AbhILEAT (ORCPT + 99 others); Sun, 12 Sep 2021 00:00:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229500AbhILEAS (ORCPT ); Sun, 12 Sep 2021 00:00:18 -0400 Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C5DCC061574 for ; Sat, 11 Sep 2021 20:59:05 -0700 (PDT) Received: by mail-io1-xd35.google.com with SMTP id m11so7605397ioo.6 for ; Sat, 11 Sep 2021 20:59:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=HhLg5DITdMaoXGUKwt3y/aLN8E5bLmiGQtimHz9p9Ro=; b=k+WexDFwFNZ3aYJO7L4G8uJ9XHOMfxt6XUwgXRdf9NN4LiM+2RKlv4Wjy6vhqQ2UGC GcisGwKhk1QRlwwUkeF5ZS70szYZAl5V61aEAXP3Y1CF7kqsTsIPEeZR5Ukp86/DYS8R adqYcovxdF+R3PFrJdGkP0RyT46Cpngu5hp9c6/iy21Hg3AvPMiARpWMTiTSaUgo3wHE 0iq50of0XdDtZfykYnbe1VbS/Xn/QkUBrFC6b/7CDEWNIaX1PZgDjvfwQPxxCbBhpOKu 8iMvtYkAXv+9yTFWviOaM6TxzGybvpURxWyM1rmVkyl98D6UOKcjAjdp/W5hl3LJIO0u iRcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=HhLg5DITdMaoXGUKwt3y/aLN8E5bLmiGQtimHz9p9Ro=; b=7xx0wgPSeTYv3odZFayRajKIzZF5GihmlsAhT1BmelAQQf4uD9NIYK8k2IiCuKGuoS H4fEUMiVtF5kLLZXrqie0BxMx3zLH1E/3zl8ebd1H4XgfpivubjAtpk06JZqUTB2QUTo rU6J3pKS3ussus4hkz2/2t7iOJjNm+URLMl/E575iitPwMXlI3n6fdN5e6EpIOijl/Tq tsLGgrJp1KGe7asksvvv5e+7jaMRXn3EZuL3/wKR2zm3df3l3HpgiDRaMqMVr+K5X6In nUr/R4xtYPrjQifda4QAVGD/+c43/l1SQY4YGWMmgarmZRQw0pnjzx1u5n2157UYa1M0 bezQ== X-Gm-Message-State: AOAM531dCxpkwvfXCUzuO3f3oL9UXkjpbMo5k+68X8iISxLsGxZ0vQwM RWsqy9/2F19ujEFIfgvRl6S1GydR21DdrA== X-Received: by 2002:a6b:14f:: with SMTP id 76mr3962238iob.211.1631419144036; Sat, 11 Sep 2021 20:59:04 -0700 (PDT) Received: from auth1-smtp.messagingengine.com (auth1-smtp.messagingengine.com. [66.111.4.227]) by smtp.gmail.com with ESMTPSA id 12sm2106076ilq.37.2021.09.11.20.59.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 11 Sep 2021 20:59:03 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id ABB9A27C0054; Sat, 11 Sep 2021 23:59:02 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Sat, 11 Sep 2021 23:59:02 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudeggedgjeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeeuohhquhhn ucfhvghnghcuoegsohhquhhnrdhfvghnghesghhmrghilhdrtghomheqnecuggftrfgrth htvghrnhepvdelieegudfggeevjefhjeevueevieetjeeikedvgfejfeduheefhffggedv geejnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsg hoqhhunhdomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqieelvdeghedtieeg qddujeejkeehheehvddqsghoqhhunhdrfhgvnhhgpeepghhmrghilhdrtghomhesfhhigi hmvgdrnhgrmhgv X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 11 Sep 2021 23:59:00 -0400 (EDT) Date: Sun, 12 Sep 2021 11:57:22 +0800 From: Boqun Feng To: Peter Zijlstra Cc: Will Deacon , tglx@linutronix.de, linux-kernel@vger.kernel.org, Ingo Molnar , Juri Lelli , Steven Rostedt , Davidlohr Bueso , Waiman Long , Sebastian Andrzej Siewior , Mike Galbraith , Daniel Bristot de Oliveira Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Message-ID: References: <20210909105915.757320973@infradead.org> <20210909110203.767330253@infradead.org> <20210909134524.GB9722@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote: > On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote: > > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote: > > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking > > > it really ought to use smp_store_mb(), because something like: > > > > > > current_save_and_set_rtlock_wait_state(); > > > for (;;) { > > > if (try_lock()) > > > break; > > > > > > raw_spin_unlock_irq(&lock->wait_lock); > > > schedule(); > > > raw_spin_lock_irq(&lock->wait_lock); > > > > > > set_current_state(TASK_RTLOCK_WAIT); > > > } > > > current_restore_rtlock_saved_state(); > > > > > > which is the advertised usage in the comment, is actually broken, > > > since trylock() will only need a load-acquire in general and that > > > could be re-ordered against the state store, which could lead to a > > > missed wakeup -> BAD (tm). > > > > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state() > > order the state change before the successful try_lock? I'm just struggling > > to envisage how this actually goes wrong. > > Moo yes, so the earlier changelog I wrote was something like: > > current_save_and_set_rtlock_wait_state(); > for (;;) { > if (try_lock()) > break; > > raw_spin_unlock_irq(&lock->wait_lock); > if (!cond) > schedule(); > raw_spin_lock_irq(&lock->wait_lock); > > set_current_state(TASK_RTLOCK_WAIT); > } > current_restore_rtlock_saved_state(); > > which is more what the code looks like before these patches, and in that > case the @cond load can be lifted before __state. > > It all sorta works in the current application because most things are > serialized by ->wait_lock, but given the 'normal' wait pattern I got > highly suspicious of there not being a full barrier around. Hmm.. I think ->pi_lock actually protects us here. IIUC, a mising wake-up would happen if try_to_wake_up() failed to observe the __state change by the about-to-wait task, and the about-to-wait task didn't observe the condition set by the waker task, for example: TASK 0 TASK 1 ====== ====== cond = 1; ... try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..): ttwu_state_match(...) if (t0->__state & TASK_RTLOCK_WAIT) // false .. return false; // don't wake up ... current->__state = TASK_RTLOCK_WAIT ... if (!cond) // !cond is true because of memory reordering schedule(); // sleep, and may not be waken up again. But let's add ->pi_lock critical sections into the example: TASK 0 TASK 1 ====== ====== cond = 1; ... try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..): raw_spin_lock_irqsave(->pi_lock,...); ttwu_state_match(...) if (t0->__state & TASK_RTLOCK_WAIT) // false .. return false; // don't wake up raw_spin_unlock_irqrestore(->pi_lock,...); // A ... raw_spin_lock_irqsave(->pi_lock, ...); // B current->__state = TASK_RTLOCK_WAIT raw_spin_unlock_irqrestore(->pi_lock, ...); if (!cond) schedule(); Now the read of cond on TASK0 must observe the store of cond on TASK1, because accesses to __state is serialized by ->pi_lock, so if TASK1's read to __state didn't observe the write of TASK0 to __state, then the lock B must read from the unlock A (or another unlock co-after A), then we have a release-acquire pair to guarantee that the read of cond on TASK0 sees the write of cond on TASK1. Simplify this by a litmus test below: C unlock-lock { } P0(spinlock_t *s, int *cond, int *state) { int r1; spin_lock(s); WRITE_ONCE(*state, 1); spin_unlock(s); r1 = READ_ONCE(*cond); } P1(spinlock_t *s, int *cond, int *state) { int r1; WRITE_ONCE(*cond, 1); spin_lock(s); r1 = READ_ONCE(*state); spin_unlock(s); } exists (0:r1=0 /\ 1:r1=0) and result is: Test unlock-lock Allowed States 3 0:r1=0; 1:r1=1; 0:r1=1; 1:r1=0; 0:r1=1; 1:r1=1; No Witnesses Positive: 0 Negative: 3 Condition exists (0:r1=0 /\ 1:r1=0) Observation unlock-lock Never 0 3 Time unlock-lock 0.01 Hash=e1f914505f07e380405f65d3b0fb6940 In short, since we write to the __state with ->pi_lock held, I don't think we need to smp_store_mb() for __state. But maybe I'm missing something subtle here ;-) Regards, Boqun