Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4633437pxf; Tue, 30 Mar 2021 12:45:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxrA3slpXqLblRJuAKZGkovtFCCeBHLUohSFMnRgTbRA9cyNGJpGdlMdeT5W2k0JFgrK3Sk X-Received: by 2002:a17:907:7699:: with SMTP id jv25mr33818835ejc.363.1617133518514; Tue, 30 Mar 2021 12:45:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617133518; cv=none; d=google.com; s=arc-20160816; b=JSgkuv6HHMkNDzmhcKl4VrfTMEQ9MiJhCwCDIxwmesvmcrXWfbF/OY/DtddwiEza22 wH/m/9UFu1eFSn9lDwYgIpnHL8Qu/3ekTqa3GwrfznbiyDyxG6cCCHiAPFDqj5RAb26H zjqiOAStvyGxQDXC9l2EjkjtrcgrkfR+3wqgASUocPOh7zG40azAzltoWfGVuynbk5mU Spf8Nr5/RShdpR9d1TTWkObU18qhwaHE8iiac5UCtYnralCi29llKUW2QVkFyj56tC26 NBFm+isipJ1Xa05wsNZKDGrJCD6ZPH1NeEAmzAX0z4WA1h8MLsb8u1u2SEPAfatbRgCh kVIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=g3IJGSpa4V7kxQCCiNdS0yq+p2vQpiM9GPuBKrSVj54=; b=fPA5ARGWVM4GnT0pmMIqSTaPYGAC6dvlABNLjB7jawQn9hWQPclS49Kl+6ZbU5aJqm kQHjJo4PYQ2cuzjOHxs3jR5wsbqnwMzn0xSaKtHwFG00FDd1FEqfap4/7GUHpxVWNQb5 LCxpX4L+twoUGIlBCMCOwjoq0nq/DLdYiJauGzUt/LkPrbzF6/wBZ5Q/ESrXt6HSORT+ /24Vpkap0/MrWes8Lo9Jv3vFwiemTTCUtZa+UmotXbYrbNObAh3ysn5SCO4dZOSgtB6a z2htBwo89JYnMIqYuZXqA8eiSzcmuf9Q2+Q4xecuFxnTplQIZxzBNdQi+8k/D3xkLGVL cF3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=vOUdoK1P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jg8si16521083ejc.657.2021.03.30.12.44.30; Tue, 30 Mar 2021 12:45:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=vOUdoK1P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233113AbhC3Tnf (ORCPT + 99 others); Tue, 30 Mar 2021 15:43:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57406 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233135AbhC3Tnd (ORCPT ); Tue, 30 Mar 2021 15:43:33 -0400 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05C3CC061574 for ; Tue, 30 Mar 2021 12:43:33 -0700 (PDT) Received: by mail-ed1-x533.google.com with SMTP id j3so19527532edp.11 for ; Tue, 30 Mar 2021 12:43:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=g3IJGSpa4V7kxQCCiNdS0yq+p2vQpiM9GPuBKrSVj54=; b=vOUdoK1PWlhj/aNxl5aoyoNnCu021/rOMrZTK4OK6qhKjx6ZY9ZTMprpwLM6aUHCA7 L0cncH+g4f10YhaxVwCbAD46s1n+cLOuHWbfgiNvDRtiB/f2JQz7Rtjycw4CnTgcvtEs mpv4oQEanMuhdN1hAbKhxolFwMFr0u7Zn/bBMvBMQaGNjCclm3/FqH4I6nqsX+22gqZi zwAQr5iz/jPTmvKI0a33bbSsotX4Tpaybw+k4jPqvA/b4kTZzuwvKLtcRjpM2B7tO8fY hd+zdWQxqH7f0qu6MiIq+GakE6RUB+W3ZTadLhcSkHXpT6ZRXURyOPFYF5Bf8t9ZPCCH T6LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=g3IJGSpa4V7kxQCCiNdS0yq+p2vQpiM9GPuBKrSVj54=; b=bc2Ab+JEHX+xxvwQCDPDXJFc8m/B056U3fj1WzYULeuX7Jv85yxwZXVo4+5dHO5+03 yFFjnQCK3bPdAfQoJ6l91NPQoBOYXWf1FGJ68JVDo3BU37bHHakorGxDwVh2ANg12ODN +52aCYaFQ0bg6+4pDz4SjyKgi5McokB78Y7Gywx+dCEq66041Mkt3Is5vRxbAOvho9NY kJKVwXKhKtj3UE6ICJD/a3gbxGKlPJr0okGyjCCx3YyWnt4DIGcGJqZ0j00Dqm4s4s+b MqT4tOxpL0E5JF5uyVwBSSZZbYlCtFgFbfDhLLMYnw+fLAzyQ5LGr6TqukjNqu2+g1lf 4WPw== X-Gm-Message-State: AOAM530MSIUtylI2eAcZVamEuWXPsvh+G/5susbJdqvZz1i5M85usCDQ slv5/+ZuIqbfTuC9tnXMwH7qpja0G46wLVDOGGfocQ== X-Received: by 2002:a05:6402:1713:: with SMTP id y19mr35386165edu.52.1617133405601; Tue, 30 Mar 2021 12:43:25 -0700 (PDT) MIME-Version: 1.0 References: <161707245893.2072157.6743322596719518693.stgit@dwillia2-desk3.amr.corp.intel.com> <161707246948.2072157.2116502455691653472.stgit@dwillia2-desk3.amr.corp.intel.com> <20210330111620.GK2356281@nvidia.com> <20210330154712.GR2356281@nvidia.com> <20210330170253.GU2356281@nvidia.com> <20210330175431.GX2356281@nvidia.com> <20210330192608.GA1430856@nvidia.com> In-Reply-To: <20210330192608.GA1430856@nvidia.com> From: Dan Williams Date: Tue, 30 Mar 2021 12:43:15 -0700 Message-ID: Subject: Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations To: Jason Gunthorpe Cc: linux-cxl@vger.kernel.org, Linux Kernel Mailing List , Vishal L Verma , "Weiny, Ira" , "Schofield, Alison" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 30, 2021 at 12:26 PM Jason Gunthorpe wrote: > > On Tue, Mar 30, 2021 at 12:00:23PM -0700, Dan Williams wrote: > > > > > > IMHO this can't use 'dev->kobj.state_in_sysfs' as the RCU protected data. > > > > > > > > This usage of srcu is functionally equivalent to replacing > > > > srcu_read_lock() with down_read() and the shutdown path with: > > > > > > Sort of, but the rules for load/store under RCU are different than for > > > load/store under a normal barriered lock. All the data is unstable for > > > instance and minimially needs READ_ONCE. > > > > The data is unstable under the srcu_read_lock until the end of the > > next rcu grace period, synchronize_rcu() ensures all active > > srcu_read_lock() sections have completed. > > No, that isn't how I would phrase it. *any* write side data under RCU > is always unstable by definition in the read side because the write > side can always race with any reader. Thus one should use the RCU > accessors to deal with that data race, and get some acquire/release > semantics when pointer chasing (though this doesn't matter here) > > > Unless Paul and I misunderstood each other, this scheme of > > synchronizing object state is also used in kill_dax(), and I put > > that comment there the last time this question was raised. If srcu > > was being used to synchronize the liveness of an rcu object like > > @cxlm or a new ops object then I would expect rcu_dereference + > > rcu_assign_pointer around usages of that object. The liveness of the > > object in this case is handled by kobject reference, or inode > > reference in the case of kill_dax() outside of srcu. > > It was probably a mis-understanding as I don't think Paul would say > you should read data in thread A and write to it in B without using > READ_ONCE/WRITE_ONCE or a stronger atomic to manage the data race. > > The LWN articles on the "big bad compiler" are informative here. You > don't want the compiler to do a transformation where it loads > state_in_sysfs multiple times and gets different answers. This is what > READ_ONCE is aiming to prevent. > > Here is it just a boolean flag, and the flag is only cleared, so risks > are low, but it still isn't a technically correct way to use RCU. > > (and yes the kernel is full of examples of not following the memory > model strictly) Ok, so this is the disagreement. Strict adherence to the model where it does not matter in practice. In that sense this use is not idiomatic, and the fact that we've spilled this many bytes in emails back and forth is good enough reason to change it. > > > > > cdev_device_del(...); > > > > down_write(...): > > > > up_write(...); > > > > > > The lock would have to enclose the store to state_in_sysfs, otherwise > > > as written it has the same data race problems. > > > > There's no race above. The rule is that any possible observation of > > ->state_in_sysfs == 1, or rcu_dereference() != NULL, must be > > flushed. > > It is not about the flushing. Ok, it's not about the flushing it's about whether the store to state_in_sysfs can leak past up_write(). If that store can leak then neither arrangement of: cdev_device_del(...); down_write(...): up_write(...); ...or: down_write(...): cdev_device_del(...); up_write(...); ...is sufficient.