Received: by 2002:a05:6500:1b8f:b0:1fa:5c73:8e2d with SMTP id df15csp1301877lqb; Thu, 30 May 2024 06:38:32 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXg6BS1qkldE/1bihzf5YE6mU22d5EY77ATNDQxhXxGmrIuFvHEXJjcgihsdlAEWUUugVLAVL3LwUm/TlFgrz1kyglnrV3T/9VZDW9EKg== X-Google-Smtp-Source: AGHT+IFPxFgW1QhJ4JIE01xOYg1QknBqxntQzM9zE2xmMv/zLSxzpRHK80C1/R82YZadXhTZtmrT X-Received: by 2002:a05:622a:1883:b0:43a:f89a:6631 with SMTP id d75a77b69052e-43fe92a0b5amr28176201cf.8.1717076312305; Thu, 30 May 2024 06:38:32 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717076312; cv=pass; d=google.com; s=arc-20160816; b=YUumW/uLeA8o+u5Km4HWLjOpEH/vY3AZf7qnUqrDNRviPyF1Gjvt2lxaxQ7t+Mmpu6 77TelaF6cKXBUkgE0RNSQVMnAMENZA+LCi5jVMr1g7juRw46baYy75eIL1MXAmTtDXCc 5vpaF/6sFE2UuaqisPV/qalZ/ojd/05ncgfFu98wHTIW5HSqapFsfPsuFZWME6iqDmm8 +tDCg/SwYw0E3JUEfks43Jj39wBt+mLcj/CwIcD1albAd1JOPZUgssoq+se62jGBnSmD Wjlp5fyneWbwcvAPPrnLsvPaBuYU0RA63nXfb709raIqYlD8mqe2SP64x/phrDHieWOp mEeA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:references :in-reply-to:message-id:subject:cc:to:from:date; bh=SaPLZk7u8lb3WMtjXFfpBdlyrRjfTK5gfs7wxAF2ofg=; fh=Jdengz6I4+0a+5eyogpWnmhMV6qeH5B7hVJwS49LAKY=; b=ThMQZKm5fkEAFr4A9Tc5zBuZpZtCI/laK4J775G/44nPYUUi4TyjjL6fvlDxRxbqNX DDfumQVihyMXDyups7f/wFiE+COsSce4zN1p3KTtfiX/8HC++CjVcq+uDKdFu3OKK6u6 1ydu7SKP810gLkYCovLfAkS6TbX2O/zuWI3j2MhQ3PMmLXbqWgBQI/e2D38AL/cGNsUL sG43NWzaS5OncrfbhHFTmnTF14wkUQEQt3sp4I3nWTJjzMEP2m/r6vHZ1RgqnKUQK/nG m6stkuIoKFcZ7vsp3KB9Ic30fcuo/42Zr/QeH6Hm1zLAFFT1FxaWLN15N0ZpDu6NPntJ aqQg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-195432-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195432-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d75a77b69052e-43fb18ea32dsi169809561cf.767.2024.05.30.06.38.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 06:38:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-195432-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-195432-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195432-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id F1CF71C21EA2 for ; Thu, 30 May 2024 13:38:31 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6BA7317D883; Thu, 30 May 2024 13:38:21 +0000 (UTC) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D87D917C203; Thu, 30 May 2024 13:38:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717076300; cv=none; b=toUj4qYCt3TcP64Os3ekaBTJkEVPzMWv7pJ/Oc/Teu+BInjCupy0Y2z0Q68D2MUe8LypFAgjhk7iW5Z+iK2t5WhPU1cJldihe5M6UDR+2fHpxQHfx3tFvl/4K11KvslVWL+IFpXuNK9PAXnpJ/DmU5fszmBQWQ5SNz4yfsnsCWI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717076300; c=relaxed/simple; bh=S55mCDwB/L5bdVzNJdlQckiADHtrRU057V4YeMUyCks=; h=Date:From:To:CC:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mBsm+uY0bq0rwFqLyE5eXJd2LThWuHFmYS2Z9k4PneqpDvtrTfCZETVFkLrPQdl7TJ2/AiqEsDW8Su5vEPpDh4LyMO8p7ykq2+1OvrUKSGKM9VoFc229/G/qGRAGluuCkGjbLND7/NP0nhvz+1i9jicjkKDXm2O8hOlxP+S77G4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4VqnKK2qwNz6J9yY; Thu, 30 May 2024 21:34:13 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id 61233140B38; Thu, 30 May 2024 21:38:15 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 30 May 2024 14:38:14 +0100 Date: Thu, 30 May 2024 14:38:13 +0100 From: Jonathan Cameron To: Dongsheng Yang CC: Gregory Price , Dan Williams , John Groves , , , , , , , Mark Rutland Subject: Re: [PATCH RFC 0/7] block: Introduce CBD (CXL Block Device) Message-ID: <20240530143813.00006def@Huawei.com> In-Reply-To: <5db870de-ecb3-f127-f31c-b59443b4fbb4@easystack.cn> References: <20240503105245.00003676@Huawei.com> <5b7f3700-aeee-15af-59a7-8e271a89c850@easystack.cn> <20240508131125.00003d2b@Huawei.com> <20240508164417.00006c69@Huawei.com> <3d547577-e8f2-8765-0f63-07d1700fcefc@easystack.cn> <20240509132134.00000ae9@Huawei.com> <664cead8eb0b6_add32947d@dwillia2-mobl3.amr.corp.intel.com.notmuch> <8f161b2d-eacd-ad35-8959-0f44c8d132b3@easystack.cn> <5db870de-ecb3-f127-f31c-b59443b4fbb4@easystack.cn> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: lhrpeml500003.china.huawei.com (7.191.162.67) To lhrpeml500005.china.huawei.com (7.191.163.240) On Thu, 30 May 2024 14:59:38 +0800 Dongsheng Yang wrote: > =E5=9C=A8 2024/5/29 =E6=98=9F=E6=9C=9F=E4=B8=89 =E4=B8=8B=E5=8D=88 11:25,= Gregory Price =E5=86=99=E9=81=93: > > On Wed, May 22, 2024 at 02:17:38PM +0800, Dongsheng Yang wrote: =20 > >> > >> > >> =E5=9C=A8 2024/5/22 =E6=98=9F=E6=9C=9F=E4=B8=89 =E4=B8=8A=E5=8D=88 2:4= 1, Dan Williams =E5=86=99=E9=81=93: =20 > >>> Dongsheng Yang wrote: > >>> > >>> What guarantees this property? How does the reader know that its local > >>> cache invalidation is sufficient for reading data that has only reach= ed > >>> global visibility on the remote peer? As far as I can see, there is > >>> nothing that guarantees that local global visibility translates to > >>> remote visibility. In fact, the GPF feature is counter-evidence of the > >>> fact that writes can be pending in buffers that are only flushed on a > >>> GPF event. =20 > >> > >> Sounds correct. From what I learned from GPF, ADR, and eADR, there wou= ld > >> still be data in WPQ even though we perform a CPU cache line flush in = the > >> OS. > >> > >> This means we don't have a explicit method to make data puncture all c= aches > >> and land in the media after writing. also it seems there isn't a expli= cit > >> method to invalidate all caches along the entire path. > >> =20 > >>> > >>> I remain skeptical that a software managed inter-host cache-coherency > >>> scheme can be made reliable with current CXL defined mechanisms. =20 > >> > >> > >> I got your point now, acorrding current CXL Spec, it seems software ma= naged > >> cache-coherency for inter-host shared memory is not working. Will the = next > >> version of CXL spec consider it? =20 > >>> =20 > >=20 > > Sorry for missing the conversation, have been out of office for a bit. > >=20 > > It's not just a CXL spec issue, though that is part of it. I think the > > CXL spec would have to expose some form of puncturing flush, and this > > makes the assumption that such a flush doesn't cause some kind of > > race/deadlock issue. Certainly this needs to be discussed. > >=20 > > However, consider that the upstream processor actually has to generate > > this flush. This means adding the flush to existing coherence protocol= s, > > or at the very least a new instruction to generate the flush explicitly. > > The latter seems more likely than the former. > >=20 > > This flush would need to ensure the data is forced out of the local WPQ > > AND all WPQs south of the PCIE complex - because what you really want to > > know is that the data has actually made it back to a place where remote > > viewers are capable of percieving the change. > >=20 > > So this means: > > 1) Spec revision with puncturing flush > > 2) Buy-in from CPU vendors to generate such a flush > > 3) A new instruction added to the architecture. > >=20 > > Call me in a decade or so. > >=20 > >=20 > > But really, I think it likely we see hardware-coherence well before thi= s. > > For this reason, I have become skeptical of all but a few memory sharing > > use cases that depend on software-controlled cache-coherency. =20 >=20 > Hi Gregory, >=20 > From my understanding, we actually has the same idea here. What I am=20 > saying is that we need SPEC to consider this issue, meaning we need to=20 > describe how the entire software-coherency mechanism operates, which=20 > includes the necessary hardware support. Additionally, I agree that if=20 > software-coherency also requires hardware support, it seems that=20 > hardware-coherency is the better path. > >=20 > > There are some (FAMFS, for example). The coherence state of these > > systems tend to be less volatile (e.g. mappings are read-only), or > > they have inherent design limitations (cacheline-sized message passing > > via write-ahead logging only). =20 >=20 > Can you explain more about this? I understand that if the reader in the=20 > writer-reader model is using a readonly mapping, the interaction will be= =20 > much simpler. However, after the writer writes data, if we don't have a=20 > mechanism to flush and invalidate puncturing all caches, how can the=20 > readonly reader access the new data? There is a mechanism for doing coarse grained flushing that is known to work on some architectures. Look at cpu_cache_invalidate_memregion(). On intel/x86 it's wbinvd_on_all_cpu_cpus() on arm64 it's a PSCI firmware call CLEAN_INV_MEMREGION (there is a public alpha specification for PSCI 1.3 with that defined but we don't yet have kernel code.) These are very big hammers and so unsuited for anything fine grained. In the extreme end of possible implementations they briefly stop all CPUs and clean and invalidate all caches of all types. So not suited to anything fine grained, but may be acceptable for a rare setup event, particularly if the main job of the writing host is to fill that memory for lots of other hosts to use. At least the ARM one takes a range so allows for a less painful implementation. I'm assuming we'll see new architecture over time but this is a different (and potentially easier) problem space to what you need. Jonathan > > ~Gregory > > =20