Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp617423lqo; Wed, 8 May 2024 09:31:07 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXYoPQepE1vbwbq9IN+5av7o57iDhKf1x06TuUQlEOgSusdOjSF+QxXuHwasQNW6nuOdmcqKPikwhyniARQPuQ7deQUaueitM5NC5TG7w== X-Google-Smtp-Source: AGHT+IGZ/wyXk0S7rT7wNy+pXNhsFe5JKbNcHXvEen8PDjvaAwUj/x5mF2OtY13d1r0RcjnrieDf X-Received: by 2002:a17:902:fb03:b0:1e7:e7ed:cbd8 with SMTP id d9443c01a7336-1eefa574148mr1567635ad.22.1715185867152; Wed, 08 May 2024 09:31:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715185867; cv=pass; d=google.com; s=arc-20160816; b=zP1LXX/5HqOycA8zABlwBmDGF28FzHW1BIL0Tn61XvglatrDW1hJs2ZYUi5jbCq0eI D37b39RBIr0BD1TojPCiYbNB+ze7z26u9fPBGQajx+NoFuM3iaU2WmW8e0zQSyOlYRkr Hd/HcjBNe3ra33Ti5DBiongpLdJY4V6VCtxC6bOKYkbS1FTgRj6YpJ/yuk1Ca9D7KYPe V8ybmK4LcXAwE5oNpcFAMJgLs+FcLg1JQAgxx2OCZaYFxdgOWo8yIPh6VRWNqfmvT54o AXMUYz4H05DqQhOodrUItFqOjlhuWEJBBfamwv9zS8alx9d/P19qPiHfmZwy3+z5QyIf ptEQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=q9PR0H5Ma0VdvfQv9vA6DsIGHDUPuZqqFaaGm/DimjM=; fh=YSm8zTqQgcCnh2DBIepaX0ySySHMHDn0C3d1X+KoFBg=; b=XBx6vGlr7dob+bSyyreG2QiFgomvqI72uqXxVyDuybxDWQWITy9KqbMsLCW8DqpwQO FKrQX8tbt1eyWF2jck9LsievOQbCUm4EuNR24Jjj/jbNhwaMG0SjhUGUJcdfrOlYWPSn z9Omhec9r5vENCKq80n6gVK0onp0Mn7gEs4GmfmSgL8IbEhPrz5Fhvi/trFProUxxLX8 gz5an6I41U2aVnhU9+icUZQLHFNshcAPc2uFJxi4n7px38EQsBC8hnAo3tTFTmrfwI2U LphbseP5MxTYHjDpL8Zil7aEio+THPFRWNVaa0KK8BLQeX78EuqjiPQQTu6g1WogOpYU 8o0A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=rm6ALsVE; arc=pass (i=1 spf=pass spfdomain=rivosinc.com dkim=pass dkdomain=rivosinc-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-173537-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-173537-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id b12-20020a170902d40c00b001e0d0b2ced0si11085863ple.328.2024.05.08.09.31.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 May 2024 09:31:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-173537-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=rm6ALsVE; arc=pass (i=1 spf=pass spfdomain=rivosinc.com dkim=pass dkdomain=rivosinc-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-173537-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-173537-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id A5DA6B23A4F for ; Wed, 8 May 2024 16:23:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2DF7D12A175; Wed, 8 May 2024 16:23:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="rm6ALsVE" Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B7D712881C for ; Wed, 8 May 2024 16:23:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715185432; cv=none; b=umGOAk5byzPUSd/2SHzSgQoXkFNW0Dt959nK4SItMch83C8O+mL4F8qiOCaFeq2F0XtHml0u0yI4wpHbi6WUxYWRxfEJnIRvH8n/ncsJ7t4RgwXXspjxJf3SbaL9/iwhN27xVbuk5I/12tHe5rhWEdmkIc9pYlcBe0zpIwNIo+s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715185432; c=relaxed/simple; bh=LALlFwOWTUd2XSFgFgwKb1A89IlaNVHQcrhML21oaak=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=LSKZ8iXEqrSuK/9JDYe17pvlukWyFeE6R8XJfXLvu0lg0BPuWMhn3+wme+74oH4nImmtMGySK/oAQDFjWmrEKCPmA7UcYjXC8HF5dxYKIzcXhKjiZM2c4bd12EzdgCvtls73ZECEu3N5NESZgNV1MSYHwU4i3mkHz98ZGxaoSl4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=rm6ALsVE; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2b33d011e5dso782018a91.0 for ; Wed, 08 May 2024 09:23:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1715185430; x=1715790230; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q9PR0H5Ma0VdvfQv9vA6DsIGHDUPuZqqFaaGm/DimjM=; b=rm6ALsVEUBA5vlThf+LLOrFRqDPL0plouypTYD4j64Z/QVUGal3dpFMQKGTm5iiBXJ Cjj9KHkvnA2zi0gV2/7tsQbmMoaNQ/kMtOEpT9/O5smSSigdP9JgM/t7bBAvIGfCeljN hT6NHi0YiegEs5t3LYDiz3N835D0w4uBD9fdmu6WZrnVjrhX2i1pc4eKsEF+B9NrNUKt 00MpDlfIO89Xs0Fqja43FdjcHbEw+Z6vz7AUYnYmEYnbJTIxm4+zDF4NOG5Mhy2JSUlN 70kIzrt6hVBHMum4Xwp/JgjSph7iyxRZjwjRVrGKKhcInxPXgq/KmANBhS1oBg3QmukT khTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715185430; x=1715790230; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q9PR0H5Ma0VdvfQv9vA6DsIGHDUPuZqqFaaGm/DimjM=; b=DMx8fXjVjS34hSNBZUXRLPabSGjskaCRqX4UL/7a+LQKeg2n4zsPsaJxzcwzQD+yzs oWhZMtT10VQ2KMgHxmDpgjeY6ttLauvNT201d2mjipzxsH2W4PrBv52gh1wSk/mH3LCL DIAN1bttCyhChLOGwTr51zn4eAAXxqiHCuNL+uaxqhMmf9ffWFkH1rJDmlpYOxdp3Von 8lrjPGHoE+IsrSJ2GQSdwQgeckIe1s6le6RmiWgg0gDzBtYKAEeauedDYWLHrCmW4pSs HWuOifZH2ZTyAi/ySAV+so4ty1U0Tas6BPELtGsK1WHfDF16jPVJ19fKoLm6AVODGgCR uz6w== X-Forwarded-Encrypted: i=1; AJvYcCU+RrnU0/GBx8wVn1T2AmaU+MpTEpCaivZwYjD99bMSaR1HmtSMGsqFiVbJiPVZZFbUGklGVufKTc05NXO7fYZjQKPH8K6q9RO/xphA X-Gm-Message-State: AOJu0YxHHxGsdnrXyDpH0vsGgmipsIyj21504MDjV9AReYdW9Y4bmy5K 7rMY8Z2q/Yrw43/M4gfMf4y6oRlZjNGsH39UQ7OqzAL9Ph22S2v/maLmX5RHMO5VZeC4b6lV15o LcxOLEYH406MsOqRAEwLES+V3SukOvpoy/4WVMA== X-Received: by 2002:a17:90a:e396:b0:2b6:208c:2aee with SMTP id 98e67ed59e1d1-2b65fe18df9mr188527a91.20.1715185429807; Wed, 08 May 2024 09:23:49 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240501145621.GD1723318@ziepe.ca> <20240503181059.GC901876@ziepe.ca> <20240505154639.GD901876@ziepe.ca> <20240507165156.GH4718@ziepe.ca> In-Reply-To: <20240507165156.GH4718@ziepe.ca> From: Tomasz Jeznach Date: Wed, 8 May 2024 09:23:37 -0700 Message-ID: Subject: Re: [PATCH v3 7/7] iommu/riscv: Paging domain support To: Jason Gunthorpe Cc: Joerg Roedel , Will Deacon , Robin Murphy , Paul Walmsley , Palmer Dabbelt , Albert Ou , Anup Patel , Sunil V L , Nick Kossifidis , Sebastien Boeuf , Rob Herring , Krzysztof Kozlowski , Conor Dooley , devicetree@vger.kernel.org, iommu@lists.linux.dev, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux@rivosinc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, May 7, 2024 at 9:51=E2=80=AFAM Jason Gunthorpe wrote= : > > On Mon, May 06, 2024 at 07:22:07PM -0700, Tomasz Jeznach wrote: > > On Sun, May 5, 2024 at 8:46=E2=80=AFAM Jason Gunthorpe w= rote: > > > > > > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote: > > > > > For detach I think yes: > > > > > > > > > > Inv CPU Detach CPU > > > > > > > > > > write io_pte Update device descri= ptor > > > > > rcu_read_lock > > > > > list_for_each > > > > > > > > > > dma_wmb() dma_wmb() > > > > > > > > > > rcu_read_unlock > > > > > list_del_rcu() > > > > > > > > > > > > > > > In this case I think we never miss an invalidation, the list_del = is > > > > > always after the HW has been fully fenced, so I don't think we ca= n > > > > > have any issue. Maybe a suprious invalidation if the ASID gets > > > > > re-used, but who cares. > > > > > > > > > > Attach is different.. > > > > > > > > > > Inv CPU Attach CPU > > > > > > > > > > write io_pte > > > > > rcu_read_lock > > > > > list_for_each // empty > > > > > list_add_rcu() > > > > > Update device descri= ptor > > > > > > > > > > dma_wmb() > > > > > > > > > > rcu_read_unlock > > > > > > > > > > As above shows we can "miss" an invalidation. The issue is narrow= , the > > > > > io_pte could still be sitting in write buffers in "Inv CPU" and n= ot > > > > > yet globally visiable. "Attach CPU" could get the device descript= or > > > > > installed in the IOMMU and the IOMMU could walk an io_pte that is= in > > > > > the old state. Effectively this is because there is no release/ac= quire > > > > > barrier passing the io_pte store from the Inv CPU to the Attach C= PU to the > > > > > IOMMU. > > > > > > > > > > It seems like it should be solvable somehow: > > > > > 1) Inv CPU releases all the io ptes > > > > > 2) Attach CPU acquires the io ptes before updating the DDT > > > > > 3) Inv CPU acquires the RCU list in such a way that either attac= h > > > > > CPU will acquire the io_pte or inv CPU will acquire the RCU l= ist. > > > > > 4) Either invalidation works or we release the new iopte to the = SMMU > > > > > and don't need it. > > > > > > > > > > But #3 is a really weird statement. smb_mb() on both sides may do= the > > > > > job?? > > > > > > > > > > > > > Actual attach sequence is slightly different. > > > > > > > > Inv CPU Attach CPU > > > > > > > > write io_pte > > > > rcu_read_lock > > > > list_for_each // empty > > > > list_add_rcu() > > > > IOTLB.INVAL(PSCID) > > > > > > > > dma_wmb() > > > > > > > > rcu_read_unlock > > > > > > > > I've tried to cover this case with riscv_iommu_iotlb_inval() called > > > > before the attached domain is visible to the device. > > > > > > That invalidation shouldn't do anything. If this is the first attach > > > of a PSCID then the PSCID had better already be empty, it won't becom= e > > > non-empty until the DDT entry is installed. > > > > > > And if it is the second attach then the Inv CPU is already taking car= e > > > of things, no need to invalidate at all. > > > > > > Regardless, there is still a theortical race that the IOPTEs haven't > > > been made visible yet because there is still no synchronization with > > > the CPU writing them. > > > > > > So, I don't think this solves any problem. I belive you need the > > > appropriate kind of CPU barrier here instead of an invalidation. > > > > > > > Yes. There was a small, but still plausible race w/ IOPTEs visibility > > to the IOMMU. > > For v5 I'm adding two barriers to the inval/detach flow, I believe > > should cover it. > > > > 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any > > pending writes to PTEs visible to the IOMMU device. This should cover > > the case when list_add_rcu() update is not yet visible in the > > _iotlb_inval() sequence, for the first time the domain is attached to > > the IOMMU. > > > > Inv CPU Attach CPU > > write io_pte > > dma_wmb (1) > > rcu_read_lock > > list_for_each // empty list_add_rcu() > > smp_wmb (2) > > Update device descriptor > > > > // PTEs are visible to the= HW (*1) > > dma_wmb() > > > > rcu_read_unlock > > > > 2) In riscv_iommu_bond_link() write memory barrier to ensure list > > update is visible before IOMMU descriptor update. If stale data has > > been fetched by the HW, inval CPU will run iotlb-invalidation > > sequence. There is a possibility that IOMMU will fetch correct PTEs > > and will receive unnecessary IOTLB inval, but I don't think anyone > > would care. > > > > Inv CPU Attach CPU > > write io_pte list_add_rcu() > > smp_wmb (2) > > Update device descriptor > > > > // HW might fetch stale PT= Es > > dma_wmb() > > > > dma_wmb (1) > > rcu_read_lock > > list_for_each // non-empty (*2) > > > > dma_wmb() > > > > rcu_read_unlock > > > > 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache > > on the last domain unlink from the IOMMU. > > > > Thank you for pointing this out. Let me know if that makes sense. > > I'm not an expert in barriers, but I think you need the more expensive > "mb" in both cases. > > The inv side is both releasing the write and acquiring the list > read. IIRC READ_ONCE is not a full acquire? > > The Attach side is both releasing the list_add_rcu() and acquiring the > iopte. > > rcu is still a benefit, there is no cache line sharing and there is > only one full barrier, not two, like a spinlock. > > And a big fat comment in both sides explaining this :) > I'm not an expert in barriers as well, but I've checked offline with one ;) And the conclusion is that we need FENCE W,W (or stronger) on the attach side, and FENCE RW,RW in the invalidation sequence. Hardware access to PTE/DC is sequential, with implied FENCE R,R barriers. As 'attach' sequence is a rare event anyway, I'll go on "mb" on both sides, as suggested. Unless there are other opinions on that I'll update barriers to match this conclusion and try to capture in long comment for bond / inval / attach synchronization assumptions. Jason, thanks again for pointing this out. > Jason Best, - Tomasz