Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp1174962rdb; Tue, 30 Jan 2024 09:59:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IFyvpGciBrjiB44LATozxssHtJOEIphQrGvuFc1EoEBUnOrzUekpOEDRTVau3BlNUH+tmj7 X-Received: by 2002:a50:8e1e:0:b0:55f:2e1e:5b8e with SMTP id 30-20020a508e1e000000b0055f2e1e5b8emr2519934edw.7.1706637540484; Tue, 30 Jan 2024 09:59:00 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706637540; cv=pass; d=google.com; s=arc-20160816; b=wrHodcgSEKhd97RDxfFGh+wFemvt/WOj26H2lXyN4hu9Xtjhk/fnz9LkYGbFFYK90t 7IMTccGFBF60FjKsDVfXPKe6+YiNw3yTyKsjtg/QuZRrddIMmkVNGlL7w7PRB46w4Ht2 kJ+1dk2quo9amxD7H9eo02QReyR675n8na6i5ulXglUM0j23YUdrmnTKDHycEHDSkga5 VbfL8vlSKQKiod3Tk0uHralFZL3rNOtcIzJQ5k1muONm2JGlPFESx2FFydkA+CJlj79S Fa64zoA9noifhB3Yu0piUs3hNVuRYc0/wAreg5PJND9IMXUymMqUpcGSk6sJM5eX2Mt1 AliA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=nN/dWeTiwWM0SK1RsG2uknX7G82yapUCO1HdIHNwLII=; fh=uDGV37n6EnVQGZVMjZ68hgHBUSrnM8fmu+TQ3o8FtvI=; b=Ff/qerSWHoLPR34TcvK8FTFPx2E2LXExcAhG10A6r7KsYWmXn6NuOAo1OdM7xgW9AL xEqd7ljenvO7qwg/p70cWhfyZhz21vmWKroXg7RmxNwrjTCCjfeTofT7VgPq9rdP6NRu szQyjBnh5Ahy+xQMDmtvGAGFAIoylbbHQeTC5eJD1YFTZkfJ5qSZtGhj54dmYK0qA72a +B6VsvMuyw5o74vw0V143UKuUNuaa0beKUCUtz0dJ9r3Zrrd8cib2JAg/t8Ti218JiFh +wbpIO0ytGfmDqMl8G5MWdHHj+wJr/iDiO1R9Gy/kvxJMXJ9VuidgR0ZrL21jN/tCA2t TEaQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-45068-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-45068-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id ay22-20020a056402203600b0055f0067b15fsi2196975edb.217.2024.01.30.09.59.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 09:59:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-45068-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-45068-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-45068-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 20DB61F24A92 for ; Tue, 30 Jan 2024 17:59:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 31A95151CE2; Tue, 30 Jan 2024 17:58:40 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7034B15530D; Tue, 30 Jan 2024 17:58:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706637519; cv=none; b=dARlvcghBoY05kPVmLotNUjEmZusajIBHiyTgILTREU5fwg1FlTAMNftSdSNjzVQvLuxfR/DrQFSefITy+wxfUIyGMejEyUwVBEgYLWozYOoDTi7chpmBtSpWPVoXlC+NTlPpjCKXZ8/fMUG5jU6qlcQxgApo92qQGh8Xf4PhwI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706637519; c=relaxed/simple; bh=haZtEcAtM6RAOUswKZSwgnSzcYCz0/OtRj7b44h2iHM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=LYJdIDwm4N4b9CxW4+Fn5w7NnmiP01uJ0wLtpDXgHfEEBYQytmJQuLOyRK7f1ZrxWSN70EBv+yW2Dtz0lNEd1t46gElhZJXBLUfOHH1p7XohhhAcwThMcktJky7aa83PUqgQ11PBw+Hu/1+GnA/cIa3gh293zM1tqOu/yjYfhy0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D5AB7DA7; Tue, 30 Jan 2024 09:59:18 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.45.140]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 820CA3F762; Tue, 30 Jan 2024 09:58:32 -0800 (PST) Date: Tue, 30 Jan 2024 17:58:24 +0000 From: Mark Rutland To: Fenghua Yu Cc: Vinod Koul , Dave Jiang , dmaengine@vger.kernel.org, linux-kernel , Nikhil Rao , Tony Zhu , Mathieu Desnoyers Subject: Re: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space Message-ID: References: <20240130025806.2027284-1-fenghua.yu@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240130025806.2027284-1-fenghua.yu@intel.com> This patch might be ok (it looks reasonable as an optimization), but I think the description of wmb() and smp_wmb() is incorrect. I also think that you're missing an rmb()/smp_rmb()eor equivalent on the reader side. On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote: > wmb() is used to ensure status in the completion record is written > after the rest of the completion record, making it visible to the user. > However, on SMP systems, this may not guarantee visibility across > different CPUs. > > Considering this scenario that event log handler is running on CPU1 while > user app is polling completion record (cr) status on CPU2: > > CPU1 CPU2 > event log handler user app > > 1. cr = 0 (status = 0) > 2. copy X to user cr except "status" > 3. wmb() > 4. copy Y to user cr "status" > 5. poll status value Y > 6. read rest cr which is still 0. > cr handling fails > 7. cr value X visible now > > Although wmb() ensure value Y is written and visible after X is written > on CPU1, the order is not guaranteed on CPU2. So user app may see status > value Y while cr value X is still not visible yet on CPU2. This will > cause reading 0 from the rest of cr and cr handling fails. The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2 between reading the 'status' and 'rest' parts; otherwise CPU2 (or the compiler!) is permitted to hoist the read of 'rest' early, before reading from 'status', and hence you can end up with a sequence that is effectively: CPU1 CPU2 event log handler user app 1. cr = 0 (status = 0) 6a. read rest cr which is still 0. 2. copy X to user cr except "status" 3. wmb() 4. copy Y to user cr "status" 5. poll status value Y 6b. cr handling fails 7. cr value X visible now Since this is all to regular cacheable memory, it's *sufficient* to use smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering fix. Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb() is just a compiler barrier), and IIUC loads are not reordered w.r.t. other loads (and so smp_rmb() is also just a compiler barrier). > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 > and CPU2. This guarantees that user app can consume cr in right order. This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a barrier()). Thanks, Mark. > > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") > Suggested-by: Nikhil Rao > Tested-by: Tony Zhu > Signed-off-by: Fenghua Yu > --- > drivers/dma/idxd/cdev.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > index 77f8885cf407..9b7388a23cbe 100644 > --- a/drivers/dma/idxd/cdev.c > +++ b/drivers/dma/idxd/cdev.c > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, > * Ensure that the completion record's status field is written > * after the rest of the completion record has been written. > * This ensures that the user receives the correct completion > - * record information once polling for a non-zero status. > + * record information on any CPU once polling for a non-zero > + * status. > */ > - wmb(); > + smp_wmb(); > status = *(u8 *)cr; > if (put_user(status, (u8 __user *)addr)) > left += status_size; > -- > 2.37.1 > >