Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3286714imu; Mon, 19 Nov 2018 13:34:28 -0800 (PST) X-Google-Smtp-Source: AJdET5euIXpCtCBYkUzJmpOoXWylvMsrlYRbINbhMk09upBOafjfAB+UThbmvIdfzfYO6KULD3jq X-Received: by 2002:a63:2586:: with SMTP id l128mr21920970pgl.104.1542663268854; Mon, 19 Nov 2018 13:34:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542663268; cv=none; d=google.com; s=arc-20160816; b=oM2A0+2k+RihvGLkVogHFndLULos3ntUZXimhSD9iObkyMJ4XS1Kj4D0WgJ+8wBoPh GhL7XaaAsA6NMjukhNHj2+mA8AaEsQ6GYYRVayQVheygjY9JXCOZhJvCxYSY3FEIdlzz cqCkB3/UW39z4h25qBBzQvI0kCeDGtw5HtoNGNItlcmJhMQ9avKMYVa0Gw/gFlgdfF3R JLbU5AlBILo+/O8zw/LDvLdHNe+Oy6GT+6PywMgNc3vAYxKHTA+z3gTcMUolrtEIA7m6 u5hjpKAOzxnHX6BjUd2eQH5a9KwNDvz3rMRXhIQg2uyw8KX62CxOIdkYffUgImhLH9N9 wkIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=PJktcGQuI5qeIhhWGsETWB3MXW9hnZsIyXYJktkcyIs=; b=LjkKLCHQjsgvRkYymC7ZBRzUD+ieLumL2pRhyYErnzBPgwJPhLgVb3nUzm5rzVKtm6 /OOGzUh5oR15BbqlZfg3Y1yN6dPO7DVJ3GEFk2ryYM6QoihrRAOKRw2pZ2MJyj9L4yVn 0ZIkQM22aaUrBMq9obZgqrX7MFwstSppVTJchh9Rjm0h+YPhF6PCeI1Iw1BXAfPtfWDA lvHLDlib1JSl8WbWXZXydpBQ933IL+rY6WB61VtORX4cWft9avy65te8VelQHf6vTX0F YYcU5esyA8UBOUCHrm17yWOvHYzCMzUsn0PfG1y7IzlHS0l2AdSwrTZE0pG6qULkXwRR H9kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v32si152114plb.369.2018.11.19.13.34.14; Mon, 19 Nov 2018 13:34:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731146AbeKTH7D (ORCPT + 99 others); Tue, 20 Nov 2018 02:59:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48213 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725898AbeKTH7C (ORCPT ); Tue, 20 Nov 2018 02:59:02 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4989F87645; Mon, 19 Nov 2018 21:33:27 +0000 (UTC) Received: from redhat.com (ovpn-124-1.rdu2.redhat.com [10.10.124.1]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D38365C57B; Mon, 19 Nov 2018 21:33:22 +0000 (UTC) Date: Mon, 19 Nov 2018 16:33:20 -0500 From: Jerome Glisse To: Jason Gunthorpe Cc: Tim Sell , linux-doc@vger.kernel.org, Alexander Shishkin , Zaibo Xu , zhangfei.gao@foxmail.com, linuxarm@huawei.com, haojian.zhuang@linaro.org, Christoph Lameter , Hao Fang , Gavin Schenk , Leon Romanovsky , RDMA mailing list , Vinod Koul , Doug Ledford , Uwe =?iso-8859-1?Q?Kleine-K=F6nig?= , David Kershner , Kenneth Lee , Johan Hovold , Cyrille Pitchen , Sagar Dharia , Jens Axboe , guodong.xu@linaro.org, linux-netdev , Randy Dunlap , linux-kernel@vger.kernel.org, Zhou Wang , linux-crypto@vger.kernel.org, Philippe Ombredanne , Sanyog Kale , Kenneth Lee , "David S. Miller" , linux-accelerators@lists.ozlabs.org Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Message-ID: <20181119213320.GG4593@redhat.com> References: <20181119164853.GA4593@redhat.com> <20181119182752.GA4890@ziepe.ca> <20181119184215.GB4593@redhat.com> <20181119185333.GC4890@ziepe.ca> <20181119191721.GC4593@redhat.com> <20181119192702.GD4890@ziepe.ca> <20181119194631.GE4593@redhat.com> <20181119201156.GG4890@ziepe.ca> <20181119202614.GF4593@redhat.com> <20181119212638.GI4890@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181119212638.GI4890@ziepe.ca> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 19 Nov 2018 21:33:28 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 19, 2018 at 02:26:38PM -0700, Jason Gunthorpe wrote: > On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote: > > On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote: > > > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote: > > > > > > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same > > > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and > > > > > be fine too? > > > > > > > > > > AFAIK the only difference is the length of the race window. You'd have > > > > > to fork and fault during the shorter time O_DIRECT has get_user_pages > > > > > open. > > > > > > > > Well in O_DIRECT case there is only one page table, the CPU > > > > page table and it gets updated during fork() so there is an > > > > ordering there and the race window is small. > > > > > > Not really, in O_DIRECT case there is another 'page table', we just > > > call it a DMA scatter/gather list and it is sent directly to the block > > > device's DMA HW. The sgl plays exactly the same role as the various HW > > > page list data structures that underly RDMA MRs. > > > > > > It is not a page table that matters here, it is if the DMA address of > > > the page is active for DMA on HW. > > > > > > Like you say, the only difference is that the race is hopefully small > > > with O_DIRECT (though that is not really small, NVMeof for instance > > > has windows as large as connection timeouts, if you try hard enough) > > > > > > So we probably can trigger this trouble with O_DIRECT and fork(), and > > > I would call it a bug :( > > > > I can not think of any scenario that would be a bug with O_DIRECT. > > Do you have one in mind ? When you fork() and do other syscall that > > affect the memory of your process in another thread you should > > expect non consistant results. Kernel is not here to provide a fully > > safe environement to user, user can shoot itself in the foot and > > that's fine as long as it only affect the process itself and no one > > else. We should not be in the business of making everything baby > > proof :) > > Sure, I setup AIO with O_DIRECT and launch a read. > > Then I fork and dirty the READ target memory using the CPU in the > child. > > As you described in this case the fork will retain the physical page > that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page. > > The DMA completes, and the child gets the DMA'd to page. The parent > gets an unchanged copy'd page. > > The parent gets the AIO completion, but can't see the data. > > I'd call that a bug with O_DIRECT. The only correct outcome is that > the parent will always see the O_DIRECT data. Fork should not cause > the *parent* to malfunction. I agree the child cannot make any > prediction what memory it will see. > > I assume the same flow is possible using threads and read().. > > It is really no different than the RDMA bug with fork. > Yes and that's expected behavior :) If you fork() and have anything still in flight at time of fork that can change your process address space (including data in it) then all bets are of. At least this is my reading of fork() syscall. Cheers, J?r?me