Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3280851imu; Mon, 19 Nov 2018 13:29:01 -0800 (PST) X-Google-Smtp-Source: AFSGD/XUZ42A7e0+p9vmTHWEbHkAFkIBPGBOQsRrYTdnxTODjkzXWmM9aZ8Lo/du3vrfndSpDz4N X-Received: by 2002:a17:902:8e8b:: with SMTP id bg11mr11344467plb.332.1542662941036; Mon, 19 Nov 2018 13:29:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542662941; cv=none; d=google.com; s=arc-20160816; b=MQwpMdRscvoy8hZL358q8/wNWcAEfZ/AZq1me6ml6aWO+oITQQq5U8PY+zFJsgF7Ct WEOi30zQvqJG77jOhHbKEqGC2DziUtjVgtVPQFtlAqF/v9ohHRSJLpkIuIB5igICUEYG Ujii7Rug0h0LIUMmTwu9EBXDBRfEsInfNlsp+kMdyBjCgzcU6Dnz8n27fUnA/5ndDXvP O07rPORst+IKINGdnNmWxUB2G3tmhgJ3SMuvUGzjmuYkwbZh7BQ7H99v09gxTbJ5TfBk XpahCu432MI43Z59iJPgzN4GHcwmGjx6FbeVXyNVQRLdAeTjWQFC3lMzdsl7p2I04B16 r9bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=FDJZfcAoRFuPs73eEKk1KIT7jJTr1E9628DZngqfpiY=; b=Exsc17M4kp9MRsIvmPoNEyY6vd5202nKPSRGTj05XAeDCb0a6y1u3TNvyCwufQbh8k 67nrwwwu7rbCmYHbcZH5v13l/wlA5VkdZvFDseKAbU2ZTh9eQ0L4i+T2raqwJEahsqdC nd6Z9y0MGhVGgvRPjxRJSsDcESrzuqt6Vugj4iMeOVqagd5liMc9NPau+7/v/5yBaD1I kUpqnz8ws17B6HEJwXx4q+5pZHHDz0CCoVPpVxnod8cGmQoTfzpupg7IvF1qbztQ5eAg ppJ/tHuaPO1bUrV5H2s1lef1mAIIZH095f1A+QI1ojgJ6VorNmfedMn7H8bx7x7Lkx+6 EhPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="E6UIRx9/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k186si24198794pgc.576.2018.11.19.13.28.46; Mon, 19 Nov 2018 13:29:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="E6UIRx9/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731198AbeKTHwO (ORCPT + 99 others); Tue, 20 Nov 2018 02:52:14 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:34300 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730976AbeKTHwN (ORCPT ); Tue, 20 Nov 2018 02:52:13 -0500 Received: by mail-pf1-f193.google.com with SMTP id h3so8913219pfg.1 for ; Mon, 19 Nov 2018 13:26:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=FDJZfcAoRFuPs73eEKk1KIT7jJTr1E9628DZngqfpiY=; b=E6UIRx9/Fdo7EDDGjXZ3hVkbkKiyuntD1JTFFmRUgkrYg3lVTY/3oSvCwAIRuce5b3 +lsZPvPvyC9Niw1XR4txYwg8BT0CSXybwJ1lbe5yjS0n03dy/DFIwAtRBn0uFQ4jGkcR ur2z46LIvkTENr+SHmcL9aWDUdzwEdLR7GTfcgaGzWqEHFzvfhT2lpZVY+FyvDMXG+TM 0ZLLwj8ySDtgvQru4rSvO5slaungMJdxdsJ+i43ymSSb3HPmrp7elGNcvGB0FIEiJ7PC y7gUlj3xn7k238FtsmA7drdoMISpIMInHbI6tUuAShgNspsvT+GzybLEffKHJnDXftuP tj2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=FDJZfcAoRFuPs73eEKk1KIT7jJTr1E9628DZngqfpiY=; b=ZKTJj0S95aZ6YGRTetQdlL4cyFhMm7S5yua9kmALiuSAl5zuua4uq4yVDV2bDcd7j9 qpvaUja4GDb8uqNvViX1eqbRka8XHaAMxEWoWZFmLf90QSAGewrH2Qy8hz9sYp6KXA1M gbqwSF6Q6dVoUGjhXeEuJoXmuKkR8JxbwkkYLq4dezFB7dNTbXboSX298K4+gH7j48eg rYxHDq2iWyTmyEM8ux+AorHkRM2PmCCCkGCJ6GuTctloQKhVhUswoolrsifVTlPn1t6q dqIu1CKA5t6WgbrkzuB/U5ryB3eiTIV0n+Mo8/9nOhjF1aK+JzUKetGLxWD+pflFynKX fGHA== X-Gm-Message-State: AGRZ1gIvJrkGIRv90JuHzYS1wX0j3zXYFXcfZ7D8yDw3SaWKMFdATfju gAHBMM02wMN3WvFqrRe8fOlhdg== X-Received: by 2002:a63:da45:: with SMTP id l5mr21813504pgj.111.1542662800726; Mon, 19 Nov 2018 13:26:40 -0800 (PST) Received: from ziepe.ca (S010614cc2056d97f.ed.shawcable.net. [174.3.196.123]) by smtp.gmail.com with ESMTPSA id v62sm3653979pfd.163.2018.11.19.13.26.39 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 19 Nov 2018 13:26:39 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1gOr42-0006YR-Si; Mon, 19 Nov 2018 14:26:38 -0700 Date: Mon, 19 Nov 2018 14:26:38 -0700 From: Jason Gunthorpe To: Jerome Glisse Cc: Tim Sell , linux-doc@vger.kernel.org, Alexander Shishkin , Zaibo Xu , zhangfei.gao@foxmail.com, linuxarm@huawei.com, haojian.zhuang@linaro.org, Christoph Lameter , Hao Fang , Gavin Schenk , Leon Romanovsky , RDMA mailing list , Vinod Koul , Doug Ledford , Uwe =?utf-8?Q?Kleine-K=C3=B6nig?= , David Kershner , Kenneth Lee , Johan Hovold , Cyrille Pitchen , Sagar Dharia , Jens Axboe , guodong.xu@linaro.org, linux-netdev , Randy Dunlap , linux-kernel@vger.kernel.org, Zhou Wang , linux-crypto@vger.kernel.org, Philippe Ombredanne , Sanyog Kale , Kenneth Lee , "David S. Miller" , linux-accelerators@lists.ozlabs.org Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Message-ID: <20181119212638.GI4890@ziepe.ca> References: <20181119104801.GF8268@mtr-leonro.mtl.com> <20181119164853.GA4593@redhat.com> <20181119182752.GA4890@ziepe.ca> <20181119184215.GB4593@redhat.com> <20181119185333.GC4890@ziepe.ca> <20181119191721.GC4593@redhat.com> <20181119192702.GD4890@ziepe.ca> <20181119194631.GE4593@redhat.com> <20181119201156.GG4890@ziepe.ca> <20181119202614.GF4593@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181119202614.GF4593@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote: > On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote: > > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote: > > > > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same > > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and > > > > be fine too? > > > > > > > > AFAIK the only difference is the length of the race window. You'd have > > > > to fork and fault during the shorter time O_DIRECT has get_user_pages > > > > open. > > > > > > Well in O_DIRECT case there is only one page table, the CPU > > > page table and it gets updated during fork() so there is an > > > ordering there and the race window is small. > > > > Not really, in O_DIRECT case there is another 'page table', we just > > call it a DMA scatter/gather list and it is sent directly to the block > > device's DMA HW. The sgl plays exactly the same role as the various HW > > page list data structures that underly RDMA MRs. > > > > It is not a page table that matters here, it is if the DMA address of > > the page is active for DMA on HW. > > > > Like you say, the only difference is that the race is hopefully small > > with O_DIRECT (though that is not really small, NVMeof for instance > > has windows as large as connection timeouts, if you try hard enough) > > > > So we probably can trigger this trouble with O_DIRECT and fork(), and > > I would call it a bug :( > > I can not think of any scenario that would be a bug with O_DIRECT. > Do you have one in mind ? When you fork() and do other syscall that > affect the memory of your process in another thread you should > expect non consistant results. Kernel is not here to provide a fully > safe environement to user, user can shoot itself in the foot and > that's fine as long as it only affect the process itself and no one > else. We should not be in the business of making everything baby > proof :) Sure, I setup AIO with O_DIRECT and launch a read. Then I fork and dirty the READ target memory using the CPU in the child. As you described in this case the fork will retain the physical page that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page. The DMA completes, and the child gets the DMA'd to page. The parent gets an unchanged copy'd page. The parent gets the AIO completion, but can't see the data. I'd call that a bug with O_DIRECT. The only correct outcome is that the parent will always see the O_DIRECT data. Fork should not cause the *parent* to malfunction. I agree the child cannot make any prediction what memory it will see. I assume the same flow is possible using threads and read().. It is really no different than the RDMA bug with fork. Jason