Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2973082imu; Mon, 19 Nov 2018 08:49:57 -0800 (PST) X-Google-Smtp-Source: AFSGD/V9YzFlaLcFdTetgavQA2TPXq0G4O/nMmJC/ZlzfAduP+mv15RDETba7doldkfWuBbaBslx X-Received: by 2002:a17:902:4523:: with SMTP id m32mr7795861pld.53.1542646197293; Mon, 19 Nov 2018 08:49:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542646197; cv=none; d=google.com; s=arc-20160816; b=0VuaVDa8YhvRYOGnYa5X7UtnwPwcxvqx4BBxFBJO9APN3ykDoXQkFLOdoFIWhPyvEM LHKyKEE5oC0q8d46Ms0A+4q2R4iWCBfvUVrAjj0A6ZoHIEkkANLQLqly8Cu7BvrD++Ny 8/e3M26sfQR+dtNukKDghOBdoEOpur2K2D5BADs7VIZwzQkj6o7XL5sNFWPm6N7IHhPn IOv7FQlZFyyy0xaOSVA5W6vad6WbTw0w4cyPlMOPAiKypWSGN7CG9B/Ioyg9PkkCAk2u eXnqAYx9h6IlAs4jbBqpMmzgjGQQgBYS/8JZhWMqY9lLdetSUDiElbSh9gMdY1bPDief kuVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=byWR0B8lFM7mdNmnlfS+SpFffx7qNvixJWRIpLP5DzU=; b=FTTBUc2QbGshRTjIfuBq6rJ6Hu2z8YFhOd1P0jf9cJilrmuj35alYE/Hh+sLNM0lKY wuSR8aDK9k+reOZp6+m9ibujrZYnXlfHrr7QS7aZIvhSMsBBg54XbF7bRbmH8jB980rf 8vxgxH2eQx+UKuGVEPfeO+Flrr9Pkw90kTZ5J+NF95fwfJMY4DKv1lpwqSHqQ370vLlE dgyqavjRLezDzSOd4UQYD7T6hfwoKY3bzPaYDC88F/1WMTtjjL7NHA+G5fwg9a9z6yz6 MNxOVej3C0nTRnMBLpTI1o16EE3GuU2gQOMCuk1tsNjVOwxrTIb7s60diGva02JcQkF3 rDPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c1-v6si42127370pld.11.2018.11.19.08.49.42; Mon, 19 Nov 2018 08:49:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389634AbeKTDNR (ORCPT + 99 others); Mon, 19 Nov 2018 22:13:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48672 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389110AbeKTDNQ (ORCPT ); Mon, 19 Nov 2018 22:13:16 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F18AFC0C18B0; Mon, 19 Nov 2018 16:49:02 +0000 (UTC) Received: from redhat.com (ovpn-124-1.rdu2.redhat.com [10.10.124.1]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 13FF365F56; Mon, 19 Nov 2018 16:48:55 +0000 (UTC) Date: Mon, 19 Nov 2018 11:48:54 -0500 From: Jerome Glisse To: Leon Romanovsky Cc: Kenneth Lee , Tim Sell , linux-doc@vger.kernel.org, Alexander Shishkin , Zaibo Xu , zhangfei.gao@foxmail.com, linuxarm@huawei.com, haojian.zhuang@linaro.org, Christoph Lameter , Hao Fang , Gavin Schenk , RDMA mailing list , Zhou Wang , Jason Gunthorpe , Doug Ledford , Uwe =?iso-8859-1?Q?Kleine-K=F6nig?= , David Kershner , Kenneth Lee , Johan Hovold , Cyrille Pitchen , Sagar Dharia , Jens Axboe , guodong.xu@linaro.org, linux-netdev , Randy Dunlap , linux-kernel@vger.kernel.org, Vinod Koul , linux-crypto@vger.kernel.org, Philippe Ombredanne , Sanyog Kale , "David S. Miller" , linux-accelerators@lists.ozlabs.org Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Message-ID: <20181119164853.GA4593@redhat.com> References: <20181112075807.9291-1-nek.in.cn@gmail.com> <20181112075807.9291-2-nek.in.cn@gmail.com> <20181113002354.GO3695@mtr-leonro.mtl.com> <95310df4-b32c-42f0-c750-3ad5eb89b3dd@gmail.com> <20181114160017.GI3759@mtr-leonro.mtl.com> <20181115085109.GD157308@Turing-Arch-b> <20181115145455.GN3759@mtr-leonro.mtl.com> <20181119091405.GE157308@Turing-Arch-b> <20181119091910.GF157308@Turing-Arch-b> <20181119104801.GF8268@mtr-leonro.mtl.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181119104801.GF8268@mtr-leonro.mtl.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Mon, 19 Nov 2018 16:49:04 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 19, 2018 at 12:48:01PM +0200, Leon Romanovsky wrote: > On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: [...] > > > > memory exposed to user is properly protected from security point of view. > > > > 3. "stop using the page for a while for the copying" - I'm not fully > > > > understand this claim, maybe this article will help you to better > > > > describe : https://lwn.net/Articles/753027/ > > > > > > This topic was being discussed in RFCv2. The key problem here is that: > > > > > > The device need to hold the memory for its own calculation, but the CPU/software > > > want to stop it for a while for synchronizing with disk or COW. > > > > > > If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the > > > device share page table with CPU, the device will raise a page fault when the > > > CPU downgrade the PTE to read-only. > > > > > > If the hardware cannot share page table with the CPU, we then need to have > > > some way to change the device page table. This is what happen in ODP. It > > > invalidates the page table in device upon mmu_notifier call back. But this cannot > > > solve the COW problem: if the user process A share a page P with device, and A > > > forks a new process B, and it continue to write to the page. By COW, the > > > process B will keep the page P, while A will get a new page P'. But you have > > > no way to let the device know it should use P' rather than P. > > I didn't hear about such issue and we supported fork for a long time. > Just to comment on this, any infiniband driver which use umem and do not have ODP (here ODP for me means listening to mmu notifier so all infiniband driver except mlx5) will be affected by same issue AFAICT. AFAICT there is no special thing happening after fork() inside any of those driver. So if parent create a umem mr before fork() and program hardware with it then after fork() the parent might start using new page for the umem range while the old memory is use by the child. The reverse is also true (parent using old memory and child new memory) bottom line you can not predict which memory the child or the parent will use for the range after fork(). So no matter what you consider the child or the parent, what the hw will use for the mr is unlikely to match what the CPU use for the same virtual address. In other word: Before fork: CPU parent: virtual addr ptr1 -> physical address = 0xCAFE HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE Case 1: CPU parent: virtual addr ptr1 -> physical address = 0xCAFE CPU child: virtual addr ptr1 -> physical address = 0xDEAD HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE Case 2: CPU parent: virtual addr ptr1 -> physical address = 0xBEEF CPU child: virtual addr ptr1 -> physical address = 0xCAFE HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE This apply for every single page and is not predictable. This only apply to private memory (mmap() with MAP_PRIVATE) I am not familiar enough with RDMA user space API contract to know if this is an issue or not. Note that this can not be fix, no one should have done umem without ODP like mlx5. For this to work properly you need sane hardware like mlx5. Cheers, J?r?me