Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp290898pxk; Wed, 23 Sep 2020 03:23:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyn8/7ad7BTOOoKKxkN0jTzaByKlgTzWCOlI1lFsSdOsjTh+OhXe+KLgS5+SACiATOPhBdY X-Received: by 2002:a17:906:1c5b:: with SMTP id l27mr9912807ejg.283.1600856604785; Wed, 23 Sep 2020 03:23:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600856604; cv=none; d=google.com; s=arc-20160816; b=DRKdGoyZaLJfHQK+GqYphUHQqXtn0j21C+o8kX0OMpU/pIXakDglCs9wobEBPNqkxD 6Uok4STRFVQ8EdaFYW5PJW/690FV7XD5hyGXyT3xqKRxlPoC4RlpPzGnnqd/AyQGRRUE UJ0B1ozmQSSWLLR7InLGkm5r9ic5iSrb7ppaeHxtKqQZxM26gzOYsP2zs9sHGoVPOxKs zX4OZSAQcuRuqaI/hmEE71w7QuZxcFgBgiKvlNHBgSbVRx6v8F4SsxjL3Kai8vAe5Bfv NqbwYUhEFSeqpmj8rGQC7WozalfcZ6/TXggoQFcB8/y9BgEFfFrX/khJr0C4L99t5Z7Y evhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=x9ZdjW/9/crEL8JWUuISIM5dLBLDsHYOS7HvpUYdN1A=; b=O6lVaMXHG2ytu8/z//GYKsju5vc+5uiQbHr0QqRzl2kZATI9j+NCUGdP/68NGlQ7bA 7+PmZf3Zmb2r0b8WmmWHefMhQfASrBpuUWj6DINO4eif9YPOaFau0EkkkLAit1/dniAr qQjgYjGW9SLPC5+b6vvmvotrcQJd/vRqjyMsatO6Urh6ch8WGqG6zXRiZn6uy4Cc8Cu1 XmZe3k8ptXbtANJzIMctF5j22t/p+pssbUbphMLs614tuNFzmFKCGeKKagF4FuiOg0Da l75R6+H3CNpSk6GfNRxZ8KO0BYGX1+Jo/rTNjmx7ipTL1rnBcn5RYWVZXkRSwNXodpLq ALsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id wr19si12789591ejb.60.2020.09.23.03.22.59; Wed, 23 Sep 2020 03:23:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726557AbgIWKVY (ORCPT + 99 others); Wed, 23 Sep 2020 06:21:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:41814 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgIWKVX (ORCPT ); Wed, 23 Sep 2020 06:21:23 -0400 Received: from localhost (unknown [213.57.247.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 759EE20C56; Wed, 23 Sep 2020 10:21:22 +0000 (UTC) Date: Wed, 23 Sep 2020 13:21:19 +0300 From: Leon Romanovsky To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jason Gunthorpe , Andrew Morton , Jan Kara , Michal Hocko , Kirill Tkhai , Kirill Shutemov , Hugh Dickins , Christoph Hellwig , Andrea Arcangeli , John Hubbard , Oleg Nesterov , Linus Torvalds , Jann Horn Subject: Re: [PATCH 0/5] mm: Break COW for pinned pages during fork() Message-ID: <20200923102119.GK1223944@unreal> References: <20200921211744.24758-1-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200921211744.24758-1-peterx@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 21, 2020 at 05:17:39PM -0400, Peter Xu wrote: > Finally I start to post formal patches because it's growing. And also since > we've discussed quite some issues already, so I feel like it's clearer on what > we need to do, and how. > > This series is majorly inspired by the previous discussion on the list [1], > starting from the report from Jason on the rdma test failure. Linus proposed > the solution, which seems to be a very nice approach to avoid the breakage of > userspace apps that didn't use MADV_DONTFORK properly before. More information > can be found in that thread too. > > I believe the initial plan was to consider merging something like this for > rc7/rc8. However now I'm not sure due to the fact that the code change in > copy_pte_range() is probably more than expected, so it can be with some risk. > I'll leave this question to the reviewers... > > I tested it myself with fork() after vfio pinning a bunch of device pages, and > I verified that the new copy pte logic worked as expected at least in the most > general path. However I didn't test thp case yet because afaict vfio does not > support thp backed dma pages. Luckily, the pmd/pud thp patch is much more > straightforward than the pte one, so hopefully it can be directly verified by > some code review plus some more heavy-weight rdma tests. > > Patch 1: Introduce mm.has_pinned (as single patch as suggested by Jason) > Patch 2-3: Some slight rework on copy_page_range() path as preparation > Patch 4: Early cow solution for pte copy for pinned pages > Patch 5: Same as above, but for thp (pmd/pud). > > Hugetlbfs fix is still missing, but as planned, that's not urgent so we can > work upon. Comments greatly welcomed. Hi Peter, I'm ware that this series is under ongoing review and probably not final, but we tested anyway and it solves our RDMA failures. Thanks > > Thanks. > > Peter Xu (5): > mm: Introduce mm_struct.has_pinned > mm/fork: Pass new vma pointer into copy_page_range() > mm: Rework return value for copy_one_pte() > mm: Do early cow for pinned pages during fork() for ptes > mm/thp: Split huge pmds/puds if they're pinned when fork() > > include/linux/mm.h | 2 +- > include/linux/mm_types.h | 10 ++ > kernel/fork.c | 3 +- > mm/gup.c | 6 ++ > mm/huge_memory.c | 26 +++++ > mm/memory.c | 226 +++++++++++++++++++++++++++++++++++---- > 6 files changed, 248 insertions(+), 25 deletions(-) > > -- > 2.26.2 > >