Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3757766pxk; Tue, 22 Sep 2020 01:31:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJymV8L1q3TIsf3wLqpsU/SOK7zb62uUqWgGjlmLhBB4GW66bSEqDOC43+RgEcJbprPP6+zr X-Received: by 2002:a05:6402:1b1d:: with SMTP id by29mr2789332edb.96.1600763516678; Tue, 22 Sep 2020 01:31:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600763516; cv=none; d=google.com; s=arc-20160816; b=kvLKXsrOg3bmV7oRoIr1vdvq55gFlBcLZZ6L8RTXt3pbnbr9fDIugAvq23jEcWL+5k cRoTuNXiOF8yWHYC6D2vCemh3vjVm8xNXrdJpIFTY4VyPgS+8lS5JBDx0ZhNaRmF2E8M 3s6koP5hmdK/a0LVqmnTe8RChN2B+o62xprUPPU3Q+uNcOY9O7LBUje47EyP3qy5Qr/H IA7Kk2cycYKymqcN2BCXtZ/i8k0RgDQtuiMdUdqAbFqMJGjj9FBdPCz+i2KW41fj1YZK AYcpZnx12c/JMZnePu73mN7M/dBj7oeWlA3KncfbaFqGtOqNxfJPDUB6Njop4GzusMKC a7eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=PGSDeSt3BFe4Hd1n3brR/Jj7+kPght+eaqYsY8xTtNs=; b=yuHcblVt89hT6dehLWSFhM2mKkJNbKxHkFQslmp8ddngwjFEZFgUyQCvlpjFSl+tBU lo1iA33GHYs5IsVj4ZWYCFLSjQVKwoEnmeKY4mvTinWdLLtBL3GxUdkR6+BD7J6pkbK9 ezjp/xEOFJY4OifAz2Pg8pjm7B6OTE5/D62vv3VTjTjNN8sehgZt9Wzs64v7OVyOPaAC RAP/sJbbg8yWFN18TYcHr914uGrA2x7336GM1wt+K/mT5dmLztBAEoyIsSEbyMlHzhZ1 QP6kMz9ItC3CHuzZgdBgjBvPo4z0QEFLwofkXhjVCL9MRvtjz6D7AKMppDiV/Jvg5Gt4 W00w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=GPzc5iW3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g13si9773591edm.239.2020.09.22.01.31.33; Tue, 22 Sep 2020 01:31:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=GPzc5iW3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729474AbgIVGlR (ORCPT + 99 others); Tue, 22 Sep 2020 02:41:17 -0400 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:13729 "EHLO hqnvemgate26.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728307AbgIVGlR (ORCPT ); Tue, 22 Sep 2020 02:41:17 -0400 Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Mon, 21 Sep 2020 23:41:04 -0700 Received: from [10.2.52.174] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 22 Sep 2020 06:41:16 +0000 Subject: Re: [PATCH 5/5] mm/thp: Split huge pmds/puds if they're pinned when fork() To: Peter Xu , , CC: Linus Torvalds , Michal Hocko , Kirill Shutemov , Jann Horn , Oleg Nesterov , Kirill Tkhai , Hugh Dickins , Leon Romanovsky , Jan Kara , Christoph Hellwig , Andrew Morton , Jason Gunthorpe , Andrea Arcangeli References: <20200921211744.24758-1-peterx@redhat.com> <20200921212031.25233-1-peterx@redhat.com> From: John Hubbard Message-ID: <5e594e71-537f-3e9f-85b6-034b7f5fedbe@nvidia.com> Date: Mon, 21 Sep 2020 23:41:16 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20200921212031.25233-1-peterx@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1600756864; bh=PGSDeSt3BFe4Hd1n3brR/Jj7+kPght+eaqYsY8xTtNs=; h=Subject:To:CC:References:From:Message-ID:Date:User-Agent: MIME-Version:In-Reply-To:Content-Type:Content-Language: Content-Transfer-Encoding:X-Originating-IP:X-ClientProxiedBy; b=GPzc5iW3MVT6FTZ2HzvongIWa7QtLZ/1lD308KA39n/lcVCtoBOVu45lbUwFbF0Bi D7u/mxFBuYinhv5sztuSHdymukaPyyiBQrP7THzLyZyqSq7G3PhtkvyejmsDGzf3Vk oki+DjD+97hHcJAqjzDIYW4UUFOZLmWmPw/6jzEWrVVe7YfPNyyyGODxDA9yiDSjax r/zJR4IEisqyTuWRvD8CC0kGVUR9PuAnOT40qTi/1Mt5zleKl/tDVE7PA9ckOfsiYB t0nr1YEWStoXuP6MCK9tbwTORiYy47IZY7JX8VHoIgM/swmWGYKzq2ZJw5qKo1n1G2 g97H6LWqW7Oew== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/21/20 2:20 PM, Peter Xu wrote: ... > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 7ff29cc3d55c..c40aac0ad87e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > src_page = pmd_page(pmd); > VM_BUG_ON_PAGE(!PageHead(src_page), src_page); > + > + /* > + * If this page is a potentially pinned page, split and retry the fault > + * with smaller page size. Normally this should not happen because the > + * userspace should use MADV_DONTFORK upon pinned regions. This is a > + * best effort that the pinned pages won't be replaced by another > + * random page during the coming copy-on-write. > + */ > + if (unlikely(READ_ONCE(src_mm->has_pinned) && > + page_maybe_dma_pinned(src_page))) { This condition would make a good static inline function. It's used in 3 places, and the condition is quite special and worth documenting, and having a separate function helps with that, because the function name adds to the story. I'd suggest approximately: page_likely_dma_pinned() for the name. > + pte_free(dst_mm, pgtable); > + spin_unlock(src_ptl); > + spin_unlock(dst_ptl); > + __split_huge_pmd(vma, src_pmd, addr, false, NULL); > + return -EAGAIN; > + } Why wait until we are so deep into this routine to detect this and unwind? It seems like if you could do a check near the beginning of this routine, and handle it there, with less unwinding? In fact, after taking only the src_ptl, the check could be made, right? thanks, -- John Hubbard NVIDIA