Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp2972904imb; Mon, 4 Mar 2019 20:15:56 -0800 (PST) X-Google-Smtp-Source: AHgI3IYml8YZ8fYS9sj5B9IIPGq4C60rCkq/YtXnLDp9Nzo9vxYVYRqwTveWe2RWrzpKN8sDb/Mw X-Received: by 2002:a62:168a:: with SMTP id 132mr23634406pfw.155.1551759356798; Mon, 04 Mar 2019 20:15:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551759356; cv=none; d=google.com; s=arc-20160816; b=RdSSoUVFwcjkKrmlJ4AyLylq5TpPq1sbq6Iar1fGWDikput/+lUc6Px6IjHDHAd+1T 9LqGMqBh1XEAuEHaLU/H4fEDT+yqde8gL1f1JRRfZTeB26KczgvQLy9CHxWDWctRWGKp 5Gw5gfZAEAfCrFS6djN+4SooAdclga0vROh0Tr+zlSoAyr9EQ2myWTmr/yYZeIiGUs0M 66pzd6V0AGsz7vYjTpr1QV4MQEkvFtoqx0y3hDJBzvMxCsjwJDsFY5YSD5Wvv8dgZpSF Ed0xWLcKXt9rJLoW1iFTj/o5NjJpUX+b7caLNTUJ5TK/MgCCMHKyG9aRyGJ3S5su3xDE 9PtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=NrzUYlqt9JgzgwnvDtaQelEHqWQXPcanR2iTJv5jcKU=; b=0Tcd46A6cahaknMFuvUPQasFIQx9U+POKzoxiizxbANPkpt3a3CgYJI674smuzhOiP uWYEKzHsVs/eDCEhEwXGc01cy93xHCzd3ZBKk8NR2pOY49WljzT6keBvS/XHTc9AW6S4 x5PASm/aPAXWMYttbOJaYI+atKAsLBXEBzWvoIKuK3jrRbcnFdHuHPVOjdVxubXluzQ8 ZsZIqDjWOiI+db2Gc3E3bNrZIaJG+M5o5H79MSoJ/c92lPYXWGchH1QXjL7qIDK4Lx0m RudOVZgllOmy+1+CF2BefdRlxe4GZG5WvNW7Lt5JrtEwkCFVwAvoUoBX3TG5kPt635tO bSHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w12si7117770pfn.95.2019.03.04.20.15.41; Mon, 04 Mar 2019 20:15:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727007AbfCEEPU (ORCPT + 99 others); Mon, 4 Mar 2019 23:15:20 -0500 Received: from mga05.intel.com ([192.55.52.43]:33416 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726590AbfCEEPU (ORCPT ); Mon, 4 Mar 2019 23:15:20 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Mar 2019 20:15:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,442,1544515200"; d="scan'208";a="131290433" Received: from iweiny-desk2.sc.intel.com ([10.3.52.157]) by orsmga003.jf.intel.com with ESMTP; 04 Mar 2019 20:15:18 -0800 Date: Mon, 4 Mar 2019 12:13:39 -0800 From: Ira Weiny To: John Hubbard Cc: Artemy Kovalyov , "john.hubbard@gmail.com" , "linux-mm@kvack.org" , Andrew Morton , LKML , Jason Gunthorpe , Doug Ledford , "linux-rdma@vger.kernel.org" Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths Message-ID: <20190304201338.GA28731@iweiny-DESK2.sc.intel.com> References: <20190302032726.11769-2-jhubbard@nvidia.com> <20190302202435.31889-1-jhubbard@nvidia.com> <20190302194402.GA24732@iweiny-DESK2.sc.intel.com> <2404c962-8f6d-1f6d-0055-eb82864ca7fc@mellanox.com> <20190303165550.GB27123@iweiny-DESK2.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.1 (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote: > On 3/3/19 8:55 AM, Ira Weiny wrote: > > On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote: > >> > >> > >> On 02/03/2019 21:44, Ira Weiny wrote: > >>> > >>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, john.hubbard@gmail.com wrote: > >>>> From: John Hubbard > >>>> > >>>> ... > >>>> 3. Dead code removal: the check for (user_virt & ~page_mask) > >>>> is checking for a condition that can never happen, > >>>> because earlier: > >>>> > >>>> user_virt = user_virt & page_mask; > >>>> > >>>> ...so, remove that entire phrase. > >>>> > >>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt); > >>>> mutex_lock(&umem_odp->umem_mutex); > >>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) { > >>>> - if (user_virt & ~page_mask) { > >>>> - p += PAGE_SIZE; > >>>> - if (page_to_phys(local_page_list[j]) != p) { > >>>> - ret = -EFAULT; > >>>> - break; > >>>> - } > >>>> - put_page(local_page_list[j]); > >>>> - continue; > >>>> - } > >>>> - > >>> > >>> I think this is trying to account for compound pages. (ie page_mask could > >>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.) > >>> But putting the page in that case seems to be the wrong thing to do? > >>> > >>> Yes this was added by Artemy[1] now cc'ed. > >> > >> Right, this is for huge pages, please keep it. > >> put_page() needed to decrement refcount of the head page. > > > > You mean decrement the refcount of the _non_-head pages? > > > > Ira > > > > Actually, I'm sure Artemy means head page, because put_page() always > operates on the head page. > > And this reminds me that I have a problem to solve nearby: get_user_pages > on huge pages increments the page->_refcount *for each tail page* as well. > That's a minor problem for my put_user_page() > patchset, because my approach so far assumed that I could just change us > over to: > > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) > > ...and just stop doing the odd (to me) technique of incrementing once for > each tail page. I cannot see any reason why that's actually required, as > opposed to just "raise the page->_refcount enough to avoid losing the head > page too soon". What about splitting a huge page? From Documention/vm/transhuge.rst split_huge_page internally has to distribute the refcounts in the head page to the tail pages before clearing all PG_head/tail bits from the page structures. It can be done easily for refcounts taken by page table entries. But we don't have enough information on how to distribute any additional pins (i.e. from get_user_pages). split_huge_page() fails any requests to split pinned huge page: it expects page count to be equal to sum of mapcount of all sub-pages plus one (split_huge_page caller must have reference for head page). FWIW, I'm not sure why it needs to "store" the reference in the head page for this. I don't see any check to make sure the ref has been "stored" but I'm not really familiar with the compound page code yet. Ira > > However, it may be tricky to do this in one pass. Probably at first, I'll have > to do this horrible thing approach: > > get_user_page(): increments page->_refcount by a large amount (1024) > > put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED > by the number of tail pages. argghhh that's ugly. > > thanks, > -- > John Hubbard > NVIDIA >