Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1791895imu; Sat, 10 Nov 2018 01:33:54 -0800 (PST) X-Google-Smtp-Source: AJdET5eHo6Yj8m7h95ecVhgtnLXd+FJpQWpyWat7y0v2SJ6FyFUW26nlqJd8R1aQcfsBl/eFjKmG X-Received: by 2002:a65:4946:: with SMTP id q6mr3415442pgs.201.1541842434419; Sat, 10 Nov 2018 01:33:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541842434; cv=none; d=google.com; s=arc-20160816; b=mKf/B/lpG5fvgmXG4T6UGx1ryMpqdxyncX9bTvI+rbcFL2zAy85bAmILBMgHfhdMy+ A6PkMwlndoOYbY7qMnkULY0LvnBed1/mzBr9yDY+Qqz7ZHFNUOHP6Oh2kNAncMOEJ8Cm /15d+TkC7yiBPL1pzWekuenNqvLlWjcjLQC9nFG+VILgbiW9npW0NvXpJ9/qVfgG4KND X0Kibobre1uUN7Dr+GkcFcf1vQsTKn3HCfHQyPTEEgUL9DGYmn4tuvvTeUmMskXa/qPk fs6GyYLyZdy13JFP/wl8/nkT1f7ktWWUVLNTDoO0ryfkq4v7nktIhbCicSXP7WCnex6W RcfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from:dkim-signature; bh=pfx1TFE9SqKPTZ6J5gT7UQ8NGm3Ks/6WXg0s+rirskU=; b=gVPy4sqh+1Dlg4v7OZg7JenY3+a+U82TfreBFULdVYQ1dXTYW3FAClkBWMCJ3F7JN3 BvXacZjzQAFQ0UzlZLS6J02lNH3x3qWIgOXTR+lfoxg2zB+7F+OH+9fQtaA3XeU++2Ji ztwzPLlOtJQ4BwrnISUV0B3dDZCjq8XMDia4hKDVX1zDymBmb/y+k0nEuP/U34H2rJCI 7yyTi6jl6moxv8ZkIXaRSA+0MxQIpsbBOTltfL6rLnxu6jfvYKKp7QNmzyU7H9eFyw/H UBK/bv2awEo7Mfpq0MMfXEer71ENYqN79sD8KXP9raB5DZ0xEuydfVW4cJyFBLNw2uyn 99jg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=W9RNM+us; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j17-v6si10850244pfn.176.2018.11.10.01.33.38; Sat, 10 Nov 2018 01:33:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=W9RNM+us; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728974AbeKJTRf (ORCPT + 99 others); Sat, 10 Nov 2018 14:17:35 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:41120 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728836AbeKJTRf (ORCPT ); Sat, 10 Nov 2018 14:17:35 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id wAA4mdr0025635; Sat, 10 Nov 2018 04:48:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : subject : to : cc : references : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=pfx1TFE9SqKPTZ6J5gT7UQ8NGm3Ks/6WXg0s+rirskU=; b=W9RNM+usHGxxkgQmMU5u3dTiTRKD7bcWQmIQoTn9+P2iyC8LbV/jo9924f6xmfTvOHNg VuJVcH0qfq7mRz4TaZExq2CqebLBbxdih4qDFE70lK8HE5An0eL0HVPrDNfPmd6IALiE N4+5F8+kNJOiMzg7cBqYBZQp4uvqGsqYoHL8UD0/MbBZAlMOltIfX6VBj57/CeeMRc7B 0BxzmEo70ZV99yt0R/90hA75ep8eUBWWPUIWm/+YF/ouHiUk68/uO6UN9f5fInFzZnFi N1zi8sDJAFFVKkyDoHRasfSp0IpkrRfN/Od84MC9L7x9OBPGE9TdyzsbmI8w1cOUYYhP Dw== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2nnprtr4db-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 10 Nov 2018 04:48:39 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id wAA4mbDn010474 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 10 Nov 2018 04:48:38 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id wAA4malg007089; Sat, 10 Nov 2018 04:48:36 GMT Received: from dhcp-10-159-156-150.vpn.oracle.com (/10.159.156.150) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 09 Nov 2018 20:48:36 -0800 From: Prakash Sangappa Subject: Re: [PATCH V2 0/6] VA to numa node information To: Michal Hocko , Steven Sistare Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, dave.hansen@intel.com, nao.horiguchi@gmail.com, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, khandual@linux.vnet.ibm.com References: <1536783844-4145-1-git-send-email-prakash.sangappa@oracle.com> <20180913084011.GC20287@dhcp22.suse.cz> <375951d0-f103-dec3-34d8-bbeb2f45f666@oracle.com> <20180914055637.GH20287@dhcp22.suse.cz> <91988f05-2723-3120-5607-40fabe4a170d@oracle.com> <20180924171443.GI18685@dhcp22.suse.cz> Message-ID: <41af45a9-c428-ccd8-ca10-c355d22c56a7@oracle.com> Date: Fri, 9 Nov 2018 20:48:29 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180924171443.GI18685@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9072 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=11 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=976 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1811100041 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/24/18 10:14 AM, Michal Hocko wrote: > On Fri 14-09-18 12:01:18, Steven Sistare wrote: >> On 9/14/2018 1:56 AM, Michal Hocko wrote: > [...] >>> Why does this matter for something that is for analysis purposes. >>> Reading the file for the whole address space is far from a free >>> operation. Is the page walk optimization really essential for usability? >>> Moreover what prevents move_pages implementation to be clever for the >>> page walk itself? In other words why would we want to add a new API >>> rather than make the existing one faster for everybody. >> One could optimize move pages. If the caller passes a consecutive range >> of small pages, and the page walk sees that a VA is mapped by a huge page, >> then it can return the same numa node for each of the following VA's that fall >> into the huge page range. It would be faster than 55 nsec per small page, but >> hard to say how much faster, and the cost is still driven by the number of >> small pages. > This is exactly what I was arguing for. There is some room for > improvements for the existing interface. I yet have to hear the explicit > usecase which would required even better performance that cannot be > achieved by the existing API. > Above mentioned optimization to move_pages() API helps when scanning mapped huge pages, but does not help if there are large sparse mappings with few pages mapped. Otherwise, consider adding page walk support in the move_pages() implementation, enhance the API(new flag?) to return address range to numa node information. The page walk optimization would certainly make a difference for usability. We can have applications(Like Oracle DB) having processes with large sparse mappings(in TBs)  with only some areas of these mapped address range being accessed, basically  large portions not having page tables backing it. This can become more prevalent on newer systems with multiple TBs of memory. Here is some data from pmap using move_pages() API  with optimization. Following table compares time pmap takes to print address mapping of a large process, with numa node information using move_pages() api vs pmap using /proc numa_vamaps file. Running pmap command on a process with 1.3 TB of address space, with sparse mappings.           ~1.3 TB sparse    250G dense segment with hugepages. move_pages              8.33s              3.14 optimized move_pages    6.29s              0.92 /proc numa_vamaps       0.08s              0.04 Second column is pmap time on a 250G address range of this process, which maps hugepages(THP & hugetlb).