Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1418832ybx; Thu, 7 Nov 2019 11:36:16 -0800 (PST) X-Google-Smtp-Source: APXvYqxdrMCw07SokV5+XDDjzPLugcOi2Snq5VFqAGMSqmBLCWhYrvhDrrWbXgRyjAnptvtkia7g X-Received: by 2002:a50:d80c:: with SMTP id o12mr5607121edj.251.1573155376138; Thu, 07 Nov 2019 11:36:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573155376; cv=none; d=google.com; s=arc-20160816; b=rv6veJIKjwaZ9m85tnhrlo5Tp4PYO9eJNxolBFQnVX8QbUY0g5P6dqR0Zw2I28hxJP ZmhhS7EjPkcLIB67lRDfNtZU8iLd3N5Uo7OLv7IuWU5NhA7pcVGDw15xp0+XV5PXgr0W 0aX1jPQ5Nvldt/MOqFCTjmRZbkXgRV08CcTgrQItzIXDAvEsl+OI1PyNJ+W+cY/axN9F +gaRI2EEq4WYvEim/vUTmosuepVJSb4tHDfHR34vsufZ6aJ0ZM/UZs7iALM7u8GZ4WmS Y2SBuY8cDXn54U2lmr+YzDw+BKSg75HDHXydXWiVg4UqV7V2O2358nRhdxvkmkVSqeig M2sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=tZ3pQcc3LeqFaEpO6hpLV/tKBa5ycbr9oiR/e6A4So4=; b=AGZ+qgpOXiOlu6OIQAxijyEefc5BDiWsDUwjDGxP983QxwfEoGbOiCjxnwkfQXkUVF KVOV/T/lIaCCfzY5DtUnSQ7WC79U0XjR/73DJohT4clFaB0+ROLeBjs4V4uA9QhCL+ZH v/bonwNe4dtk8sOSK8s225Mzi0n+Z0T66JXQ8wN9ZNlNOPzk08kfv1i4nw7L465ikYiO gogBNu8txUCPOSf4OhD8Vi4yVA2s3iJDXxlHX9AN3q6pu63tg1+IfyJ1M/I0eNgmE9hQ KJrOTl0ZNOzN8FWRXdWk5n5k4p4GLtb71RLSPaVycZ2cwJ3YUqhEe6ndVM3d29eD1pO2 667Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b=El6ZUPgq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id va6si2088590ejb.73.2019.11.07.11.35.48; Thu, 07 Nov 2019 11:36:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b=El6ZUPgq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727010AbfKGTch (ORCPT + 99 others); Thu, 7 Nov 2019 14:32:37 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:43170 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725785AbfKGTch (ORCPT ); Thu, 7 Nov 2019 14:32:37 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xA7JP2w3119269; Thu, 7 Nov 2019 19:32:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=tZ3pQcc3LeqFaEpO6hpLV/tKBa5ycbr9oiR/e6A4So4=; b=El6ZUPgq90XLasjGS2Mko2oSA6VAktVVmCi42MKqifTGLvf6gVcHmh0g01XAi48rPyx+ gyB60cihjLlX1sG4OMeJaLomizt/0eEpGYB1R9l3GRwNb2QVjhtxDSR/OKY2nubFJUic 3ewBLOMCHVTKRt+rQTkgHPTtkmmCIiC4iyRTVljwJ3QA5gwQ855teL+uqzpVQfBrrgna nR6C7PwWGD6ZaK86f8KsQ4gwGVu9TaEQ8Wytr92sOop/ck3m48s6Bi8DOiKxY4BHS0Bc dgHVohtV+oh7eUYyOZ6YyNzBB+6cx++RL4bVvBNdUiqUh/w9IS2k5cOC04+AoRXdvnyk Rg== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2w41w18jfr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 Nov 2019 19:32:13 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xA7JVibr156318; Thu, 7 Nov 2019 19:32:12 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 2w41wfpde6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 Nov 2019 19:32:12 +0000 Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id xA7JVvVR002151; Thu, 7 Nov 2019 19:31:57 GMT Received: from [192.168.1.206] (/71.63.128.209) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 07 Nov 2019 11:31:57 -0800 Subject: Re: [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing To: Waiman Long , Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Davidlohr Bueso , Peter Zijlstra , Ingo Molnar , Will Deacon References: <20191107190628.22667-1-longman@redhat.com> From: Mike Kravetz Message-ID: Date: Thu, 7 Nov 2019 11:31:56 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0 MIME-Version: 1.0 In-Reply-To: <20191107190628.22667-1-longman@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9434 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1910280000 definitions=main-1911070184 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9434 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1910280000 definitions=main-1911070183 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/7/19 11:06 AM, Waiman Long wrote: > A customer with large SMP systems (up to 16 sockets) with application > that uses large amount of static hugepages (~500-1500GB) are experiencing > random multisecond delays. These delays was caused by the long time it > took to scan the VMA interval tree with mmap_sem held. > > The sharing of huge PMD does not require changes to the i_mmap at all. > As a result, we can just take the read lock and let other threads > searching for the right VMA to share in parallel. Once the right > VMA is found, either the PMD lock (2M huge page for x86-64) or the > mm->page_table_lock will be acquired to perform the actual PMD sharing. > > Lock contention, if present, will happen in the spinlock. That is much > better than contention in the rwsem where the time needed to scan the > the interval tree is indeterminate. > > With this patch applied, the customer is seeing significant improvements > over the unpatched kernel. Thanks for getting this tested in the customers environment! > Signed-off-by: Waiman Long Just a small typo in the comment, otherwise. Reviewed-by: Mike Kravetz > --- > mm/hugetlb.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index b45a95363a84..087e7ff00137 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4842,7 +4842,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) > if (!vma_shareable(vma, addr)) > return (pte_t *)pmd_alloc(mm, pud, addr); > > - i_mmap_lock_write(mapping); > + /* > + * PMD sharing does not require changes to i_mmap. So a read lock > + * is enuogh. s/enuogh/enough/ > + */ > + i_mmap_lock_read(mapping); > vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { > if (svma == vma) > continue; > @@ -4872,7 +4876,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) > spin_unlock(ptl); > out: > pte = (pte_t *)pmd_alloc(mm, pud, addr); > - i_mmap_unlock_write(mapping); > + i_mmap_unlock_read(mapping); > return pte; > } > > -- Mike Kravetz