Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1547539ybx; Thu, 7 Nov 2019 13:28:31 -0800 (PST) X-Google-Smtp-Source: APXvYqyIx52FKDLvB+Qzba4e0ldBGSNTKDqeXjzFKhfnWD5jSBzsfmmKqm3xFomOMFDEcIJ+ZYgb X-Received: by 2002:a17:906:48b:: with SMTP id f11mr5251660eja.225.1573162110944; Thu, 07 Nov 2019 13:28:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573162110; cv=none; d=google.com; s=arc-20160816; b=jsRekxJSbt6hfwqJuIPilJwVsmOr1igP8K1BXuFroHGXpRKffIO2D+PS79caHloekQ 89Y3Hr7TaU2z3us0wmkM79KwX3Mt7L5srUO5GiZuKvEimAQCkKLhiRVSfNTOaWmrpVAZ Gj1Oie5uXHY66EZxQ/obtvz2cY8gHPbAnJRxsk3jpV026N7sU3u5Ro6ac3c16T1t9GJ0 QaTScRshRHJd28OMOrb+6hTm0ZxUTiDRYUGuh97W1X1r/NYuHywaLE04goQy7EaqJsVI FJIvl1wJnNxF7itjB3/GCWe6CHFz48P9RV5LHDWPXNT9V74n1U4rLO23G6+wZh8Fdlzc y1FQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=LLb8ODlrE2DcLQV25lHkIdJt8ldk3ozlStqtuMJM5vA=; b=n6ib38MVGY1T1a7O8YpcTi13qohrYdyUXSc9iGCQVSrWTuTBTlprD1OgVu1G4NkVIN VHQQRvmfLvLRi2P5d60TXKeAMKvC/y6tAbcily16ccqBg9jNu5mxyAIaug6NbgSlTtDV kb1vfdI9u8ZxqESTKvMQEZ9RlYGpaS11truN3eKYaKt9KfwfvRfefytb24zjMv9VN73M DjEZcGHjSjbkUxMRLQJRoamohPa5gzKSQ6akAMncejU8tM172LIpweqGEe014+lzRrDq pVCKhHsqY85kN2EVQPy4JceDuCgs5UBMhFUEdNunyNySnMml5oVnTnXbmpYqLYFTNDPo /brg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dKBmKNmu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x31si2624245eda.334.2019.11.07.13.28.06; Thu, 07 Nov 2019 13:28:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dKBmKNmu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726118AbfKGV10 (ORCPT + 99 others); Thu, 7 Nov 2019 16:27:26 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:52734 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725870AbfKGV10 (ORCPT ); Thu, 7 Nov 2019 16:27:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573162044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LLb8ODlrE2DcLQV25lHkIdJt8ldk3ozlStqtuMJM5vA=; b=dKBmKNmuszT83w7OWLqYTxenYjbllb41mOsqKVxwmOieDgjgidiAEZnEbv+MtVdjigHTia 1NylWB61FOjjAVHdYrS/o7o8nScmXbAHQZv2X8g4VmZ0RmCCKM4xDP0zXbq+zXdWhkMTCp 148aNNFKspDPj81NBIZQQvPlhbWgyPc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-19-m8ORxfZJNDW2Vk3fzUHgUg-1; Thu, 07 Nov 2019 16:27:22 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3001D8017DD; Thu, 7 Nov 2019 21:27:21 +0000 (UTC) Received: from llong.remote.csb (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 21962608B3; Thu, 7 Nov 2019 21:27:18 +0000 (UTC) Subject: Re: [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing To: Matthew Wilcox Cc: Mike Kravetz , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Davidlohr Bueso , Peter Zijlstra , Ingo Molnar , Will Deacon References: <20191107190628.22667-1-longman@redhat.com> <20191107195441.GF11823@bombadil.infradead.org> From: Waiman Long Organization: Red Hat Message-ID: Date: Thu, 7 Nov 2019 16:27:18 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20191107195441.GF11823@bombadil.infradead.org> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-MC-Unique: m8ORxfZJNDW2Vk3fzUHgUg-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/7/19 2:54 PM, Matthew Wilcox wrote: > On Thu, Nov 07, 2019 at 02:06:28PM -0500, Waiman Long wrote: >> A customer with large SMP systems (up to 16 sockets) with application >> that uses large amount of static hugepages (~500-1500GB) are experiencin= g >> random multisecond delays. These delays was caused by the long time it >> took to scan the VMA interval tree with mmap_sem held. >> >> The sharing of huge PMD does not require changes to the i_mmap at all. >> As a result, we can just take the read lock and let other threads >> searching for the right VMA to share in parallel. Once the right >> VMA is found, either the PMD lock (2M huge page for x86-64) or the >> mm->page_table_lock will be acquired to perform the actual PMD sharing. >> >> Lock contention, if present, will happen in the spinlock. That is much >> better than contention in the rwsem where the time needed to scan the >> the interval tree is indeterminate. > I don't think this description really explains the contention argument > well. There are _more_ PMD locks than there are i_mmap_sem locks, so > processes accessing different parts of the same file can work in parallel= . I am sorry for not being clear enough. PMD lock contention here means 2 or more tasks that happens to touch the same PMD. Because of the use of PMD lock, modification of the same PMD cannot happen in parallel. If they touch different PMDs, they can do that in parallel. Previously, they are contending the same rwsem write lock and hence have to be done serially. > Are there other current users of the write lock that could use a read loc= k? > At first blush, it would seem that unmap_ref_private() also only needs > a read lock on the i_mmap tree. I don't think hugetlb_change_protection(= ) > needs the write lock either. Nor retract_page_tables(). It is possible that other locking sites can be converted to use read lock, but it is outside the scope of this patch. Cheers, Longman