Received: by 10.213.65.68 with SMTP id h4csp1095071imn; Wed, 14 Mar 2018 09:27:11 -0700 (PDT) X-Google-Smtp-Source: AG47ELvPDd6gosc+50FF7BqYkPIddh5ikS+zZuYG/0hxmkby/nwntIOFShDV4d3Ly0MozQuR3DbT X-Received: by 10.98.36.25 with SMTP id r25mr4914573pfj.106.1521044831335; Wed, 14 Mar 2018 09:27:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521044831; cv=none; d=google.com; s=arc-20160816; b=PrGFnHkbimlpoHSsFqMlW3V0Qq6jCjyWpM67WHmDCQXtm0GPoETKUOZb58ELeGkrp7 DnD25zEScmU+dMP4rlYYxV7hKMxCMQF49wMIcWWalHm3ojLYm5OypgqXjqukMVUqXSru r2UCreikbonAzZLlESy9jm2kRxMzTMjNvjNKprvVCfNTIbzX7fLUtC9G3K4BIR3U+Jod 9njAXx+vh0xZlZ8gbQPGQUkdsLtYN8KZFAqSWJIgaAc9I98GQN7iGjuVhh9ZH7vfsZkj 4Rej1oNFUsa3p3mASOOYoBcx7VnqAlWS4S+y8NKtmZ3cpR0HO/W9oGzVYFNy8Lpy3tbW 04Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :references:cc:to:subject:from:arc-authentication-results; bh=1h2QCndBB1gorTka3fiA2ncmqSDKHuX4QiSPtnUnOQk=; b=VUYKrRzXb64lpYt21o86B1XuXg2//A6b07lBbh6QaU/OhMeIHNJ587QnHnuiMKVA+h xgCL9dhRYzxy+7xa3b929gbcEoZXLKPHJdQX35/bteg04EZqM0Nu/QUGsCpxTinwaiy2 m1H39VGmVOov0sZtwsdkrZY8HRZl0gZUuVR6wg8ssrgQ7RmbU+5Feu4UbLeTPGMMw+xK yZUXMmLF7YeEX+RKqXG9RN5WQDUZzJbw+g6KvlCnS9pM9XXCYD26nqlDHlJ9ck5HlPvU pPFdG6i63aCrVMIBz4Z+rZt7xp/CBXLXA0hRXwGdq277YzYWD9mgRtDKa+Z0q82jPfWw EMyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q13si2083861pgs.277.2018.03.14.09.26.57; Wed, 14 Mar 2018 09:27:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751447AbeCNQZx (ORCPT + 99 others); Wed, 14 Mar 2018 12:25:53 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:52672 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750779AbeCNQZv (ORCPT ); Wed, 14 Mar 2018 12:25:51 -0400 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2EGPpB2047332 for ; Wed, 14 Mar 2018 12:25:51 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gq515f1tg-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Wed, 14 Mar 2018 12:25:50 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Mar 2018 16:25:41 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 14 Mar 2018 16:25:33 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2EGPX5744695652; Wed, 14 Mar 2018 16:25:33 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8CB774C052; Wed, 14 Mar 2018 16:18:47 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 49D414C044; Wed, 14 Mar 2018 16:18:45 +0000 (GMT) Received: from [9.145.50.219] (unknown [9.145.50.219]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 14 Mar 2018 16:18:45 +0000 (GMT) From: Laurent Dufour Subject: Re: [PATCH v9 17/24] mm: Protect mm_rb tree with a rwlock To: Peter Zijlstra Cc: paulmck@linux.vnet.ibm.com, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org References: <1520963994-28477-1-git-send-email-ldufour@linux.vnet.ibm.com> <1520963994-28477-18-git-send-email-ldufour@linux.vnet.ibm.com> <20180314084844.GP4043@hirez.programming.kicks-ass.net> Date: Wed, 14 Mar 2018 17:25:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180314084844.GP4043@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18031416-0012-0000-0000-000005BE7291 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18031416-0013-0000-0000-0000193A7831 Message-Id: <399d758c-c329-fe49-d501-065067eb3b29@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-14_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803140185 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/03/2018 09:48, Peter Zijlstra wrote: > On Tue, Mar 13, 2018 at 06:59:47PM +0100, Laurent Dufour wrote: >> This change is inspired by the Peter's proposal patch [1] which was >> protecting the VMA using SRCU. Unfortunately, SRCU is not scaling well in >> that particular case, and it is introducing major performance degradation >> due to excessive scheduling operations. > > Do you happen to have a little more detail on that? This has been reported by kemi who find bad performance when running some benchmarks on top of the v5 series: https://patchwork.kernel.org/patch/9999687/ It appears that SRCU is generating a lot of additional scheduling to manage the freeing of the VMA structure. SRCU is dealing through per cpu ressources but the SRCU callback is And since we are handling this way a per process ressource (VMA) through a global resource (SRCU) this leads to a lot of overhead when scheduling the SRCU callback. >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 34fde7111e88..28c763ea1036 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -335,6 +335,7 @@ struct vm_area_struct { >> struct vm_userfaultfd_ctx vm_userfaultfd_ctx; >> #ifdef CONFIG_SPECULATIVE_PAGE_FAULT >> seqcount_t vm_sequence; >> + atomic_t vm_ref_count; /* see vma_get(), vma_put() */ >> #endif >> } __randomize_layout; >> >> @@ -353,6 +354,9 @@ struct kioctx_table; >> struct mm_struct { >> struct vm_area_struct *mmap; /* list of VMAs */ >> struct rb_root mm_rb; >> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT >> + rwlock_t mm_rb_lock; >> +#endif >> u32 vmacache_seqnum; /* per-thread vmacache */ >> #ifdef CONFIG_MMU >> unsigned long (*get_unmapped_area) (struct file *filp, > > When I tried this, it simply traded contention on mmap_sem for > contention on these two cachelines. > > This was for the concurrent fault benchmark, where mmap_sem is only ever > acquired for reading (so no blocking ever happens) and the bottle-neck > was really pure cacheline access. I'd say that this expected if multiple threads are dealing on the same VMA, but if the VMA differ then this contention is disappearing while it is remaining when using the mmap_sem. This being said, test I did on PowerPC using will-it-scale/page_fault1_threads showed that the number of caches-misses generated in get_vma() are very low (less than 5%). Am I missing something ? > Only by using RCU can you avoid that thrashing. I agree, but this kind of test is the best use case for SRCU because there are not so many updates, so not a lot of call to the SRCU asynchronous callback. Honestly, I can't see an ideal solution here, RCU is not optimal when there is a high number of updates, and using a rwlock may introduced a bottleneck there. I get better results when using the rwlock than using SRCU in that case, but if you have another proposal, please advise, I'll give it a try. > Also note that if your database allocates the one giant mapping, it'll > be _one_ VMA and that vm_ref_count gets _very_ hot indeed. In the case of the database product I mentioned in the series header, that's the opposite, the VMA number is very high so this doesn't happen. But in the case of one VMA, it's clear that there will be a contention on vm_ref_count, but this would be better than blocking on the mmap_sem. Laurent.