Received: by 10.223.176.5 with SMTP id f5csp2728787wra; Mon, 5 Feb 2018 08:54:52 -0800 (PST) X-Google-Smtp-Source: AH8x2266dyfp1tiAMYaWhyWaYXWyU2WmGWqmhi2GpFBSrLvelbS+KY15FFl/moTeIOIWIMgZ/uT4 X-Received: by 2002:a17:902:834c:: with SMTP id z12-v6mr23269487pln.293.1517849691968; Mon, 05 Feb 2018 08:54:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517849691; cv=none; d=google.com; s=arc-20160816; b=wycFV4CgsdQkEVTfEpZkG/DY7jzk51pYTLZ1X7WsTu6Qzj2L4HLJaOZGOX93J4ooh1 ljXvvcR1zZYuxbhymaynCRUpH99hqzmKvAcCSsKQsweGNLNphpW5EJ2kHj+g229SGZwV 5YFb21w7kBOh0+cz/xTQwa8YxwJnMCFSPuygMIJtvl78aPpy/5rim2twFm/W9X2gc2N4 Fuzvp4E7rm00KDad2wUQ625hninlsUF9pBhL6kyMmH4h6TY95aWfRq5ZNFuh8mcbi3Tp 246F10681yVenb39tazLEKw68/ni5Jcv+oai3NLyc6/rBo17xeez0mpt0ZmV0uCjOFei LnGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=i8z1wMqmcdOwYEddruJErxJeL3ReyvGXJbWhXGKD2Nc=; b=IgXf/zH0LNZEZlmFHM56kEkZRlxCxgkQMaohCYp7axYyM0jYcFwxoRDbHwkyyJQ7kT CParPUjS27LwSfoNglXbCkthcSdMIPkm1yq9DSkkcJNRYQdkNUmo18YaUG2NTL0+mktY HYd+YSuLb++9/RuLDkeiC6nxtqqg47UReMPNCXmfFEDFyTPurbn2IHs5J4lCYikNZfnv DNTy7LststgYOD28YAtGlyi7e0/EOb6VxMG3YTyoWhwrPKkfQmwslX2g1fsrEBn7bfoG hQaiMXUQfHJ2nzRb8S3YQRSyVXRP4IbZqhc/GVnzlNiIg+sCOY4srXoyJJDd1WZU+Hrm 6obQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z31-v6si4879925plb.217.2018.02.05.08.54.36; Mon, 05 Feb 2018 08:54:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753178AbeBEQxs (ORCPT + 99 others); Mon, 5 Feb 2018 11:53:48 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:56550 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752582AbeBEQxl (ORCPT ); Mon, 5 Feb 2018 11:53:41 -0500 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w15GoSaZ014229 for ; Mon, 5 Feb 2018 11:53:40 -0500 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0b-001b2d01.pphosted.com with ESMTP id 2fxqb02y51-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 05 Feb 2018 11:53:38 -0500 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 5 Feb 2018 16:53:36 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 5 Feb 2018 16:53:33 -0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w15GrXg627853042; Mon, 5 Feb 2018 16:53:33 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 60A3511C054; Mon, 5 Feb 2018 16:46:57 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5ACD011C05B; Mon, 5 Feb 2018 16:46:56 +0000 (GMT) Received: from [9.145.37.135] (unknown [9.145.37.135]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 5 Feb 2018 16:46:56 +0000 (GMT) Subject: Re: [RFC PATCH 00/64] mm: towards parallel address space operations To: Davidlohr Bueso , akpm@linux-foundation.org, mingo@kernel.org Cc: peterz@infradead.org, jack@suse.cz, mhocko@kernel.org, kirill.shutemov@linux.intel.com, mawilcox@microsoft.com, mgorman@techsingularity.net, dave@stgolabs.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20180205012754.23615-1-dbueso@wotan.suse.de> From: Laurent Dufour Date: Mon, 5 Feb 2018 17:53:31 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180205012754.23615-1-dbueso@wotan.suse.de> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18020516-0020-0000-0000-000003F28371 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18020516-0021-0000-0000-00004284F7D7 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-02-05_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1802050212 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/02/2018 02:26, Davidlohr Bueso wrote: > From: Davidlohr Bueso > > Hi, > > This patchset is a new version of both the range locking machinery as well > as a full mmap_sem conversion that makes use of it -- as the worst case > scenario as all mmap_sem calls are converted to a full range mmap_lock > equivalent. As such, while there is no improvement of concurrency perse, > these changes aim at adding the machinery to permit this in the future. Despite the massive rebase, what are the changes in this series compared to the one I sent in last May - you silently based on, by the way : https://lkml.org/lkml/2017/5/24/409 > > Direct users of the mm->mmap_sem can be classified as those that (1) acquire > and release the lock within the same context, and (2) those who directly > manipulate the mmap_sem down the callchain. For example: > > (1) down_read(&mm->mmap_sem); > /* do something */ > /* nobody down the chain uses mmap_sem directly */ > up_read(&mm->mmap_sem); > > (2a) down_read(&mm->mmap_sem); > /* do something that retuns mmap_sem unlocked */ > fn(mm, &locked); > if (locked) > up_read(&mm->mmap_sem); > > (2b) down_read(&mm->mmap_sem); > /* do something that in between released and reacquired mmap_sem */ > fn(mm); > up_read(&mm->mmap_sem); Unfortunately, there are also indirect users which rely on the mmap_sem locking to protect their data. For the first step using a full range this doesn't matter, but when refining the range, these one would be the most critical ones as they would have to be reworked to take the range in account. > > Patches 1-2: add the range locking machinery. This is rebased on the rbtree > optimizations for interval trees such that we can quickly detect overlapping > ranges. More documentation as also been added, with an ordering example in the > source code. > > Patch 3: adds new mm locking wrappers around mmap_sem. > > Patches 4-15: teaches page fault paths about mmrange (specifically adding the > range in question to the struct vm_fault). In addition, most of these patches > update mmap_sem callers that call into the 2a and 2b examples above. > > Patches 15-63: adds most of the trivial conversions -- the (1) example above. > (patches 21, 22, 23 are hacks that avoid rwsem_is_locked(mmap_sem) such that > we don't have to teach file_operations about mmrange. > > Patch 64: finally do the actual conversion and replace mmap_sem with the range > mmap_lock. > > I've run the series on a 40-core (ht) 2-socket IvyBridge with 16 Gb of memory > on various benchmarks that stress address space concurrency. > > ** pft is a microbenchmark for page fault rates. > > When running with increasing thread counts, range locking takes a rather small > hit (yet constant) of ~2% for the pft timings, with a max of 5%. This translates > similarly to faults/cpu. > > > pft timings > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > Amean system-1 1.11 ( 0.00%) 1.17 ( -5.86%) > Amean system-4 1.14 ( 0.00%) 1.18 ( -3.07%) > Amean system-7 1.38 ( 0.00%) 1.36 ( 0.94%) > Amean system-12 2.28 ( 0.00%) 2.31 ( -1.18%) > Amean system-21 4.11 ( 0.00%) 4.13 ( -0.44%) > Amean system-30 5.94 ( 0.00%) 6.01 ( -1.11%) > Amean system-40 8.24 ( 0.00%) 8.33 ( -1.04%) > Amean elapsed-1 1.28 ( 0.00%) 1.33 ( -4.50%) > Amean elapsed-4 0.32 ( 0.00%) 0.34 ( -5.27%) > Amean elapsed-7 0.24 ( 0.00%) 0.24 ( -0.43%) > Amean elapsed-12 0.23 ( 0.00%) 0.23 ( -0.22%) > Amean elapsed-21 0.26 ( 0.00%) 0.25 ( 0.39%) > Amean elapsed-30 0.24 ( 0.00%) 0.24 ( -0.21%) > Amean elapsed-40 0.24 ( 0.00%) 0.24 ( 0.84%) > Stddev system-1 0.04 ( 0.00%) 0.05 ( -16.29%) > Stddev system-4 0.03 ( 0.00%) 0.03 ( 17.70%) > Stddev system-7 0.08 ( 0.00%) 0.02 ( 68.56%) > Stddev system-12 0.05 ( 0.00%) 0.06 ( -31.22%) > Stddev system-21 0.06 ( 0.00%) 0.06 ( 8.07%) > Stddev system-30 0.05 ( 0.00%) 0.09 ( -70.15%) > Stddev system-40 0.11 ( 0.00%) 0.07 ( 41.53%) > Stddev elapsed-1 0.03 ( 0.00%) 0.05 ( -72.14%) > Stddev elapsed-4 0.01 ( 0.00%) 0.01 ( -4.98%) > Stddev elapsed-7 0.01 ( 0.00%) 0.01 ( 60.65%) > Stddev elapsed-12 0.01 ( 0.00%) 0.01 ( 6.24%) > Stddev elapsed-21 0.01 ( 0.00%) 0.01 ( -1.13%) > Stddev elapsed-30 0.00 ( 0.00%) 0.00 ( -45.10%) > Stddev elapsed-40 0.01 ( 0.00%) 0.01 ( 25.97%) > > pft faults > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > Hmean faults/cpu-1 629011.4218 ( 0.00%) 601523.2875 ( -4.37%) > Hmean faults/cpu-4 630952.1771 ( 0.00%) 602105.6527 ( -4.57%) > Hmean faults/cpu-7 518412.2806 ( 0.00%) 518082.2585 ( -0.06%) > Hmean faults/cpu-12 324957.1130 ( 0.00%) 321678.8932 ( -1.01%) > Hmean faults/cpu-21 182712.2633 ( 0.00%) 182643.5347 ( -0.04%) > Hmean faults/cpu-30 126618.2558 ( 0.00%) 125698.1965 ( -0.73%) > Hmean faults/cpu-40 91266.3914 ( 0.00%) 90614.9956 ( -0.71%) > Hmean faults/sec-1 628010.9821 ( 0.00%) 600700.3641 ( -4.35%) > Hmean faults/sec-4 2475859.3012 ( 0.00%) 2351373.1960 ( -5.03%) > Hmean faults/sec-7 3372026.7978 ( 0.00%) 3408924.8028 ( 1.09%) > Hmean faults/sec-12 3517750.6290 ( 0.00%) 3488785.0815 ( -0.82%) > Hmean faults/sec-21 3151328.9188 ( 0.00%) 3156983.9401 ( 0.18%) > Hmean faults/sec-30 3324673.3141 ( 0.00%) 3318585.9949 ( -0.18%) > Hmean faults/sec-40 3362503.8992 ( 0.00%) 3410086.6644 ( 1.42%) > Stddev faults/cpu-1 14795.1817 ( 0.00%) 22870.4755 ( -54.58%) > Stddev faults/cpu-4 8759.4355 ( 0.00%) 8117.4629 ( 7.33%) > Stddev faults/cpu-7 20638.6659 ( 0.00%) 2290.0083 ( 88.90%) > Stddev faults/cpu-12 4003.9838 ( 0.00%) 5297.7747 ( -32.31%) > Stddev faults/cpu-21 2127.4059 ( 0.00%) 1186.5330 ( 44.23%) > Stddev faults/cpu-30 558.8082 ( 0.00%) 1366.5374 (-144.54%) > Stddev faults/cpu-40 1234.8354 ( 0.00%) 768.8031 ( 37.74%) > Stddev faults/sec-1 14757.0434 ( 0.00%) 22740.7172 ( -54.10%) > Stddev faults/sec-4 49934.6675 ( 0.00%) 54133.9449 ( -8.41%) > Stddev faults/sec-7 152781.8690 ( 0.00%) 16415.0736 ( 89.26%) > Stddev faults/sec-12 228697.8709 ( 0.00%) 239575.3690 ( -4.76%) > Stddev faults/sec-21 70244.4600 ( 0.00%) 75031.5776 ( -6.81%) > Stddev faults/sec-30 52147.1842 ( 0.00%) 58651.5496 ( -12.47%) > Stddev faults/sec-40 149846.3761 ( 0.00%) 113646.0640 ( 24.16%) > > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > User 47.46 48.21 > System 540.43 546.03 > Elapsed 61.85 64.33 > > ** gitcheckout is probably the workload that takes the biggest hit (-35%). > Sys time, as expected, increases quite a bit, coming from overhead of blocking. > > gitcheckout > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > System mean 9.49 ( 0.00%) 9.82 ( -3.49%) > System stddev 0.20 ( 0.00%) 0.39 ( -95.73%) > Elapsed mean 22.87 ( 0.00%) 30.90 ( -35.12%) > Elapsed stddev 0.39 ( 0.00%) 6.32 (-1526.48%) > CPU mean 98.07 ( 0.00%) 76.27 ( 22.23%) > CPU stddev 0.70 ( 0.00%) 14.63 (-1978.37%) > > > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > User 224.06 224.80 > System 176.05 181.01 > Elapsed 619.51 801.78 > > > ** freqmine is an implementation of Frequent Itemsets Mining (FIM) that > analyses a set of transactions looking to extract association rules with > threads. This is a common workload in retail. This configuration uses > between 2 and 4*NUMCPUs. The performance differences with this patchset > are marginal. > > freqmine-large > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > Amean 2 216.89 ( 0.00%) 216.59 ( 0.14%) > Amean 5 91.56 ( 0.00%) 91.58 ( -0.02%) > Amean 8 59.41 ( 0.00%) 59.54 ( -0.22%) > Amean 12 44.19 ( 0.00%) 44.24 ( -0.12%) > Amean 21 33.97 ( 0.00%) 33.55 ( 1.25%) > Amean 30 33.28 ( 0.00%) 33.15 ( 0.40%) > Amean 48 34.38 ( 0.00%) 34.21 ( 0.48%) > Amean 79 33.22 ( 0.00%) 32.83 ( 1.19%) > Amean 110 36.15 ( 0.00%) 35.29 ( 2.40%) > Amean 141 35.63 ( 0.00%) 36.38 ( -2.12%) > Amean 160 36.31 ( 0.00%) 36.05 ( 0.73%) > Stddev 2 1.10 ( 0.00%) 0.19 ( 82.79%) > Stddev 5 0.23 ( 0.00%) 0.10 ( 54.31%) > Stddev 8 0.17 ( 0.00%) 0.43 (-146.19%) > Stddev 12 0.12 ( 0.00%) 0.12 ( -0.05%) > Stddev 21 0.49 ( 0.00%) 0.39 ( 21.88%) > Stddev 30 1.07 ( 0.00%) 0.93 ( 12.61%) > Stddev 48 0.76 ( 0.00%) 0.66 ( 12.07%) > Stddev 79 0.29 ( 0.00%) 0.58 ( -98.77%) > Stddev 110 1.10 ( 0.00%) 0.53 ( 51.93%) > Stddev 141 0.66 ( 0.00%) 0.79 ( -18.83%) > Stddev 160 0.27 ( 0.00%) 0.15 ( 42.71%) > > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > User 29346.21 28818.39 > System 292.18 676.92 > Elapsed 2622.81 2615.77 > > > ** kernbench (build kernels). With increasing thread counts, the amoung of > overhead from range locking is no more than ~5%. > > kernbench > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > Amean user-2 554.53 ( 0.00%) 555.74 ( -0.22%) > Amean user-4 566.23 ( 0.00%) 567.15 ( -0.16%) > Amean user-8 588.66 ( 0.00%) 589.68 ( -0.17%) > Amean user-16 647.97 ( 0.00%) 648.46 ( -0.08%) > Amean user-32 923.05 ( 0.00%) 925.25 ( -0.24%) > Amean user-64 1066.74 ( 0.00%) 1067.11 ( -0.03%) > Amean user-80 1082.50 ( 0.00%) 1082.11 ( 0.04%) > Amean syst-2 71.80 ( 0.00%) 74.90 ( -4.31%) > Amean syst-4 76.77 ( 0.00%) 79.91 ( -4.10%) > Amean syst-8 71.58 ( 0.00%) 74.83 ( -4.54%) > Amean syst-16 79.21 ( 0.00%) 82.95 ( -4.73%) > Amean syst-32 104.21 ( 0.00%) 108.47 ( -4.09%) > Amean syst-64 113.69 ( 0.00%) 119.39 ( -5.02%) > Amean syst-80 113.98 ( 0.00%) 120.18 ( -5.44%) > Amean elsp-2 307.65 ( 0.00%) 309.27 ( -0.53%) > Amean elsp-4 159.86 ( 0.00%) 160.94 ( -0.67%) > Amean elsp-8 84.76 ( 0.00%) 85.04 ( -0.33%) > Amean elsp-16 49.63 ( 0.00%) 49.56 ( 0.15%) > Amean elsp-32 37.52 ( 0.00%) 38.16 ( -1.68%) > Amean elsp-64 36.76 ( 0.00%) 37.03 ( -0.72%) > Amean elsp-80 37.09 ( 0.00%) 37.49 ( -1.08%) > Stddev user-2 0.97 ( 0.00%) 0.66 ( 32.20%) > Stddev user-4 0.52 ( 0.00%) 0.60 ( -17.34%) > Stddev user-8 0.64 ( 0.00%) 0.23 ( 63.28%) > Stddev user-16 1.40 ( 0.00%) 0.64 ( 54.46%) > Stddev user-32 1.32 ( 0.00%) 0.95 ( 28.47%) > Stddev user-64 0.77 ( 0.00%) 1.47 ( -91.61%) > Stddev user-80 1.12 ( 0.00%) 0.94 ( 16.00%) > Stddev syst-2 0.45 ( 0.00%) 0.45 ( 0.22%) > Stddev syst-4 0.41 ( 0.00%) 0.58 ( -41.24%) > Stddev syst-8 0.55 ( 0.00%) 0.28 ( 49.35%) > Stddev syst-16 0.22 ( 0.00%) 0.29 ( -30.98%) > Stddev syst-32 0.44 ( 0.00%) 0.56 ( -27.75%) > Stddev syst-64 0.47 ( 0.00%) 0.48 ( -1.91%) > Stddev syst-80 0.24 ( 0.00%) 0.60 (-144.20%) > Stddev elsp-2 0.46 ( 0.00%) 0.31 ( 32.97%) > Stddev elsp-4 0.14 ( 0.00%) 0.25 ( -72.38%) > Stddev elsp-8 0.36 ( 0.00%) 0.08 ( 77.92%) > Stddev elsp-16 0.74 ( 0.00%) 0.58 ( 22.00%) > Stddev elsp-32 0.31 ( 0.00%) 0.74 (-138.95%) > Stddev elsp-64 0.12 ( 0.00%) 0.12 ( 1.62%) > Stddev elsp-80 0.23 ( 0.00%) 0.15 ( 35.38%) > > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > User 28309.95 28341.20 > System 3320.18 3473.73 > Elapsed 3792.13 3850.21 > > > > ** reaim's compute, new_dbase and shared workloads were tested, with > the new dbase one taking up to a 20% hit, which is expected as this > micro benchmark context switches a lot and benefits from reducing them > with spin-on-owner feature that range locks lack. Compute otoh was > boosted for higher thread counts. > > reaim > v4.15-rc8 v4.15-rc8 > range-mmap_lock-v1 > Hmean compute-1 5652.98 ( 0.00%) 5738.64 ( 1.52%) > Hmean compute-21 81997.42 ( 0.00%) 81997.42 ( -0.00%) > Hmean compute-41 135622.27 ( 0.00%) 138959.73 ( 2.46%) > Hmean compute-61 179272.55 ( 0.00%) 174367.92 ( -2.74%) > Hmean compute-81 200187.60 ( 0.00%) 195250.60 ( -2.47%) > Hmean compute-101 207337.40 ( 0.00%) 187633.35 ( -9.50%) > Hmean compute-121 179018.55 ( 0.00%) 206087.69 ( 15.12%) > Hmean compute-141 175887.20 ( 0.00%) 195528.60 ( 11.17%) > Hmean compute-161 198063.33 ( 0.00%) 190335.54 ( -3.90%) > Hmean new_dbase-1 56.64 ( 0.00%) 60.76 ( 7.27%) > Hmean new_dbase-21 11149.48 ( 0.00%) 10082.35 ( -9.57%) > Hmean new_dbase-41 25161.87 ( 0.00%) 21626.83 ( -14.05%) > Hmean new_dbase-61 39858.32 ( 0.00%) 33956.04 ( -14.81%) > Hmean new_dbase-81 55057.19 ( 0.00%) 43879.73 ( -20.30%) > Hmean new_dbase-101 67566.57 ( 0.00%) 56323.77 ( -16.64%) > Hmean new_dbase-121 79517.22 ( 0.00%) 64877.67 ( -18.41%) > Hmean new_dbase-141 92365.91 ( 0.00%) 76571.18 ( -17.10%) > Hmean new_dbase-161 101590.77 ( 0.00%) 85332.76 ( -16.00%) > Hmean shared-1 71.26 ( 0.00%) 76.43 ( 7.26%) > Hmean shared-21 11546.39 ( 0.00%) 10521.92 ( -8.87%) > Hmean shared-41 28302.97 ( 0.00%) 22116.50 ( -21.86%) > Hmean shared-61 23814.56 ( 0.00%) 21886.13 ( -8.10%) > Hmean shared-81 11578.89 ( 0.00%) 16423.55 ( 41.84%) > Hmean shared-101 9991.41 ( 0.00%) 11378.95 ( 13.89%) > Hmean shared-121 9884.83 ( 0.00%) 10010.92 ( 1.28%) > Hmean shared-141 9911.88 ( 0.00%) 9637.14 ( -2.77%) > Hmean shared-161 8587.14 ( 0.00%) 9613.53 ( 11.95%) > Stddev compute-1 94.42 ( 0.00%) 166.37 ( -76.20%) > Stddev compute-21 1915.36 ( 0.00%) 2582.96 ( -34.85%) > Stddev compute-41 4822.88 ( 0.00%) 6057.32 ( -25.60%) > Stddev compute-61 4425.14 ( 0.00%) 3676.90 ( 16.91%) > Stddev compute-81 5549.60 ( 0.00%) 17213.90 (-210.18%) > Stddev compute-101 19395.33 ( 0.00%) 28315.96 ( -45.99%) > Stddev compute-121 16140.56 ( 0.00%) 27927.63 ( -73.03%) > Stddev compute-141 9616.27 ( 0.00%) 31273.43 (-225.21%) > Stddev compute-161 34746.00 ( 0.00%) 20706.81 ( 40.41%) > Stddev new_dbase-1 1.08 ( 0.00%) 0.80 ( 25.62%) > Stddev new_dbase-21 356.67 ( 0.00%) 297.23 ( 16.66%) > Stddev new_dbase-41 739.68 ( 0.00%) 1287.72 ( -74.09%) > Stddev new_dbase-61 896.06 ( 0.00%) 1293.55 ( -44.36%) > Stddev new_dbase-81 2003.96 ( 0.00%) 2018.08 ( -0.70%) > Stddev new_dbase-101 2101.25 ( 0.00%) 3461.91 ( -64.75%) > Stddev new_dbase-121 3294.30 ( 0.00%) 3917.20 ( -18.91%) > Stddev new_dbase-141 3488.81 ( 0.00%) 5242.36 ( -50.26%) > Stddev new_dbase-161 2744.12 ( 0.00%) 5262.36 ( -91.77%) > Stddev shared-1 1.38 ( 0.00%) 1.24 ( 9.84%) > Stddev shared-21 1930.40 ( 0.00%) 232.81 ( 87.94%) > Stddev shared-41 1939.93 ( 0.00%) 2316.09 ( -19.39%) > Stddev shared-61 15001.13 ( 0.00%) 12004.82 ( 19.97%) > Stddev shared-81 1313.02 ( 0.00%) 14583.51 (-1010.68%) > Stddev shared-101 355.44 ( 0.00%) 393.79 ( -10.79%) > Stddev shared-121 1736.68 ( 0.00%) 782.50 ( 54.94%) > Stddev shared-141 1865.93 ( 0.00%) 1140.24 ( 38.89%) > Stddev shared-161 1155.19 ( 0.00%) 2045.55 ( -77.07%) > > Overall sys% always increases, which is expected, but with the exception > of git-checkout, the worst case scenario is not that excruciating. > > Full test and details (including sysbench oltp mysql and specjbb) can be found here: > https://linux-scalability.org/range-mmap_lock/tweed-results/ > > Testing: I have setup an mmtests config file with all the workloads described: > http://linux-scalability.org/mmtests-config Is this link still valid, I can't reach it ? Thanks, Laurent. > > Applies on top of linux-next (20180202). At least compile tested on > the following architectures: > > x86_64, alpha, arm32, blackfin, cris, frv, ia64, m32r, m68k, mips, microblaze > ppc, s390, sparc, tile and xtensa. > > > Thanks! > > Davidlohr Bueso (64): > interval-tree: build unconditionally > Introduce range reader/writer lock > mm: introduce mm locking wrappers > mm: add a range parameter to the vm_fault structure > mm,khugepaged: prepare passing of rangelock field to vm_fault > mm: teach pagefault paths about range locking > mm/hugetlb: teach hugetlb_fault() about range locking > mm: teach lock_page_or_retry() about range locking > mm/mmu_notifier: teach oom reaper about range locking > kernel/exit: teach exit_mm() about range locking > prctl: teach about range locking > fs/userfaultfd: teach userfaultfd_must_wait() about range locking > fs/proc: teach about range locking > fs/coredump: teach about range locking > ipc: use mm locking wrappers > virt: use mm locking wrappers > kernel: use mm locking wrappers > mm/ksm: teach about range locking > mm/mlock: use mm locking wrappers > mm/madvise: use mm locking wrappers > mm: teach drop/take_all_locks() about range locking > mm: avoid mmap_sem trylock in vm_insert_page() > mm: huge pagecache: do not check mmap_sem state > mm/thp: disable mmap_sem is_locked checks > mm: use mm locking wrappers > fs: use mm locking wrappers > arch/{x86,sh,ppc}: teach bad_area() about range locking > arch/x86: use mm locking wrappers > arch/alpha: use mm locking wrappers > arch/tile: use mm locking wrappers > arch/sparc: use mm locking wrappers > arch/s390: use mm locking wrappers > arch/powerpc: use mm locking wrappers > arch/parisc: use mm locking wrappers > arch/ia64: use mm locking wrappers > arch/mips: use mm locking wrappers > arch/arc: use mm locking wrappers > arch/blackfin: use mm locking wrappers > arch/m68k: use mm locking wrappers > arch/sh: use mm locking wrappers > arch/cris: use mm locking wrappers > arch/frv: use mm locking wrappers > arch/hexagon: use mm locking wrappers > arch/score: use mm locking wrappers > arch/m32r: use mm locking wrappers > arch/metag: use mm locking wrappers > arch/microblaze: use mm locking wrappers > arch/tile: use mm locking wrappers > arch/xtensa: use mm locking wrappers > arch/unicore32: use mm locking wrappers > arch/mn10300: use mm locking wrappers > arch/openrisc: use mm locking wrappers > arch/nios2: use mm locking wrappers > arch/arm: use mm locking wrappers > arch/riscv: use mm locking wrappers > drivers/android: use mm locking wrappers > drivers/gpu: use mm locking wrappers > drivers/infiniband: use mm locking wrappers > drivers/iommu: use mm locking helpers > drivers/xen: use mm locking wrappers > staging/lustre: use generic range lock > drivers: use mm locking wrappers (the rest) > mm/mmap: hack drop down_write_nest_lock() > mm: convert mmap_sem to range mmap_lock > > arch/alpha/kernel/traps.c | 6 +- > arch/alpha/mm/fault.c | 13 +- > arch/arc/kernel/troubleshoot.c | 5 +- > arch/arc/mm/fault.c | 15 +- > arch/arm/kernel/process.c | 5 +- > arch/arm/kernel/swp_emulate.c | 5 +- > arch/arm/lib/uaccess_with_memcpy.c | 18 +- > arch/arm/mm/fault.c | 14 +- > arch/arm/probes/uprobes/core.c | 5 +- > arch/arm64/kernel/traps.c | 5 +- > arch/arm64/kernel/vdso.c | 12 +- > arch/arm64/mm/fault.c | 13 +- > arch/blackfin/kernel/ptrace.c | 5 +- > arch/blackfin/kernel/trace.c | 7 +- > arch/cris/mm/fault.c | 13 +- > arch/frv/mm/fault.c | 13 +- > arch/hexagon/kernel/vdso.c | 5 +- > arch/hexagon/mm/vm_fault.c | 11 +- > arch/ia64/kernel/perfmon.c | 10 +- > arch/ia64/mm/fault.c | 13 +- > arch/ia64/mm/init.c | 13 +- > arch/m32r/mm/fault.c | 15 +- > arch/m68k/kernel/sys_m68k.c | 18 +- > arch/m68k/mm/fault.c | 11 +- > arch/metag/mm/fault.c | 13 +- > arch/microblaze/mm/fault.c | 15 +- > arch/mips/kernel/traps.c | 5 +- > arch/mips/kernel/vdso.c | 7 +- > arch/mips/mm/c-octeon.c | 5 +- > arch/mips/mm/c-r4k.c | 5 +- > arch/mips/mm/fault.c | 13 +- > arch/mn10300/mm/fault.c | 13 +- > arch/nios2/mm/fault.c | 15 +- > arch/nios2/mm/init.c | 5 +- > arch/openrisc/kernel/dma.c | 6 +- > arch/openrisc/mm/fault.c | 13 +- > arch/parisc/kernel/traps.c | 7 +- > arch/parisc/mm/fault.c | 11 +- > arch/powerpc/include/asm/mmu_context.h | 3 +- > arch/powerpc/include/asm/powernv.h | 5 +- > arch/powerpc/kernel/vdso.c | 7 +- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +- > arch/powerpc/kvm/book3s_64_vio.c | 5 +- > arch/powerpc/kvm/book3s_hv.c | 7 +- > arch/powerpc/kvm/e500_mmu_host.c | 5 +- > arch/powerpc/mm/copro_fault.c | 8 +- > arch/powerpc/mm/fault.c | 35 +- > arch/powerpc/mm/mmu_context_iommu.c | 5 +- > arch/powerpc/mm/subpage-prot.c | 13 +- > arch/powerpc/oprofile/cell/spu_task_sync.c | 7 +- > arch/powerpc/platforms/cell/spufs/file.c | 6 +- > arch/powerpc/platforms/powernv/npu-dma.c | 7 +- > arch/riscv/kernel/vdso.c | 5 +- > arch/riscv/mm/fault.c | 13 +- > arch/s390/include/asm/gmap.h | 14 +- > arch/s390/kernel/vdso.c | 5 +- > arch/s390/kvm/gaccess.c | 35 +- > arch/s390/kvm/kvm-s390.c | 24 +- > arch/s390/kvm/priv.c | 29 +- > arch/s390/mm/fault.c | 9 +- > arch/s390/mm/gmap.c | 125 ++-- > arch/s390/pci/pci_mmio.c | 5 +- > arch/score/mm/fault.c | 13 +- > arch/sh/kernel/sys_sh.c | 7 +- > arch/sh/kernel/vsyscall/vsyscall.c | 5 +- > arch/sh/mm/fault.c | 50 +- > arch/sparc/mm/fault_32.c | 24 +- > arch/sparc/mm/fault_64.c | 15 +- > arch/sparc/vdso/vma.c | 5 +- > arch/tile/kernel/stack.c | 5 +- > arch/tile/mm/elf.c | 12 +- > arch/tile/mm/fault.c | 15 +- > arch/tile/mm/pgtable.c | 6 +- > arch/um/include/asm/mmu_context.h | 8 +- > arch/um/kernel/tlb.c | 12 +- > arch/um/kernel/trap.c | 9 +- > arch/unicore32/mm/fault.c | 14 +- > arch/x86/entry/vdso/vma.c | 14 +- > arch/x86/events/core.c | 2 +- > arch/x86/include/asm/mmu_context.h | 5 +- > arch/x86/include/asm/mpx.h | 6 +- > arch/x86/kernel/tboot.c | 2 +- > arch/x86/kernel/vm86_32.c | 5 +- > arch/x86/mm/debug_pagetables.c | 13 +- > arch/x86/mm/fault.c | 40 +- > arch/x86/mm/mpx.c | 55 +- > arch/x86/um/vdso/vma.c | 5 +- > arch/xtensa/mm/fault.c | 13 +- > drivers/android/binder_alloc.c | 12 +- > drivers/gpu/drm/Kconfig | 2 - > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 7 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 11 +- > drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +- > drivers/gpu/drm/i915/Kconfig | 1 - > drivers/gpu/drm/i915/i915_gem.c | 5 +- > drivers/gpu/drm/i915/i915_gem_userptr.c | 13 +- > drivers/gpu/drm/radeon/radeon_cs.c | 5 +- > drivers/gpu/drm/radeon/radeon_gem.c | 7 +- > drivers/gpu/drm/radeon/radeon_mn.c | 7 +- > drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- > drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 +- > drivers/infiniband/core/umem.c | 19 +- > drivers/infiniband/core/umem_odp.c | 14 +- > drivers/infiniband/hw/hfi1/user_pages.c | 15 +- > drivers/infiniband/hw/mlx4/main.c | 5 +- > drivers/infiniband/hw/mlx5/main.c | 5 +- > drivers/infiniband/hw/qib/qib_user_pages.c | 17 +- > drivers/infiniband/hw/usnic/usnic_uiom.c | 19 +- > drivers/iommu/amd_iommu_v2.c | 9 +- > drivers/iommu/intel-svm.c | 9 +- > drivers/media/v4l2-core/videobuf-core.c | 5 +- > drivers/media/v4l2-core/videobuf-dma-contig.c | 5 +- > drivers/media/v4l2-core/videobuf-dma-sg.c | 22 +- > drivers/misc/cxl/cxllib.c | 5 +- > drivers/misc/cxl/fault.c | 5 +- > drivers/misc/mic/scif/scif_rma.c | 17 +- > drivers/misc/sgi-gru/grufault.c | 91 +-- > drivers/misc/sgi-gru/grufile.c | 5 +- > drivers/oprofile/buffer_sync.c | 12 +- > drivers/staging/lustre/lustre/llite/Makefile | 2 +- > drivers/staging/lustre/lustre/llite/file.c | 16 +- > .../staging/lustre/lustre/llite/llite_internal.h | 4 +- > drivers/staging/lustre/lustre/llite/llite_mmap.c | 4 +- > drivers/staging/lustre/lustre/llite/range_lock.c | 240 -------- > drivers/staging/lustre/lustre/llite/range_lock.h | 83 --- > drivers/staging/lustre/lustre/llite/vvp_io.c | 7 +- > .../media/atomisp/pci/atomisp2/hmm/hmm_bo.c | 5 +- > drivers/tee/optee/call.c | 5 +- > drivers/vfio/vfio_iommu_spapr_tce.c | 8 +- > drivers/vfio/vfio_iommu_type1.c | 16 +- > drivers/xen/gntdev.c | 5 +- > drivers/xen/privcmd.c | 12 +- > fs/aio.c | 7 +- > fs/binfmt_elf.c | 3 +- > fs/coredump.c | 5 +- > fs/exec.c | 38 +- > fs/proc/base.c | 33 +- > fs/proc/internal.h | 3 + > fs/proc/task_mmu.c | 51 +- > fs/proc/task_nommu.c | 22 +- > fs/proc/vmcore.c | 14 +- > fs/userfaultfd.c | 64 +- > include/asm-generic/mm_hooks.h | 3 +- > include/linux/hmm.h | 4 +- > include/linux/huge_mm.h | 2 - > include/linux/hugetlb.h | 9 +- > include/linux/ksm.h | 6 +- > include/linux/lockdep.h | 33 + > include/linux/migrate.h | 4 +- > include/linux/mm.h | 159 ++++- > include/linux/mm_types.h | 4 +- > include/linux/mmu_notifier.h | 6 +- > include/linux/pagemap.h | 7 +- > include/linux/range_lock.h | 189 ++++++ > include/linux/uprobes.h | 15 +- > include/linux/userfaultfd_k.h | 5 +- > ipc/shm.c | 22 +- > kernel/acct.c | 5 +- > kernel/events/core.c | 5 +- > kernel/events/uprobes.c | 66 +- > kernel/exit.c | 9 +- > kernel/fork.c | 18 +- > kernel/futex.c | 7 +- > kernel/locking/Makefile | 2 +- > kernel/locking/range_lock.c | 667 +++++++++++++++++++++ > kernel/sched/fair.c | 5 +- > kernel/sys.c | 22 +- > kernel/trace/trace_output.c | 5 +- > lib/Kconfig | 14 - > lib/Kconfig.debug | 1 - > lib/Makefile | 3 +- > mm/filemap.c | 9 +- > mm/frame_vector.c | 8 +- > mm/gup.c | 79 ++- > mm/hmm.c | 37 +- > mm/hugetlb.c | 16 +- > mm/init-mm.c | 2 +- > mm/internal.h | 3 +- > mm/khugepaged.c | 57 +- > mm/ksm.c | 64 +- > mm/madvise.c | 80 ++- > mm/memcontrol.c | 21 +- > mm/memory.c | 30 +- > mm/mempolicy.c | 56 +- > mm/migrate.c | 30 +- > mm/mincore.c | 28 +- > mm/mlock.c | 49 +- > mm/mmap.c | 145 +++-- > mm/mmu_notifier.c | 14 +- > mm/mprotect.c | 28 +- > mm/mremap.c | 34 +- > mm/msync.c | 9 +- > mm/nommu.c | 55 +- > mm/oom_kill.c | 11 +- > mm/pagewalk.c | 60 +- > mm/process_vm_access.c | 8 +- > mm/shmem.c | 2 +- > mm/swapfile.c | 7 +- > mm/userfaultfd.c | 24 +- > mm/util.c | 12 +- > security/tomoyo/domain.c | 3 +- > virt/kvm/arm/mmu.c | 17 +- > virt/kvm/async_pf.c | 7 +- > virt/kvm/kvm_main.c | 25 +- > 205 files changed, 2817 insertions(+), 1651 deletions(-) > delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.c > delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.h > create mode 100644 include/linux/range_lock.h > create mode 100644 kernel/locking/range_lock.c >