Received: by 10.223.164.221 with SMTP id h29csp2409603wrb; Thu, 2 Nov 2017 10:28:04 -0700 (PDT) X-Google-Smtp-Source: ABhQp+QkLofb7P4Kzj36RizRL7rEDbaPhNPURko57Hw7sENv4wvUY5Ude7lDuw6BeMBQoV168+EB X-Received: by 10.84.148.134 with SMTP id k6mr3901271pla.117.1509643684844; Thu, 02 Nov 2017 10:28:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509643684; cv=none; d=google.com; s=arc-20160816; b=IzrFp0X4Urkm5XXXpG89SR6+bsg30Sy2Me2nZOzSq3hwNzDRK7JYw3y/i1jQMLzgtr t9eK1Qll2RCxeD/oxxDSCFv7gZfChnj7GgpijTZwAzh90ANJdsWY5YMOQM/IovhGvnog wadCEgFigfCYW8PZVe7jn27SvgcSjHrBk5eog3IJbmlWxaMSYcpKvk9/guvlV9R9iYxI lPa7Rgru3Qewo8bb92Zqo5COCMIwSpb4Gtk8FMFvhEDy8zQso9vIp9X1PuUxgrEf42+v I5H/sZjRlT2zy7rAZt1vQYW829lBU32zN6L3E/KRnQLRR8vc1d7jEbTRy1cVfav5sd5x Icuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :references:cc:to:from:subject:arc-authentication-results; bh=Il5R59U5nbNbBgnlnyzRYXbjxaVfFXjgFeA4605k+aY=; b=UlWB7+dSx1ECq7L0cK/8dJue3RWVgKSEKRbHK8Bfc36QF81TJBlzg6LZyxC7kz9Aju GJqpIflv1PWmSo6wZNY1/mOkGqTsH09OrAHI5vN/eoI7Kz5bgvIIQiWQ7uq67n3VhED6 Oe3+hr+Cq32/2RshaF1tQowkHJ38QtMceBIHOaCl0h6thuh9hTEJwCA7pNxWZuCdQUpe pDSb9YyNFVb49znP5nhWqT5gmVmhCOmGjVCohYDSnM1Rd/YEBXIv4glN9fT/f5zL3b89 nL0FHw3sCOPoxzQZ8fbeqVdtcXxpKBZjdMeY1kqsbaE1bTXYcb05MJ7ZPT1WE3wJeyr6 7tWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m3si4036524pgc.262.2017.11.02.10.27.51; Thu, 02 Nov 2017 10:28:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755592AbdKBRZ0 (ORCPT + 97 others); Thu, 2 Nov 2017 13:25:26 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54892 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754198AbdKBRZY (ORCPT ); Thu, 2 Nov 2017 13:25:24 -0400 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vA2HOoB8065683 for ; Thu, 2 Nov 2017 13:25:24 -0400 Received: from e06smtp11.uk.ibm.com (e06smtp11.uk.ibm.com [195.75.94.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 2e07kwggck-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 02 Nov 2017 13:25:24 -0400 Received: from localhost by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 2 Nov 2017 17:25:21 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp11.uk.ibm.com (192.168.101.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 2 Nov 2017 17:25:13 -0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vA2HPDLj24576040; Thu, 2 Nov 2017 17:25:13 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0A8284204B; Thu, 2 Nov 2017 17:20:25 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C03FC4203F; Thu, 2 Nov 2017 17:20:23 +0000 (GMT) Received: from [9.101.4.33] (unknown [9.101.4.33]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 2 Nov 2017 17:20:23 +0000 (GMT) Subject: Re: [PATCH v5 07/22] mm: Protect VMA modifications using VMA sequence count From: Laurent Dufour To: Andrea Arcangeli Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Alexei Starovoitov , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org References: <1507729966-10660-1-git-send-email-ldufour@linux.vnet.ibm.com> <1507729966-10660-8-git-send-email-ldufour@linux.vnet.ibm.com> <20171026101833.GF563@redhat.com> <2cbea37c-c2a7-bfd4-4528-fd273b210e29@linux.vnet.ibm.com> Date: Thu, 2 Nov 2017 18:25:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <2cbea37c-c2a7-bfd4-4528-fd273b210e29@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17110217-0040-0000-0000-00000409DB0F X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17110217-0041-0000-0000-000020AC6872 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-02_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=9 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711020212 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/11/2017 16:16, Laurent Dufour wrote: > Hi Andrea, > > Thanks for reviewing this series, and sorry for the late answer, I took few > days off... > > On 26/10/2017 12:18, Andrea Arcangeli wrote: >> Hello Laurent, >> >> Message-ID: <7ca80231-fe02-a3a7-84bc-ce81690ea051@intel.com> shows >> significant slowdown even for brk/malloc ops both single and >> multi threaded. >> >> The single threaded case I think is the most important because it has >> zero chance of getting back any benefit later during page faults. >> >> Could you check if: >> >> 1. it's possible change vm_write_begin to be a noop if mm->mm_count is >> <= 1? Hint: clone() will run single threaded so there's no way it can run >> in the middle of a being/end critical section (clone could set an >> MMF flag to possibly keep the sequence counter activated if a child >> thread exits and mm_count drops to 1 while the other cpu is in the >> middle of a critical section in the other thread). > > This sounds to be a good idea, I'll dig on that. > The major risk here is to have a thread calling vm_*_begin() with > mm->mm_count > 1 and later calling vm_*_end() with mm->mm_count <= 1, but > as you mentioned we should find a way to work around this. > >> >> 2. Same thing with RCU freeing of vmas. Wouldn't it be nicer if RCU >> freeing happened only once a MMF flag is set? That will at least >> reduce the risk of temporary memory waste until the next RCU grace >> period. The read of the MMF will scale fine. Of course to allow >> point 1 and 2 then the page fault should also take the mmap_sem >> until the MMF flag is set. >> > > I think we could also deal with the mm->mm_count value here, if there is > only one thread, no need to postpone the VMA's free operation. Isn't it ? > Also, if mm->mm_count <= 1, there is no need to try the speculative path. > >> Could you also investigate a much bigger change: I wonder if it's >> possible to drop the sequence number entirely from the vma and stop >> using sequence numbers entirely (which is likely the source of the >> single threaded regression in point 1 that may explain the report in >> the above message-id), and just call the vma rbtree lookup once again >> and check that everything is still the same in the vma and the PT lock >> obtained is still a match to finish the anon page fault and fill the >> pte? > > That's an interesting idea. The big deal here would be to detect that the > VMA has been touched in our back, but there are not so much VMA's fields > involved in the speculative path so that sounds reasonable. The other point > is to identify the impact of the vma rbtree lookup, it's also a known > order, but there is the vma_srcu's lock involved. I think there is some memory barrier missing when the VMA is modified so currently the modifications done in the VMA structure may not be written down at the time the pte is locked. So doing that change will also requires to call smp_wmb() before locking the page tables. In the current patch this is ensured by the call to write_seqcount_end(). Doing so will still require to have a memory barrier when touching the VMA. Not sure we get far better performance compared to the sequence count change. But I'll give it a try anyway ;) >> >> Then of course we also need to add a method to the read-write >> semaphore so it tells us if there's already one user holding the read >> mmap_sem and we're the second one. If we're the second one (or more >> than second) only then we should skip taking the down_read mmap_sem. >> Even a multithreaded app won't ever skip taking the mmap_sem until >> there's sign of runtime contention, and it won't have to run the way >> more expensive sequence number-less revalidation during page faults, >> unless we get an immediate scalability payoff because we already know >> the mmap_sem is already contended and there are multiple nested >> threads in the page fault handler of the same mm. > > The problem is that we may have a thread entering the page fault path, > seeing that the mmap_sem is free, grab it and continue processing the page > fault. Then another thread is entering mprotect or any other mm service > which grab the mmap_sem and it will be blocked until the page fault is > done. The idea with the speculative page fault is also to not block the > other thread which may need to grab the mmap_sem. > >> >> Perhaps we'd need something more advanced than a >> down_read_trylock_if_not_hold() (which has to guaranteed not to write >> to any cacheline) and we'll have to count the per-thread exponential >> backoff of mmap_sem frequency, but starting with >> down_read_trylock_if_not_hold() would be good I think. >> >> This is not how the current patch works, the current patch uses a >> sequence number because it pretends to go lockless always and in turn >> has to slow down all vma updates fast paths or the revalidation >> slowsdown performance for page fault too much (as it always >> revalidates). >> >> I think it would be much better to go speculative only when there's >> "detected" runtime contention on the mmap_sem with >> down_read_trylock_if_not_hold() and that will make the revalidation >> cost not an issue to worry about because normally we won't have to >> revalidate the vma at all during page fault. In turn by making the >> revalidation more expensive by starting a vma rbtree lookup from >> scratch, we can drop the sequence number entirely and that should >> simplify the patch tremendously because all vm_write_begin/end would >> disappear from the patch and in turn the mmap/brk slowdown measured by >> the message-id above, should disappear as well. > > As I mentioned above, I'm not sure about checking the lock contention when > entering the page fault path, checking for the mm->mm_count or a dedicated > mm flags should be enough, but removing the sequence lock would be a very > good simplification. I'll dig further here, and come back soon. > > Thanks a lot, > Laurent. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From 1582968023407068959@xxx Thu Nov 02 15:19:07 +0000 2017 X-GM-THRID: 1580970013752612139 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread