Received: by 10.223.164.202 with SMTP id h10csp34537wrb; Mon, 6 Nov 2017 02:01:18 -0800 (PST) X-Google-Smtp-Source: ABhQp+TBo+MM3dW7InyheBuoE77SQs7BVn51ukex825ZJN2gKu271EA/NcewJswHLxqH6X2BJ/v8 X-Received: by 10.98.65.218 with SMTP id g87mr16202360pfd.105.1509962478737; Mon, 06 Nov 2017 02:01:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1509962478; cv=none; d=google.com; s=arc-20160816; b=zx7rkJemdlJvAGWj83aLbf98KdSMAEEkvBr69CdDFJwTimu7DQV3M8sJ5PrbUVm0Km IT6IIvXE5qc8QBwwcq9TNa2xmPp6f2ONz1pdkdkE9zEe8xcp1T3FyTWZ7zFm9SgkefDq UHb5yxS4D8+8fDFh6AhytWB2Swr/KjcYjGZepdiBdEROiEXptfKOHm6Ct6bDj+zvnNst cCfRlOpKXLxLSZv24Kah6Cb9+MwoGU9m4rcHLOUXpBzxMkUubP0/NFRrABl/LZ/UYfQW MLJ13O0KvoKsdvh+OtjYcfVUtKnIbfkbKoCfZ7qQjHJyMnrZLhqfqoKBHOTnjRmDUD4m ABdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=9bSLuFeU63PhYec8n4ZZi6OCFi1W6bRR7WRzCAQQAFk=; b=ejT+rwmcgbqSgR9iix0XIwG7GmkYFAZ9XDvld+5dZgFr4TfD+qnCGERYa0Cff/W+Vf ay5wyGaIdQmboB/HnwI+04mGsB4AOhz4BlMMv/zAjWoVXO70MPRVeahihVgudO39vS7r 5tM+ttohgNNbQdPDROjoTXCYn6Ddf7NRG0zHNdVn2+jtMRCGaIt2yRg77Uhnx+ivjU8s luQuMtPALdwai9vV18UMFHXu7rcd3fUMzrLimrlvDdjax+c7o2tPhbKA5IbmM2OEMEt8 JNH1IZSHSvj3UVBFor68No7eUeNEfE61Oglj+pj2bbfP2D1aXATFzNKivG+3QkeTLB5f h3Gw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k12si11016067pgn.165.2017.11.06.02.01.03; Mon, 06 Nov 2017 02:01:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753330AbdKFKAa (ORCPT + 99 others); Mon, 6 Nov 2017 05:00:30 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:53304 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752071AbdKFJrM (ORCPT ); Mon, 6 Nov 2017 04:47:12 -0500 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vA69kApS097630 for ; Mon, 6 Nov 2017 04:47:12 -0500 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0a-001b2d01.pphosted.com with ESMTP id 2e2kv7cgfg-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 06 Nov 2017 04:47:11 -0500 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 6 Nov 2017 09:47:09 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 6 Nov 2017 09:47:03 -0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vA69l2Ix38797388; Mon, 6 Nov 2017 09:47:02 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 407BB42190; Mon, 6 Nov 2017 09:42:09 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7F78C4218F; Mon, 6 Nov 2017 09:42:07 +0000 (GMT) Received: from [9.145.153.227] (unknown [9.145.153.227]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 6 Nov 2017 09:42:07 +0000 (GMT) Subject: Re: [PATCH v5 07/22] mm: Protect VMA modifications using VMA sequence count To: Andrea Arcangeli Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org, akpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Alexei Starovoitov , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org References: <1507729966-10660-1-git-send-email-ldufour@linux.vnet.ibm.com> <1507729966-10660-8-git-send-email-ldufour@linux.vnet.ibm.com> <20171026101833.GF563@redhat.com> <2cbea37c-c2a7-bfd4-4528-fd273b210e29@linux.vnet.ibm.com> <20171102200840.GC22686@redhat.com> From: Laurent Dufour Date: Mon, 6 Nov 2017 10:47:00 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171102200840.GC22686@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17110609-0020-0000-0000-000003C8EB9A X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17110609-0021-0000-0000-0000425DF781 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-06_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711060137 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrea, On 02/11/2017 21:08, Andrea Arcangeli wrote: > On Thu, Nov 02, 2017 at 06:25:11PM +0100, Laurent Dufour wrote: >> I think there is some memory barrier missing when the VMA is modified so >> currently the modifications done in the VMA structure may not be written >> down at the time the pte is locked. So doing that change will also requires >> to call smp_wmb() before locking the page tables. In the current patch this >> is ensured by the call to write_seqcount_end(). >> Doing so will still require to have a memory barrier when touching the VMA. >> Not sure we get far better performance compared to the sequence count >> change. But I'll give it a try anyway ;) > > Luckily smp_wmb is a noop on x86. I would suggest to ignore the above > issue completely if you give it a try, and then if this performs, we > can just embed a smp_wmb() before spin_lock() somewhere in > pte_offset_map_lock/pte_lockptr/spin_lock_nested for those archs whose > spin_lock isn't a smp_wmb() equivalent. I would focus at flushing > writes before every pagetable spin_lock for non-x86 archs, rather than > after all vma modifications. That should be easier to keep under > control and it's going to be more efficient too as if something there > are fewer spin locks than vma modifications. I do agree that would simplify the patch series a lot. I'll double check that pte lock is not done in a loop other wise having smp_wmb() there will be bad. Another point I'm trying to double check is that we may have inconsistency while reading the vma's flags in the page fault path until the memory barrier got it in the VMA's changing path. Especially we may have vm_flags and vm_page_prot not matching at all, which couldn't happen when checking for the vm_sequence count. > > For non-x86 archs we may then need a smp_wmb__before_spin_lock. That > looks more self contained than surrounding all vma modifications and > it's a noop on x86 anyway. > > I thought about the contention detection logic too yesterday: to > detect contention we could have a mm->mmap_sem_contention_jiffies and > if down_read_trylock_exclusive() [same as down_read_if_not_hold in > prev mail] fails (and it'll fail if either read or write mmap_sem is > hold, so also convering mremap/mprotect etc..) we set > mm->mmap_sem_contention_jiffies = jiffies and then to know if you must > not touch the mmap_sem at all, you compare jiffies against > mmap_sem_contention_jiffies, if it's equal we go speculative. If > that's not enough we can just keep going speculative for a few more > jiffies with time_before(). The srcu lock is non concerning because the > inc/dec of the fast path is in per-cpu cacheline of course, no false > sharing possible there or it wouldn't be any better than a normal lock. I'm sorry, I should have missed something here. I can't see how this would help fixing the case where a thread is entering the page fault handler seeing that no one else has the mmap_sem and then grab it. While it is processing the page fault another thread is entering mprotect for instance and thus will wait for the mmap_sem to be released by the thread processing the page fault. Cheers, Laurent. > The vma revalidation is already done by khugepaged and mm/userfaultfd, > both need to drop the mmap_sem and continue working on the pagetables, > so we already know it's workable and not too slow. > > Summarizing.. by using a runtime contention triggered speculative > design that goes speculative only when contention is runtime-detected > using the above logic (or equivalent), and by having to revalidate the > vma by hand with find_vma without knowing instantly if the vma become > stale, we will run with a substantially slower speculative page fault > than with your current speculative always-on design, but the slower > speculative page fault runtime will still scale 100% in SMP so it > should still be faster on large SMP systems. The pros is that it won't > regress the mmap/brk vma modifications. The whole complexity of > tracking the vma modifications should also go away and the resulting > code should be more maintainable and less risky to break in subtle > ways impossible to reproduce. > > Thanks! > Andrea > From 1582986302507205657@xxx Thu Nov 02 20:09:39 +0000 2017 X-GM-THRID: 1580970013752612139 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread