Received: by 10.192.165.148 with SMTP id m20csp1935628imm; Thu, 3 May 2018 07:46:18 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqTfi7JMVdYYigHJ65KWYsYqJe8BjMrDcXx4XGYTJvXzHNVPHVp16RUrtI0o7Ypm1SZOQ6i X-Received: by 2002:a63:79ce:: with SMTP id u197-v6mr19870812pgc.242.1525358778247; Thu, 03 May 2018 07:46:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525358778; cv=none; d=google.com; s=arc-20160816; b=pKRVNjcqJI4PRD5063K2i1meuRqDTwLB4Vk1XTfqh0Iihfm6Gb1tGBVDVRhUHeBZUX Wfto7wlwWfWU8Nk3PUxuBt/8VbQigvo2XOshvja9CpUR1dRg2Ht9hq3rI/ZD/xxM9D8C H0ylObEc6lAOlhjRZJDsM9yVP1si7YHtSdzs3X6iuJIJYEOTSkBmUVUhsIhPi/tLvons yVPLCSJwc3z5tZCr3HAdXtBMo31/T00KiYnN0mVffIfmbVzY7NsRfMx82NDKb6jQ8xnY aR8HaumWzlbckQlKLAhNamjCjr3v9x7SgQjm/6VCxS4zFkfqTdOqx21uVBcKv/mIVMXT o9dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=VBYa+K+Xvbj4RvktBU+P0st4VhJFJPBJ9PaHNSd4NiM=; b=ydIPV9FQaVZrstTLS1XtOg4UV5gvFKyv/vU9KPBMGdEJ7UZNX1y5fn6n445oUpMSku VkIox8aLfHM0t3PYauQuoYvGOnOUOcxHMOR02U4kvJRdClL6Im1sbGdkI5PKvoG094Tj 7ZH5+Xj2HIVYq0Id2mbHfB57WIPb/Y3Ycgu4EtAltqG7d0sxdUSOkw86dT9ZtO3Za1gq s7WCAwwc9ZafNAzVkOBNSu2H6rlH4c6AKt+mnrWK8Uj1l9zy+wGNdWgykOiy6u9SM+Ep lJs57RRAsEmqhF6HpPlx2aUSQyu+uPOM96PLxY5Q/4PZwxPgc9snKl1aJ3syHmOo8nB7 DAzg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o14-v6si11813112pgf.174.2018.05.03.07.46.04; Thu, 03 May 2018 07:46:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751350AbeECOpp (ORCPT + 99 others); Thu, 3 May 2018 10:45:45 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:51250 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751050AbeECOpm (ORCPT ); Thu, 3 May 2018 10:45:42 -0400 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w43EjFIk078657 for ; Thu, 3 May 2018 10:45:41 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2hr47h0ugu-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 03 May 2018 10:45:37 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 3 May 2018 15:45:17 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 3 May 2018 15:45:10 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w43Ej9Qh56950904; Thu, 3 May 2018 14:45:09 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39D4DAE045; Thu, 3 May 2018 15:34:43 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E6B90AE051; Thu, 3 May 2018 15:34:41 +0100 (BST) Received: from [9.101.4.33] (unknown [9.101.4.33]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 3 May 2018 15:34:41 +0100 (BST) Subject: Re: [PATCH v10 08/25] mm: VMA sequence count To: Minchan Kim Cc: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org References: <1523975611-15978-1-git-send-email-ldufour@linux.vnet.ibm.com> <1523975611-15978-9-git-send-email-ldufour@linux.vnet.ibm.com> <20180423064259.GC114098@rodete-desktop-imager.corp.google.com> <06b996b0-b831-3d39-8a99-792abfb6a4d1@linux.vnet.ibm.com> <20180501131646.GB118722@rodete-laptop-imager.corp.google.com> From: Laurent Dufour Date: Thu, 3 May 2018 16:45:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180501131646.GB118722@rodete-laptop-imager.corp.google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18050314-0040-0000-0000-000004359B32 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18050314-0041-0000-0000-00002639BF11 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-05-03_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805030130 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/05/2018 15:16, Minchan Kim wrote: > On Mon, Apr 30, 2018 at 05:14:27PM +0200, Laurent Dufour wrote: >> >> >> On 23/04/2018 08:42, Minchan Kim wrote: >>> On Tue, Apr 17, 2018 at 04:33:14PM +0200, Laurent Dufour wrote: >>>> From: Peter Zijlstra >>>> >>>> Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence >>>> counts such that we can easily test if a VMA is changed. >>> >>> So, seqcount is to protect modifying all attributes of vma? >> >> The seqcount is used to protect fields that will be used during the speculative >> page fault like boundaries, protections. > > a VMA is changed, it was rather vague to me at this point. > If you could specify detail fields or some example what seqcount aim for, > it would help to review. Got it, I'll try to make it more explicit in the commit message, mentioning which fields are used in the speculative path and to be protected using the VMA sequence counter. > >> >>>> >>>> The unmap_page_range() one allows us to make assumptions about >>>> page-tables; when we find the seqcount hasn't changed we can assume >>>> page-tables are still valid. >>> >>> Hmm, seqcount covers page-table, too. >>> Please describe what the seqcount want to protect. >> >> The calls to vm_write_begin/end() in unmap_page_range() are used to detect when >> a VMA is being unmap and thus that new page fault should not be satisfied for >> this VMA. This is protecting the VMA unmapping operation, not the page tables >> themselves. > > Thanks for the detail. yes, please include this phrase instead of "page-table > are still valid". It makes me confused. Sure, will do. > >> >>>> >>>> The flip side is that we cannot distinguish between a vma_adjust() and >>>> the unmap_page_range() -- where with the former we could have >>>> re-checked the vma bounds against the address. >>>> >>>> Signed-off-by: Peter Zijlstra (Intel) >>>> >>>> [Port to 4.12 kernel] >>>> [Build depends on CONFIG_SPECULATIVE_PAGE_FAULT] >>>> [Introduce vm_write_* inline function depending on >>>> CONFIG_SPECULATIVE_PAGE_FAULT] >>>> [Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence by >>>> using vm_raw_write* functions] >>>> [Fix a lock dependency warning in mmap_region() when entering the error >>>> path] >>>> [move sequence initialisation INIT_VMA()] >>>> Signed-off-by: Laurent Dufour >>>> --- >>>> include/linux/mm.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ >>>> include/linux/mm_types.h | 3 +++ >>>> mm/memory.c | 2 ++ >>>> mm/mmap.c | 31 +++++++++++++++++++++++++++++++ >>>> 4 files changed, 80 insertions(+) >>>> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index efc1248b82bd..988daf7030c9 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -1264,6 +1264,9 @@ struct zap_details { >>>> static inline void INIT_VMA(struct vm_area_struct *vma) >>>> { >>>> INIT_LIST_HEAD(&vma->anon_vma_chain); >>>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT >>>> + seqcount_init(&vma->vm_sequence); >>>> +#endif >>>> } >>>> >>>> struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, >>>> @@ -1386,6 +1389,47 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping, >>>> unmap_mapping_range(mapping, holebegin, holelen, 0); >>>> } >>>> >>>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT >>>> +static inline void vm_write_begin(struct vm_area_struct *vma) >>>> +{ >>>> + write_seqcount_begin(&vma->vm_sequence); >>>> +} >>>> +static inline void vm_write_begin_nested(struct vm_area_struct *vma, >>>> + int subclass) >>>> +{ >>>> + write_seqcount_begin_nested(&vma->vm_sequence, subclass); >>>> +} >>>> +static inline void vm_write_end(struct vm_area_struct *vma) >>>> +{ >>>> + write_seqcount_end(&vma->vm_sequence); >>>> +} >>>> +static inline void vm_raw_write_begin(struct vm_area_struct *vma) >>>> +{ >>>> + raw_write_seqcount_begin(&vma->vm_sequence); >>>> +} >>>> +static inline void vm_raw_write_end(struct vm_area_struct *vma) >>>> +{ >>>> + raw_write_seqcount_end(&vma->vm_sequence); >>>> +} >>>> +#else >>>> +static inline void vm_write_begin(struct vm_area_struct *vma) >>>> +{ >>>> +} >>>> +static inline void vm_write_begin_nested(struct vm_area_struct *vma, >>>> + int subclass) >>>> +{ >>>> +} >>>> +static inline void vm_write_end(struct vm_area_struct *vma) >>>> +{ >>>> +} >>>> +static inline void vm_raw_write_begin(struct vm_area_struct *vma) >>>> +{ >>>> +} >>>> +static inline void vm_raw_write_end(struct vm_area_struct *vma) >>>> +{ >>>> +} >>>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ >>>> + >>>> extern int access_process_vm(struct task_struct *tsk, unsigned long addr, >>>> void *buf, int len, unsigned int gup_flags); >>>> extern int access_remote_vm(struct mm_struct *mm, unsigned long addr, >>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >>>> index 21612347d311..db5e9d630e7a 100644 >>>> --- a/include/linux/mm_types.h >>>> +++ b/include/linux/mm_types.h >>>> @@ -335,6 +335,9 @@ struct vm_area_struct { >>>> struct mempolicy *vm_policy; /* NUMA policy for the VMA */ >>>> #endif >>>> struct vm_userfaultfd_ctx vm_userfaultfd_ctx; >>>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT >>>> + seqcount_t vm_sequence; >>>> +#endif >>>> } __randomize_layout; >>>> >>>> struct core_thread { >>>> diff --git a/mm/memory.c b/mm/memory.c >>>> index f86efcb8e268..f7fed053df80 100644 >>>> --- a/mm/memory.c >>>> +++ b/mm/memory.c >>>> @@ -1503,6 +1503,7 @@ void unmap_page_range(struct mmu_gather *tlb, >>>> unsigned long next; >>>> >>>> BUG_ON(addr >= end); >>> >>> The comment about saying it aims for page-table stability will help. >> >> A comment may be added mentioning that we use the seqcount to indicate that the >> VMA is modified, being unmapped. But there is not a real page table protection, >> and I think this may be confusing to talk about page table stability here. > > Okay, so here you mean seqcount is not protecting VMA's fields but vma unmap > operation like you mentioned above. I was confused like below description. > > "The unmap_page_range() one allows us to make assumptions about > page-tables; when we find the seqcount hasn't changed we can assume > page-tables are still valid" > > Instead of using page-tables's validness in descriptoin, it would be better > to use scenario you mentioned about VMA unmap operation and page fault race. Ok will do that. Thanks.