Received: by 10.192.165.148 with SMTP id m20csp4922140imm; Tue, 1 May 2018 06:18:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqzdi+K6WfM/p9FHAn9XFp8eaYHxpJ5pWAywpIbTMszE/4QMJbi3pFUvUg8KC0TggZ+eSqq X-Received: by 10.98.65.132 with SMTP id g4mr15734542pfd.51.1525180719406; Tue, 01 May 2018 06:18:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525180719; cv=none; d=google.com; s=arc-20160816; b=KttizG14R2V21jMuwGM9P/E24ws7OB19GR8kvhGQOAb4TMGeOfTeS51VAQsXAa4f94 KWW5y16oT3K5lkYlq2tpuuRo1NGq+yzfRBTPTuhZQW8tF2HSnTYpZGg8M0+qqksKCY0i WbzH0Oqgkbi1kSJXHxyU/G72SyoQk1LCl7uJs+tJJ982wivCI8nNTw12A7Z8hfCunhdC LcW6e/lta/iHRjk2WIUdHpXItVS0GWPSleFEih8T+IwSTuG4CXHupeMf8lmah0fz2KNS g4mI8VKz/wza/Gyg+jDEcydm2OVu8P92PK5n5f737GrdoVBvP0wcmXxQmb/UaaFei3av qfVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=kVz3DfoKUakl6EjJjxUpeVHxbSxEWNnKw8txsqfUxeg=; b=OWqy6l6BMU5G6gs2TKU3qGWEOsrG/Skqysoe1Yn+aLFDHK4FhdOMt3tLRQqOafK224 cp0Hy685FvwmOyzNKQax6csY0TP+Y2Tji8t50kGysa0PM79U05tkwRJlNFkp12QZw7sv nqY46OafpbVPg/Xn9tbyl0wHNLk8jn5PkCqotnTMEf/OVUl286mQyBlE7hVIw07WIuTB AaWtawmgkkR/kfgHc+wj534Xdm3IClIrLb5+Ba+cE8/ZFRZ7nMqDKU7l9sJAEFkBT12r YnXWk4rqwcMg/vcq1mF9vIB7yGZAPuvr2Uw3ckM4f4nl+AeSnQ7TdbD+oYLUdJCNLr/H tPeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=P8bTjaAM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g130si4493158pfc.366.2018.05.01.06.18.24; Tue, 01 May 2018 06:18:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=P8bTjaAM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753856AbeEANRB (ORCPT + 99 others); Tue, 1 May 2018 09:17:01 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:37330 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751090AbeEANQ7 (ORCPT ); Tue, 1 May 2018 09:16:59 -0400 Received: by mail-pf0-f195.google.com with SMTP id e9so5113093pfi.4 for ; Tue, 01 May 2018 06:16:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=kVz3DfoKUakl6EjJjxUpeVHxbSxEWNnKw8txsqfUxeg=; b=P8bTjaAMxz5R/iWYClqzrffZ3rC7XixOxHl5jxKcmN8ZEqM8pdIpJnbtjIS+/8oQq7 tqOz/iUmhALF7M8P9rrR3I4OXvcjWEK7zGDKaHUx946zSj99aFwsDEZnhYwqbX0GAppH uH9l5N/VYLlZjbAaBxQ7MxrvOE4vNXQWfr1dMPwh5l50YuJOr7Z311H/j4uUjtgPSOse aUCmcdw+TsxFGJPyID58Cjmw+2urmcYzWGzIYXrNL2JKy+V1ZxMCTNZftUm/aar0qD8k 5U2EMrM4K7urqhTd/JZUJ59A9TVoI3W0lG7zPUsHQJj7xv+LbfP9dUXZ8Ghi6zQX5/J0 /Vyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=kVz3DfoKUakl6EjJjxUpeVHxbSxEWNnKw8txsqfUxeg=; b=iXWvTm37ulS5uwWlzCxX3GtFH8Q7vyu6WA7a9kkPg5mepF+QrF27P482vP0aWecXYc sN+70MvvnGfj+kRyL/kWzACLeFYSE4sh030en2Ht/ZKhhp+rB+MgVQdvpQVIeGhl8kmd sGFKwPGLtkoJ7AfcrV2K1oBmWlyIghkbcoGTu67GIAVn6wHnLD37nmrLVqo82vRWwgrk MIIoDeA16g9xdcU7MQ+VZcDXLJ1abrHaM7YsOLfEmMCjqqYsdvJPK6xSTJ+8aPgn4MG7 w1tfU4eTDzqsPZ5wdKs6DW02R2oDy1Tmp8ZuOLN+bb4Fn+vfLS868+Njd20jYCO6/wkl RsIA== X-Gm-Message-State: ALQs6tD8/Hh2Me4geuDjXyc+hbH5MoEkaxXpPWUg38lJGidGZFhdNDLY 5U/A2UsqwaoFyMb1V/qXoIQ= X-Received: by 2002:a63:618b:: with SMTP id v133-v6mr12664622pgb.285.1525180618392; Tue, 01 May 2018 06:16:58 -0700 (PDT) Received: from rodete-laptop-imager.corp.google.com ([122.38.223.241]) by smtp.gmail.com with ESMTPSA id b3-v6sm15056831pga.7.2018.05.01.06.16.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 01 May 2018 06:16:56 -0700 (PDT) Date: Tue, 1 May 2018 22:16:46 +0900 From: Minchan Kim To: Laurent Dufour Cc: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, sergey.senozhatsky.work@gmail.com, Daniel Jordan , David Rientjes , Jerome Glisse , Ganesh Mahendran , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v10 08/25] mm: VMA sequence count Message-ID: <20180501131646.GB118722@rodete-laptop-imager.corp.google.com> References: <1523975611-15978-1-git-send-email-ldufour@linux.vnet.ibm.com> <1523975611-15978-9-git-send-email-ldufour@linux.vnet.ibm.com> <20180423064259.GC114098@rodete-desktop-imager.corp.google.com> <06b996b0-b831-3d39-8a99-792abfb6a4d1@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <06b996b0-b831-3d39-8a99-792abfb6a4d1@linux.vnet.ibm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 30, 2018 at 05:14:27PM +0200, Laurent Dufour wrote: > > > On 23/04/2018 08:42, Minchan Kim wrote: > > On Tue, Apr 17, 2018 at 04:33:14PM +0200, Laurent Dufour wrote: > >> From: Peter Zijlstra > >> > >> Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence > >> counts such that we can easily test if a VMA is changed. > > > > So, seqcount is to protect modifying all attributes of vma? > > The seqcount is used to protect fields that will be used during the speculative > page fault like boundaries, protections. a VMA is changed, it was rather vague to me at this point. If you could specify detail fields or some example what seqcount aim for, it would help to review. > > >> > >> The unmap_page_range() one allows us to make assumptions about > >> page-tables; when we find the seqcount hasn't changed we can assume > >> page-tables are still valid. > > > > Hmm, seqcount covers page-table, too. > > Please describe what the seqcount want to protect. > > The calls to vm_write_begin/end() in unmap_page_range() are used to detect when > a VMA is being unmap and thus that new page fault should not be satisfied for > this VMA. This is protecting the VMA unmapping operation, not the page tables > themselves. Thanks for the detail. yes, please include this phrase instead of "page-table are still valid". It makes me confused. > > >> > >> The flip side is that we cannot distinguish between a vma_adjust() and > >> the unmap_page_range() -- where with the former we could have > >> re-checked the vma bounds against the address. > >> > >> Signed-off-by: Peter Zijlstra (Intel) > >> > >> [Port to 4.12 kernel] > >> [Build depends on CONFIG_SPECULATIVE_PAGE_FAULT] > >> [Introduce vm_write_* inline function depending on > >> CONFIG_SPECULATIVE_PAGE_FAULT] > >> [Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence by > >> using vm_raw_write* functions] > >> [Fix a lock dependency warning in mmap_region() when entering the error > >> path] > >> [move sequence initialisation INIT_VMA()] > >> Signed-off-by: Laurent Dufour > >> --- > >> include/linux/mm.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ > >> include/linux/mm_types.h | 3 +++ > >> mm/memory.c | 2 ++ > >> mm/mmap.c | 31 +++++++++++++++++++++++++++++++ > >> 4 files changed, 80 insertions(+) > >> > >> diff --git a/include/linux/mm.h b/include/linux/mm.h > >> index efc1248b82bd..988daf7030c9 100644 > >> --- a/include/linux/mm.h > >> +++ b/include/linux/mm.h > >> @@ -1264,6 +1264,9 @@ struct zap_details { > >> static inline void INIT_VMA(struct vm_area_struct *vma) > >> { > >> INIT_LIST_HEAD(&vma->anon_vma_chain); > >> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > >> + seqcount_init(&vma->vm_sequence); > >> +#endif > >> } > >> > >> struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > >> @@ -1386,6 +1389,47 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping, > >> unmap_mapping_range(mapping, holebegin, holelen, 0); > >> } > >> > >> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > >> +static inline void vm_write_begin(struct vm_area_struct *vma) > >> +{ > >> + write_seqcount_begin(&vma->vm_sequence); > >> +} > >> +static inline void vm_write_begin_nested(struct vm_area_struct *vma, > >> + int subclass) > >> +{ > >> + write_seqcount_begin_nested(&vma->vm_sequence, subclass); > >> +} > >> +static inline void vm_write_end(struct vm_area_struct *vma) > >> +{ > >> + write_seqcount_end(&vma->vm_sequence); > >> +} > >> +static inline void vm_raw_write_begin(struct vm_area_struct *vma) > >> +{ > >> + raw_write_seqcount_begin(&vma->vm_sequence); > >> +} > >> +static inline void vm_raw_write_end(struct vm_area_struct *vma) > >> +{ > >> + raw_write_seqcount_end(&vma->vm_sequence); > >> +} > >> +#else > >> +static inline void vm_write_begin(struct vm_area_struct *vma) > >> +{ > >> +} > >> +static inline void vm_write_begin_nested(struct vm_area_struct *vma, > >> + int subclass) > >> +{ > >> +} > >> +static inline void vm_write_end(struct vm_area_struct *vma) > >> +{ > >> +} > >> +static inline void vm_raw_write_begin(struct vm_area_struct *vma) > >> +{ > >> +} > >> +static inline void vm_raw_write_end(struct vm_area_struct *vma) > >> +{ > >> +} > >> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ > >> + > >> extern int access_process_vm(struct task_struct *tsk, unsigned long addr, > >> void *buf, int len, unsigned int gup_flags); > >> extern int access_remote_vm(struct mm_struct *mm, unsigned long addr, > >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > >> index 21612347d311..db5e9d630e7a 100644 > >> --- a/include/linux/mm_types.h > >> +++ b/include/linux/mm_types.h > >> @@ -335,6 +335,9 @@ struct vm_area_struct { > >> struct mempolicy *vm_policy; /* NUMA policy for the VMA */ > >> #endif > >> struct vm_userfaultfd_ctx vm_userfaultfd_ctx; > >> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > >> + seqcount_t vm_sequence; > >> +#endif > >> } __randomize_layout; > >> > >> struct core_thread { > >> diff --git a/mm/memory.c b/mm/memory.c > >> index f86efcb8e268..f7fed053df80 100644 > >> --- a/mm/memory.c > >> +++ b/mm/memory.c > >> @@ -1503,6 +1503,7 @@ void unmap_page_range(struct mmu_gather *tlb, > >> unsigned long next; > >> > >> BUG_ON(addr >= end); > > > > The comment about saying it aims for page-table stability will help. > > A comment may be added mentioning that we use the seqcount to indicate that the > VMA is modified, being unmapped. But there is not a real page table protection, > and I think this may be confusing to talk about page table stability here. Okay, so here you mean seqcount is not protecting VMA's fields but vma unmap operation like you mentioned above. I was confused like below description. "The unmap_page_range() one allows us to make assumptions about page-tables; when we find the seqcount hasn't changed we can assume page-tables are still valid" Instead of using page-tables's validness in descriptoin, it would be better to use scenario you mentioned about VMA unmap operation and page fault race. Thanks.