Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2558001yba; Mon, 22 Apr 2019 08:54:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqwL6XVmt0gBLWAkw2uS89fQ0suflP7YNt+jdSgj/IVBnk87+D7B2xlhcWmLj7F7poBnRKxL X-Received: by 2002:a17:902:31a4:: with SMTP id x33mr20618794plb.24.1555948457468; Mon, 22 Apr 2019 08:54:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555948457; cv=none; d=google.com; s=arc-20160816; b=elfll/ndjfpmCGivy8KtmYxFTLSIEMnABQgXPU0KRSFJUR/uXDJDgF8UV/yzvRLW6s iYezvycGF4IYtEN87y400EEQoXrqMilyYyd8p2hrPxh2tji5YX+FYTNLQ9IirDvFUH0O VPVU6/wuy2+SZsLFeDkqcZb3qSSTXgPNp4lv3h3LdwDkpNkxc11JHt6zK4jlMOttA3HL de/oX8y+wy6pOXBVeWq3R8OXBkIQjD6GJIlVdDJVYI310Ncwd62tMSrHD8Pz9mKhW/Zp i5fjsWVGLBZIqYuDtLKuQMkrOsY0dc8m8i796LvzWio2YdNUZGVNrl93350eI8ub6reJ DAqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=fC072qf+mCMjRutEAQYtab5sKujsgZWzioVp6Qf+3rw=; b=AgNENu3JsKoxzQZCXQ5w/03sapVTHusZnkO2Q9aP1ir//zLldm6wX2HkOrVq6tH1B1 AMMeZfPVtkIixTwKbOhu5G5RW//zKO0t8nIjxwrU6ZfHQ1G3jQVE6NHaBjONk8KEE9Jj s/Ymbbofy/YXiyW63ih3jUMqu0kZt7E507r6mr1Ec/1W4ORL+6cyscpO1zHrLhKIWtpP fjprG9nJ/vBWhI2vkYhlZlIQy1Iunx/C0+kZ4YnCECGerpi4Dl0G/fR16IIVU5fmNQsG ZKpc9UzJ8OyX2jFnJMffYk0nCKJbBewWLcOyvAqYx8MjDskFKTkbpt+zwJ6faBp/u05E KjdQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y13si13313374plp.238.2019.04.22.08.54.01; Mon, 22 Apr 2019 08:54:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727834AbfDVPvz (ORCPT + 99 others); Mon, 22 Apr 2019 11:51:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37138 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726945AbfDVPvy (ORCPT ); Mon, 22 Apr 2019 11:51:54 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BBD2C307D867; Mon, 22 Apr 2019 15:51:52 +0000 (UTC) Received: from redhat.com (unknown [10.20.6.236]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 730DE5D9D4; Mon, 22 Apr 2019 15:51:44 +0000 (UTC) Date: Mon, 22 Apr 2019 11:51:42 -0400 From: Jerome Glisse To: Laurent Dufour Cc: akpm@linux-foundation.org, mhocko@kernel.org, peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , aneesh.kumar@linux.ibm.com, benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , Sergey Senozhatsky , sergey.senozhatsky.work@gmail.com, Andrea Arcangeli , Alexei Starovoitov , kemi.wang@intel.com, Daniel Jordan , David Rientjes , Ganesh Mahendran , Minchan Kim , Punit Agrawal , vinayak menon , Yang Shi , zhong jiang , Haiyan Song , Balbir Singh , sj38.park@gmail.com, Michel Lespinasse , Mike Rapoport , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, npiggin@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v12 09/31] mm: VMA sequence count Message-ID: <20190422155142.GD3450@redhat.com> References: <20190416134522.17540-1-ldufour@linux.ibm.com> <20190416134522.17540-10-ldufour@linux.ibm.com> <20190418224857.GI11645@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Mon, 22 Apr 2019 15:51:53 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 19, 2019 at 05:45:57PM +0200, Laurent Dufour wrote: > Hi Jerome, > > Thanks a lot for reviewing this series. > > Le 19/04/2019 ? 00:48, Jerome Glisse a ?crit?: > > On Tue, Apr 16, 2019 at 03:45:00PM +0200, Laurent Dufour wrote: > > > From: Peter Zijlstra > > > > > > Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence > > > counts such that we can easily test if a VMA is changed. > > > > > > The calls to vm_write_begin/end() in unmap_page_range() are > > > used to detect when a VMA is being unmap and thus that new page fault > > > should not be satisfied for this VMA. If the seqcount hasn't changed when > > > the page table are locked, this means we are safe to satisfy the page > > > fault. > > > > > > The flip side is that we cannot distinguish between a vma_adjust() and > > > the unmap_page_range() -- where with the former we could have > > > re-checked the vma bounds against the address. > > > > > > The VMA's sequence counter is also used to detect change to various VMA's > > > fields used during the page fault handling, such as: > > > - vm_start, vm_end > > > - vm_pgoff > > > - vm_flags, vm_page_prot > > > - vm_policy > > > > ^ All above are under mmap write lock ? > > Yes, changes are still made under the protection of the mmap_sem. > > > > > > - anon_vma > > > > ^ This is either under mmap write lock or under page table lock > > > > So my question is do we need the complexity of seqcount_t for this ? > > The sequence counter is used to detect write operation done while readers > (SPF handler) is running. > > The implementation is quite simple (here without the lockdep checks): > > static inline void raw_write_seqcount_begin(seqcount_t *s) > { > s->sequence++; > smp_wmb(); > } > > I can't see why this is too complex here, would you elaborate on this ? > > > > > It seems that using regular int as counter and also relying on vm_flags > > when vma is unmap should do the trick. > > vm_flags is not enough I guess an some operation are not impacting the > vm_flags at all (resizing for instance). > Am I missing something ? > > > > > vma_delete(struct vm_area_struct *vma) > > { > > ... > > /* > > * Make sure the vma is mark as invalid ie neither read nor write > > * so that speculative fault back off. A racing speculative fault > > * will either see the flags as 0 or the new seqcount. > > */ > > vma->vm_flags = 0; > > smp_wmb(); > > vma->seqcount++; > > ... > > } > > Well I don't think we can safely clear the vm_flags this way when the VMA is > unmap, I think it is used later when cleaning is doen. > > Later in this series, the VMA deletion is managed when the VMA is unlinked > from the RB Tree. That is checked using the vm_rb field's value, and managed > using RCU. > > > Then: > > speculative_fault_begin(struct vm_area_struct *vma, > > struct spec_vmf *spvmf) > > { > > ... > > spvmf->seqcount = vma->seqcount; > > smp_rmb(); > > spvmf->vm_flags = vma->vm_flags; > > if (!spvmf->vm_flags) { > > // Back off the vma is dying ... > > ... > > } > > } > > > > bool speculative_fault_commit(struct vm_area_struct *vma, > > struct spec_vmf *spvmf) > > { > > ... > > seqcount = vma->seqcount; > > smp_rmb(); > > vm_flags = vma->vm_flags; > > > > if (spvmf->vm_flags != vm_flags || seqcount != spvmf->seqcount) { > > // Something did change for the vma > > return false; > > } > > return true; > > } > > > > This would also avoid the lockdep issue described below. But maybe what > > i propose is stupid and i will see it after further reviewing thing. > > That's true that the lockdep is quite annoying here. But it is still > interesting to keep in the loop to avoid 2 subsequent write_seqcount_begin() > call being made in the same context (which would lead to an even sequence > counter value while write operation is in progress). So I think this is > still a good thing to have lockdep available here. Ok so i had to read everything and i should have read everything before asking all of the above. It does look good in fact, what worried my in this patch is all the lockdep avoidance as it is usualy a red flags. But after thinking long and hard i do not see how to easily solve that one as unmap_page_range() is in so many different path... So what is done in this patch is the most sane thing. Sorry for the noise. So for this patch: Reviewed-by: J?r?me Glisse