Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751463AbdHaS0B (ORCPT ); Thu, 31 Aug 2017 14:26:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53528 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750996AbdHaSZ7 (ORCPT ); Thu, 31 Aug 2017 14:25:59 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 951CE4A6F1 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jglisse@redhat.com Date: Thu, 31 Aug 2017 14:25:55 -0400 From: Jerome Glisse To: Andrea Arcangeli Cc: Linus Torvalds , Linux Kernel Mailing List , linux-mm , Dan Williams , Ross Zwisler , Bernhard Held , Adam Borowski , Radim =?utf-8?B?S3LEjW3DocWZ?= , Wanpeng Li , Paolo Bonzini , Takashi Iwai , Nadav Amit , Mike Galbraith , "Kirill A . Shutemov" , axie , Andrew Morton Subject: Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic Message-ID: <20170831182555.GF9227@redhat.com> References: <20170829235447.10050-1-jglisse@redhat.com> <20170829235447.10050-3-jglisse@redhat.com> <20170830165250.GD13559@redhat.com> <20170830230125.GL13559@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170830230125.GL13559@redhat.com> User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 31 Aug 2017 18:25:59 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2961 Lines: 64 On Thu, Aug 31, 2017 at 01:01:25AM +0200, Andrea Arcangeli wrote: > On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote: > > On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli wrote: > > > > > > I pointed out in earlier email ->invalidate_range can only be > > > implemented (as mutually exclusive alternative to > > > ->invalidate_range_start/end) by secondary MMUs that shares the very > > > same pagetables with the core linux VM of the primary MMU, and those > > > invalidate_range are already called by > > > __mmu_notifier_invalidate_range_end. > > > > I have to admit that I didn't notice that fact - that we are already > > in the situation that > > invalidate_range is called by by the rand_end() nofifier. > > > > I agree that that should simplify all the code, and means that we > > don't have to worry about the few cases that already implemented only > > the "invalidate_page()" and "invalidate_range()" cases. > > > > So I think that simplifies J?r?me's patch further - once you have put > > the range_start/end() cases around the inner loop, you can just drop > > the invalidate_page() things entirely. > > > > > So this conversion from invalidate_page to invalidate_range looks > > > superflous and the final mmu_notifier_invalidate_range_end should be > > > enough. > > > > Yes. I missed the fact that we already called range() from range_end(). > > > > That said, the double call shouldn't hurt correctness, and it's > > "closer" to old behavior for those people who only did the range/page > > ones, so I wonder if we can keep J?r?me's patch in its current state > > for 4.13. > > Yes, the double call doesn't hurt correctness. Keeping it in current > state is safer if something, so I've no objection to it other than I'd > like to optimize it further if possible, but it can be done later. > > We're already running the double call in various fast paths too in > fact, and rmap walk isn't the fastest path that would be doing such > double call, so it's not a major concern. > > Also not a bug, but one further (but more obviously safe) enhancement > I would like is to restrict those rmap invalidation ranges to > PAGE_SIZE << compound_order(page) instead of PMD_SIZE/PMD_MASK. > > + /* > + * We have to assume the worse case ie pmd for invalidation. Note that > + * the page can not be free in this function as call of try_to_unmap() > + * must hold a reference on the page. > + */ > + end = min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE); > + mmu_notifier_invalidate_range_start(vma->vm_mm, start, end); > > We don't need to invalidate 2MB of secondary MMU mappings surrounding > a 4KB page, just to swapout a 4k page. split_huge_page can't run while > holding the rmap locks, so compound_order(page) is safe to use there. > > It can also be optimized incrementally later. This optimization is safe i believe. Linus i can respin with that and with further kvm dead code removal. J?r?me