Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751323AbdH3XB3 (ORCPT ); Wed, 30 Aug 2017 19:01:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60394 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750919AbdH3XB2 (ORCPT ); Wed, 30 Aug 2017 19:01:28 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 26398356F5 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=aarcange@redhat.com Date: Thu, 31 Aug 2017 01:01:25 +0200 From: Andrea Arcangeli To: Linus Torvalds Cc: =?iso-8859-1?B?Suly9G1l?= Glisse , Linux Kernel Mailing List , linux-mm , Dan Williams , Ross Zwisler , Bernhard Held , Adam Borowski , Radim =?utf-8?B?S3LEjW3DocWZ?= , Wanpeng Li , Paolo Bonzini , Takashi Iwai , Nadav Amit , Mike Galbraith , "Kirill A . Shutemov" , axie , Andrew Morton Subject: Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic Message-ID: <20170830230125.GL13559@redhat.com> References: <20170829235447.10050-1-jglisse@redhat.com> <20170829235447.10050-3-jglisse@redhat.com> <20170830165250.GD13559@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 30 Aug 2017 23:01:28 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3250 Lines: 75 On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote: > On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli wrote: > > > > I pointed out in earlier email ->invalidate_range can only be > > implemented (as mutually exclusive alternative to > > ->invalidate_range_start/end) by secondary MMUs that shares the very > > same pagetables with the core linux VM of the primary MMU, and those > > invalidate_range are already called by > > __mmu_notifier_invalidate_range_end. > > I have to admit that I didn't notice that fact - that we are already > in the situation that > invalidate_range is called by by the rand_end() nofifier. > > I agree that that should simplify all the code, and means that we > don't have to worry about the few cases that already implemented only > the "invalidate_page()" and "invalidate_range()" cases. > > So I think that simplifies J?r?me's patch further - once you have put > the range_start/end() cases around the inner loop, you can just drop > the invalidate_page() things entirely. > > > So this conversion from invalidate_page to invalidate_range looks > > superflous and the final mmu_notifier_invalidate_range_end should be > > enough. > > Yes. I missed the fact that we already called range() from range_end(). > > That said, the double call shouldn't hurt correctness, and it's > "closer" to old behavior for those people who only did the range/page > ones, so I wonder if we can keep J?r?me's patch in its current state > for 4.13. Yes, the double call doesn't hurt correctness. Keeping it in current state is safer if something, so I've no objection to it other than I'd like to optimize it further if possible, but it can be done later. We're already running the double call in various fast paths too in fact, and rmap walk isn't the fastest path that would be doing such double call, so it's not a major concern. Also not a bug, but one further (but more obviously safe) enhancement I would like is to restrict those rmap invalidation ranges to PAGE_SIZE << compound_order(page) instead of PMD_SIZE/PMD_MASK. + /* + * We have to assume the worse case ie pmd for invalidation. Note that + * the page can not be free in this function as call of try_to_unmap() + * must hold a reference on the page. + */ + end = min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE); + mmu_notifier_invalidate_range_start(vma->vm_mm, start, end); We don't need to invalidate 2MB of secondary MMU mappings surrounding a 4KB page, just to swapout a 4k page. split_huge_page can't run while holding the rmap locks, so compound_order(page) is safe to use there. It can also be optimized incrementally later. > Because I still want to release 4.13 this weekend, despite this > upheaval. Otherwise I'll have timing problems during the next merge > window. > > Andrea, do you otherwise agree with the whole series as is? I only wish we had more time to test Jerome's patchset, but I sure agree in principle and I don't see regressions in it. The callouts to ->invalidate_page seems to have diminished over time (for the various reasons we know) so if we don't use it for the fast paths, using it only in rmap walk slow paths probably wasn't providing much performance benefit. Thanks, Andrea