Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753083AbbGWKts (ORCPT ); Thu, 23 Jul 2015 06:49:48 -0400 Received: from eu-smtp-delivery-143.mimecast.com ([207.82.80.143]:8467 "EHLO eu-smtp-delivery-143.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752485AbbGWKtm convert rfc822-to-8bit (ORCPT ); Thu, 23 Jul 2015 06:49:42 -0400 Date: Thu, 23 Jul 2015 11:49:38 +0100 From: Catalin Marinas To: Dave Hansen Cc: David Rientjes , linux-mm , Linux Kernel Mailing List , Andrew Morton , Andrea Arcangeli Subject: Re: [PATCH] mm: Flush the TLB for a single address in a huge page Message-ID: <20150723104938.GA27052@e104818-lin.cambridge.arm.com> References: <1437585214-22481-1-git-send-email-catalin.marinas@arm.com> <55B021B1.5020409@intel.com> MIME-Version: 1.0 In-Reply-To: <55B021B1.5020409@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-OriginalArrivalTime: 23 Jul 2015 10:49:38.0482 (UTC) FILETIME=[45911920:01D0C535] X-MC-Unique: VxwUqKDBTVKwEWvdRrndmw-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2095 Lines: 48 On Thu, Jul 23, 2015 at 12:05:21AM +0100, Dave Hansen wrote: > On 07/22/2015 03:48 PM, Catalin Marinas wrote: > > You are right, on x86 the tlb_single_page_flush_ceiling seems to be > > 33, so for an HPAGE_SIZE range the code does a local_flush_tlb() > > always. I would say a single page TLB flush is more efficient than a > > whole TLB flush but I'm not familiar enough with x86. > > The last time I looked, the instruction to invalidate a single page is > more expensive than the instruction to flush the entire TLB. I was thinking of the overall cost of re-populating the TLB after being nuked rather than the instruction itself. > We also don't bother doing ranged flushes _ever_ for hugetlbfs TLB > invalidations, but that was just because the work done around commit > e7b52ffd4 didn't see any benefit. For huge pages, there are indeed fewer page table levels to fetch, so I guess the impact is not significant. With virtualisation/nested pages, at least on ARM, refilling the TLB for guest would take longer (though it's highly dependent on the microarchitecture implementation, whether it caches the guest PA to host PA separately). > That said, I can't imagine this will hurt anything. We also have TLBs > that can mix 2M and 4k pages and I don't think we did back when we put > that code in originally. Another question is whether flushing a single address is enough for a huge page. I assumed it is since tlb_remove_pmd_tlb_entry() only adjusts the mmu_gather range by PAGE_SIZE (rather than HPAGE_SIZE) and no-one complained so far. AFAICT, there are only 3 architectures that don't use asm-generic/tlb.h but they all seem to handle this case: arch/arm: it implements tlb_remove_pmd_tlb_entry() in a similar way to the generic one arch/s390: tlb_remove_pmd_tlb_entry() is a no-op arch/ia64: does not support THP -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/