Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754480Ab2JZRBh (ORCPT ); Fri, 26 Oct 2012 13:01:37 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:53389 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753554Ab2JZRBg (ORCPT ); Fri, 26 Oct 2012 13:01:36 -0400 MIME-Version: 1.0 In-Reply-To: References: <20121025121617.617683848@chello.nl> <20121025124832.840241082@chello.nl> <5089F5B5.1050206@redhat.com> <508A0A0D.4090001@redhat.com> From: Linus Torvalds Date: Fri, 26 Oct 2012 10:01:14 -0700 X-Google-Sender-Auth: Pea2X07Xt08-kDeedhCIPHtk244 Message-ID: Subject: Re: [PATCH 05/31] x86/mm: Reduce tlb flushes from ptep_set_access_flags() To: Michel Lespinasse Cc: Rik van Riel , Peter Zijlstra , Andrea Arcangeli , Mel Gorman , Johannes Weiner , Thomas Gleixner , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ingo Molnar Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3946 Lines: 79 On Fri, Oct 26, 2012 at 5:34 AM, Michel Lespinasse wrote: > On Thu, Oct 25, 2012 at 9:23 PM, Linus Torvalds wrote: >> >> Yes. It's not architected as far as I know, though. But I agree, it's >> possible - even likely - we could avoid TLB flushing entirely on x86. > > Actually, it is architected on x86. This was first described in the > intel appnote 317080 "TLBs, Paging-Structure Caches, and Their > Invalidation", last paragraph of section 5.1. Nowadays, the same > contents are buried somewhere in Volume 3 of the architecture manual > (in my copy: 4.10.4.1 Operations that Invalidate TLBs and > Paging-Structure Caches) Good. I should have known it must be architected, because we've gone back-and-forth on this in the kernel historically. We used to have some TLB invalidates in the faulting path because I wasn't sure whether they were needed or not, but we clearly don't have them any more (and I suspect coverage was always spotty). And Intel (and AMD) have been very good at documenting as architected these kinds of details that people end up relying on even if they weren't necessarily originally explicitly documented. >> I *suspect* that whole TLB flush just magically became an SMP one >> without anybody ever really thinking about it. > > I would be very worried about assuming every non-x86 arch has similar > TLB semantics. However, if their fault handlers always invalidate TLB > for pages that get spurious faults, then skipping the remote > invalidation would be fine. (I believe this is what > tlb_fix_spurious_fault() is for ?) Yes. Of course, there may be some case where we unintentionally don't necessarily flush a faulting address (on some architecture that needs it), and then removing the cross-cpu invalidate could expose that pre-existing bug-let, and cause an infinite loop of page faults due to a TLB entry that never gets invalidated even if the page tables are actually up-to-date. So changing the mm/pgtable-generic.c function sounds like the right thing to do, but would be a bit more scary. Changing the x86 version sounds safe, *especially* since you point out that the "fault-causes-tlb-invalidate" is architected behavior. So I'd almost be willing to drop the invalidate in just one single commit, because it really should be safe. The only thing it does is guarantee that the accessed bit gets updated, and the accessed bit just isn't that important. If we never flush the TLB on another CPU that continues to use a TLB entry where the accessed bit is set (even if it's cleared in the in-memory page tables), the worst that can happen is that the accessed bit doesn't ever get set even if that CPU constantly uses the page. And nobody will *ever* care. The A bit is purely a heuristic for the page LRU thing, we don't care about irrelevant special cases that won't even affect correctness (much less performance - if that thing is really hot and stays in the TLB, if we evict it, it will immediately get reloaded anyway). And doing a TLB invalidate even locally is worthless: sure, setting the dirty bit and not invalidating the TLB can cause a local micro-tlb fault (not a software-visible one, just microarchitectural pipeline restart with TLB reload) on the next write access (because the TLB would still contain D=0), so *eve*if* the CPU didn't invalidate-on-fault, there's no reason we should invalidate in software on x86. Again, this can be different on non-x86 architectures with software dirty bits, where a stale TLB entry that never gets flushed could cause infinite TLB faults that never make progress, but that's really a TLB _walker_ issue, not a generic VM issue. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/