Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754045AbbFIVyK (ORCPT ); Tue, 9 Jun 2015 17:54:10 -0400 Received: from mail-ie0-f177.google.com ([209.85.223.177]:36428 "EHLO mail-ie0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753859AbbFIVyC (ORCPT ); Tue, 9 Jun 2015 17:54:02 -0400 MIME-Version: 1.0 In-Reply-To: <55775749.3090004@intel.com> References: <1433767854-24408-1-git-send-email-mgorman@suse.de> <20150608174551.GA27558@gmail.com> <20150609084739.GQ26425@suse.de> <20150609103231.GA11026@gmail.com> <20150609112055.GS26425@suse.de> <20150609124328.GA23066@gmail.com> <5577078B.2000503@intel.com> <55771909.2020005@intel.com> <55775749.3090004@intel.com> Date: Tue, 9 Jun 2015 14:54:01 -0700 X-Google-Sender-Auth: vLpijtzi0SoCbf0CqJtPnwTTV3w Message-ID: Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5 From: Linus Torvalds To: Dave Hansen Cc: Ingo Molnar , Mel Gorman , Andrew Morton , Rik van Riel , Hugh Dickins , Minchan Kim , Andi Kleen , H Peter Anvin , Linux-MM , LKML , Peter Zijlstra , Thomas Gleixner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1600 Lines: 35 On Tue, Jun 9, 2015 at 2:14 PM, Dave Hansen wrote: > > The 0 cycle TLB miss was also interesting. It goes back up to something > reasonable if I put the mb()/mfence's back. So I've said it before, and I'll say it again: Intel does really well on TLB fills. The reason is partly historical, with Win95 doing a ton of TLB invalidation (I think every single GDI call ended up invalidating the TLB, so under some important Windows benchmarks of the time, you literally had a TLB flush every 10k instructions!). But partly it is because people are wrong in thinking that TLB fills have to be slow. There's a lot of complete garbage RISC machines where the TLB fill took forever, because in the name of simplicity it would stop the pipeline and often be done in SW. The zero-cycle TLB fill is obviously a bit optimistic, but at the same time it's not completely insane. TLB fills can be prefetched, and the table walker can hit the cache, if you do them right. And Intel mostly does. So the normal full (non-global) TLB fill really is fairly cheap. It's been optimized for, and the TLB gets re-filled fairly efficiently. I suspect that it's really the case that doing more than just a couple of single-tlb flushes is a complete waste of time: the flushing takes longer than re-filling the TLB well. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/