Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753031AbbDCSJD (ORCPT ); Fri, 3 Apr 2015 14:09:03 -0400 Received: from mail-ig0-f182.google.com ([209.85.213.182]:33525 "EHLO mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752982AbbDCSI7 (ORCPT ); Fri, 3 Apr 2015 14:08:59 -0400 MIME-Version: 1.0 In-Reply-To: <551EC5CD.3050901@redhat.com> References: <1428059634-11782-1-git-send-email-dvlasenk@redhat.com> <20150403140336.GA16422@gmail.com> <551EC5CD.3050901@redhat.com> Date: Fri, 3 Apr 2015 11:08:58 -0700 X-Google-Sender-Auth: jBeGFsU1JN1-AlTDNn4pwjvh70s Message-ID: Subject: Re: [PATCH] x86/asm/entry/64: pack interrupt dispatch table tighter From: Linus Torvalds To: Denys Vlasenko Cc: Ingo Molnar , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , "the arch/x86 maintainers" , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3036 Lines: 84 On Fri, Apr 3, 2015 at 9:54 AM, Denys Vlasenko wrote: > > How about this version? > It's still isn't a star of readability, > but the structure of the 32-byte code block is more visible now... Do we really even want to be this clever in the first place? The thing is, when we take an interrupt: (a) the L1 I$ is always cold (b) the instruction decoder has never had time to run ahead (c) there are usually not that many different interrupts anyway, even under load (ie you'd have maybe disk and networking) (d) we intentionally spread out the different interrupt vector numbers (e) the 32-byte block thing is questionable, most older micro-architectures fetch in 16-byte blocks iirc. So what this tells me is that: - (a+b) the jump-to-jump is likely fairly expensive, because even though they are in the same cacheline, the front end hasn't gotten ahead of anything, so there's no hiding any front end pipeline hickups. - (c+d) there is likely very little advantage to trying to "pack" things in cachelines - (d+e) the 7-instructions-in-one-32-byte-block doesn't really sound all that big of a win, and it does cause a 16-byte split for some interrupt. In other words, I'd suggest that we just use simple unconditional 5-byte branch instead. Add the two-byte "push" instruction, you have 7 bytes per interrupt. Align that 7 bytes up to 8, and none of them ever cross a 16-byte boundary. Simple, clean, and slightly bigger in memory footprint, but probably not noticeably more so in cache footprint, simply because there usually aren't that many active interrupts anyway. The people who do millions of networking interrupts per second and have network cards that steer things to many different interrupts already try to make sure that the steering goes to different CPU's - otherwise there wouldn't be any *point* to steering things. So that particular case of "lots of active interrupts" doesn't have a bigger cache footprint *either*, since any particular CPU L1 I$ will still only handle a few interrupts. So you get "only" 4 interrupt cases per 32 bytes rather than 7. But is that odd double jump and all this complexity really worth it? So I really suggest just doing something stupid and straightforward (and completely untested) like this: .macro push_vector pushq_cfi $(~vector+0x80) jmp common_interrupt .align 8 .endm vector=FIRST_EXTERNAL_VECTOR .align 64 ENTRY(irq_entries_start) .rept 256 /* this number does not need to be exact, just big enough */ make_vector .endr and just be done with it. (Of course, you have to change the code that knows about the "7 entries in 32 bytes" patterns too, but that's just going to be much simpler now). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/