Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030636AbbD1Q24 (ORCPT ); Tue, 28 Apr 2015 12:28:56 -0400 Received: from mail-ie0-f173.google.com ([209.85.223.173]:34105 "EHLO mail-ie0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030204AbbD1Q2y (ORCPT ); Tue, 28 Apr 2015 12:28:54 -0400 MIME-Version: 1.0 In-Reply-To: <20150428155511.GF19025@pd.tnic> References: <20150427154631.GB28871@pd.tnic> <20150427164024.GD28871@pd.tnic> <20150427183854.GG28871@pd.tnic> <20150427185344.GI28871@pd.tnic> <61BCF405-8000-43EB-A6B1-2BF5677E4ADE@zytor.com> <20150427200329.GL28871@pd.tnic> <2F6CA156-F03F-4F49-A6B9-7D1D1E1D805B@zytor.com> <20150428155511.GF19025@pd.tnic> Date: Tue, 28 Apr 2015 09:28:52 -0700 X-Google-Sender-Auth: k0lW0U6PhKKcC7dI6bHv9tsuF_8 Message-ID: Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue From: Linus Torvalds To: Borislav Petkov Cc: "H. Peter Anvin" , Andy Lutomirski , Andy Lutomirski , X86 ML , Denys Vlasenko , Brian Gerst , Denys Vlasenko , Ingo Molnar , Steven Rostedt , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , Linux Kernel Mailing List , Mel Gorman Content-Type: multipart/mixed; boundary=001a1140f33cb61da40514cb5b5d Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5718 Lines: 121 --001a1140f33cb61da40514cb5b5d Content-Type: text/plain; charset=UTF-8 On Tue, Apr 28, 2015 at 8:55 AM, Borislav Petkov wrote: > > Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are > better than the 0F 1F 00 suggested by the manual (Haha!): That's which AMD CPU? On my intel i7-4770S, they are the same cost (I cut down your loop numbers by an order of magnitude each because I couldn't be arsed to wait for it, so it might be off by a cycle or two): Running 60 times, 1000000 loops per run. nop_0x90 average: 81.065681 nop_3_byte average: 80.230101 That said, I think your benchmark tests the speed of "rdtsc" rather than the no-ops. Putting the read_tsc inside the inner loop basically makes it swamp everything else. > $ taskset -c 3 ./nops > Running 600 times, 10000000 loops per run. > nop_0x90 average: 439.805220 > nop_3_byte average: 442.412915 I think that's in the noise, and could be explained by random alignment of the loop too, or even random factors like "the CPU heated up, so the later run was slightly slower". The difference between 439 and 442 doesn't strike me as all that significant. It might be better to *not* inline, and instead make a real function call to something that has a lot of no-ops (do some preprocessor magic to make more no-ops in one go). At least that way the alignment is likely the same for the two cases. Or if not that, then I think you're better off with something like p1 = read_tsc(); for (i = 0; i < LOOPS; i++) { nop_0x90(); } p2 = read_tsc(); r = (p2 - p1); because while you're now measuring the loop overhead too, that's *much* smaller than the rdtsc overhead. So I get something like Running 600 times, 1000000 loops per run. nop_0x90 average: 3.786935 nop_3_byte average: 3.677228 and notice the difference between "~80 cycles" and "~3.7 cycles". Yeah, that's rdtsc. I bet your 440 is about the same thing too. Btw, the whole thing about "averaging cycles" is not the right thing to do either. You should probably take the *minimum* cycles count, not the average, because anything non-minimal means "some perturbation" (ie interrupt etc). So I think something like the attached would be better. It gives an approximate "cycles per one four-byte nop", and I get [torvalds@i7 ~]$ taskset -c 3 ./a.out Running 60 times, 1000000 loops per run. nop_0x90 average: 0.200479 nop_3_byte average: 0.199694 which sounds suspiciously good to me (5 nops per cycle? uop cache and nop compression, I guess). Linus --001a1140f33cb61da40514cb5b5d Content-Type: text/x-csrc; charset=US-ASCII; name="t.c" Content-Disposition: attachment; filename="t.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i91j335s0 LyoKICogJCB0YXNrc2V0IC1jIDMgLi9ub3BzCiAqIFJ1bm5pbmcgNjAwIHRpbWVzLCAxMDAwMDAw MCBsb29wcyBwZXIgcnVuLgogKiBub3BfMHg5MCBhdmVyYWdlOiA0MzkuODA1MjIwCiAqIG5vcF8z X2J5dGUgYXZlcmFnZTogNDQyLjQxMjkxNQogKgogKiBIb3cgdG8gcnVuOgogKgogKiB0YXNrc2V0 IC1jIDxjcHVudW0+IGFyZ3YwCiAqLwojaW5jbHVkZSA8c3RkaW8uaD4KI2luY2x1ZGUgPHN5cy9z eXNjYWxsLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgoKdHlwZWRl ZiB1bnNpZ25lZCBsb25nIGxvbmcgdTY0OwoKI2RlZmluZSBUV08oYSkgYTsgYTsKI2RlZmluZSBG T1VSKGEpIFRXTyhUV08oYSkpCiNkZWZpbmUgU0lYVEVFTihhKSBGT1VSKEZPVVIoYSkpCiNkZWZp bmUgVFdPRklWRVNJWChhKSBTSVhURUVOKFNJWFRFRU4oYSkpCgojZGVmaW5lIERFQ0xBUkVfQVJH Uyh2YWwsIGxvdywgaGlnaCkgICAgdW5zaWduZWQgbG93LCBoaWdoCiNkZWZpbmUgRUFYX0VEWF9W QUwodmFsLCBsb3csIGhpZ2gpICAgICAoKGxvdykgfCAoKHU2NCkoaGlnaCkgPDwgMzIpKQojZGVm aW5lIEVBWF9FRFhfQVJHUyh2YWwsIGxvdywgaGlnaCkgICAgImEiIChsb3cpLCAiZCIgKGhpZ2gp CiNkZWZpbmUgRUFYX0VEWF9SRVQodmFsLCBsb3csIGhpZ2gpICAgICAiPWEiIChsb3cpLCAiPWQi IChoaWdoKQoKc3RhdGljIF9fYWx3YXlzX2lubGluZSB1bnNpZ25lZCBsb25nIGxvbmcgcmR0c2Mo dm9pZCkKewogICAgICAgIERFQ0xBUkVfQVJHUyh2YWwsIGxvdywgaGlnaCk7CgogICAgICAgIGFz bSB2b2xhdGlsZSgicmR0c2MiIDogRUFYX0VEWF9SRVQodmFsLCBsb3csIGhpZ2gpKTsKCiAgICAg ICAgcmV0dXJuIEVBWF9FRFhfVkFMKHZhbCwgbG93LCBoaWdoKTsKfQoKc3RhdGljIGlubGluZSB1 NjQgcmVhZF90c2Modm9pZCkKewoJdTY0IHJldDsKCglhc20gdm9sYXRpbGUoIm1mZW5jZSIpOwoJ cmV0ID0gcmR0c2MoKTsKCWFzbSB2b2xhdGlsZSgibWZlbmNlIik7CgoJcmV0dXJuIHJldDsKfQoK c3RhdGljIHZvaWQgbm9wXzB4OTAodm9pZCkKewoJVFdPRklWRVNJWChhc20gdm9sYXRpbGUoIi5i eXRlIDB4NjYsIDB4NjYsIDB4OTAiKSkKfQoKc3RhdGljIHZvaWQgbm9wXzNfYnl0ZSh2b2lkKQp7 CglUV09GSVZFU0lYKGFzbSB2b2xhdGlsZSgiLmJ5dGUgMHgwZiwgMHgxZiwgMHgwMCIpKQp9Cgpp bnQgbWFpbigpCnsKCWludCBpLCBqOwoJdTY0IHAxLCBwMjsKCXU2NCByLCBtaW47CgojZGVmaW5l IFRJTUVTIDYwCiNkZWZpbmUgTE9PUFMgMTAwMDAwMFVMTAoKCXByaW50ZigiUnVubmluZyAlZCB0 aW1lcywgJWxsZCBsb29wcyBwZXIgcnVuLlxuIiwgVElNRVMsIExPT1BTKTsKCgltaW4gPSAxMDAw MDAwMDA7CgoJZm9yIChyID0gMCwgaiA9IDA7IGogPCBUSU1FUzsgaisrKSB7CgkJcDEgPSByZWFk X3RzYygpOwoJCWZvciAoaSA9IDA7IGkgPCBMT09QUzsgaSsrKSB7CgkJCW5vcF8weDkwKCk7CgoJ CX0KCQlwMiA9IHJlYWRfdHNjKCk7CgkJciA9IChwMiAtIHAxKTsKCgkJaWYgKHIgPCBtaW4pCgkJ CW1pbiA9IHI7Cgl9CgoJcHJpbnRmKCJub3BfMHg5MCBhdmVyYWdlOiAlZlxuIiwgbWluIC8gKGRv dWJsZSkgTE9PUFMgLyAyNTYpOwoKCW1pbiA9IDEwMDAwMDAwMDsKCglmb3IgKHIgPSAwLCBqID0g MDsgaiA8IFRJTUVTOyBqKyspIHsKCQlwMSA9IHJlYWRfdHNjKCk7CgkJZm9yIChpID0gMDsgaSA8 IExPT1BTOyBpKyspIHsKCQkJbm9wXzNfYnl0ZSgpOwoJCX0KCQlwMiA9IHJlYWRfdHNjKCk7CgoJ CXIgPSAocDIgLSBwMSk7CgkJaWYgKHIgPCBtaW4pCgkJCW1pbiA9IHI7Cgl9CgoJcHJpbnRmKCJu b3BfM19ieXRlIGF2ZXJhZ2U6ICVmXG4iLCBtaW4gLyAoZG91YmxlKSBMT09QUyAvIDI1Nik7CgoJ cmV0dXJuIDA7Cn0K --001a1140f33cb61da40514cb5b5d-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/