Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1315950imm; Wed, 17 Oct 2018 17:57:49 -0700 (PDT) X-Google-Smtp-Source: ACcGV61oVQ+0AkC3/+srC35ibX47gh2AD6uP/6kokmKhzpCNqtKKRYIBVDJINEMxvv5DihLYlfz4 X-Received: by 2002:a62:444f:: with SMTP id r76-v6mr29298549pfa.111.1539824269916; Wed, 17 Oct 2018 17:57:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539824269; cv=none; d=google.com; s=arc-20160816; b=uJ5LCiTKRpYuQXxZKvYdMCl9Q66sdE74SULp9kDUHqyEvkPDqy4a4UtKYFeD3trX4x vGvxawdp+5VtHlUzDOn8Nl8O6sD+wA9fgshc4PHS1CD4hDYw8mOCD0rE+PvIJVYQiGD9 CaXc4SNAq/yRV+HPV9AEgsTLDGHrsaa/LDKYJkxPM5Do8IRhmYnDqDVS37GVt8q9Fzh2 OX7siJdbHZxvkWKTfF3ZDWE+torkrClBpLuiWHHl34S9KnGLrpm8Juq/kUh1M4ykc1Uf C3ibEwa75w0zIxhmW/TunQHdaCU+DUvfU8LWfy3i/TKKCQS7ZidLP0fB8DH22Vgr2tcn /uUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=fMVOY8nIAQck3gqcyyibIeGvGjr6FMQxEfYVNhUu2sE=; b=dhE3FGFxDnvYD06qNXAklT/UpBVyTOkp9Nuxa7MyRDH54/IGvSCnKSYzc4ycNYG+mj hFs4EymT/zvGadG+d+A0y4d5Pp8ZfWJQUbBqz9aLnfC/C0slYEHSFU42m0Sr8wmr4BRX +MNQN9moCbPCcFfUdAH1K+U+i9iNQvH5tDGI9sxOie2meC42dmpdRYf1VJKipXAztj6J Ufgpiaujj4qNc/MrHhP7BZktdolQuu4zEGEXkTBCnRov+wbyB7kZGR3byYsNq8RDAfXT AMGXrzZiL9uAYNtLrHJkEmOQMk7VBl9VXuQR1w3COF+HQ1RN7lSjrFRGoBmcU1VCEpjY 5VgA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=vmware.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h22-v6si18944465pgi.368.2018.10.17.17.57.34; Wed, 17 Oct 2018 17:57:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=vmware.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727408AbeJRIyb (ORCPT + 99 others); Thu, 18 Oct 2018 04:54:31 -0400 Received: from ex13-edg-ou-001.vmware.com ([208.91.0.189]:46860 "EHLO EX13-EDG-OU-001.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726706AbeJRIya (ORCPT ); Thu, 18 Oct 2018 04:54:30 -0400 Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Wed, 17 Oct 2018 17:55:41 -0700 Received: from sc2-haas01-esx0118.eng.vmware.com (sc2-haas01-esx0118.eng.vmware.com [10.172.44.118]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 3989740827; Wed, 17 Oct 2018 17:56:09 -0700 (PDT) From: Nadav Amit To: Ingo Molnar CC: Andy Lutomirski , Peter Zijlstra , "H . Peter Anvin " , Thomas Gleixner , , Nadav Amit , , Borislav Petkov , David Woodhouse , Nadav Amit Subject: [RFC PATCH 0/5] x86: dynamic indirect call promotion Date: Wed, 17 Oct 2018 17:54:15 -0700 Message-ID: <20181018005420.82993-1-namit@vmware.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (EX13-EDG-OU-001.vmware.com: namit@vmware.com does not designate permitted sender hosts) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This RFC introduces indirect call promotion in runtime, which for the matter of simplification (and branding) will be called here "relpolines" (relative call + trampoline). Relpolines are mainly intended as a way of reducing retpoline overheads due to Spectre v2. Unlike indirect call promotion through profile guided optimization, the proposed approach does not require a profiling stage, works well with modules whose address is unknown and can adapt to changing workloads. The main idea is simple: for every indirect call, we inject a piece of code with fast- and slow-path calls. The fast path is used if the target matches the expected (hot) target. The slow-path uses a retpoline. During training, the slow-path is set to call a function that saves the call source and target in a hash-table and keep count for call frequency. The most common target is then patched into the hot path. The patching is done on-the-fly by patching the conditional branch (opcode and offset) that is used to compare the target to the hot target. This allows to direct all cores to the fast-path, while patching the slow-path and vice-versa. Patching follows 2 more rules: (1) Only patch a single byte when the code might be executed by any core. (2) When patching more than one byte, ensure that all cores do not run the to-be-patched-code by preventing this code from being preempted, and using synchronize_sched() after patching the branch that jumps over this code. Changing all the indirect calls to use relpolines is done using assembly macro magic. There are alternative solutions, but this one is relatively simple and transparent. There is also logic to retrain the software predictor, but the policy it uses may need to be refined. Eventually the results are not bad (2 VCPU VM, throughput reported): base relpoline ---- --------- nginx 22898 25178 (+10%) redis-ycsb 24523 25486 (+4%) dbench 2144 2103 (+2%) When retpolines are disabled, and if retraining is off, performance benefits are up to 2% (nginx), but are much less impressive. There are several open issues: retraining should be done when modules are removed; CPU hotplug is not supported, x86-32 is probably broken and the Makefile does not rebuild when the relpoline code is changed. Having said that, I am worried that some of the approaches I took would challenge the new code-of-conduct, so I though of getting some feedback before putting more effort into it. Nadav Amit (5): x86: introduce preemption disable prefix x86: patch indirect branch promotion x86: interface for accessing indirect branch locations x86: learning and patching indirect branch targets x86: relpoline: disabling interface arch/x86/entry/entry_64.S | 10 + arch/x86/include/asm/nospec-branch.h | 158 +++++ arch/x86/include/asm/sections.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/asm-offsets.c | 6 + arch/x86/kernel/macros.S | 1 + arch/x86/kernel/nospec-branch.c | 899 +++++++++++++++++++++++++++ arch/x86/kernel/vmlinux.lds.S | 7 + arch/x86/lib/retpoline.S | 75 +++ include/linux/module.h | 5 + kernel/module.c | 8 + kernel/seccomp.c | 2 + 12 files changed, 1174 insertions(+) create mode 100644 arch/x86/kernel/nospec-branch.c -- 2.17.1