Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7196529ybi; Mon, 8 Jul 2019 16:41:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqz8WGG4yIyjVHeOmYD/Tp1hLEshiSfRuTmqGS9Iyw4MThkFqL1oOyxDSL94PbObURLWY7xC X-Received: by 2002:a17:902:6ac6:: with SMTP id i6mr28319224plt.233.1562629280057; Mon, 08 Jul 2019 16:41:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562629280; cv=none; d=google.com; s=arc-20160816; b=Uz0G44iBVrzW+/u0qW5OXCqM0xsew4132K31KRfBxlgTH1xm9H4U02++WQm3y7Pufr AxgBzuag2sxfpOO2STOc3YlxCwm9+ujKKmneB0AA6caRl0hDT6EfnfVnEOHgIwOqxGqw 5Itbzm9xg0fOHvJcs4IgMzpdOW3d6zBhxmriBeexTdOcIKtkwhU67bF9vQYumz0SXKCL ho2h1JYht+c2Z2JVINEBWofev6IPeeK7kjHW6GKBApdnbTZ1dNcES9HGXFhCnjQMxdHM S8jPlIjutbZMjuJd1WfE/TTjuigCYY083Pxpc5KoE2nPuYJEhrHkOhz4LU7GkHPU9sNX VcxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=XmrTaZHs1ZyCmACajXt2dKReVOHzkg6ke3uUVhyxo4M=; b=BPAgZLk48KWyABIxLoVuvx03gbAZ1QJ/Gtf5SkiIPF2ti2s81DzOtdpZelxQoz+pPY mUN9RlBHJ4zhWE2t501NIxDI4GOszMSP5c220QApVCnyw62/abMvYu2l+KMdIH72Db0K 1hUBlk5pNWrtEj9mQqQv2dLBLQ/g/YdkOgcetfi/T8EpjrcmhNBtkh0dFTqDWs0tZqWM y+rLr6Dl1SIrvaf3us0Wfw6ILfSpAqMw/9k3pKrRpWfZ4OoLcSw6fSVP7Fq6neiuaaQ6 Y/EtCnCcuLXphNJALmcvimdO8Aj50U+V/eN+sc7//WPViHM1e91oC81b+ApG8lG8yoH2 /XgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=aEqVYpqj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d6si937612pjc.7.2019.07.08.16.41.05; Mon, 08 Jul 2019 16:41:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=aEqVYpqj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726126AbfGHWt7 (ORCPT + 99 others); Mon, 8 Jul 2019 18:49:59 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:44804 "EHLO mail-lf1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725840AbfGHWts (ORCPT ); Mon, 8 Jul 2019 18:49:48 -0400 Received: by mail-lf1-f65.google.com with SMTP id r15so12020040lfm.11; Mon, 08 Jul 2019 15:49:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XmrTaZHs1ZyCmACajXt2dKReVOHzkg6ke3uUVhyxo4M=; b=aEqVYpqjDZcGuLK2ahuFzS+p2IEBZfuD4zCnmcDMu75SAR3MX00Y8tNziZQWqVxzmR jhIpgWzjXdsKpwRjXKd0z2Css+YwUfm3Nr2Xdqwzl8b4imYwNX21uICFBeMON75NQ3O7 EqF2oKFFRAnpC+h+BqXRfnBWijVkOJk9CnvvKpWO4h4FYl3sNSZGrG/UBCFvcP/+b2oB g2jfMuN4Kh5Ix2Qroi85YMPJCQ9UMneE0ksyoYpvsFHnYQLzfyMBpp1zt7qTVKAE/7mZ QV/AS+pw6X24acX9feE/Tk5wOeqeJ8pC3iLvqaRGnGIQ+MiriwAfTy/rfqIVA2jVbhOH 1E9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XmrTaZHs1ZyCmACajXt2dKReVOHzkg6ke3uUVhyxo4M=; b=o+plOB3xFtYLT0NCSTUfI3cs08U3z8p8yLbbF/g4ZHb4lzbP6Xzy8o5nK1ujER9zyz RwxxxS+1kYzx/uj0BCWhq02K0Sio0VoCnBPDnr15MSFj1JDynx3GGLBwPcdtiHzVrvnk qJTzQd/gbA0BxKdzrXqfduQPNCa3acLnszSpwHi/c0EfXlYajMw9Yart3CdNt1TBwEEW auc7QCzjiE/6BHZQMWrGoeKRo4isrAlgaEkDKhFpbBMmYi6YFGmy7kpNL1uByScjyQQY qg8K/H7txXfqZ+JUeePr4cbluL82f5CoSBUf3Ad62v3Y7ICb6lwGcUjMcBfsjg9zstRR ffQQ== X-Gm-Message-State: APjAAAXrgg9+1AiSDSRwQmRoXPSGTM1d7r7fjxjF5jB3+Os0WNMb4EdF G/boBpNT1Ov+GnqU54TQUOd23utP1VhMocJlFEM= X-Received: by 2002:ac2:5bc7:: with SMTP id u7mr10110824lfn.167.1562626185210; Mon, 08 Jul 2019 15:49:45 -0700 (PDT) MIME-Version: 1.0 References: <881939122b88f32be4c374d248c09d7527a87e35.1561685471.git.jpoimboe@redhat.com> <20190706202942.GA123403@gmail.com> <20190707013206.don22x3tfldec4zm@treble> <20190707055209.xqyopsnxfurhrkxw@treble> <20190708223834.zx7u45a4uuu2yyol@treble> In-Reply-To: <20190708223834.zx7u45a4uuu2yyol@treble> From: Alexei Starovoitov Date: Mon, 8 Jul 2019 15:49:33 -0700 Message-ID: Subject: Re: [tip:x86/urgent] bpf: Fix ORC unwinding in non-JIT BPF code To: Josh Poimboeuf Cc: Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Song Liu , Thomas Gleixner , Steven Rostedt , Kairui Song , Daniel Borkmann , Alexei Starovoitov , Peter Zijlstra , LKML , linux-tip-commits@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 8, 2019 at 3:38 PM Josh Poimboeuf wrote: > > On Mon, Jul 08, 2019 at 03:15:37PM -0700, Alexei Starovoitov wrote: > > > 2) > > > > > > After doing the first optimization, GCC then does another one which is > > > a little trickier. It replaces: > > > > > > select_insn: > > > jmp *jumptable(, %rax, 8) > > > ... > > > ALU64_ADD_X: > > > ... > > > jmp *jumptable(, %rax, 8) > > > ALU_ADD_X: > > > ... > > > jmp *jumptable(, %rax, 8) > > > > > > with > > > > > > select_insn: > > > mov jumptable, %r12 > > > jmp *(%r12, %rax, 8) > > > ... > > > ALU64_ADD_X: > > > ... > > > jmp *(%r12, %rax, 8) > > > ALU_ADD_X: > > > ... > > > jmp *(%r12, %rax, 8) > > > > > > The problem is that it only moves the jumptable address into %r12 > > > once, for the entire function, then it goes through multiple recursive > > > indirect jumps which rely on that %r12 value. But objtool isn't yet > > > smart enough to be able to track the value across multiple recursive > > > indirect jumps through the jump table. > > > > > > After some digging I found that the quick and easy fix is to disable > > > -fgcse. In fact, this seems to be recommended by the GCC manual, for > > > code like this: > > > > > > -fgcse > > > Perform a global common subexpression elimination pass. This > > > pass also performs global constant and copy propagation. > > > > > > Note: When compiling a program using computed gotos, a GCC > > > extension, you may get better run-time performance if you > > > disable the global common subexpression elimination pass by > > > adding -fno-gcse to the command line. > > > > > > Enabled at levels -O2, -O3, -Os. > > > > > > This code indeed relies extensively on computed gotos. I don't know > > > *why* disabling this optimization would improve performance. In fact > > > I really don't see how it could make much of a difference either way. > > > > > > Anyway, using -fno-gcse makes optimization #2 go away and makes > > > objtool happy, with only a fix for #1 needed. > > > > > > If -fno-gcse isn't an option, we might be able to fix objtool by using > > > the "first_jump_src" thing which Peter added, improving it such that > > > it also takes table jumps into account. > > > > Sorry for delay. I'm mostly offgrid until next week. > > As far as -fno-gcse.. I don't mind as long as it doesn't hurt performance. > > Which I suspect it will :( > > All these indirect gotos are there for performance. > > Single indirect goto and a bunch of jmp select_insn > > are way slower, since there is only one instruction > > for cpu branch predictor to work with. > > When every insn is followed by "jmp *jumptable" > > there is more room for cpu to speculate. > > It's been long time, but when I wrote it the difference > > between all indirect goto vs single indirect goto was almost 2x. > > Just to clarify, -fno-gcse doesn't get rid of any of the indirect jumps. > It still has 166 indirect jumps. It just gets rid of the second > optimization, where the jumptable address is placed in a register. what about other functions in core.c ? May be it's easier to teach objtool to recognize that pattern? > If you have a benchmark which is relatively easy to use, I could try to > run some tests. modprobe test_bpf selftests/bpf/test_progs both print runtime. Some of test_progs have high run-to-run variations though.