Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3358349ybz; Mon, 20 Apr 2020 00:51:03 -0700 (PDT) X-Google-Smtp-Source: APiQypJ1u6gWd6n6y8bFbX9yq8ATNc0CcG7wPN6Noqm+Q0Q53Nu5e4irlVJ5sGTS1FyoVMUlIJqR X-Received: by 2002:a50:b061:: with SMTP id i88mr13280905edd.325.1587369063459; Mon, 20 Apr 2020 00:51:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587369063; cv=none; d=google.com; s=arc-20160816; b=eFM82N8kYjBs11OphFBHmJMiToXWqEd56XTRbA/YW4p1i9vJ+ukuFPwB6enlELLTcJ tlMm6Nl/tzetTCRZ8UYURG5ZUQZfyNhWnUcwniG9isTGSGJ+YsKx/FkjUO3K/TDRSeN4 QnHNiQxygB3JNuZhY7j7D6p54R81CbhjhmHCV0WcgiKmLF66reLftxMcFdUW/wdOd40u nkUp73qegL4I3eU5Uq/9J+GAYkm+pe1okeajFzMX/a22p9q8XeIZO7Gmt27ct08jVXCm +xJL01lUn4b7ehAiNwPZ4u3dfn/5Vl1uwkvw3uVf61LJn6afyNHAdL1qXKYf0M6EV0nd SyqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=3wz23xjfLFCUP1LDp/E/sRHfstYXbDlgI8CwoDmJPeA=; b=U+XSMm3xJC07QUhyidGp4H8o+ptoeKLUlVgGiKb0hZpMKd4Yf/nDPX+3vgs9XC2cpV G0E2aSwIr6gCTrCUchj8A1DwlNApres+AFk3ktghoD3GfpYpSba6o+fOEo/zk8J+9Hb+ FKt2XIhUgFUi8VmIAmVdgOGQPA/bKkhD0ftcuu9lSfE5qzgL6LJWp5a/U/58lCAOMm/a U1XUp5BmIvBzIXsF7gptzZ5cmdnnHbgu6mJYE0SbYHPwTl51rq8JO2gCX/kh+sd88bBi O40MQ+ob5/7LIzQOzYRRfcCmZvyzzsNdyd1ULd97aeSfjJ1W34ODlWhlWHCXThQJ2axc kmWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b="Y/IJPQW4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n9si91207edb.87.2020.04.20.00.50.40; Mon, 20 Apr 2020 00:51:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b="Y/IJPQW4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726117AbgDTHsv (ORCPT + 99 others); Mon, 20 Apr 2020 03:48:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725930AbgDTHsv (ORCPT ); Mon, 20 Apr 2020 03:48:51 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF84FC061A0C for ; Mon, 20 Apr 2020 00:48:50 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id i10so10817704wrv.10 for ; Mon, 20 Apr 2020 00:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=3wz23xjfLFCUP1LDp/E/sRHfstYXbDlgI8CwoDmJPeA=; b=Y/IJPQW4tGi9NY8w12wxko3rFEdehSZX0Jo+pjfz40wwwNMoTbwx4nnZ+hzosuY44M jwOGpBWR1CR5K8fseDxjVlAX8ecAh91+SU4SCDWxkSCK27rMVotfiqziwKqFl7IUAFG/ Kva5KP9WqBcJRCgenySYl9YrMbIqxZhUCEFn/waKM3ozsi1Pz5wkXxrJY5xfQ01TTczm VA2AeH52noeogXX5BYH1EenXZeq+c7kQABrOA7LSG/6F+ev3x0SEuYUDFKh0/1D6i/q0 b7hg6kji6k5uNCSGP3nngaN1ZGpaE6cDyRwtDNbKXwGlg02hlJ2WSKBpGLT1jwe0VX7O rj8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=3wz23xjfLFCUP1LDp/E/sRHfstYXbDlgI8CwoDmJPeA=; b=BE8Ao27ylPH8MCH7F3CPvKjDuS3c1zxHEGgqUNBr3vPbDCINJTHuzyZc31HLhUX1pa O3GZMcJ7DNIn70b/jJSoxWC4JFVozBWRwzUtmMZas46P8RRsDJhfTppJ7+wGzYqQWUTO Ge07zNNbEuLm5NhfLynWuun7kLsc+j5b0NwNzzz93aHqAi19YI8bfswYwUhGXD/wBtR/ XFuyETIjwRfv8/HLMaogY9uPAp3FFon6EmHuE8+Uid3d2GmwqNoZNhOIBwysT/eQAVIw ChkoWdM22K9SjNDqqdiAIlRpZ6qZNLXC6bqyUdAtLK3LxgbLwQlEx8+xgbWa2R+OSc6N Nf1A== X-Gm-Message-State: AGi0PubBJHffURd9L5ANmhHz4RSWLZrc68TYpxUB6OITb4nSaspUhIx0 ru5VwfW5TryQQmhHN49O7Vg= X-Received: by 2002:a5d:4248:: with SMTP id s8mr16920394wrr.216.1587368929464; Mon, 20 Apr 2020 00:48:49 -0700 (PDT) Received: from gmail.com (54033286.catv.pool.telekom.hu. [84.3.50.134]) by smtp.gmail.com with ESMTPSA id x18sm83248wrs.11.2020.04.20.00.48.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2020 00:48:48 -0700 (PDT) Date: Mon, 20 Apr 2020 09:48:45 +0200 From: Ingo Molnar To: Josh Poimboeuf Cc: Linus Torvalds , Thomas Gleixner , Masahiro Yamada , Linux Kernel Mailing List , the arch/x86 maintainers , Peter Zijlstra Subject: Re: [GIT pull] perf/urgent for 5.7-rc2 Message-ID: <20200420074845.GA72554@gmail.com> References: <158730459860.31269.9496277256253823777.tglx@nanos.tec.linutronix.de> <158730460101.31269.5005570498545135614.tglx@nanos.tec.linutronix.de> <20200419200758.3xry3vn2a5caxapx@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200419200758.3xry3vn2a5caxapx@treble> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Josh Poimboeuf wrote: > On Sun, Apr 19, 2020 at 11:56:51AM -0700, Linus Torvalds wrote: > > > So I'm wondering if there any way that objtool could be run at > > link-time (and archive time) rather than force a re-build of all the > > object files from source? > > We've actually been making progress in that direction. Peter added > partial vmlinux.o support, for Thomas' noinstr validation. The problem > is, linking is single-threaded so it ends up making the kernel build > slower overall. > > So right now, we still do most things per compilation unit, and only do > the noinstr validation at vmlinux.o link time. Eventually, especially > with LTO, we'll probably end up moving everything over to link time. Fortunately, much of what objtool does against vmlinux.o can be parallelized in a rather straightforward fashion I believe, if we build with -ffunction-sections. Here's the main "objtool check" processing steps: int check(const char *_objname, bool orc) { ... ret = decode_sections(&file); ... ret = validate_functions(&file); ... ret = validate_unwind_hints(&file); ... ret = validate_reachable_instructions(&file); ... ret = create_orc(&file); ... ret = create_orc_sections(&file); } The 'decode_sections()' step takes about 92% of the runtime against vmlinux.o: $ taskset 1 perf stat --repeat 3 --sync --null tools/objtool/objtool check vmlinux.o Performance counter stats for 'tools/objtool/objtool check vmlinux.o' (3 runs): 3.05757 +- 0.00247 seconds time elapsed ( +- 0.08% ) $ taskset 1 perf stat --repeat 3 --exit-after-decode --null tools/objtool/objtool check vmlinux.o Performance counter stats for 'tools/objtool/objtool check vmlinux.o' (3 runs): 2.83132 +- 0.00272 seconds time elapsed ( +- 0.10% ) (The --exit-after-decode hack makes it exit right after decode_sections().) Within decode_sections(), the main overhead is in decode_instructions() (~75% of the total objtool overhead): 2.31325 +- 0.00609 seconds time elapsed ( +- 0.26% ) This goes through every executable section, to decode the instructions: static int decode_instructions(struct objtool_file *file) { ... for_each_sec(file, sec) { if (!(sec->sh.sh_flags & SHF_EXECINSTR)) continue; The size distribution of function section sizes is strongly biased towards section sizes of 100 bytes or less, over 95% of all instructions in the vmlinux.o are in such a section. In fact over 99% of all decoded instructions are in a section of 500 bytes or smaller, so a threaded decoder where each thread batch-decodes a handful of sections in a single processing step and then batch-inserts it into the (global) instructions hash should do the trick. The batching size could be driven by section byte size, i.e. we could say that the unit of batching is for a decoding thread to grab ~10k bytes worth of sections from the list, build a local list of decoded instructions, and then insert them into the global hash in a single go. This would scale very well IMO, with the defconfig already having almost 3 million instructions, and a distro build or allmodconfig build a lot more. I believe the 3.0 seconds total objdump runtime above could be reduced to below 1.0 second on typical contemporary development systems - which would IMHO make it a feasible model to run objtool only against the whole kernel binary. Is there any code generation disadvantage or other quirk to -ffunction-sections, or other complications that I missed, that would make this difficult? Thanks, Ingo