Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4312081ybb; Mon, 23 Mar 2020 18:23:20 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsDH7Y/GGT3W0+TdNd+AeUkaTc7+2JmaM5AbbvWKgDEZIv5U6pkgxE4KsFQeY4EEltP6m4X X-Received: by 2002:aca:7285:: with SMTP id p127mr1666649oic.32.1585013000387; Mon, 23 Mar 2020 18:23:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585013000; cv=none; d=google.com; s=arc-20160816; b=fK/lKpA6KhYvQ9DmW3UXcIMar7ghbeT5QDPXIU1OP6ThST39vZ4vk4y6Lk9gK4xHs4 uswm110/gz/5owlVNi9kCjSY0mFG5LHVgvAOxrmeTvbGbxjjSZb2HFcHBpQjYZvid5Pq NzRtNHWZ9bo7czRcYLwK+x72GAxPcVKvE5SHkQbf8HsI/ouyz2bFgYbYYp8vfc/E/VS3 mDQEgO/2TA5tjznys1nn281ZGJ3zciL3Bl6KFYXEWy2LSlL/wOmJ52sVAiRvGxDgM6Wi I8yWuxcmvaj0A5ybZeCrZo1p4FY/dvj2+Vo7CfjFOd3k+w2opaVGB2tTfvDrehEBNxNX jsxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=2bF4kWzVYozUGgvzITVBuWBVodBeRJJH7jEiGrv7K80=; b=t0xcwdEHQ/vpTFD0zxCQp/Up78DAZME4WbNUoaEevUhTitORmupCcrq254UMRybRru yfE0kbEXg60hwvDfrPRW+1iyato0TaNIrp5HjPJ7FOhgKp3sOv/zB5iUY62I8DoYaoVw bVsEY5lw1SWVsD/FRFyuwyQb3qHNtx1+pwBdcHsBRFHZaGBnaDnRiSryOLYPL/z0qHyZ L3WaG9jlqoVNMOQkyN3IQorHDTm68Q2N1O+yMlfx6D94fbh0Pput1TUP0TtGY+KBe3hJ XKw5bSYB2AL4iD3MITvqyeh1eyTGSGs6gkQquQr5w7R+hPXb8ZmTgBECe9eUgfc0wL7X Lv+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=C3gnOA1K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a66si7918454otc.172.2020.03.23.18.23.07; Mon, 23 Mar 2020 18:23:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=C3gnOA1K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727102AbgCXBWU (ORCPT + 99 others); Mon, 23 Mar 2020 21:22:20 -0400 Received: from mail-pj1-f65.google.com ([209.85.216.65]:34450 "EHLO mail-pj1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727031AbgCXBWU (ORCPT ); Mon, 23 Mar 2020 21:22:20 -0400 Received: by mail-pj1-f65.google.com with SMTP id q16so589646pje.1 for ; Mon, 23 Mar 2020 18:22:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2bF4kWzVYozUGgvzITVBuWBVodBeRJJH7jEiGrv7K80=; b=C3gnOA1KGbgctO0Lp0HFjTkts2ZuawK2Gtonne3GnlpAlxES6N2cVc48Eh2mlbPqKw 0fRsVvuCnkk2nGRDF14yXkHSww9uBJk9UGxzj+6osUWddCnQGDa7o0CSgp+M9q6tyLrg GY0dKiSHTnANrrd866y8msYp0HRWQLpRLYGbhWzqg4F6YPRehHYek6HW8r+Lx1lhK7qQ Zf7rsyuWoOA+ky5bDrlZyWZFdqynEqI6DSWnB53jSwtpXHjyqBQVrMdldKnksLLYGawh 1fRKA5BsGVSWGghzY8UFaJhFTbm1XSeiToSCH3br8IqU2lr13mDlptEBwa0EJZkLYWA+ WZ7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2bF4kWzVYozUGgvzITVBuWBVodBeRJJH7jEiGrv7K80=; b=dG/DUQREG8VPTv5WvTHKuavxOiBknF9NMfEA+hD0M4AdF7cd4hunRalrpNLP/X8nLR E0s6nZ2ltww+xw2znlyqB05sd5VEX076IZtHofk/iIXEEm7aMC4w/ccf0jDMYoPgvvL8 FdvOS0BNt6X0Fw92Wjer6G0EhrrzWx03GeeJHyQMadewLF5W4f0gec7s+z+AvPRk4Jfk m2gHsN88hUtZ21ty8dxz5qvfYocaIfsJ1bHWBw5ErPhx4XqEePaASxqGEDiXwDqln1Vb ba3uhjiP5LA4WNU6iBpd/XJFoC3u4Q2pmRBW/L1KGdiGV9Uw/XW7XZ65jD1jemPqQ+qp lIcQ== X-Gm-Message-State: ANhLgQ0aNsFGQyKae2vCvD+3Mxa8on9F7LJresFMTdccjaOQYeg48AaJ IRJWQyON6ejJ246+rWDHtwR2guY5HosRbj/2u0L2yg== X-Received: by 2002:a17:902:6bc8:: with SMTP id m8mr17986977plt.223.1585012938823; Mon, 23 Mar 2020 18:22:18 -0700 (PDT) MIME-Version: 1.0 References: <20200323114207.222412-1-courbet@google.com> In-Reply-To: <20200323114207.222412-1-courbet@google.com> From: Nick Desaulniers Date: Mon, 23 Mar 2020 18:22:05 -0700 Message-ID: Subject: Re: [PATCH] x86: Alias memset to __builtin_memset. To: Clement Courbet Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , LKML , clang-built-linux Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 23, 2020 at 4:43 AM 'Clement Courbet' via Clang Built Linux wrote: > > Recent compilers know the meaning of builtins (`memset`, > `memcpy`, ...) and can replace calls by inline code when > deemed better. For example, `memset(p, 0, 4)` will be lowered > to a four-byte zero store. > > When using -ffreestanding (this is the case e.g. building on > clang), these optimizations are disabled. This means that **all** > memsets, including those with small, constant sizes, will result > in an actual call to memset. Isn't this only added for 32b x86 (if I'm reading arch/x86/Makefile right)? Who's adding it for 64b? arch/x86/Makefile has a comment: 88 # temporary until string.h is fixed 89 KBUILD_CFLAGS += -ffreestanding Did you look into fixing that? > > We have identified several spots where we have high CPU usage > because of this. For example, a single one of these memsets is > responsible for about 0.3% of our total CPU usage in the kernel. > > Aliasing `memset` to `__builtin_memset` allows the compiler to > perform this optimization even when -ffreestanding is used. > There is no change when -ffreestanding is not used. > > Below is a diff (clang) for `update_sg_lb_stats()`, which > includes the aforementionned hot memset: > memset(sgs, 0, sizeof(*sgs)); > > Diff: > movq %rsi, %rbx ~~~> movq $0x0, 0x40(%r8) > movq %rdi, %r15 ~~~> movq $0x0, 0x38(%r8) > movl $0x48, %edx ~~~> movq $0x0, 0x30(%r8) > movq %r8, %rdi ~~~> movq $0x0, 0x28(%r8) > xorl %esi, %esi ~~~> movq $0x0, 0x20(%r8) > callq ~~~> movq $0x0, 0x18(%r8) > ~~~> movq $0x0, 0x10(%r8) > ~~~> movq $0x0, 0x8(%r8) > ~~~> movq $0x0, (%r8) > > In terms of code size, this grows the clang-built kernel a > bit (+0.022%): > 440285512 vmlinux.clang.after > 440383608 vmlinux.clang.before The before number looks bigger? Did it shrink in size, or was the above mislabeled? > > Signed-off-by: Clement Courbet > --- > arch/x86/include/asm/string_64.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h > index 75314c3dbe47..7073c25aa4a3 100644 > --- a/arch/x86/include/asm/string_64.h > +++ b/arch/x86/include/asm/string_64.h > @@ -18,6 +18,15 @@ extern void *__memcpy(void *to, const void *from, size_t len); > void *memset(void *s, int c, size_t n); > void *__memset(void *s, int c, size_t n); > > +/* Recent compilers can generate much better code for known size and/or > + * fill values, and will fallback on `memset` if they fail. > + * We alias `memset` to `__builtin_memset` explicitly to inform the compiler to > + * perform this optimization even when -ffreestanding is used. > + */ > +#if (__GNUC__ >= 4) > +#define memset(s, c, count) __builtin_memset(s, c, count) > +#endif > + > #define __HAVE_ARCH_MEMSET16 > static inline void *memset16(uint16_t *s, uint16_t v, size_t n) > { > @@ -74,6 +83,7 @@ int strcmp(const char *cs, const char *ct); > #undef memcpy > #define memcpy(dst, src, len) __memcpy(dst, src, len) > #define memmove(dst, src, len) __memmove(dst, src, len) > +#undef memset > #define memset(s, c, n) __memset(s, c, n) > > #ifndef __NO_FORTIFY > -- -- Thanks, ~Nick Desaulniers