Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp4809707ybe; Mon, 16 Sep 2019 19:57:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqzcYOUp2kBq+GusEuMc4fUCkDE4N/NdlSggodoF0toMLnVOFeNU29bTTOzf71QPRV6a2Dq9 X-Received: by 2002:a17:906:20c7:: with SMTP id c7mr2758912ejc.248.1568689041873; Mon, 16 Sep 2019 19:57:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568689041; cv=none; d=google.com; s=arc-20160816; b=q+lTic2a4O+nNBXyQyz6OC9dTW1nXN10KVxT8uw1Rr7hxE33Wk0LkfhVwKQ1G9aR3S LAQHZUiO5jP0QxfrfRCKxY2s+vc2OyUa6eVzvXUKJf7O9ebUC5QWtt4wA65u/Qnu17ce w1JHP/nUCJojK/ukCntJr8DY2OWgT2qVE2MyNiepm9Sq9o9//owwzTIUY8oHm4ps6DSx 1Sx5YINT54L1jXIDIgSLPISzYsmf1cEM0Sj8+y7kaLjNxywGr0s8Y8R06RP7EvqBiIO9 Za5X3fsh8ViAf/ShTb5p67vz/2cSnbv/N6x9rJunvm8+7wFM1YDgalX6Ot5eQY8Tq+FW gtMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ZWp6A+IPNPw3zgvnUNtxYnYvB7ODA1uj+yaoOFdC5JQ=; b=Sro3W5WalB6J/zz6GjQRTyqHEf+ffTTQ05wBnyzLPeE2N2YvdpFrlstKmBh2kk6ojb dT64WcJAUasWUbh83U2mGDjbHmdQCIUIdYQjG0UQWtYXz9QXnxLhHmcBEbXH4O71FtYR T82L3Cxi8K/wkOH4chB12QFUt5jLB1rtZUVw6tbI26edZ72bbI0HSnbNMlV+vfrMNXQf XdfUsnQWBknXtiGRzNbt+wiNpHg+yZKAyeku/r1RJX6RF3R9DxXLw3wmCbprUwJrOaok Gd6d5g3riXHEsBwzelcBK5oY7NK9IKvSDc4aknLlV5f3nbFjEJiIQc8vu0RIjNZdqKT/ ZhbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lYeEnjAP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id jr9si425175ejb.118.2019.09.16.19.56.53; Mon, 16 Sep 2019 19:57:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lYeEnjAP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387485AbfIPRlI (ORCPT + 99 others); Mon, 16 Sep 2019 13:41:08 -0400 Received: from mail.kernel.org ([198.145.29.99]:56140 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732173AbfIPRlH (ORCPT ); Mon, 16 Sep 2019 13:41:07 -0400 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A23A921670 for ; Mon, 16 Sep 2019 17:41:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568655666; bh=aCMrJ+i1U2L9ga3d7x3U6QFYk5paiZATYxfrjnjeDps=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=lYeEnjAP0gZnfmmr3tgouSlde/2/caoZVH7Zgmk5yheiZBOY9Xc9mTZLF/Ip7T1ll E3RqCVTQKs5JnkPagw9ksF83RdI3AKV38Nl+V96vAZxF9KTZ3K93k3Lckcb/XQ9avK iq/9+svsqUZsQl26bfVBGhjNrF992t3He8AmcRYs= Received: by mail-wr1-f50.google.com with SMTP id o18so279048wrv.13 for ; Mon, 16 Sep 2019 10:41:06 -0700 (PDT) X-Gm-Message-State: APjAAAW1/4wTUKhhMRC6NWpUh4n9yf5E4ptVFtdnRDILu4nwI9kIJqVn amVgYTA4lc9OHprQmR6sTIw/fFYOKCZGOEqRfHGyvg== X-Received: by 2002:a5d:424c:: with SMTP id s12mr718627wrr.221.1568655665105; Mon, 16 Sep 2019 10:41:05 -0700 (PDT) MIME-Version: 1.0 References: <20190913072237.GA12381@zn.tnic> <9dc9f1e6-5d19-167c-793d-2f4a5ebee097@rasmusvillemoes.dk> <20190913104232.GA4190@zn.tnic> <20190913163645.GC4190@zn.tnic> <3fc31917-9452-3a10-d11d-056bf2d8b97d@rasmusvillemoes.dk> In-Reply-To: From: Andy Lutomirski Date: Mon, 16 Sep 2019 10:40:53 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC] Improve memset To: Linus Torvalds Cc: Rasmus Villemoes , Borislav Petkov , Rasmus Villemoes , x86-ml , Andy Lutomirski , Josh Poimboeuf , lkml Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 16, 2019 at 10:25 AM Linus Torvalds wrote: > > On Mon, Sep 16, 2019 at 2:18 AM Rasmus Villemoes > wrote: > > > > Eh, this benchmark doesn't seem to provide any hints on where to set the > > cut-off for a compile-time constant n, i.e. the 32 in > > Yes, you'd need to use proper fixed-size memset's with > __builtin_memset() to test that case. Probably easy enough with some > preprocessor macros to expand to a lot of cases. > > But even then it will not show some of the advantages of inlining the > memset (quite often you have a "memset structure to zero, then > initialize a couple of fields" pattern, and gcc does much better for > that when it just inlines the memset to stores - to the point of just > removing all the memset entirely and just storing a couple of zeroes > between the fields you initialized). After some experimentation, I think y'all are just doing it wrong. GCC is very clever about this as long as it's given the chance. This test, for example, generates excellent code: #include __THROW __nonnull ((1)) __attribute__((always_inline)) void *memset(void *s, int c, size_t n) { asm volatile ("nop"); return s; } /* generates 'nop' */ void zero(void *dest, size_t size) { __builtin_memset(dest, 0, size); } /* xorl %eax, %eax */ int test(void) { int x; __builtin_memset(&x, 0, sizeof(x)); return x; } /* movl $0, (%rdi) */ void memset_a_bit(int *ptr) { __builtin_memset(ptr, 0, sizeof(*ptr)); } So I'm thinking maybe compiler.h should actually do something like: #define memset __builtin_memset and we should have some appropriate magic so that the memset inline is exempt from the macro. Or maybe there's some very clever way to put all of this into the memset inline function. FWIW, this obviously wrong code: __THROW __nonnull ((1)) __attribute__((always_inline)) void *memset(void *s, int c, size_t n) { __builtin_memset(s, c, n); return s; } generates 'jmp memset'. It's not entirely clear to me exactly what's happening here. --Andy