Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp3919590ybe; Mon, 16 Sep 2019 03:43:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqz6CdeOJYb5ybX09BaAXkvA6EbBj9JTcqqk/UF9xpjjaqag5E0vp316/fDz/z91+ds3aZn0 X-Received: by 2002:aa7:d988:: with SMTP id u8mr3303052eds.116.1568630591498; Mon, 16 Sep 2019 03:43:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568630591; cv=none; d=google.com; s=arc-20160816; b=OqlUNr5ItL0XoOGIDOixo4UT9r8LQQ2IYILqjz+5B0riNFGEioIld1aAlo7qFQ4zAC bDujkuPTghEdgSpPHaDyOHA6HbCDszrjl9eMISwoRkoUqsZURV55EIj5WrOHTI4VqoOK LiM3SjdmJND4MMYmIx5Xs2jVccjflozsFs7b+whN8eOH1JYAsb6nsW6UdyoZGDMo5c6s RUjtWDD79MgjkOUEe9t8YL6dDzCL2d2hc41Drxr7bAW878KvT2tu3lAFu2U5pKXLKus/ 0kcWAZo2jSAvL87uLXylMWdLwn2tnyDu3knv3bO/UjJ9IoS9TuSxo6f8p7joQy6Xcg41 0pFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=qkDCdm015VHztmknrPE7J07mB1i0Dmas5Y8N7jHwsJk=; b=XklGs34xIu/cjuaEMihR7CzhsqUR9VQkvnSdFeW1LyFuDLUg34EU0Vd3FFx+16A6q/ mZ7B5mfreH6Tc5IdtzTeVuInXSdiNYCkE8oEur6fT1jp3WJf6sq/vqAz03hVKlnVfSHA KE3q8PJzefYT6ggtY24vvL9Rh0nAPalvCrtJd3UPfgigxwcHSTdbyyBJo1mjbDFAczrx Tm9SMZ/1X72XswqWnm6U6BGxzILpnhF8Au0uxUwoTGeWCBGp1pYBmcZYr7pmbtBSxmuN 6mVwknYMq7VaQVkWCWZuuxUaphHp5TxOBa92nTmG3KFe4ejH/BgzD3B7ejbpfvrC0C3/ M3Yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rasmusvillemoes.dk header.s=google header.b=AIclLN2e; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i12si16009258edb.133.2019.09.16.03.42.47; Mon, 16 Sep 2019 03:43:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@rasmusvillemoes.dk header.s=google header.b=AIclLN2e; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731856AbfIPJSj (ORCPT + 99 others); Mon, 16 Sep 2019 05:18:39 -0400 Received: from mail-lj1-f178.google.com ([209.85.208.178]:43917 "EHLO mail-lj1-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726541AbfIPJSj (ORCPT ); Mon, 16 Sep 2019 05:18:39 -0400 Received: by mail-lj1-f178.google.com with SMTP id d5so33005787lja.10 for ; Mon, 16 Sep 2019 02:18:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rasmusvillemoes.dk; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=qkDCdm015VHztmknrPE7J07mB1i0Dmas5Y8N7jHwsJk=; b=AIclLN2eRXc4xJH425AbVQ/TXYDBslcMt/6RE71rddOkrKAm6iqip+UFuX/ExkX0YT JfLBjuqBeOG8YAog5WPW9X0hmfQ9RhNXmPzwPg5zrMGUMSfhQvwK2Yu4cTQBMMZNggm3 VkoA9hjLhslalHFkTZk0TghF6QMATTLvYOszk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=qkDCdm015VHztmknrPE7J07mB1i0Dmas5Y8N7jHwsJk=; b=OSIMc2vsx+U+8nErIvv5XbOBrZlZDGKyKStSMSVzA3NJ9/cKqIqJAEy4vM6I2azm3B sNYu+9Oj4FP8AgiSXA9j79EJeXHKuaWktAcfeVG2AZ8o7FGg91wUdANB89H1iHFghPD+ 5GnGihqkXfkcZ5ME7RISfhEQKjTZpRpzuIrhvaDS7nT8kpCel9+mkZpce0ag/pprroqK vIsoW3jyF0L+ELA6ynlDY9/fZJNzVrZ2koQMz7CHLtBEs2afwxABORnlle+R+pXj9iZh TRFKXictRqUq3Wt6BeLUWNhLnUI8EDLVXrurrh4kUSlOWgLJv/JIIsiAHe5b4G37bWta P77Q== X-Gm-Message-State: APjAAAVGfriKdODTWzQ+31TwJ6ZDU4o6J9VmGzT3CIq9MCeTihJCTKD4 qw7G/2Sch8vMYKNocz5nSHmCjzxAeTXrWBDT X-Received: by 2002:a2e:91d9:: with SMTP id u25mr7210723ljg.85.1568625514902; Mon, 16 Sep 2019 02:18:34 -0700 (PDT) Received: from [172.16.11.28] ([81.216.59.226]) by smtp.gmail.com with ESMTPSA id z7sm3527382ljc.9.2019.09.16.02.18.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 16 Sep 2019 02:18:34 -0700 (PDT) Subject: Re: [RFC] Improve memset To: Borislav Petkov , Rasmus Villemoes Cc: Linus Torvalds , x86-ml , Andy Lutomirski , Josh Poimboeuf , lkml References: <20190913072237.GA12381@zn.tnic> <9dc9f1e6-5d19-167c-793d-2f4a5ebee097@rasmusvillemoes.dk> <20190913104232.GA4190@zn.tnic> <20190913163645.GC4190@zn.tnic> From: Rasmus Villemoes Message-ID: <3fc31917-9452-3a10-d11d-056bf2d8b97d@rasmusvillemoes.dk> Date: Mon, 16 Sep 2019 11:18:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190913163645.GC4190@zn.tnic> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13/09/2019 18.36, Borislav Petkov wrote: > On Fri, Sep 13, 2019 at 12:42:32PM +0200, Borislav Petkov wrote: >> Or should we talk to Intel hw folks about it... > > Or, I can do something like this, while waiting. Benchmark at the end. > > The numbers are from a KBL box: > > model : 158 > model name : Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz > stepping : 12 > > and if I'm not doing anything wrong with the benchmark Eh, this benchmark doesn't seem to provide any hints on where to set the cut-off for a compile-time constant n, i.e. the 32 in __b_c_p(n) && n <= 32 - unless gcc has unrolled your loop completely, which I find highly unlikely. (the asm looks > ok By "looks ok", do you mean the the builtin_memset() have been made into calls to libc memset(), or how has gcc expanded that? And if so, what's the disassembly of your libc's memset()? The thing is, what needs to be compared is how a rep;stosb of 32 bytes compares to 4 immediate stores. In fact, perhaps we shouldn't even try to find a cutoff. If __b_c_p(n), just use __builtin_memset unconditionally. If n is smallish, gcc will do a few stores, and if n is largish and gcc ends up emitting a call to memset(), well, we can optimize memset() itself based on cpu capabilities _and_ it's not the call/ret that will dominate. There are also optimization and diagnostic advantages of having gcc know the semantics of the memset() call (e.g. the tr.b DSE you showed). but I could very well be missing something), the numbers say that > the REP; STOSB is better from sizes of 8 and upwards and up to two > cachelines we're pretty much on-par with the builtin variant. I don't think that's what the numbers say. Rasmus