Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5546957ybe; Tue, 17 Sep 2019 09:35:31 -0700 (PDT) X-Google-Smtp-Source: APXvYqxjlfPmlk81WNNH8kC+as5XC8mOMB9LftHvpBD1hvG3yM+iCGoAQQQOBtvdVL/3hy+R7lzM X-Received: by 2002:a50:e04b:: with SMTP id g11mr5727621edl.302.1568738131191; Tue, 17 Sep 2019 09:35:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568738131; cv=none; d=google.com; s=arc-20160816; b=qNAHAgv/worLzgZo4yS9nQ8zZKaXAexd/JQmhCvBi1Q1726M8fj+UjYIwVh+n/dG+4 dWvR4d9rqcvmnkm+gHeHBy5ZmO8BVgTY/KnKYSsF7aTcTGI53kUqvH7l0MQ0Bn8972ZJ 2vcQTXVnBjtMN2q2zrrQgyVXhXTEWWQGtKYsHGeoCiqBx4wW1d9ynv7h8ul3JdQCz9VT 45ykFmNbnqRZtzd+QJl8Ky2V9HYBCvna6AwkSdXnQfGJy5qI4KFPGqpRaJ5NMoaWSjPm c/F0q/03paKWoKgad2IICiRF7eIu/GP6TEgDhEhWofmSQcpFVcWy34rb1VrRlK+E4PHu P+Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=RrtNIUMohCox+BYKXI+e26vky+21ewyw44X+ixcBCsY=; b=J9gx5nOmsJL0INjDi1Xc0o2hKodLUNW6o0rgc9YT2GNNsoaLDXA1x5WYmQQ59HqF1t 4HFAEW+K0CTH3xOkbXLYdb5NuZWSat+IkryKNLJW4Xiev5rRf34RYAWdHXQNZoDPJWMm SGNCWXY6h2n7Uwg36w4kBeQcs+ZIb1iyhbc4M5b8C/bzoKEErSwxoDROmCIK4EiUQmfI 51QuBgPmZSsxn+uRMUlZTvHzkAhJn+5zIavpVg6pGTxCGTYkfnKEEZpfPkOzcSFl1jBW SPBWdrcz92mPG/joy69gfrI8dVpYszUMwdYseroofPqfdiusEzGc6uF/EyqtfOX3woyn eY0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=ObGxVqBT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p28si1619395ejn.88.2019.09.17.09.35.06; Tue, 17 Sep 2019 09:35:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=ObGxVqBT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728935AbfIQIQA (ORCPT + 99 others); Tue, 17 Sep 2019 04:16:00 -0400 Received: from mail.skyhub.de ([5.9.137.197]:45098 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725798AbfIQIQA (ORCPT ); Tue, 17 Sep 2019 04:16:00 -0400 Received: from nazgul.tnic (unknown [193.86.95.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id E1BCF1EC0C1A; Tue, 17 Sep 2019 10:15:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1568708155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=RrtNIUMohCox+BYKXI+e26vky+21ewyw44X+ixcBCsY=; b=ObGxVqBTR4bHZeDtH2hxlu4CfRnKND+hG1Mv8X0k1YnQ/fIFsolnTuVDnmwWxcv+SPIswz F+VnE+hAa6lHi8caBRAzzdletHRDF93eCmjoVxW5oR0GT77v6qOHUxLqiN6XkJVM4zFeSX csvrxCk2vZ6/nsbDFHhaRvuWBUvAq0g= Date: Tue, 17 Sep 2019 10:15:49 +0200 From: Borislav Petkov To: Linus Torvalds Cc: Rasmus Villemoes , Rasmus Villemoes , x86-ml , Andy Lutomirski , Josh Poimboeuf , lkml Subject: Re: [RFC] Improve memset Message-ID: <20190917081549.GB12174@nazgul.tnic> References: <20190913072237.GA12381@zn.tnic> <9dc9f1e6-5d19-167c-793d-2f4a5ebee097@rasmusvillemoes.dk> <20190913104232.GA4190@zn.tnic> <20190913163645.GC4190@zn.tnic> <3fc31917-9452-3a10-d11d-056bf2d8b97d@rasmusvillemoes.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 16, 2019 at 10:25:25AM -0700, Linus Torvalds wrote: > So the "inline constant sizes" case has advantages over and beyond the > obvious ones. I suspect that a reasonable cut-off point is somethinig > like "8*sizeof(long)". But look at things like "struct kstat" uses > etc, the limit might actually be even higher than that. Ok, sounds to me like we want to leave the small and build-time known sizes to gcc to optimize. Especially if you want to clear only a subset of the struct members and set the rest. > Also note that while "rep stosb" is _reasonably_ good with current > CPU's (ie roughly gen 8+), it's not so great a few generations ago > (gen 6ish), and it can be absolutely horrid on older cores and/or > atom. The limit for when it is a win ends up depending on whether I$ > footprint is an issue too, of course, but some of the bigger wins tend > to happen when you have sizes >= 128. Ok, so I have X86_FEATURE_ERMS set on this old IVB laptop so if gen8 and newer do perform better, then we need to change our feature detection to clear ERMS on those older generations and really leave it set only on gen8+. I'll change my benchmark to do explicit small-sized moves (instead of using __builtin_memset) to see whether I can compare performance better. Which also begs the question do we do __builtin_memset() at all or we code those small-sized moves ourselves? We'll lose the advantage of the partial struct members clearing but then we avoid the differences the different compiler versions would introduce when the builtin is used. And last but not least we should not leave readability somewhere along the way: I'd prefer good performance and maintainable code than some hypothetical max perf on some uarch but ugly and unreadable macro mess... > You can basically always beat "rep movs/stos" with hand-tuned AVX2/512 > code for specific cases if you don't look at I$ footprint and the cost > of the AVX setup (and the cost of frequency changes, which often go > hand-in-hand with the AVX use). So "rep movs/stos" is seldom > _optimal_, but it tends to be "quite good" for modern CPU's with > variable sizes that are in the 100+ byte range. Right. If we did this, it would be a conditional in front of the REP; STOS but that ain't worse than our current CALL+JMP. Thx. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --