Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp477138ybe; Fri, 13 Sep 2019 00:38:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqwNfxybfKI/uzutppaiFDa1g1wWvgumrmk0o/ISTi+d7NCmoTWknHXu4GZvhUVZr70MCZ8G X-Received: by 2002:a17:906:1153:: with SMTP id i19mr38018120eja.160.1568360325247; Fri, 13 Sep 2019 00:38:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568360325; cv=none; d=google.com; s=arc-20160816; b=zfMOqcROhl+6ff50tf6aNqFZLSrY9d92UFGHY9TY62yufJ4Mo72dtFjgmBhJdk9UIK wH5XEhRZaCRM9huOXMH+QFT0lZfL00nX9zarEviEh8kahgkmBLnJ2KtgZva435Z1rNPg 7SeQlWEVaukAljZO257dX63iHRAg3AtWmOg09I4Y2bzHTQYf5nON7wWAb1+HMCJ5y+6R +fHWJf+Zl0bdQVAYK/QHYvds2u5mUaQxpihzWxFXM529x68RAP5aPpAotwND753UhZ0j SgoQbfQhQpcl3Bwirb7+WLulmXzXJxoMJgKF1U2aowTA3JFpSSIwpfz5xcMPZzhIjWto Lu9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=e9KqrpRewRK330rqZdSd9xo5DsZMlUgKKAeLiHEs+aY=; b=JcAGGEb5Kdqux6vbgHxBckNPKmksU8JaZmwdXrFTmcSZ2OfWIYviyN8XDshcq5p5av Pb6eCHCnX1t9l7UZut9YavDICN02YQzwanVJ4VRfvj4Ht76YXxHJ3MOEDi3iYQ/0BDe8 nWHAI+ydwy56yXXrOeOmrMkdGyjJlnsi9aneWB2kMrwXP33yeOSa6gabzTAtNX4lVPJw BUSSSFpis2HXbUw/GS3TBTWBYpNayER94IBnGcMBRZz1dle79mBkLoqBTgTEhxDMwoPO /iZ+piLfg0U9RMgsT8RvIGvaquHkK15e2qK2Brf2GfBAwOaTsjGwGQYeFz4eQSHcgJ0Y Ht8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=pqGi7MpX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3si1314173ejw.108.2019.09.13.00.38.21; Fri, 13 Sep 2019 00:38:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=pqGi7MpX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728690AbfIMHff (ORCPT + 99 others); Fri, 13 Sep 2019 03:35:35 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:35101 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728666AbfIMHff (ORCPT ); Fri, 13 Sep 2019 03:35:35 -0400 Received: by mail-wm1-f66.google.com with SMTP id n10so1600229wmj.0 for ; Fri, 13 Sep 2019 00:35:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=e9KqrpRewRK330rqZdSd9xo5DsZMlUgKKAeLiHEs+aY=; b=pqGi7MpXngval3it54/qIEijJxSYP4A+8HAPW4TGVPJklPx4eJngR2MJEZE+kQZZEb DDQ6M4gXrLuz/ffUMjB1Om+ibgocGvasGqq/we4hQ2VOQ+o9UUxRIH+ZtFSm5YGt76kQ 2MPUe7XC5ywBHR85bfL+Hdx08VEbo5M0vNo4uOhYaXCfY0bs0xYJe+AUVF573M+kNOjS QNp3Bz+ZJ+G7efbTateZD29OTcAMu7wBXcbnaxAf6y7W7QsCL3FL2nCx3996eMo5+DCt AUyIyDha0KQeJR4xaWo0ZNPZPpE/CPcrGNgK+s+mJqeOueY2EWmgc9FGTnuDMelbY0Zb h+OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=e9KqrpRewRK330rqZdSd9xo5DsZMlUgKKAeLiHEs+aY=; b=Rl1MN4Zr2VpJ7Ik/6yeh9E6jGDAsc5g8qHSiDz91nqA/KEaa0Pzs336M86RvQl4pZj H6DL9iZWM08Zl5qcuvbVPGwXByaONtpvFhcFPtP92xiR9YGSztzXG/ZN8vzYJwVJtpK0 OF5+wQFsPRjG4Y6EHwTmvTQllihpVwm6gxITYhW54H5t5pMNl2GDEKgRn8jezOY3i79P m0mEaPKVgu6XBZ94Ax+4ZzV3cGvy2tcTeXd9cZI2pJPYL+ujstn3scLMbaq4GSUknNel Cnt3nUARDObXJdJbRGpkJhuuphLEzqm4rvj3HKREGFOFOPmWeAYQg5D//FYMkCWvFa3v 2rQw== X-Gm-Message-State: APjAAAWJJI6FfNWnektKmaCLUw1Mjici71BIwit9JTBdDH8xAkwfdX3s yPjlrZN9nND7W/HgtNpGwyk= X-Received: by 2002:a1c:28d4:: with SMTP id o203mr2212033wmo.142.1568360133199; Fri, 13 Sep 2019 00:35:33 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id j26sm51088014wrd.2.2019.09.13.00.35.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Sep 2019 00:35:32 -0700 (PDT) Date: Fri, 13 Sep 2019 09:35:30 +0200 From: Ingo Molnar To: Borislav Petkov Cc: x86-ml , Andy Lutomirski , Josh Poimboeuf , Linus Torvalds , lkml Subject: Re: [RFC] Improve memset Message-ID: <20190913073530.GA125477@gmail.com> References: <20190913072237.GA12381@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190913072237.GA12381@zn.tnic> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Borislav Petkov wrote: > Hi, > > since the merge window is closing in and y'all are on a conference, I > thought I should take another stab at it. It being something which Ingo, > Linus and Peter have suggested in the past at least once. > > Instead of calling memset: > > ffffffff8100cd8d: e8 0e 15 7a 00 callq ffffffff817ae2a0 <__memset> > > and having a JMP inside it depending on the feature supported, let's simply > have the REP; STOSB directly in the code: > > ... > ffffffff81000442: 4c 89 d7 mov %r10,%rdi > ffffffff81000445: b9 00 10 00 00 mov $0x1000,%ecx > > <---- new memset > ffffffff8100044a: f3 aa rep stos %al,%es:(%rdi) > ffffffff8100044c: 90 nop > ffffffff8100044d: 90 nop > ffffffff8100044e: 90 nop > <---- > > ffffffff8100044f: 4c 8d 84 24 98 00 00 lea 0x98(%rsp),%r8 > ffffffff81000456: 00 > ... > > And since the majority of x86 boxes out there is Intel, they haz > X86_FEATURE_ERMS so they won't even need to alternative-patch those call > sites when booting. > > In order to patch on machines which don't set X86_FEATURE_ERMS, I need > to do a "reversed" patching of sorts, i.e., patch when the x86 feature > flag is NOT set. See the below changes in alternative.c which basically > add a flags field to struct alt_instr and thus control the patching > behavior in apply_alternatives(). > > The result is this: > > static __always_inline void *memset(void *dest, int c, size_t n) > { > void *ret, *dummy; > > asm volatile(ALTERNATIVE_2_REVERSE("rep; stosb", > "call memset_rep", X86_FEATURE_ERMS, > "call memset_orig", X86_FEATURE_REP_GOOD) > : "=&D" (ret), "=a" (dummy) > : "0" (dest), "a" (c), "c" (n) > /* clobbers used by memset_orig() and memset_rep_good() */ > : "rsi", "rdx", "r8", "r9", "memory"); > > return dest; > } > > and so in the !ERMS case, we patch in a call to the memset_rep() version > which is the old variant in memset_64.S. There we need to do some reg > shuffling because I need to map the registers from where REP; STOSB > expects them to where the x86_64 ABI wants them. Not a big deal - a push > and two moves and a pop at the end. > > If X86_FEATURE_REP_GOOD is not set either, we fallback to another call > to the original unrolled memset. > > The rest of the diff is me trying to untangle memset()'s definitions > from the early code too because we include kernel proper headers there > and all kinds of crazy include hell ensues but that later. > > Anyway, this is just a pre-alpha version to get people's thoughts and > see whether I'm in the right direction or you guys might have better > ideas. That looks exciting - I'm wondering what effects this has on code footprint - for example defconfig vmlinux code size, and what the average per call site footprint impact is? If the footprint effect is acceptable, then I'd expect this to improve performance, especially in hot loops. Thanks, Ingo