Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2129498imu; Thu, 10 Jan 2019 08:45:57 -0800 (PST) X-Google-Smtp-Source: ALg8bN5YffqamPT9nMExDpFJ3/ycdGfKHLYorI2cH+qAUD5zcY5IrPXIKafzNMtLW6BOWuPBp25U X-Received: by 2002:a62:28c9:: with SMTP id o192mr11068223pfo.57.1547138757019; Thu, 10 Jan 2019 08:45:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547138756; cv=none; d=google.com; s=arc-20160816; b=eiVlaK+kPja/27Caxl2xwVEqtcM3fZSyhmjhcLVj1UUBq17NeHj5xLs9KKqYZFBRiA 5h/yTWH2GUe/YcgXEvEAltQAMfbL2qc8LUI8T5rP/Fon7WQAFjM69wsUlLOjRSKdSm1L FztA99j2grI9873BAH3h8dgtz9KNNWiaWGLkhYjSCmrfRZLuZEXN5slGtsqqle9wxKfK i6BmkWuJsqFVfn6kgQ4Zlm8HMuRRMx8eX21Takxun70FIJJOXk0oOXtc34OswR/Qx7SJ qiEOZmyFe8ycdtsxZTs65p+aR0Bs5QmHF98/1H6PNjOfp+ifuNpyOLXD04zCXaHn4MU/ 4Fxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=NWGUtjaFeoWA+B2E//nlvL4at1My75KIcCtPVwmjYRs=; b=JPHsWEYPZs3U/ueRo2WNxgWOJxMS4qwbi7aRLWBMtUqu4OuLcJMuAxixlpEao452X5 WwM87BgTF0aVqKEOMl75UGQ6Wrn4nAq0xDdXRfwNO4+nd1epD/3lrAli2TVG5q7ZFpcK lrzRRzdBjT4K9obfBSOOidaaoMuLgmS+Tl5RJ6KLZwmjAR1k/JVLtxrejKS0Gt1C5I7x pIzymdDLc1g/inFx4Lwtrdc4cC6y4H8jbxgeLzygoDlr0xcvvgcPRc2/f6jRM63s/m9b kW8x1aceoG5Gx1wCKa/n1ssBLJUTvsNGksPIfvTa4qivM5i1yKWmgl2feMOLUQFQA81A 7w2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3si10069221plx.33.2019.01.10.08.45.41; Thu, 10 Jan 2019 08:45:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728507AbfAJQob (ORCPT + 99 others); Thu, 10 Jan 2019 11:44:31 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52972 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728183AbfAJQob (ORCPT ); Thu, 10 Jan 2019 11:44:31 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 06DB27D0E0; Thu, 10 Jan 2019 16:44:29 +0000 (UTC) Received: from treble (ovpn-125-32.rdu2.redhat.com [10.10.125.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 666442B1CD; Thu, 10 Jan 2019 16:44:03 +0000 (UTC) Date: Thu, 10 Jan 2019 10:44:01 -0600 From: Josh Poimboeuf To: Nadav Amit Cc: X86 ML , LKML , Ard Biesheuvel , Andy Lutomirski , Steven Rostedt , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Linus Torvalds , Masami Hiramatsu , Jason Baron , Jiri Kosina , David Laight , Borislav Petkov , Julia Cartwright , Jessica Yu , "H. Peter Anvin" , Rasmus Villemoes , Edward Cree , Daniel Bristot de Oliveira Subject: Re: [PATCH v3 0/6] Static calls Message-ID: <20190110164401.g747vifrppbhbo3o@treble> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20180716 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Thu, 10 Jan 2019 16:44:30 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 10, 2019 at 01:21:00AM +0000, Nadav Amit wrote: > > On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf wrote: > > > > With this version, I stopped trying to use text_poke_bp(), and instead > > went with a different approach: if the call site destination doesn't > > cross a cacheline boundary, just do an atomic write. Otherwise, keep > > using the trampoline indefinitely. > > > > NOTE: At least experimentally, the call destination writes seem to be > > atomic with respect to instruction fetching. On Nehalem I can easily > > trigger crashes when writing a call destination across cachelines while > > reading the instruction on other CPU; but I get no such crashes when > > respecting cacheline boundaries. > > > > BUT, the SDM doesn't document this approach, so it would be great if any > > CPU people can confirm that it's safe! > > > > I (still) think that having a compiler plugin can make things much cleaner > (as done in [1]). The callers would not need to be changed, and the key can > be provided through an attribute. > > Using a plugin should also allow to use Steven’s proposal for doing > text_poke() safely: by changing 'func()' into 'asm (“call func”)', as done > by the plugin, you can be guaranteed that registers are clobbered. Then, you > can store in the assembly block the return address in one of these > registers. I'm no GCC expert (why do I find myself saying this a lot lately?), but this sounds to me like it could be tricky to get right. I suppose you'd have to do it in an early pass, to allow GCC to clobber the registers in a later pass. So it would necessarily have side effects, but I don't know what the risks are. Would it work with more than 5 arguments, where args get passed on the stack? At the very least, it would (at least partially) defeat the point of the callee-saved paravirt ops. What if we just used a plugin in a simpler fashion -- to do call site alignment, if necessary, to ensure the instruction doesn't cross cacheline boundaries. This could be done in a later pass, with no side effects other than code layout. And it would allow us to avoid breakpoints altogether -- again, assuming somebody can verify that intra-cacheline call destination writes are atomic with respect to instruction decoder reads. -- Josh