Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3984461yba; Mon, 29 Apr 2019 11:44:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqy3DUKcpnAWfnLOp0iu1kEmKW0pJwIaMcRWv63tg//cFalAzvr4IvL4yRCOamNDWW6JSp79 X-Received: by 2002:a17:902:9a4a:: with SMTP id x10mr22221354plv.113.1556563479887; Mon, 29 Apr 2019 11:44:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556563479; cv=none; d=google.com; s=arc-20160816; b=htoN3xovCZ3u66r34CJsIP1OhmRVKHeaawmCitDaLy5QhVFGITQbvIToBV8TfwwTCO 1w0pfQ5J9e6yAaDDt1Y+QhyVvFhaJRDAhcwbF3fzFzfm+2f8ZjDcUYOGEeHu0FNdECcT AsPNa6oD3KVeQiilLnKiY4zjATj3IDq3FO40PHwKKyKRcY5z2jkWlShQ1U0T9ThaCpLT AlcGfvci4dqW/aUbw6ZsZn3DAoEci3EXm6GGwDtPWL3FGrjispdlhbc/MPxTZA9s3g5v q2XiNf9BGh10ys6rsZ++8QO1HdxBPxx8LLpZtJbtyHqpjyapMd7TyHjQhfFzLIyM24Uo RR3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=TqMkRV5UAWKU1KPWfbTVzFkti63TUpYZy+7I8P9esAY=; b=Jx2Am70UNvHoLwWcevC+bHuRazkQ/UUzd9tqqIPfNIFOyqIL4GD4lSja+6zAP6hYBR KT05RyVGZf2Dna01WyA0XT+uHdx5tX3lj+PF+nKB4/NOfltqRYF6N3EpWyVuaCNQO2ZS E1WIgGy/CdrMJdEHqa0GTcb1ChnZffPh/6wFqAtKizm8kLBDUnyj0KM0M1R0Q4JZa2d4 bW4hFLON3n2JHmV57fU94DkAkrP/VL57FE5+uLM0j2lkQ0FycPVjsJUs1V5SRVVT6k9h CXmrPYnP+u4XYhFXe/B4jEiRzrBrPjodI7jYUWsg76GwOgmhe+ROcFp+psFsexWKMHg2 mSlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=wFd2OGX9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l25si35301635pfi.9.2019.04.29.11.44.24; Mon, 29 Apr 2019 11:44:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=wFd2OGX9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729045AbfD2Smz (ORCPT + 99 others); Mon, 29 Apr 2019 14:42:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:46662 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729118AbfD2Smy (ORCPT ); Mon, 29 Apr 2019 14:42:54 -0400 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id EBEF321783 for ; Mon, 29 Apr 2019 18:42:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1556563373; bh=uk+tO9aXRLS/rmrBAuW4lAJObJJz4OESG+TqjlzC2Ic=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=wFd2OGX9/cao7jixixF0P5itZF/mWjJAD8gRrcPwhqC9EGB1JB7wS5gHB9lXwC0cG 4VHrFChhnx5j8cTRX+5t6Fb+ZNPUXPnF18kaX5n3mUM/Lt/fhQUay/docUaElA1Mri amriFjlKKrTpu/qOVKO1uhQvYcmzQgLnhpcQ1gQw= Received: by mail-wm1-f51.google.com with SMTP id h11so555963wmb.5 for ; Mon, 29 Apr 2019 11:42:52 -0700 (PDT) X-Gm-Message-State: APjAAAXrnHlYF9nhnpqEJGwESc7qg3fqAeA63iDVCIoqCEIe0zJJWnNE Wbi/iWJ2YWWM517aDN+z7BbXfmbk7Ttc4v0FC52wnw== X-Received: by 2002:a1c:eb18:: with SMTP id j24mr363919wmh.32.1556563369708; Mon, 29 Apr 2019 11:42:49 -0700 (PDT) MIME-Version: 1.0 References: <20190427100639.15074-1-nstange@suse.de> <20190427100639.15074-4-nstange@suse.de> <20190427102657.GF2623@hirez.programming.kicks-ass.net> <20190428133826.3e142cfd@oasis.local.home> In-Reply-To: From: Andy Lutomirski Date: Mon, 29 Apr 2019 11:42:38 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 3/4] x86/ftrace: make ftrace_int3_handler() not to skip fops invocation To: Linus Torvalds Cc: Steven Rostedt , Peter Zijlstra , Nicolai Stange , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "the arch/x86 maintainers" , Josh Poimboeuf , Jiri Kosina , Miroslav Benes , Petr Mladek , Joe Lawrence , Shuah Khan , Konrad Rzeszutek Wilk , Tim Chen , Sebastian Andrzej Siewior , Mimi Zohar , Juergen Gross , Nick Desaulniers , Nayna Jain , Masahiro Yamada , Andy Lutomirski , Joerg Roedel , Linux List Kernel Mailing , live-patching@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 29, 2019 at 11:29 AM Linus Torvalds wrote: > > On Mon, Apr 29, 2019 at 11:06 AM Linus Torvalds > wrote: > > > > > > It does *not* emulate the "call" in the BP handler itself, instead if > > replace the %ip (the same way all the other BP handlers replace the > > %ip) with a code sequence that just does > > > > push %gs:bp_call_return > > jmp *%gs:bp_call_target > > > > after having filled in those per-cpu things. > > Note that if you read the patch, you'll see that my explanation > glossed over the "what if an interrupt happens" part. Which is handled > by having two handlers, one for "interrupts were already disabled" and > one for "interrupts were enabled, so I disabled them before entering > the handler". This is quite a cute solution. > > The second handler does the same push/jmp sequence, but has a "sti" > before the jmp. Because of the one-instruction sti shadow, interrupts > won't actually be enabled until after the jmp instruction has > completed, and thus the "push/jmp" is atomic wrt regular interrupts. > > It's not safe wrt NMI, of course, but since NMI won't be rescheduling, > and since any SMP IPI won't be punching through that sequence anyway, > it's still atomic wrt _another_ text_poke() attempt coming in and > re-using the bp_call_return/tyarget slots. I'm less than 100% convinced about this argument. Sure, an NMI right there won't cause a problem. But an NMI followed by an interrupt will kill us if preemption is on. I can think of three solutions: 1. Assume that all CPUs (and all relevant hypervisors!) either mask NMIs in the STI shadow or return from NMIs with interrupts masked for one instruction. Ditto for MCE. This seems too optimistic. 2. Put a fixup in the NMI handler and MCE handler: if the return address is one of these magic jumps, clear IF and back up IP by one byte. This should *work*, but it's a bit ugly. 3. Use a different magic sequence: push %gs:bp_call_return int3 and have the int3 handler adjust regs->ip and regs->flags as appropriate. I think I like #3 the best, even though it's twice as slow. FWIW, kernel shadow stack patches will show up eventually, and none of these approaches are compatible as is. Perhaps the actual sequence should be this, instead: bp_int3_fixup_irqsoff: call 1f 1: int3 bp_int3_fixup_irqson: call 1f 1: int3 and the int3 handler will update the conventional return address *and* the shadow return address. Linus, what do you think about this variant? Finally, with my maintainer hat on: if anyone actually wants to merge this thing, I want to see a test case, exercised regularly (every boot in configured, perhaps) that actually *runs* this code. Hitting it in practice will be rare, and I want bugs to be caught.