Date:   Wed, 15 Feb 2023 15:16:37 -0800
From:   Josh Poimboeuf <jpoimboe@kernel.org>
To:     Peter Zijlstra <peterz@infradead.org>
Cc:     Masami Hiramatsu <mhiramat@kernel.org>, x86@kernel.org,
        linux-kernel@vger.kernel.org,
        Chen Zhongjin <chenzhongjin@huawei.com>,
        "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
        Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
        "David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH 2/2] x86/entry: Fix unwinding from kprobe on PUSH/POP
 instruction
Message-ID: <20230215231637.laryjsua5p4wcd57@treble>
References: <cover.1676068346.git.jpoimboe@kernel.org>
 <baafcd3cc1abb14cb757fe081fa696012a5265ee.1676068346.git.jpoimboe@kernel.org>
 <20230213234357.1fe194b2767d9bc431202d4c@kernel.org>
 <Y+tx6DZyoQ362lUM@hirez.programming.kicks-ass.net>
 <20230214170552.glhdytvunczyxxao@treble>
 <Y+yzMmL7gUprDru3@hirez.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <Y+yzMmL7gUprDru3@hirez.programming.kicks-ass.net>
Precedence: bulk

On Wed, Feb 15, 2023 at 11:25:54AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 14, 2023 at 09:05:52AM -0800, Josh Poimboeuf wrote:
> > On Tue, Feb 14, 2023 at 12:35:04PM +0100, Peter Zijlstra wrote:
> > > On Mon, Feb 13, 2023 at 11:43:57PM +0900, Masami Hiramatsu wrote:
> > > 
> > > > > Fix it by annotating the #BP exception as a non-signal stack frame,
> > > > > which tells the ORC unwinder to decrement the instruction pointer before
> > > > > looking up the corresponding ORC entry.
> > > > 
> > > > Just to make it clear, this sounds like a 'hack' use of non-signal stack
> > > > frame. If so, can we change the flag name as 'literal' or 'non-literal' etc?
> > > > I concern that the 'signal' flag is used differently in the future.
> > 
> > Agreed, though I'm having trouble coming up with a succinct yet
> > scrutable name.  If length wasn't an issue it would be something like
> > 
> >   "decrement_return_address_when_looking_up_the_next_orc_entry"
> > 
> > > Oooh, bike-shed :-) Let me suggest trap=1, where a trap is a fault with
> > > a different return address, specifically the instruction after the
> > > faulting instruction.
> > 
> > I think "trap" doesn't work because
> > 
> >  1) It's more than just traps, it's also function calls.  We have
> >     traps/calls in one bucket (decrement IP); and everything else
> >     (faults, aborts, irqs) in the other (don't decrement IP).
> > 
> >  2) It's not necessarily all traps which need the flag, just those that
> >     affect a previously-but-now-overwritten stack-modifying instruction.
> >     So #OF (which we don't use?) and trap-class #DB don't seem to be
> >     affected.  In practice maybe this distinction doesn't matter, but
> >     for example there's no reason for ORC try to distinguish trap #DB
> >     from non-trap #DB at runtime.
> 
> Well, I was specifically thinking about #DB, why don't we need to
> decrement when we put a hardware breakpoint on a stack modifying op?

I assume you mean the INT1 instruction.  Yeah, maybe we should care
about that.

I'm struggling to come up with any decent ideas about how to implement
that.  Presumably the #DB handler would have to communicate to the
unwinder somehow whether the given frame is a trap.

Alternatively I was thinking the unwinder could read the instruction,
but then it doesn't know whether to read regs->ip or the previous
instruction.

-- 
Josh