2014-04-01 15:34:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()

On 04/01, Masami Hiramatsu wrote:
>
> (2014/04/01 4:44), Oleg Nesterov wrote:
> > arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
> > but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
> > Move the callsite into "default" case and change the "0xff" case to
> > fall-through.
> >
> > We are going to add the various hooks to handle the rip-relative
> > jmp/call instructions (and more), we need this change to enforce the
> > fact that the new code can't conflict with is_riprel_insn() code.
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> Hmm, this seems not obviously reasonable at this point.
> However, the code itself is not wrong. Could you merge
> this change to that new hooks?

Good point, I'll send v2.

I'd still prefer to do this in a separate patch if you do not object,
and this patch should come after "add uprobe_xol_ops". In this case
it should be more clear why do we need this change, and the changelog
can tell more. Say, it can mention that otherwise uprobe_abort_xol()
can be confused until at least we add uprobe_xol_ops->abort().

And probably I'll add another patch into this series, "restart if
->post_xol() fails" (see "TODO:" in 7/7). We need this change anyway
to emulate the "call" insn. We could fix this before we add the hooks,
but after 7/7 the change will be more simple/clear.

Thanks!

Oleg.


2014-04-01 17:40:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()

On 04/01, Oleg Nesterov wrote:
>
> Good point, I'll send v2.

Masami, et al, I was going to send v2 plus the next short RFC series
which fixes the problem today, but it turns out that I have no time.
Will try to do this tomorrow.

So let me explain the problem, and how (I think) it should be solved.
Unfortunately, I do not even know the terminology, so firstly I have
to explain you the things I recently learned when I investigated the
bug report ;)

Lets only discuss the "call" instruction (the one which starts with
"e8" byte), because (compared to "jmp") the fix is more complicated.
This instruction simply does

push rip
rip += offset_encoded_in_insn // ignoring the length of insn

To simplify, suppose that we probe such an insn at virtual address
0x1000 and offset_encoded_in_insn == 0x10.

When this task hits the probe, the kernel simply copies this (unmodified)
insn into xol area, and the task does single step. Lets suppose that
the virtual address of xol slot == 0x2000. This means that regs->ip
becomes 0x2000 + 0x10 = 0x2010, and we only need to adjust ->ip and
return adrress.

Note that the new ->ip == 0x2010 can point to nowhere, to the unmapped
area or it can point to the kernel memory. This still works because
the task doesn't even try to execute the next insn at this address.

But! this does NOT works if the new ->ip is non-canonical (not sure this
is the common term, I mean the hole starting from 0xffff800000000000,
which I didn't know about until recently ;). In this case CPU generates
#GP and do_general_protection() kills the task which tries to execute
this insn out-of-line.

The test-case I wrote to ensure:

void probe_func(void), callee(void);

int failed = 1;

asm (
".text\n"
".align 4096\n"
".globl probe_func\n"
"probe_func:\n"
"call callee\n"
"ret"
);

/*
* This assumes that:
*
* - &probe_func = 0x401000 + a_bit, aligned = 0x402000
*
* - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
* as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
*
* so we can target the non-canonical address from xol_vma using
* the simple math below, 100 * 4096 is just the random offset
*/
asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n");

void callee(void)
{
failed = 0;
}

int main(void)
{
probe_func();
return failed;
}

It dies if you probe "probe_func" (although this test-case is not very
reliable, randomize_va_space/etc can change the placement of xol area).

Now let me describe how I am going to fix this. Please let me know if
you think this can't work or you see the better solution.

To simplify, lets ignore the fact we have lib/insn.c. The first change
will add

bool call_emulate_op(...)
{
// push return adress
if (put_user(regs->ip + 5, (long __user *)(regs->sp - 8)))
return false;

regs->sp -= 8;
regs->ip += 5 + *(s32*)(auprobe->insn + 1);
return true;
}

and change "case 0xe8" in arch_uprobe_analyze_insn() to setup
->ops = call_xol_ops.

But this is obviously incomplete because put_user() can fail. In this case
we could send a signal an restart, but this is not simple. We do not know
which signal (say, SIGSEGV or SIGBUS ?), we do not know what should we put
into siginfo, etc. And we do not want to emulate __do_page_fault ;)

So the 2nd patch will do the following:

1. Add "long call_offset" into "struct arch_uprobe". (yes, we
should also add a "union" into arch_uprobe, but lets ignore
the details).

2. change "case 0xe8" in arch_uprobe_analyze_insn() to also do

s32 *offset = (void*)(auprobe->insn + 1);

arch_uprobe->call_offset = *offset;
/*
* Turn this insn into
*
* 1: call 1b;
*
* This is only needed to expand the stack if emulate
* fails. We do not turn it into, say, "pushf" because
* we do not want to change the 1st byte used by
* set_orig_insn().
*
*offset = -5;

3. Change call_emulate_op() to use arch_uprobe->call_offset

4. Add

//
// In practice, this will be almost never called. put_user()
// in call_emulate_op() failed, single-step can only succeed
// if another thread expanded our stack.
//
int call_post_xol_op(...)
{
regs->sp += 8;
// tell the caller to restart
return -EAGAIN;
}

Once again, if this can work we need more changes to handle jmp's/etc. But
lets discuss this later. I am thinking in horror about conditional jmp ;)
In fact this should be simple, just I do not know (yet) to parse such an
insn, and I simply do not know if lib/insn.c can help me to figure out which
flag in regs->flags ->emulate() should check.

So this all looks a bit complicated, but I do not see a simpler solution.
Well, we could avoid ->emulate() altogether, we could just mangle the
problematic instructions and complicate arch_uprobe_post_xol(), but I do
not think this would be better. And if nothing else, ->emulate is a) simple
and b) should likely succeed and make the probing faster.

But perhaps I missed something?

Oleg.

2014-04-02 17:57:51

by Jim Keniston

[permalink] [raw]
Subject: Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()

On Tue, 2014-04-01 at 18:39 +0200, Oleg Nesterov wrote:
> On 04/01, Oleg Nesterov wrote:
> >
> > Good point, I'll send v2.
>
> Masami, et al, I was going to send v2 plus the next short RFC series
> which fixes the problem today, but it turns out that I have no time.
> Will try to do this tomorrow.
>
> So let me explain the problem, and how (I think) it should be solved.
> Unfortunately, I do not even know the terminology, so firstly I have
> to explain you the things I recently learned when I investigated the
> bug report ;)
>
[problem description and proposed solution snipped]

Thanks for your work on this. I think your analysis is correct. As you
say, emulating calls is tricky because of the possibility that the call
will incur a page fault when it grows the stack. Your best solution
might be to emulate jumps, but rewrite call instructions using a scratch
register, similar to how we handle rip-relative instructions.

>
> Once again, if this can work we need more changes to handle jmp's/etc. But
> lets discuss this later. I am thinking in horror about conditional jmp ;)
> In fact this should be simple, just I do not know (yet) to parse such an
> insn, and I simply do not know if lib/insn.c can help me to figure out which
> flag in regs->flags ->emulate() should check.

Emulating jumps (including conditional jumps) shouldn't be all that much
code. In case you haven't already found it, the "AMD64 Architecture
Programmer's Manual, Volume 3" provides the sort of info you need.
Looking at the "Jcc" section, I see dozens of names of conditional-jump
instructions; but they're all aliases for the same 16 variations, which
branch based on various combinations of 5 flags in regs->eflags (OF, CF,
ZF, SF, PF).

One thing about emulating jumps is that if the task has block stepping
enabled, then a trap is expected on every successful branch. I'd have
to poke around a while to figure out how to know whether block stepping
is enabled. set_task_blockstep() should point the way.

>
> So this all looks a bit complicated, but I do not see a simpler solution.
> Well, we could avoid ->emulate() altogether, we could just mangle the
> problematic instructions and complicate arch_uprobe_post_xol(), but I do
> not think this would be better. And if nothing else, ->emulate is a) simple
> and b) should likely succeed and make the probing faster.
>
> But perhaps I missed something?
>
> Oleg.
>

Jim Keniston
IBM Linux Technology Center

2014-04-02 19:14:39

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel_insn()

On 04/02, Jim Keniston wrote:
>
> On Tue, 2014-04-01 at 18:39 +0200, Oleg Nesterov wrote:
>
> > So let me explain the problem, and how (I think) it should be solved.
> > Unfortunately, I do not even know the terminology, so firstly I have
> > to explain you the things I recently learned when I investigated the
> > bug report ;)
> >
> [problem description and proposed solution snipped]
>
> Thanks for your work on this. I think your analysis is correct.

Great, thanks!

> As you
> say, emulating calls is tricky because of the possibility that the call
> will incur a page fault when it grows the stack. Your best solution
> might be to emulate jumps,

Yes,

> but rewrite call instructions using a scratch
> register, similar to how we handle rip-relative instructions.

Yes, this is what I meant when I said that we can avoid ->emulate in
this case, mangle insn, and complicate post_xol(). But so far I do not
think this would be better.

OK. Let me actually finish amd send the fixes, then we can discuss this
again and see if another approach makes more sense.

Sorry, I was distracted again, so I need more time. Will try to send tomorrow.

> > Once again, if this can work we need more changes to handle jmp's/etc. But
> > lets discuss this later. I am thinking in horror about conditional jmp ;)
> > In fact this should be simple, just I do not know (yet) to parse such an
> > insn, and I simply do not know if lib/insn.c can help me to figure out which
> > flag in regs->flags ->emulate() should check.
>
> Emulating jumps (including conditional jumps) shouldn't be all that much
> code. In case you haven't already found it, the "AMD64 Architecture
> Programmer's Manual, Volume 3" provides the sort of info you need.

Thanks. I'll try to read it, but most probably I'll come here with the
stupid questions anyway.

> One thing about emulating jumps is that if the task has block stepping
> enabled, then a trap is expected on every successful branch.

Yes, but probably we can do this later. Note that uprobes doesn't play
nice with TIF_BLOCKSTEP anyway, see the comment in arch_uprobe_post_xol:

/*
* arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
* so we can get an extra SIGTRAP if we do not clear TF. We need
* to examine the opcode to make it right.
*/

So I think that at least the initial version can safely ignore this problem.

Oleg.