MIME-Version: 1.0
In-Reply-To: <20150817043603.GB9387@nazgul.tnic>
References: <20150816222956.GA14290@krava.brq.redhat.com> <20150817043603.GB9387@nazgul.tnic>
From: Andy Lutomirski <luto@amacapital.net>
Date: Mon, 17 Aug 2015 09:06:59 -0700
Message-ID: <CALCETrXADuwEi-=kX0GyNt=h2RPCcVRtOtkOao-KBSdy11xr8g@mail.gmail.com>
Subject: Re: [BUG/RFC] perf test fails on AMD CPUs
To: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        X86 ML <x86@kernel.org>, Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Ingo Molnar <mingo@redhat.com>, Robert Richter <rric@kernel.org>,
        "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Namhyung Kim <namhyung@kernel.org>, Jan Stancek <jstancek@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4267
Lines: 105

On Sun, Aug 16, 2015 at 9:36 PM, Borislav Petkov <bp@suse.de> wrote:
> On Mon, Aug 17, 2015 at 12:29:56AM +0200, Jiri Olsa wrote:
>> hi,
>> 'perf test 18' is failing on systems with AMD processor.
>
> Hmm, still using that b0rked test box? :-)
>
> Also, which kernel?
>
> There have been substantial changes to the entry code recently. Although
> I don't see anything being done differently on AMD there except
> X86_BUG_SYSRET_SS_ATTRS but that should be unrelated.
>
>> The only reason I could find is that AMD does not set 'resume flag'
>> in RFLAGS register the way the Intel CPU does.
>>
>> (simplified) test scenario:
>>
>>   - create breakpoint (on test_function) perf event with SIGIO signal
>>     to be delivered any time the breakpoint is hit
>>   - run test_function
>>
>>
>> expected course of actions is:
>>   1) CPU hits 'test_function'
>>   2) DB exception is triggered, with RFLAGS.RF=0
>>   3) DB exception handler sets regs->RFLAGS.RF=1 and perf handler
>>      triggers irq_work pending work
>>   4) DB exception executes iretd
>>   5) irq_work interrupt is triggered, with RFLAGS.RF=1
>>   6) irq_work interrupt calls kill_fasync with SIGIO signal
>>   7) irq_work interrupt on return to userspace calls prepare_exit_to_usermode
>>      which actually delivers the SIGIO signal
>>   8) sigreturn syscall prepare registers to return to the
>>      instruction from step 1) and sets RFLAGS.RF to the its original
>>      value from step 5) (RFLAGS.RF=1)
>>   9) CPU hits 'test_function' and DB exception is NOT triggered
>>      due to RFLAGS.RF=1
>>
>> this is how I see it works on Intel
>>
>> But AMD gives me RFLAGS.RF=0 on step 5, which makes the step 9 to
>> trigger the DB exception once again and makes the test fail.
>
> Adding Andy, he might have an idea. Leaving in the rest for reference.

Gee thanks :-p

Jiri, did you instrument the code and observe do_IRQ sees RF clear in
its pt_regs?  Also, it might be worth checking that regs->ip in the
irq_work matches regs->ip.

It's *possible* that I messed up and broke RF restore with
opportunistic sysret, but the code looks correct:

        testq   $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
        jnz     opportunistic_sysret_failed


>
>> I'm not sure this test ever worked on AMD CPUs, anyway is there
>> anything I'm missing or is this some AMD/Intel quirk?
>>
>> thanks,
>> jirka
>>
>>
>>
>> AMD description of RF flag (SDM 3.1.6):
>> =======================================
>> Resume Flag (RF) Bit. Bit 16. The RF bit allows an instruction to be restarted following an
>> instruction breakpoint resulting in a debug exception (#DB). This bit prevents multiple debug
>> exceptions from occurring on the same instruction.
>> The processor clears the RF bit after every instruction is successfully executed, except when the
>> instruction is:
>> •
>> •
>> An IRET that sets the RF bit.
>> JMP, CALL, or INTn through a task gate.
>> In both of the above cases, RF is not cleared to 0 until the next instruction successfully executes.
>> When an exception occurs (or when a string instruction is interrupted), the processor normally sets
>> RF=1 in the RFLAGS image saved on the interrupt stack. However, when a #DB exception occurs as a
>> result of an instruction breakpoint, the processor clears the RF bit to 0 in the interrupt-stack RFLAGS
>> image.

That's a little weird, I think.  Shouldn't RF be zero on #DB due to a
*watchpoint* so that a watchpoint followed immediately by a breakpoint
works?

>> • For other cases, the value pushed for RF is the value that was in EFLAG.RF at the time the event handler was
>> called. This includes:
>> — Debug exceptions generated in response to instruction breakpoints
>> — Hardware-generated interrupts arriving between instructions (including those arriving after the last
>> iteration of a repeated string instruction)

This appears to be why it works on Intel.  Does AMD not do that?  We
could probably work around this in software (by not using irq work for
this), but yuck.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/