Setting a hardware breakpoint at the
rex64 sysret
instruction at the end of int_ret_from_sys_call causes the system to
triple fault
and reboot when the breakpoint is triggered. Appears to be related
the same problem
as the lockup.
This function can be stepped over and traced through with the TRAP
FLAG set so long as a hardware breakpoint is set somewhere in the
function. Otherwise upon exist the system hard hangs. If you break
exactly on that instruction -- reboot. If you break a few
instructions before it and single step through the call it works. If
you step through the call with no breakpoint the system hard hangs.
Same behavior as when you try to step from inside an nmi handler.
Looks related.
Caused somewhere in the way exception handlers are coded for sure.
On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>
> Setting a hardware breakpoint at the
>
> rex64 sysret
>
> instruction at the end of int_ret_from_sys_call causes the system to
> triple fault
> and reboot when the breakpoint is triggered. Appears to be related
> the same problem
> as the lockup.
>
> This function can be stepped over and traced through with the TRAP
> FLAG set so long as a hardware breakpoint is set somewhere in the
> function. Otherwise upon exist the system hard hangs. If you break
> exactly on that instruction -- reboot. If you break a few
> instructions before it and single step through the call it works. If
> you step through the call with no breakpoint the system hard hangs.
> Same behavior as when you try to step from inside an nmi handler.
> Looks related.
You're probably encountering the user mode RSP when SYSRET happens.
--Andy
On 12/16/15, Andy Lutomirski <[email protected]> wrote:
> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>>
>> Setting a hardware breakpoint at the
>>
>> rex64 sysret
>>
>> instruction at the end of int_ret_from_sys_call causes the system to
>> triple fault
>> and reboot when the breakpoint is triggered. Appears to be related
>> the same problem
>> as the lockup.
>>
>> This function can be stepped over and traced through with the TRAP
>> FLAG set so long as a hardware breakpoint is set somewhere in the
>> function. Otherwise upon exist the system hard hangs. If you break
>> exactly on that instruction -- reboot. If you break a few
>> instructions before it and single step through the call it works. If
>> you step through the call with no breakpoint the system hard hangs.
>> Same behavior as when you try to step from inside an nmi handler.
>> Looks related.
>
> You're probably encountering the user mode RSP when SYSRET happens.
>
> --Andy
>
Hi Andy,
Could be, but I am getting a double fault message with an error code
of 0 that then scrolls off the screen when the triple fault hits. It
flashes too quickly to get the function address -- wish I had a logic
analyzer with an inverse assembler -- would already be there. A
usermode RSP would I assume clear TRAP flag and that does not explain
why it works if I set a breakpoint right above the instruction then
step over it, which I can without the triple fault.
Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
int_ret_from_syscall (scroll til you get to the swapgs then rex64
sysret, set a hardware breakpoint at that address , i.e. b
ffffffff81673ae1 (or whatever address the swapgs instruction is at),
then step through with t a few times (should just return after rex64
sysret since it returns to user space). The set a breakpoint at the
rex64 sysret instruction, b <address>, let it break at the
instruction, then hit g for go and watch the fireworks -- it will try
to print a double fault message then reboot.
I handle the whole user RSP thing, I just return if I see regs set to
user space. This looks like some sort of problem in the exception
handlers.
Jeff
On Wed, Dec 16, 2015 at 4:31 PM, Jeff Merkey <[email protected]> wrote:
> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>>>
>>> Setting a hardware breakpoint at the
>>>
>>> rex64 sysret
>>>
>>> instruction at the end of int_ret_from_sys_call causes the system to
>>> triple fault
>>> and reboot when the breakpoint is triggered. Appears to be related
>>> the same problem
>>> as the lockup.
>>>
>>> This function can be stepped over and traced through with the TRAP
>>> FLAG set so long as a hardware breakpoint is set somewhere in the
>>> function. Otherwise upon exist the system hard hangs. If you break
>>> exactly on that instruction -- reboot. If you break a few
>>> instructions before it and single step through the call it works. If
>>> you step through the call with no breakpoint the system hard hangs.
>>> Same behavior as when you try to step from inside an nmi handler.
>>> Looks related.
>>
>> You're probably encountering the user mode RSP when SYSRET happens.
>>
>> --Andy
>>
>
> Hi Andy,
>
> Could be, but I am getting a double fault message with an error code
> of 0 that then scrolls off the screen when the triple fault hits. It
> flashes too quickly to get the function address -- wish I had a logic
> analyzer with an inverse assembler -- would already be there. A
> usermode RSP would I assume clear TRAP flag and that does not explain
> why it works if I set a breakpoint right above the instruction then
> step over it, which I can without the triple fault.
>
> Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
> 4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
> int_ret_from_syscall (scroll til you get to the swapgs then rex64
> sysret, set a hardware breakpoint at that address , i.e. b
> ffffffff81673ae1 (or whatever address the swapgs instruction is at),
> then step through with t a few times (should just return after rex64
> sysret since it returns to user space). The set a breakpoint at the
> rex64 sysret instruction, b <address>, let it break at the
> instruction, then hit g for go and watch the fireworks -- it will try
> to print a double fault message then reboot.
>
> I handle the whole user RSP thing, I just return if I see regs set to
> user space. This looks like some sort of problem in the exception
> handlers.
It's kernel regs but user RSP.
--Andy
On 12/16/15, Jeff Merkey <[email protected]> wrote:
> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>>>
>>> Setting a hardware breakpoint at the
>>>
>>> rex64 sysret
>>>
>>> instruction at the end of int_ret_from_sys_call causes the system to
>>> triple fault
>>> and reboot when the breakpoint is triggered. Appears to be related
>>> the same problem
>>> as the lockup.
>>>
>>> This function can be stepped over and traced through with the TRAP
>>> FLAG set so long as a hardware breakpoint is set somewhere in the
>>> function. Otherwise upon exist the system hard hangs. If you break
>>> exactly on that instruction -- reboot. If you break a few
>>> instructions before it and single step through the call it works. If
>>> you step through the call with no breakpoint the system hard hangs.
>>> Same behavior as when you try to step from inside an nmi handler.
>>> Looks related.
>>
>> You're probably encountering the user mode RSP when SYSRET happens.
>>
>> --Andy
>>
>
> Hi Andy,
>
> Could be, but I am getting a double fault message with an error code
> of 0 that then scrolls off the screen when the triple fault hits. It
> flashes too quickly to get the function address -- wish I had a logic
> analyzer with an inverse assembler -- would already be there. A
> usermode RSP would I assume clear TRAP flag and that does not explain
> why it works if I set a breakpoint right above the instruction then
> step over it, which I can without the triple fault.
>
> Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
> 4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger,
u int_ret_from_sys_call
correction sorry
(scroll til you get to the swapgs then rex64
> sysret, set a hardware breakpoint at that address , i.e. b
> ffffffff81673ae1 (or whatever address the swapgs instruction is at),
> then step through with t a few times (should just return after rex64
> sysret since it returns to user space). The set a breakpoint at the
> rex64 sysret instruction, b <address>, let it break at the
> instruction, then hit g for go and watch the fireworks -- it will try
> to print a double fault message then reboot.
>
> I handle the whole user RSP thing, I just return if I see regs set to
> user space. This looks like some sort of problem in the exception
> handlers.
>
> Jeff
>
On 12/16/15, Andy Lutomirski <[email protected]> wrote:
> On Wed, Dec 16, 2015 at 4:31 PM, Jeff Merkey <[email protected]> wrote:
>> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>>> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>>>>
>>>> Setting a hardware breakpoint at the
>>>>
>>>> rex64 sysret
>>>>
>>>> instruction at the end of int_ret_from_sys_call causes the system to
>>>> triple fault
>>>> and reboot when the breakpoint is triggered. Appears to be related
>>>> the same problem
>>>> as the lockup.
>>>>
>>>> This function can be stepped over and traced through with the TRAP
>>>> FLAG set so long as a hardware breakpoint is set somewhere in the
>>>> function. Otherwise upon exist the system hard hangs. If you break
>>>> exactly on that instruction -- reboot. If you break a few
>>>> instructions before it and single step through the call it works. If
>>>> you step through the call with no breakpoint the system hard hangs.
>>>> Same behavior as when you try to step from inside an nmi handler.
>>>> Looks related.
>>>
>>> You're probably encountering the user mode RSP when SYSRET happens.
>>>
>>> --Andy
>>>
>>
>> Hi Andy,
>>
>> Could be, but I am getting a double fault message with an error code
>> of 0 that then scrolls off the screen when the triple fault hits. It
>> flashes too quickly to get the function address -- wish I had a logic
>> analyzer with an inverse assembler -- would already be there. A
>> usermode RSP would I assume clear TRAP flag and that does not explain
>> why it works if I set a breakpoint right above the instruction then
>> step over it, which I can without the triple fault.
>>
>> Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
>> 4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
>> int_ret_from_syscall (scroll til you get to the swapgs then rex64
>> sysret, set a hardware breakpoint at that address , i.e. b
>> ffffffff81673ae1 (or whatever address the swapgs instruction is at),
>> then step through with t a few times (should just return after rex64
>> sysret since it returns to user space). The set a breakpoint at the
>> rex64 sysret instruction, b <address>, let it break at the
>> instruction, then hit g for go and watch the fireworks -- it will try
>> to print a double fault message then reboot.
>>
>> I handle the whole user RSP thing, I just return if I see regs set to
>> user space. This looks like some sort of problem in the exception
>> handlers.
>
> It's kernel regs but user RSP.
>
> --Andy
>
right, I handle that case and I have handled that case since about
2001. Used to before all the change I could just step from userspace
to kernel space with mdb. Have not been able to do that for while
since Linus fixed the VM in about 2002.
So I handle that case.
Jeff
On 12/16/15, Jeff Merkey <[email protected]> wrote:
> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>> On Wed, Dec 16, 2015 at 4:31 PM, Jeff Merkey <[email protected]> wrote:
>>> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>>>> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>>>>>
>>>>> Setting a hardware breakpoint at the
>>>>>
>>>>> rex64 sysret
>>>>>
>>>>> instruction at the end of int_ret_from_sys_call causes the system to
>>>>> triple fault
>>>>> and reboot when the breakpoint is triggered. Appears to be related
>>>>> the same problem
>>>>> as the lockup.
>>>>>
>>>>> This function can be stepped over and traced through with the TRAP
>>>>> FLAG set so long as a hardware breakpoint is set somewhere in the
>>>>> function. Otherwise upon exist the system hard hangs. If you break
>>>>> exactly on that instruction -- reboot. If you break a few
>>>>> instructions before it and single step through the call it works. If
>>>>> you step through the call with no breakpoint the system hard hangs.
>>>>> Same behavior as when you try to step from inside an nmi handler.
>>>>> Looks related.
>>>>
>>>> You're probably encountering the user mode RSP when SYSRET happens.
>>>>
>>>> --Andy
>>>>
>>>
>>> Hi Andy,
>>>
>>> Could be, but I am getting a double fault message with an error code
>>> of 0 that then scrolls off the screen when the triple fault hits. It
>>> flashes too quickly to get the function address -- wish I had a logic
>>> analyzer with an inverse assembler -- would already be there. A
>>> usermode RSP would I assume clear TRAP flag and that does not explain
>>> why it works if I set a breakpoint right above the instruction then
>>> step over it, which I can without the triple fault.
>>>
>>> Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
>>> 4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
>>> int_ret_from_syscall (scroll til you get to the swapgs then rex64
>>> sysret, set a hardware breakpoint at that address , i.e. b
>>> ffffffff81673ae1 (or whatever address the swapgs instruction is at),
>>> then step through with t a few times (should just return after rex64
>>> sysret since it returns to user space). The set a breakpoint at the
>>> rex64 sysret instruction, b <address>, let it break at the
>>> instruction, then hit g for go and watch the fireworks -- it will try
>>> to print a double fault message then reboot.
>>>
>>> I handle the whole user RSP thing, I just return if I see regs set to
>>> user space. This looks like some sort of problem in the exception
>>> handlers.
>>
>> It's kernel regs but user RSP.
>>
>> --Andy
>>
>
> right, I handle that case and I have handled that case since about
> 2001. Used to before all the change I could just step from userspace
> to kernel space with mdb. Have not been able to do that for while
> since Linus fixed the VM in about 2002.
>
> So I handle that case.
>
> Jeff
>
It looks like that an architectural decision is the result of this bug
and I don't think there is anything I can do about it without a very
large, very ugly patch that alters the architecture of linux. Linux
has loaded an MSR value into the processor and called swapgs, gets a
breakpoint exception, MSR gets changed again and swapped somewhere
else, then hits the next instruction. The triple fault is a GP, SS,
and UD.
This is a case where linux was not designed for a debugger, and to fix
this is a BIG job. Will require lots of changes in places we probably
shouldn't be changing including all exception handlers and possible
removal of the swapgs instruction. This one I will document as a
known limitation of Linux and move on.
There will be no patch unless someone asks me to try to fix this.
Bottom line, linux is debugger hostile and not designed for one. What
tools there are will have problems on linux for debugging until Linus
decides Linux will become a more debugger friendly place. I've
written several commercial operating systems in my 35 years of
programming, and the first item I always write before a kernel,
drivers, or anything else is a debugger. The OS is then built on top
of it.
Linus read a book and decided to write an OS and his system reflects
that -- no thought of debuggers and his development process operates a
lot like a public library. It's not all bad -- look how far he got.
This bug is closed since I know what it is. The probability of this
occurring during normal operations is very low unless you debug and
break between a swapgs function and a rex64 sysret or set a breakpoint
anywhere near this instruction.
Linux Documentation
https://www.kernel.org/doc/Documentation/x86/entry_64.txt
"... Dealing with the swapgs instruction is especially tricky. Swapgs
toggles whether gs is the kernel gs or the user gs. The swapgs
instruction is rather fragile: it must nest perfectly and only in
single depth, it should only be used if entering from user mode to
kernel mode and then when returning to user-space, and precisely
so. If we mess that up even slightly, we crash.
So when we have a secondary entry, already in kernel mode, we *must
not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
not switched/swapped yet. ..."
:-)
Jeff
On 12/17/15, Jeff Merkey <[email protected]> wrote:
> On 12/16/15, Jeff Merkey <[email protected]> wrote:
>> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>>> On Wed, Dec 16, 2015 at 4:31 PM, Jeff Merkey <[email protected]>
>>> wrote:
>>>> On 12/16/15, Andy Lutomirski <[email protected]> wrote:
>>>>> On Dec 16, 2015 3:12 PM, "Jeff Merkey" <[email protected]> wrote:
>>>>>>
>>>>>> Setting a hardware breakpoint at the
>>>>>>
>>>>>> rex64 sysret
>>>>>>
>>>>>> instruction at the end of int_ret_from_sys_call causes the system to
>>>>>> triple fault
>>>>>> and reboot when the breakpoint is triggered. Appears to be related
>>>>>> the same problem
>>>>>> as the lockup.
>>>>>>
>>>>>> This function can be stepped over and traced through with the TRAP
>>>>>> FLAG set so long as a hardware breakpoint is set somewhere in the
>>>>>> function. Otherwise upon exist the system hard hangs. If you break
>>>>>> exactly on that instruction -- reboot. If you break a few
>>>>>> instructions before it and single step through the call it works. If
>>>>>> you step through the call with no breakpoint the system hard hangs.
>>>>>> Same behavior as when you try to step from inside an nmi handler.
>>>>>> Looks related.
>>>>>
>>>>> You're probably encountering the user mode RSP when SYSRET happens.
>>>>>
>>>>> --Andy
>>>>>
>>>>
>>>> Hi Andy,
>>>>
>>>> Could be, but I am getting a double fault message with an error code
>>>> of 0 that then scrolls off the screen when the triple fault hits. It
>>>> flashes too quickly to get the function address -- wish I had a logic
>>>> analyzer with an inverse assembler -- would already be there. A
>>>> usermode RSP would I assume clear TRAP flag and that does not explain
>>>> why it works if I set a breakpoint right above the instruction then
>>>> step over it, which I can without the triple fault.
>>>>
>>>> Easy to reproduce, download the mdb debugger for 4.3.3 and apply it to
>>>> 4.4-rc5, modprobe mdb, echo a > /proc/sysrq_trigger, u
>>>> int_ret_from_syscall (scroll til you get to the swapgs then rex64
>>>> sysret, set a hardware breakpoint at that address , i.e. b
>>>> ffffffff81673ae1 (or whatever address the swapgs instruction is at),
>>>> then step through with t a few times (should just return after rex64
>>>> sysret since it returns to user space). The set a breakpoint at the
>>>> rex64 sysret instruction, b <address>, let it break at the
>>>> instruction, then hit g for go and watch the fireworks -- it will try
>>>> to print a double fault message then reboot.
>>>>
>>>> I handle the whole user RSP thing, I just return if I see regs set to
>>>> user space. This looks like some sort of problem in the exception
>>>> handlers.
>>>
>>> It's kernel regs but user RSP.
>>>
>>> --Andy
>>>
>>
>> right, I handle that case and I have handled that case since about
>> 2001. Used to before all the change I could just step from userspace
>> to kernel space with mdb. Have not been able to do that for while
>> since Linus fixed the VM in about 2002.
>>
>> So I handle that case.
>>
>> Jeff
>>
>
> It looks like that an architectural decision is the result of this bug
> and I don't think there is anything I can do about it without a very
> large, very ugly patch that alters the architecture of linux. Linux
> has loaded an MSR value into the processor and called swapgs, gets a
> breakpoint exception, MSR gets changed again and swapped somewhere
> else, then hits the next instruction. The triple fault is a GP, SS,
> and UD.
>
> This is a case where linux was not designed for a debugger, and to fix
> this is a BIG job. Will require lots of changes in places we probably
> shouldn't be changing including all exception handlers and possible
> removal of the swapgs instruction. This one I will document as a
> known limitation of Linux and move on.
>
> There will be no patch unless someone asks me to try to fix this.
> Bottom line, linux is debugger hostile and not designed for one. What
> tools there are will have problems on linux for debugging until Linus
> decides Linux will become a more debugger friendly place. I've
> written several commercial operating systems in my 35 years of
> programming, and the first item I always write before a kernel,
> drivers, or anything else is a debugger. The OS is then built on top
> of it.
>
> Linus read a book and decided to write an OS and his system reflects
> that -- no thought of debuggers and his development process operates a
> lot like a public library. It's not all bad -- look how far he got.
>
> This bug is closed since I know what it is. The probability of this
> occurring during normal operations is very low unless you debug and
> break between a swapgs function and a rex64 sysret or set a breakpoint
> anywhere near this instruction.
>
> Linux Documentation
>
> https://www.kernel.org/doc/Documentation/x86/entry_64.txt
>
> "... Dealing with the swapgs instruction is especially tricky. Swapgs
> toggles whether gs is the kernel gs or the user gs. The swapgs
> instruction is rather fragile: it must nest perfectly and only in
> single depth, it should only be used if entering from user mode to
> kernel mode and then when returning to user-space, and precisely
> so. If we mess that up even slightly, we crash.
>
> So when we have a secondary entry, already in kernel mode, we *must
> not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
> not switched/swapped yet. ..."
>
> :-)
>
> Jeff
>
Added to the MDB website and project pages to explain this problem.
Limitations of Linux with Kernel Debuggers
Linux was not architected to support kernel debuggers and there are
several areas of Linux which are blacked out to kernel debuggers due
to how Linux is designed. Linux uses the swapgs instruction in x86_64
mode to swap gs frames between user space and kernel space
transitions. You can set breakpoints on and around a swapgs
instruction, however, the system may crash due to how the instruction
works if you attempt to step between user space and kernel space after
this instruction has been executed up to the instruction that performs
a sysret. This is a very rare instance that typically will not be
encountered but don't try to step over a section of code with a swapgs
instruction that subsequently calls some sort of system return. On
Linux you will see something like this in the disassembly:
swapgs
rex64 sysret
Don't try to step in between these two instructions. It's safe to do
so after the sysret executes but not between them. Debugging NMI
handlers in Linux can be done but the system may not be recoverable
after you have debugged these sections of code in the NMI handlers due
to a how Linux designed it's NMI callbacks. If you want to debug linux
without many of these limitations use MDB in Direct Mode when you
compile it. Direct Mode allows MDB to take control of the debugger
hardware from the operating system and removes many of the blacked out
areas of the operating system and allows you to debug them. Direct
Mode will not help you with the swapgs instruction problem however.