Hi Peter,
I have emails from you dating from a few years back unofficially stating
that it's OK to update the first byte of an instruction with a single-byte
int3 concurrently:
https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
It is referred in the original implementation of text_poke_bp():
commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")
Olivier Dion is working on the libpatch [1,2] project aiming to use this
property for low-latency/low-overhead live code patching in user-space as
well, but we cannot find an official statement from Intel that guarantees
this breakpoint-bypass technique is indeed OK without stopping the world
while patching.
Do you know where I could find an official statement of this guarantee ?
Thanks!
Mathieu
[1] https://www.dorsal.polymtl.ca/files/may2022/odion_may2022_libpatch_binary_patcher.pdf
[2] https://git.sr.ht/~old/libpatch
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
On Tue, 21 Feb 2023 11:44:42 -0500
Mathieu Desnoyers <[email protected]> wrote:
> Hi Peter,
>
> I have emails from you dating from a few years back unofficially stating
> that it's OK to update the first byte of an instruction with a single-byte
> int3 concurrently:
>
> https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>
> It is referred in the original implementation of text_poke_bp():
> commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")
>
> Olivier Dion is working on the libpatch [1,2] project aiming to use this
> property for low-latency/low-overhead live code patching in user-space as
> well, but we cannot find an official statement from Intel that guarantees
> this breakpoint-bypass technique is indeed OK without stopping the world
> while patching.
>
> Do you know where I could find an official statement of this guarantee ?
>
The fact that we have been using it for over 10 years without issue should
be a good guarantee ;-)
I know you probably prefer an official statement, and I thought they
actually gave one, but can't seem to find it. Anyway. how does the dynamic
linker do this? Doesn't it update code on the fly as well?
-- Steve
On 2023-02-21 12:50, Steven Rostedt wrote:
> On Tue, 21 Feb 2023 11:44:42 -0500
> Mathieu Desnoyers <[email protected]> wrote:
>
>> Hi Peter,
>>
>> I have emails from you dating from a few years back unofficially stating
>> that it's OK to update the first byte of an instruction with a single-byte
>> int3 concurrently:
>>
>> https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>>
>> It is referred in the original implementation of text_poke_bp():
>> commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")
>>
>> Olivier Dion is working on the libpatch [1,2] project aiming to use this
>> property for low-latency/low-overhead live code patching in user-space as
>> well, but we cannot find an official statement from Intel that guarantees
>> this breakpoint-bypass technique is indeed OK without stopping the world
>> while patching.
>>
>> Do you know where I could find an official statement of this guarantee ?
>>
>
> The fact that we have been using it for over 10 years without issue should
> be a good guarantee ;-)
>
> I know you probably prefer an official statement, and I thought they
> actually gave one, but can't seem to find it.
I recall an in-person discussion with Peter Anvin shortly after he got
the official confirmation, but I cannot find any public trace of it. I
suspect Intel may have documented this internally only.
Anyway. how does the dynamic
> linker do this? Doesn't it update code on the fly as well?
The dynamic linker is similar to the module loader in the kernel: the
code modification is done before the loaded code is ever executed, and
is therefore inherently safe with respect to cross-modification of
concurrently executing code.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
On Tue, Feb 21, 2023 at 01:42:58PM -0500, Mathieu Desnoyers wrote:
> On 2023-02-21 12:50, Steven Rostedt wrote:
> > On Tue, 21 Feb 2023 11:44:42 -0500
> > Mathieu Desnoyers <[email protected]> wrote:
> >
> > > Hi Peter,
> > >
> > > I have emails from you dating from a few years back unofficially stating
> > > that it's OK to update the first byte of an instruction with a single-byte
> > > int3 concurrently:
> > >
> > > https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
> > >
> > > It is referred in the original implementation of text_poke_bp():
> > > commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")
> > >
> > > Olivier Dion is working on the libpatch [1,2] project aiming to use this
> > > property for low-latency/low-overhead live code patching in user-space as
> > > well, but we cannot find an official statement from Intel that guarantees
> > > this breakpoint-bypass technique is indeed OK without stopping the world
> > > while patching.
> > >
> > > Do you know where I could find an official statement of this guarantee ?
> > >
> >
> > The fact that we have been using it for over 10 years without issue should
> > be a good guarantee ;-)
> >
> > I know you probably prefer an official statement, and I thought they
> > actually gave one, but can't seem to find it.
>
> I recall an in-person discussion with Peter Anvin shortly after he got the
> official confirmation, but I cannot find any public trace of it. I suspect
> Intel may have documented this internally only.
My 2ct, ISTR this also having been vetted by AMD, perhaps they did
manage to write it down somewhere.
On 2023-02-22 04:20, Peter Zijlstra wrote:
> On Tue, Feb 21, 2023 at 01:42:58PM -0500, Mathieu Desnoyers wrote:
>> On 2023-02-21 12:50, Steven Rostedt wrote:
>>> On Tue, 21 Feb 2023 11:44:42 -0500
>>> Mathieu Desnoyers <[email protected]> wrote:
>>>
>>>> Hi Peter,
>>>>
>>>> I have emails from you dating from a few years back unofficially stating
>>>> that it's OK to update the first byte of an instruction with a single-byte
>>>> int3 concurrently:
>>>>
>>>> https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
>>>>
>>>> It is referred in the original implementation of text_poke_bp():
>>>> commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")
>>>>
>>>> Olivier Dion is working on the libpatch [1,2] project aiming to use this
>>>> property for low-latency/low-overhead live code patching in user-space as
>>>> well, but we cannot find an official statement from Intel that guarantees
>>>> this breakpoint-bypass technique is indeed OK without stopping the world
>>>> while patching.
>>>>
>>>> Do you know where I could find an official statement of this guarantee ?
>>>>
>>>
>>> The fact that we have been using it for over 10 years without issue should
>>> be a good guarantee ;-)
>>>
>>> I know you probably prefer an official statement, and I thought they
>>> actually gave one, but can't seem to find it.
>>
>> I recall an in-person discussion with Peter Anvin shortly after he got the
>> official confirmation, but I cannot find any public trace of it. I suspect
>> Intel may have documented this internally only.
>
> My 2ct, ISTR this also having been vetted by AMD, perhaps they did
> manage to write it down somewhere.
Good point! I did not find a statement specifically about the breakpoint
bypass, but by piecing up together the explanations from their manual, I
think we can conclude that it is safe:
Based on AMD64 Architecture Programmer’s Manual Volume 2
7.6.1 Cache Organization and Operation
Cross-Modifying Code
The subsection "Asynchronous modification" describes in detail what
happens if we concurrently update an instruction that is concurrently
executed. The good news is that there is no mention of an evil Boeman
triggering any kind of general protection fault when updating
instructions concurrently with their execution. So inserting a
single-byte breakpoint as first byte of an instruction is just the
simplest scenario covered by that section:
"Such modifications must be done via a single store to the target
thread's instruction stream that is contained entirely within a
naturally-aligned quadword, and is subject to the constraints given
here. A key aspect is that, although the store is performed atomically,
the affected quadword may be read more than once in the process of
extracting instruction bytes from it. This can result in the following
scenarios resulting from a single store:
[...]
2. A modification to one instruction A that changes it to two
instructions A'-B will only result in execution of A'-B.
[...]"
Then there is the "Synchronous modification" section which basically
describes how serializing instructions can be issued before proceeding
to execute the modified instructions.
So AFAIU the XMC breakpoint insertion without stopping the world is
covered by AMD's "Asynchronous modification" section, and the rest of
the breakpoint-bypass technique using serializing instructions relying
on IPIs in the kernel, and on membarrier sync-core in userspace, is
guaranteed by the "Synchronous modification" section.
Unfortunately I cannot find anything with respect to asynchronous
cross-modification of code stated as clearly in Intel's documentation.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com