2021-08-23 08:07:04

by Huang, Ying

[permalink] [raw]
Subject: Re: [RFC 20/20] mm/rmap: avoid potential races

Hi, Nadav,

Nadav Amit <[email protected]> writes:

> From: Nadav Amit <[email protected]>
>
> flush_tlb_batched_pending() appears to have a theoretical race:
> tlb_flush_batched is being cleared after the TLB flush, and if in
> between another core calls set_tlb_ubc_flush_pending() and sets the
> pending TLB flush indication, this indication might be lost. Holding the
> page-table lock when SPLIT_LOCK is set cannot eliminate this race.

Recently, when I read the corresponding code, I find the exact same race
too. Do you still think the race is possible at least in theory? If
so, why hasn't your fix been merged?

> The current batched TLB invalidation scheme therefore does not seem
> viable or easily repairable.

I have some idea to fix this without too much code. If necessary, I
will send it out.

Best Regards,
Huang, Ying


2021-08-23 15:51:30

by Nadav Amit

[permalink] [raw]
Subject: Re: [RFC 20/20] mm/rmap: avoid potential races



> On Aug 23, 2021, at 1:05 AM, Huang, Ying <[email protected]> wrote:
>
> Hi, Nadav,
>
> Nadav Amit <[email protected]> writes:
>
>> From: Nadav Amit <[email protected]>
>>
>> flush_tlb_batched_pending() appears to have a theoretical race:
>> tlb_flush_batched is being cleared after the TLB flush, and if in
>> between another core calls set_tlb_ubc_flush_pending() and sets the
>> pending TLB flush indication, this indication might be lost. Holding the
>> page-table lock when SPLIT_LOCK is set cannot eliminate this race.
>
> Recently, when I read the corresponding code, I find the exact same race
> too. Do you still think the race is possible at least in theory? If
> so, why hasn't your fix been merged?

I think the race is possible. It didn’t get merged, IIRC, due to some
addressable criticism and lack of enthusiasm from other people, and
my laziness/busy-ness.

>
>> The current batched TLB invalidation scheme therefore does not seem
>> viable or easily repairable.
>
> I have some idea to fix this without too much code. If necessary, I
> will send it out.

Arguably, it would be preferable to have a small back-portable fix for
this issue specifically. Just try to ensure that you do not introduce
performance overheads. Any solution should be clear about its impact
on additional TLB flushes on the worst-case scenario and the number
of additional atomic operations that would be required.

2021-08-24 00:37:17

by Huang, Ying

[permalink] [raw]
Subject: Re: [RFC 20/20] mm/rmap: avoid potential races

Nadav Amit <[email protected]> writes:

>> On Aug 23, 2021, at 1:05 AM, Huang, Ying <[email protected]> wrote:
>>
>> Hi, Nadav,
>>
>> Nadav Amit <[email protected]> writes:
>>
>>> From: Nadav Amit <[email protected]>
>>>
>>> flush_tlb_batched_pending() appears to have a theoretical race:
>>> tlb_flush_batched is being cleared after the TLB flush, and if in
>>> between another core calls set_tlb_ubc_flush_pending() and sets the
>>> pending TLB flush indication, this indication might be lost. Holding the
>>> page-table lock when SPLIT_LOCK is set cannot eliminate this race.
>>
>> Recently, when I read the corresponding code, I find the exact same race
>> too. Do you still think the race is possible at least in theory? If
>> so, why hasn't your fix been merged?
>
> I think the race is possible. It didn’t get merged, IIRC, due to some
> addressable criticism and lack of enthusiasm from other people, and
> my laziness/busy-ness.

Got it! Thanks your information!

>>> The current batched TLB invalidation scheme therefore does not seem
>>> viable or easily repairable.
>>
>> I have some idea to fix this without too much code. If necessary, I
>> will send it out.
>
> Arguably, it would be preferable to have a small back-portable fix for
> this issue specifically. Just try to ensure that you do not introduce
> performance overheads. Any solution should be clear about its impact
> on additional TLB flushes on the worst-case scenario and the number
> of additional atomic operations that would be required.

Sure. Will do that.

Best Regards,
Huang, Ying