2014-04-01 15:13:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86,mm: delay TLB flush after clearing accessed bit

On Mon, Mar 31, 2014 at 8:34 AM, Rik van Riel <[email protected]> wrote:
>
> However, clearing the accessed bit does not lead to any
> consistency issues, there is no reason to flush the TLB
> immediately. The TLB flush can be deferred until some
> later point in time.

Ugh. I absolutely detest this patch.

If we're going to leave the TLB dirty, then dammit, leave it dirty.
Don't play some half-way games.

Here's the patch you should just try:

int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
return ptep_test_and_clear_young(vma, address, ptep);
}

instead of complicating things.

Rationale: if the working set is so big that we start paging things
out, we sure as hell don't need to worry about TLB flushing. It will
flush itself.

And conversely - if it doesn't flush itself, and something stays
marked as "accessed" in the TLB for a long time even though we've
cleared it in the page tables, we don't care, because clearly there
isn't enough memory pressure for the accessed bit to matter.

Linus


2014-04-01 16:11:46

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] x86,mm: delay TLB flush after clearing accessed bit

On 04/01/2014 11:13 AM, Linus Torvalds wrote:
> On Mon, Mar 31, 2014 at 8:34 AM, Rik van Riel <[email protected]> wrote:
>>
>> However, clearing the accessed bit does not lead to any
>> consistency issues, there is no reason to flush the TLB
>> immediately. The TLB flush can be deferred until some
>> later point in time.
>
> Ugh. I absolutely detest this patch.
>
> If we're going to leave the TLB dirty, then dammit, leave it dirty.
> Don't play some half-way games.
>
> Here's the patch you should just try:
>
> int ptep_clear_flush_young(struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep)
> {
> return ptep_test_and_clear_young(vma, address, ptep);
> }
>
> instead of complicating things.
>
> Rationale: if the working set is so big that we start paging things
> out, we sure as hell don't need to worry about TLB flushing. It will
> flush itself.
>
> And conversely - if it doesn't flush itself, and something stays
> marked as "accessed" in the TLB for a long time even though we've
> cleared it in the page tables, we don't care, because clearly there
> isn't enough memory pressure for the accessed bit to matter.

That was my initial feeling too, when this kind of patch first
came up, a few years ago.

However, the more I think about it, the less I am convinced it
is actually true.

Memory pressure is not necessarily caused by the same process
whose accessed bit we just cleared. Memory pressure may not
even be caused by any process's virtual memory at all, but it
could be caused by the page cache.

With 2MB pages, a reasonably sized process could fit in the
TLB quite easily. Having its accessed bits not make it to the
page table while its pages are on the inactive list could
cause it to get paged out, due to memory pressure from another,
larger process.

I have no particular preference for this implementation, and am
willing to implement any other idea for batching the TLB shootdowns
that are due to pageout scanning.

--
All rights reversed

2014-04-01 16:21:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86,mm: delay TLB flush after clearing accessed bit

On Tue, Apr 1, 2014 at 9:11 AM, Rik van Riel <[email protected]> wrote:
>
> Memory pressure is not necessarily caused by the same process
> whose accessed bit we just cleared. Memory pressure may not
> even be caused by any process's virtual memory at all, but it
> could be caused by the page cache.

If we have that much memory pressure on the page cache without having
any memory pressure on the actual VM space, then the swap-out activity
will never be an issue anyway.

IOW, I think all these scenarios are made-up. I'd much rather go for
simpler implementation, and make things more complex only in the
presence of numbers. Of which we have none.

Linus

2014-04-01 18:31:45

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] x86,mm: delay TLB flush after clearing accessed bit

On 04/01/2014 12:21 PM, Linus Torvalds wrote:
> On Tue, Apr 1, 2014 at 9:11 AM, Rik van Riel <[email protected]> wrote:
>>
>> Memory pressure is not necessarily caused by the same process
>> whose accessed bit we just cleared. Memory pressure may not
>> even be caused by any process's virtual memory at all, but it
>> could be caused by the page cache.
>
> If we have that much memory pressure on the page cache without having
> any memory pressure on the actual VM space, then the swap-out activity
> will never be an issue anyway.
>
> IOW, I think all these scenarios are made-up. I'd much rather go for
> simpler implementation, and make things more complex only in the
> presence of numbers. Of which we have none.

We've been bitten by the lack of a properly tracked accessed
bit before, but admittedly that was with the KVM code and EPT.

I'll add my Acked-by: to Shaohua's original patch then, and
will keep my eyes open for any problems that may or may not
materialize...

Shaohua?

--
All rights reversed

2014-04-02 07:07:56

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH] x86,mm: delay TLB flush after clearing accessed bit

On Tue, Apr 01, 2014 at 02:31:31PM -0400, Rik van Riel wrote:
> On 04/01/2014 12:21 PM, Linus Torvalds wrote:
> > On Tue, Apr 1, 2014 at 9:11 AM, Rik van Riel <[email protected]> wrote:
> >>
> >> Memory pressure is not necessarily caused by the same process
> >> whose accessed bit we just cleared. Memory pressure may not
> >> even be caused by any process's virtual memory at all, but it
> >> could be caused by the page cache.
> >
> > If we have that much memory pressure on the page cache without having
> > any memory pressure on the actual VM space, then the swap-out activity
> > will never be an issue anyway.
> >
> > IOW, I think all these scenarios are made-up. I'd much rather go for
> > simpler implementation, and make things more complex only in the
> > presence of numbers. Of which we have none.
>
> We've been bitten by the lack of a properly tracked accessed
> bit before, but admittedly that was with the KVM code and EPT.
>
> I'll add my Acked-by: to Shaohua's original patch then, and
> will keep my eyes open for any problems that may or may not
> materialize...
>
> Shaohua?

I'd agree to choose the simple implementation at current stage and check if
there are problems really.

Andrew,
can you please pick up my orginal patch "x86: clearing access bit don't
flush tlb" (with Rik's Ack)? Or I can resend it if you preferred.

Thanks,
Shaohua

2014-04-02 07:46:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86,mm: delay TLB flush after clearing accessed bit


* Shaohua Li <[email protected]> wrote:

> On Tue, Apr 01, 2014 at 02:31:31PM -0400, Rik van Riel wrote:
> > On 04/01/2014 12:21 PM, Linus Torvalds wrote:
> > > On Tue, Apr 1, 2014 at 9:11 AM, Rik van Riel <[email protected]> wrote:
> > >>
> > >> Memory pressure is not necessarily caused by the same process
> > >> whose accessed bit we just cleared. Memory pressure may not
> > >> even be caused by any process's virtual memory at all, but it
> > >> could be caused by the page cache.
> > >
> > > If we have that much memory pressure on the page cache without having
> > > any memory pressure on the actual VM space, then the swap-out activity
> > > will never be an issue anyway.
> > >
> > > IOW, I think all these scenarios are made-up. I'd much rather go for
> > > simpler implementation, and make things more complex only in the
> > > presence of numbers. Of which we have none.
> >
> > We've been bitten by the lack of a properly tracked accessed
> > bit before, but admittedly that was with the KVM code and EPT.
> >
> > I'll add my Acked-by: to Shaohua's original patch then, and
> > will keep my eyes open for any problems that may or may not
> > materialize...
> >
> > Shaohua?
>
> I'd agree to choose the simple implementation at current stage and check if
> there are problems really.
>
> Andrew,
> can you please pick up my orginal patch "x86: clearing access bit don't
> flush tlb" (with Rik's Ack)? Or I can resend it if you preferred.

Please resend it so I can pick it up for this cycle, that approach
obviously looks good.

Thanks,

Ingo