2010-11-03 06:44:20

by Shaohua Li

[permalink] [raw]
Subject: [RFC 0/4]x86: allocate up to 32 tlb invalidate vectors

Hi,
In workload with heavy page reclaim, flush_tlb_page() is frequently
used. We currently have 8 vectors for tlb flush, which is fine for small
machines. But for big machines with a lot of CPUs, the 8 vectors are
shared by all CPUs and we need lock to protect them. This will cause a
lot of lock contentions. please see the patch 3 for detailed number of
the lock contention.
Andi Kleen suggests we can use 32 vectors for tlb flush, which should be
fine for even 8 socket machines. Test shows this reduces lock contention
dramatically (see patch 3 for number).
One might argue if this will waste too many vectors and leave less
vectors for devices. This could be a problem. But even we use 32
vectors, we still leave 78 vectors for devices. And we now have per-cpu
vector, vector isn't scarce any more, but I'm open if anybody has
objections.

Thanks,
Shaohua


2010-11-15 14:02:50

by Shaohua Li

[permalink] [raw]
Subject: Re: [RFC 0/4]x86: allocate up to 32 tlb invalidate vectors

On Wed, 2010-11-03 at 14:44 +0800, Shaohua Li wrote:
> Hi,
> In workload with heavy page reclaim, flush_tlb_page() is frequently
> used. We currently have 8 vectors for tlb flush, which is fine for small
> machines. But for big machines with a lot of CPUs, the 8 vectors are
> shared by all CPUs and we need lock to protect them. This will cause a
> lot of lock contentions. please see the patch 3 for detailed number of
> the lock contention.
> Andi Kleen suggests we can use 32 vectors for tlb flush, which should be
> fine for even 8 socket machines. Test shows this reduces lock contention
> dramatically (see patch 3 for number).
> One might argue if this will waste too many vectors and leave less
> vectors for devices. This could be a problem. But even we use 32
> vectors, we still leave 78 vectors for devices. And we now have per-cpu
> vector, vector isn't scarce any more, but I'm open if anybody has
> objections.
>
Hi Ingo & hpa, any comments about this series?

Thanks,
Shaohua

2010-11-15 17:54:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC 0/4]x86: allocate up to 32 tlb invalidate vectors

On 11/15/2010 06:02 AM, Shaohua Li wrote:
> On Wed, 2010-11-03 at 14:44 +0800, Shaohua Li wrote:
>> Hi,
>> In workload with heavy page reclaim, flush_tlb_page() is frequently
>> used. We currently have 8 vectors for tlb flush, which is fine for small
>> machines. But for big machines with a lot of CPUs, the 8 vectors are
>> shared by all CPUs and we need lock to protect them. This will cause a
>> lot of lock contentions. please see the patch 3 for detailed number of
>> the lock contention.
>> Andi Kleen suggests we can use 32 vectors for tlb flush, which should be
>> fine for even 8 socket machines. Test shows this reduces lock contention
>> dramatically (see patch 3 for number).
>> One might argue if this will waste too many vectors and leave less
>> vectors for devices. This could be a problem. But even we use 32
>> vectors, we still leave 78 vectors for devices. And we now have per-cpu
>> vector, vector isn't scarce any more, but I'm open if anybody has
>> objections.
>>
> Hi Ingo & hpa, any comments about this series?
>

Hi Shaohua,

It looks good... I need to do a more thorough review and put it in; I
just have been consumed a bit too much by a certain internal project.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.