Cleanup percpu_counter code and fix some bugs. The main purpose is to convert
percpu_counter to use atomic64, which is useful for workloads which cause
percpu_counter->lock contented. In a workload I tested, the atomic method is
50x faster (please see patch 4 for detail).
patch 1&2: clean up
patch 3: fix bug of percpu_counter for 32-bit systems.
patch 4: convert percpu_counter to use atomic64
On Wed, 13 Apr 2011, [email protected] wrote:
> Cleanup percpu_counter code and fix some bugs. The main purpose is to convert
> percpu_counter to use atomic64, which is useful for workloads which cause
> percpu_counter->lock contented. In a workload I tested, the atomic method is
> 50x faster (please see patch 4 for detail).
Could you post your test and the results please?
On Wed, 2011-04-13 at 22:08 +0800, Christoph Lameter wrote:
> On Wed, 13 Apr 2011, [email protected] wrote:
>
> > Cleanup percpu_counter code and fix some bugs. The main purpose is to convert
> > percpu_counter to use atomic64, which is useful for workloads which cause
> > percpu_counter->lock contented. In a workload I tested, the atomic method is
> > 50x faster (please see patch 4 for detail).
>
> Could you post your test and the results please?
the test is very simple, 24 processes in 24 CPU, and each does:
while (1) {
mmap(128M);
munmap(128M)
}
we then measure how many loops the process can do.
I'll attach the test in next post.
Just found when I said 50x faster, I actually forgot one other patch's
effect, which is http://marc.info/?l=linux-kernel&m=130127782901127&w=2.
If only having the atomic change, it's about 7x faster. Sorry about
this. I'll add detail data in next post.