Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754102AbXHVBMi (ORCPT ); Tue, 21 Aug 2007 21:12:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752529AbXHVBM2 (ORCPT ); Tue, 21 Aug 2007 21:12:28 -0400 Received: from tomts22-srv.bellnexxia.net ([209.226.175.184]:44269 "EHLO tomts22-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751124AbXHVBM1 (ORCPT ); Tue, 21 Aug 2007 21:12:27 -0400 Date: Tue, 21 Aug 2007 21:12:25 -0400 From: Mathieu Desnoyers To: Christoph Lameter Cc: Andi Kleen , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, mingo@redhat.com Subject: Re: [PATCH] SLUB use cmpxchg_local Message-ID: <20070822011225.GA4124@Krystal> References: <20070821231216.GA29691@Krystal> <20070821233938.GD29691@Krystal> <20070821234702.GE29691@Krystal> <20070822000323.GG29691@Krystal> <20070822003834.GB1400@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 21:11:29 up 23 days, 1:30, 3 users, load average: 0.20, 0.23, 0.25 User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5835 Lines: 118 * Christoph Lameter (clameter@sgi.com) wrote: > Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz > (hyperthreading enabled). Test run with your module show only minor > performance improvements and lots of regressions. So we must have > cmpxchg_local to see any improvements? Some kind of a recent optimization > of cmpxchg performance that we do not see on older cpus? > I did not expect the cmpxchg with LOCK prefix to be faster than irq save/restore. You will need to run these tests using cmpxchg_local to see an improvement. Mathieu > > Code of kmem_cache_alloc (to show you that there are no debug options on): > > Dump of assembler code for function kmem_cache_alloc: > 0x4015cfa9 : push %ebp > 0x4015cfaa : mov %esp,%ebp > 0x4015cfac : push %edi > 0x4015cfad : push %esi > 0x4015cfae : push %ebx > 0x4015cfaf : sub $0x10,%esp > 0x4015cfb2 : mov %eax,%esi > 0x4015cfb4 : mov %edx,0xffffffe8(%ebp) > 0x4015cfb7 : mov 0x4(%ebp),%eax > 0x4015cfba : mov %eax,0xfffffff0(%ebp) > 0x4015cfbd : mov %fs:0x404af008,%eax > 0x4015cfc3 : mov 0x90(%esi,%eax,4),%edi > 0x4015cfca : mov (%edi),%ecx > 0x4015cfcc : test %ecx,%ecx > 0x4015cfce : je 0x4015d00a > 0x4015cfd0 : mov 0xc(%edi),%eax > 0x4015cfd3 : mov (%ecx,%eax,4),%eax > 0x4015cfd6 : mov %eax,%edx > 0x4015cfd8 : mov %ecx,%eax > 0x4015cfda : lock cmpxchg %edx,(%edi) > 0x4015cfde : mov %eax,%ebx > 0x4015cfe0 : cmp %ecx,%eax > 0x4015cfe2 : jne 0x4015cfbd > 0x4015cfe4 : cmpw $0x0,0xffffffe8(%ebp) > 0x4015cfe9 : jns 0x4015d006 > 0x4015cfeb : mov 0x10(%edi),%edx > 0x4015cfee : xor %eax,%eax > 0x4015cff0 : mov %edx,%ecx > 0x4015cff2 : shr $0x2,%ecx > 0x4015cff5 : mov %ebx,%edi > > Base > > 1. Kmalloc: Repeatedly allocate then free test > 10000 times kmalloc(8) -> 332 cycles kfree -> 422 cycles > 10000 times kmalloc(16) -> 218 cycles kfree -> 360 cycles > 10000 times kmalloc(32) -> 214 cycles kfree -> 368 cycles > 10000 times kmalloc(64) -> 244 cycles kfree -> 390 cycles > 10000 times kmalloc(128) -> 320 cycles kfree -> 417 cycles > 10000 times kmalloc(256) -> 438 cycles kfree -> 550 cycles > 10000 times kmalloc(512) -> 527 cycles kfree -> 626 cycles > 10000 times kmalloc(1024) -> 678 cycles kfree -> 775 cycles > 10000 times kmalloc(2048) -> 748 cycles kfree -> 822 cycles > 10000 times kmalloc(4096) -> 641 cycles kfree -> 650 cycles > 10000 times kmalloc(8192) -> 741 cycles kfree -> 817 cycles > 10000 times kmalloc(16384) -> 872 cycles kfree -> 927 cycles > 2. Kmalloc: alloc/free test > 10000 times kmalloc(8)/kfree -> 332 cycles > 10000 times kmalloc(16)/kfree -> 327 cycles > 10000 times kmalloc(32)/kfree -> 323 cycles > 10000 times kmalloc(64)/kfree -> 320 cycles > 10000 times kmalloc(128)/kfree -> 320 cycles > 10000 times kmalloc(256)/kfree -> 333 cycles > 10000 times kmalloc(512)/kfree -> 332 cycles > 10000 times kmalloc(1024)/kfree -> 330 cycles > 10000 times kmalloc(2048)/kfree -> 334 cycles > 10000 times kmalloc(4096)/kfree -> 674 cycles > 10000 times kmalloc(8192)/kfree -> 1155 cycles > 10000 times kmalloc(16384)/kfree -> 1226 cycles > > Slub cmpxchg. > > 1. Kmalloc: Repeatedly allocate then free test > 10000 times kmalloc(8) -> 296 cycles kfree -> 515 cycles > 10000 times kmalloc(16) -> 193 cycles kfree -> 412 cycles > 10000 times kmalloc(32) -> 188 cycles kfree -> 422 cycles > 10000 times kmalloc(64) -> 222 cycles kfree -> 441 cycles > 10000 times kmalloc(128) -> 292 cycles kfree -> 476 cycles > 10000 times kmalloc(256) -> 414 cycles kfree -> 589 cycles > 10000 times kmalloc(512) -> 513 cycles kfree -> 673 cycles > 10000 times kmalloc(1024) -> 694 cycles kfree -> 825 cycles > 10000 times kmalloc(2048) -> 739 cycles kfree -> 878 cycles > 10000 times kmalloc(4096) -> 636 cycles kfree -> 653 cycles > 10000 times kmalloc(8192) -> 715 cycles kfree -> 799 cycles > 10000 times kmalloc(16384) -> 855 cycles kfree -> 927 cycles > 2. Kmalloc: alloc/free test > 10000 times kmalloc(8)/kfree -> 354 cycles > 10000 times kmalloc(16)/kfree -> 336 cycles > 10000 times kmalloc(32)/kfree -> 335 cycles > 10000 times kmalloc(64)/kfree -> 337 cycles > 10000 times kmalloc(128)/kfree -> 337 cycles > 10000 times kmalloc(256)/kfree -> 355 cycles > 10000 times kmalloc(512)/kfree -> 354 cycles > 10000 times kmalloc(1024)/kfree -> 337 cycles > 10000 times kmalloc(2048)/kfree -> 339 cycles > 10000 times kmalloc(4096)/kfree -> 674 cycles > 10000 times kmalloc(8192)/kfree -> 1128 cycles > 10000 times kmalloc(16384)/kfree -> 1240 cycles > > -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/