Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932482Ab0DGGj1 (ORCPT ); Wed, 7 Apr 2010 02:39:27 -0400 Received: from mail-bw0-f209.google.com ([209.85.218.209]:50001 "EHLO mail-bw0-f209.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932071Ab0DGGjU (ORCPT ); Wed, 7 Apr 2010 02:39:20 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=Yy21npjh1f+WWRfgBKGiV5iIP2J5PEEMNhT8Dqs0H+qbtjkpFK2ptpf/AnGxzYbKYw 4o2SSttKAtB/HuRaRsGBUaarytTbixvygGaC7VZ5URiMDaizN2Bwm2/eT3lUJ4AU3n06 zJ9SXd3oTZHwBsTyk52+aq3PvQz+QjhxvU17s= Subject: Re: hackbench regression due to commit 9dfc6e68bfe6e From: Eric Dumazet To: "Zhang, Yanmin" Cc: Christoph Lameter , netdev , Tejun Heo , Pekka Enberg , alex.shi@intel.com, "linux-kernel@vger.kernel.org" , "Ma, Ling" , "Chen, Tim C" , Andrew Morton In-Reply-To: <1270607668.2078.259.camel@ymzhang.sh.intel.com> References: <1269506457.4513.141.camel@alexs-hp.sh.intel.com> <1269570902.9614.92.camel@alexs-hp.sh.intel.com> <1270114166.2078.107.camel@ymzhang.sh.intel.com> <1270195589.2078.116.camel@ymzhang.sh.intel.com> <4BBA8DF9.8010409@kernel.org> <1270542497.2078.123.camel@ymzhang.sh.intel.com> <1270591841.2091.170.camel@edumazet-laptop> <1270607668.2078.259.camel@ymzhang.sh.intel.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Apr 2010 08:39:12 +0200 Message-ID: <1270622352.2091.702.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5949 Lines: 106 Le mercredi 07 avril 2010 à 10:34 +0800, Zhang, Yanmin a écrit : > I collected retired instruction, dtlb miss and LLC miss. > Below is data of LLC miss. > > Kernel 2.6.33: > # Samples: 11639436896 LLC-load-misses > # > # Overhead Command Shared Object Symbol > # ........ ............... ...................................................... ...... > # > 20.94% hackbench [kernel.kallsyms] [k] copy_user_generic_string > 14.56% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg > 12.88% hackbench [kernel.kallsyms] [k] kfree > 7.37% hackbench [kernel.kallsyms] [k] kmem_cache_free > 7.18% hackbench [kernel.kallsyms] [k] kmem_cache_alloc_node > 6.78% hackbench [kernel.kallsyms] [k] kfree_skb > 6.27% hackbench [kernel.kallsyms] [k] __kmalloc_node_track_caller > 2.73% hackbench [kernel.kallsyms] [k] __slab_free > 2.21% hackbench [kernel.kallsyms] [k] get_partial_node > 2.01% hackbench [kernel.kallsyms] [k] _raw_spin_lock > 1.59% hackbench [kernel.kallsyms] [k] schedule > 1.27% hackbench hackbench [.] receiver > 0.99% hackbench libpthread-2.9.so [.] __read > 0.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg > > > > > Kernel 2.6.34-rc3: > # Samples: 13079611308 LLC-load-misses > # > # Overhead Command Shared Object Symbol > # ........ ............... .................................................................... ...... > # > 18.55% hackbench [kernel.kallsyms] [k] copy_user_generic_str > ing > 13.19% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg > 11.62% hackbench [kernel.kallsyms] [k] kfree > 8.54% hackbench [kernel.kallsyms] [k] kmem_cache_free > 7.88% hackbench [kernel.kallsyms] [k] __kmalloc_node_track_ > caller > 6.54% hackbench [kernel.kallsyms] [k] kmem_cache_alloc_node > 5.94% hackbench [kernel.kallsyms] [k] kfree_skb > 3.48% hackbench [kernel.kallsyms] [k] __slab_free > 2.15% hackbench [kernel.kallsyms] [k] _raw_spin_lock > 1.83% hackbench [kernel.kallsyms] [k] schedule > 1.82% hackbench [kernel.kallsyms] [k] get_partial_node > 1.59% hackbench hackbench [.] receiver > 1.37% hackbench libpthread-2.9.so [.] __read > > Please check values of /proc/sys/net/core/rmem_default and /proc/sys/net/core/wmem_default on your machines. Their values can also change hackbench results, because increasing wmem_default allows af_unix senders to consume much more skbs and stress slab allocators (__slab_free), way beyond slub_min_order can tune them. When 2000 senders are running (and 2000 receivers), we might consume something like 2000 * 100.000 bytes of kernel memory for skbs. TLB trashing is expected, because all these skbs can span many 2MB pages. Maybe some node imbalance happens too. You could try to boot your machine with less ram per node and check : # cat /proc/buddyinfo Node 0, zone DMA 2 1 2 2 1 1 1 0 1 1 3 Node 0, zone DMA32 219 298 143 584 145 57 44 41 31 26 517 Node 1, zone DMA32 4 1 17 1 0 3 2 2 2 2 123 Node 1, zone Normal 126 169 83 8 7 5 59 59 49 28 459 One experiment on your Nehalem machine would be to change hackbench so that each group (20 senders/ 20 receivers) run on a particular NUMA node. x86info -c -> CPU #1 EFamily: 0 EModel: 1 Family: 6 Model: 26 Stepping: 5 CPU Model: Core i7 (Nehalem) Processor name string: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz Type: 0 (Original OEM) Brand: 0 (Unsupported) Number of cores per physical package=8 Number of logical processors per socket=16 Number of logical processors per core=2 APIC ID: 0x10 Package: 0 Core: 1 SMT ID 0 Cache info L1 Instruction cache: 32KB, 4-way associative. 64 byte line size. L1 Data cache: 32KB, 8-way associative. 64 byte line size. L2 (MLC): 256KB, 8-way associative. 64 byte line size. TLB info Data TLB: 4KB pages, 4-way associative, 64 entries 64 byte prefetching. Found unknown cache descriptors: 55 5a b2 ca e4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/