Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758902AbZCRRZj (ORCPT ); Wed, 18 Mar 2009 13:25:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758239AbZCRRY6 (ORCPT ); Wed, 18 Mar 2009 13:24:58 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:35760 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756773AbZCRRY4 (ORCPT ); Wed, 18 Mar 2009 13:24:56 -0400 Message-ID: <49C12E64.1000301@us.ibm.com> Date: Wed, 18 Mar 2009 10:24:52 -0700 From: Vernon Mauery User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: netdev CC: LKML , rt-users Subject: High contention on the sk_buff_head.lock Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3544 Lines: 60 I have been beating on network throughput in the -rt kernel for some time now. After digging down through the send path of UDP packets, I found that the sk_buff_head.lock is under some very high contention. This lock is acquired each time a packet is enqueued on a qdisc and then acquired again to dequeue the packet. Under high networking loads, the enqueueing processes are not only contending among each other for the lock, but also with the net-tx soft irq. This makes for some very high contention on this one lock. My testcase is running varying numbers of concurrent netperf instances pushing UDP traffic to another machine. As the count goes from 1 to 2, the network performance increases. But from 2 to 4 and from 4 to 8, we see a big decline, with 8 instances pushing about half of what a single thread can do. Running 2.6.29-rc6-rt3 on an 8-way machine with a 10GbE card (I have tried both NetXen and Broadcom, with very similar results), I can only push about 1200 Mb/s. Whereas with the mainline 2.6.29-rc8 kernel, I can push nearly 6000 Mb/s. But still not as much as I think is possible. I was curious and decided to see if the mainline kernel was hitting the same lock, and using /proc/lock_stat, it is hitting the sk_buff_head.lock as well (it was the number one contended lock). So while this issue really hits -rt kernels hard, it has a real effect on mainline kernels as well. The contention of the spinlocks is amplified when they get turned into rt-mutexes, which causes a double context switch. Below is the top of the lock_stat for 2.6.29-rc8. This was captured from a 1 minute network stress test. The next high contender had 2 orders of magnitude fewer contentions. Think of the throughput increase if we could ease this contention a bit. We might even be able to saturate a 10GbE link. lock_stat version 0.3 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- &list->lock#3: 24517307 24643791 0.71 1286.62 56516392.42 34834296 44904018 0.60 164.79 31314786.02 ------------- &list->lock#3 15596927 [] dev_queue_xmit+0x2ea/0x468 &list->lock#3 9046864 [] __qdisc_run+0x11b/0x1ef ------------- &list->lock#3 6525300 [] __qdisc_run+0x11b/0x1ef &list->lock#3 18118491 [] dev_queue_xmit+0x2ea/0x468 The story is the same for -rt kernels, only the waittime and holdtime are both orders of magnitude greater. I am not exactly clear on the solution, but if I understand correctly, in the past there has been some discussion of batched enqueueing and dequeueing. Is anyone else working on this problem right now who has just not yet posted anything for review? Questions, comments, flames? --Vernon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/