Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759814AbZCMWPm (ORCPT ); Fri, 13 Mar 2009 18:15:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759684AbZCMWP1 (ORCPT ); Fri, 13 Mar 2009 18:15:27 -0400 Received: from mail.vyatta.com ([76.74.103.46]:51699 "EHLO mail.vyatta.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759629AbZCMWPZ (ORCPT ); Fri, 13 Mar 2009 18:15:25 -0400 Date: Fri, 13 Mar 2009 15:15:16 -0700 From: Stephen Hemminger To: Ben Hutchings Cc: Tom Herbert , David Miller , yanmin_zhang@linux.intel.com, andi@firstfloor.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, herbert@gondor.apana.org.au, jesse.brandeburg@intel.com Subject: Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer Message-ID: <20090313151516.42c9cc10@nehalam> In-Reply-To: <1236982259.3300.13.camel@achroite> References: <1236866906.3221.11.camel@achroite> <1236926602.2567.528.camel@ymzhang> <65634d660903131006n44f068dw18b2fe9dce25399e@mail.gmail.com> <20090313.115137.254924980.davem@davemloft.net> <65634d660903131401v24d0b5aarec36ad95220ba201@mail.gmail.com> <1236982259.3300.13.camel@achroite> Organization: Vyatta X-Mailer: Claws Mail 3.6.1 (GTK+ 2.15.5; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2898 Lines: 56 On Fri, 13 Mar 2009 22:10:59 +0000 Ben Hutchings wrote: > On Fri, 2009-03-13 at 14:01 -0700, Tom Herbert wrote: > > On Fri, Mar 13, 2009 at 11:51 AM, David Miller wrote: > > > > > > From: Tom Herbert > > > Date: Fri, 13 Mar 2009 10:06:56 -0700 > > > > > > > You'll definitely want to look at the hardware provided hash. We've > > > > been using a 10G NIC which provides a Toeplitz hash (the one defined > > > > by Microsoft) and a software RSS-like capability to move packets from > > > > an interrupting CPU to another for processing. The hash could be used > > > > to index to a set of CPUs, but we also use the hash as a connection > > > > identifier to key into a lookup table to steer packets to the CPU > > > > where the application is running based on the running CPU of the last > > > > recvmsg. Using the device provided hash in this manner is a HUGE win, > > > > as opposed to taking cache misses to get 4-tuple from packet itself to > > > > compute a hash. I posted some patches a while back on our work if > > > > you're interested. > > > > > > I never understood this. > > > > > > If you don't let the APIC move the interrupt around, the individual > > > MSI-X interrupts will steer packets to individual specific CPUS and as > > > a result the scheduler will migrate tasks over to those cpus since the > > > wakeup events keep occuring there. > > > > We are trying to follow the decisions scheduler as opposed to leading > > it. This works on very loaded systems, with applications binding to > > cpusets, with threads that are receiving on multiple sockets. I > > suppose it might be compelling if a NIC could steer packets per flow, > > instead of by a hash... > > Depending on the NIC, RX queue selection may be done using a large > number of bits of the hash value and an indirection table or by matching > against specific values in the headers. The SFC4000 supports both of > these, though limited to TCP/IPv4 and UDP/IPv4. I think Neptune may be > more flexible. Of course, both indirection table entries and filter > table entries will be limited resources in any NIC, so allocating these > wholly automatically is an interesting challenge. > > Ben. > The problem is that without hardware support, handing off the packet may take more effort than processing it. Especially when cache line has to bounce to other CPU and trying to keep up with DoS attacks. It all depends how much processing is required, and the architecture of the system. The tradeoff would change over time based on processing speed and optimizing the receive/firewall code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/