Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933752AbdCXA6w (ORCPT ); Thu, 23 Mar 2017 20:58:52 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:35146 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751306AbdCXA6u (ORCPT ); Thu, 23 Mar 2017 20:58:50 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170323211820.12615.88907.stgit@localhost.localdomain> <20170323213802.12615.58216.stgit@localhost.localdomain> From: Alexander Duyck Date: Thu, 23 Mar 2017 17:58:47 -0700 Message-ID: Subject: Re: [net-next PATCH v2 8/8] net: Introduce SO_INCOMING_NAPI_ID To: Andy Lutomirski Cc: Network Development , "linux-kernel@vger.kernel.org" , "Samudrala, Sridhar" , Eric Dumazet , "David S. Miller" , Linux API Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3421 Lines: 69 On Thu, Mar 23, 2017 at 3:43 PM, Andy Lutomirski wrote: > On Thu, Mar 23, 2017 at 2:38 PM, Alexander Duyck > wrote: >> From: Sridhar Samudrala >> >> This socket option returns the NAPI ID associated with the queue on which >> the last frame is received. This information can be used by the apps to >> split the incoming flows among the threads based on the Rx queue on which >> they are received. >> >> If the NAPI ID actually represents a sender_cpu then the value is ignored >> and 0 is returned. > > This may be more of a naming / documentation issue than a > functionality issue, but to me this reads as: > > "This socket option returns an internal implementation detail that, if > you are sufficiently clueful about the current performance heuristics > used by the Linux networking stack, just might give you a hint as to > which epoll set to put the socket in." I've done some digging into > Linux networking stuff, but not nearly enough to have the slighest > clue what you're supposed to do with the NAPI ID. Really the NAPI ID is an arbitrary number that will be unique per device queue, though multiple Rx queues can share a NAPI ID if they are meant to be processed in the same call to poll. If we wanted we could probably rename it to something like Device Poll Identifier or Device Queue Identifier, DPID or DQID, if that would work for you. Essentially it is just a unique u32 value that should not identify any other queue in the system while this device queue is active. Really the number itself is mostly arbitrary, the main thing is that it doesn't change and uniquely identifies the queue in the system. > It would be nice to make this a bit more concrete and a bit less tied > in Linux innards. Perhaps a socket option could instead return a hint > saying "for best results, put this socket in an epoll set that's on > cpu N"? After all, you're unlikely to do anything productive with > busy polling at all, even on a totally different kernel > implementation, if you have more than one epoll set per CPU. I can > see cases where you could plausibly poll with fewer than one set per > CPU, I suppose. Really we kind of already have an option that does what you are implying called SO_INCOMING_CPU. The problem is it requires pinning the interrupts to the CPUs in order to keep the values consistent, and even then busy polling can mess that up if the busy poll thread is running on a different CPU. With the NAPI ID we have to do a bit of work on the application end, but we can uniquely identify each incoming queue and interrupt migration and busy polling don't have any effect on it. So for example we could stack all the interrupts on CPU 0, and have our main thread located there doing the sorting of incoming requests and handing them out to epoll listener threads on other CPUs. When those epoll listener threads start doing busy polling the NAPI ID won't change even though the packet is being processed on a different CPU. > Again, though, from the description, it's totally unclear what a user > is supposed to do. What you end up having to do is essentially create a hash of sorts so that you can go from NAPI IDs to threads. In an ideal setup what you end up with multiple threads, each one running one epoll, and each epoll polling on one specific queue. Hope that helps to clarify it. - Alex