Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934493Ab3FSJK5 (ORCPT ); Wed, 19 Jun 2013 05:10:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35399 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934299Ab3FSJKu (ORCPT ); Wed, 19 Jun 2013 05:10:50 -0400 Date: Wed, 19 Jun 2013 12:11:32 +0300 From: "Michael S. Tsirkin" To: Eric Dumazet Cc: Jason Wang , davem@davemloft.net, edumazet@google.com, hkchu@google.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [net-next rfc 1/3] net: avoid high order memory allocation for queues by using flex array Message-ID: <20130619091132.GA2816@redhat.com> References: <1371620452-49349-1-git-send-email-jasowang@redhat.com> <1371620452-49349-2-git-send-email-jasowang@redhat.com> <1371623518.3252.267.camel@edumazet-glaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1371623518.3252.267.camel@edumazet-glaptop> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3167 Lines: 87 On Tue, Jun 18, 2013 at 11:31:58PM -0700, Eric Dumazet wrote: > On Wed, 2013-06-19 at 13:40 +0800, Jason Wang wrote: > > Currently, we use kcalloc to allocate rx/tx queues for a net device which could > > be easily lead to a high order memory allocation request when initializing a > > multiqueue net device. We can simply avoid this by switching to use flex array > > which always allocate at order zero. > > > > Signed-off-by: Jason Wang > > --- > > include/linux/netdevice.h | 13 ++++++---- > > net/core/dev.c | 57 ++++++++++++++++++++++++++++++++------------ > > net/core/net-sysfs.c | 15 +++++++---- > > 3 files changed, 58 insertions(+), 27 deletions(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 09b4188..c0b5d04 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -32,6 +32,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -1230,7 +1231,7 @@ struct net_device { > > > > > > #ifdef CONFIG_RPS > > - struct netdev_rx_queue *_rx; > > + struct flex_array *_rx; > > > > /* Number of RX queues allocated at register_netdev() time */ > > unsigned int num_rx_queues; > > @@ -1250,7 +1251,7 @@ struct net_device { > > /* > > * Cache lines mostly used on transmit path > > */ > > - struct netdev_queue *_tx ____cacheline_aligned_in_smp; > > + struct flex_array *_tx ____cacheline_aligned_in_smp; > > > > Using flex_array and adding overhead in this super critical part of > network stack, only to avoid order-1 allocations done in GFP_KERNEL > context is simply insane. > > We can revisit this in 2050 if we ever need order-4 allocations or so, > and still use 4K pages. > > Well KVM supports up to 160 VCPUs on x86. Creating a queue per CPU is very reasonable, and assuming cache line size of 64 bytes, netdev_queue seems to be 320 bytes, that's 320*160 = 51200. So 12.5 pages, order-4 allocation. I agree most people don't have such systems yet, but they do exist. We can cut the size of netdev_queue, moving out kobj - which does not seem to be used on data path to a separate structure. It's 64 byte in size so exactly 256 bytes. That will get us an order-3 allocation, and there's some padding there so we won't immediately increase it the moment we add some fields. Comments on this idea? Instead of always using a flex array, we could have + struct netdev_queue *_tx; /* Used with small # of queues */ +#ifdef CONFIG_NETDEV_HUGE_NUMBER_OR_QUEUES + struct flex_array *_tx_large; /* Used with large # of queues */ +#endif And fix wrappers to use _tx if not NULL, otherwise _tx_large. If configured in, it's an extra branch on data path but probably less costly than the extra indirection. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/