Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751346AbdLZTFk (ORCPT ); Tue, 26 Dec 2017 14:05:40 -0500 Received: from resqmta-ch2-04v.sys.comcast.net ([69.252.207.36]:35382 "EHLO resqmta-ch2-04v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750854AbdLZTFi (ORCPT ); Tue, 26 Dec 2017 14:05:38 -0500 Date: Tue, 26 Dec 2017 13:05:34 -0600 (CST) From: Christopher Lameter X-X-Sender: cl@nuc-kabylake To: kemi cc: Michal Hocko , Greg Kroah-Hartman , Andrew Morton , Vlastimil Babka , Mel Gorman , Johannes Weiner , YASUAKI ISHIMATSU , Andrey Ryabinin , Nikolay Borisov , Pavel Tatashin , David Rientjes , Sebastian Andrzej Siewior , Dave , Andi Kleen , Tim Chen , Jesper Dangaard Brouer , Ying Huang , Aaron Lu , Aubrey Li , Linux MM , Linux Kernel Subject: Re: [PATCH v2 3/5] mm: enlarge NUMA counters threshold size In-Reply-To: <9fb9af97-167c-6a0b-ded1-2790113ece9a@intel.com> Message-ID: References: <1513665566-4465-1-git-send-email-kemi.wang@intel.com> <1513665566-4465-4-git-send-email-kemi.wang@intel.com> <20171219124045.GO2787@dhcp22.suse.cz> <439918f7-e8a3-c007-496c-99535cbc4582@intel.com> <20171220101229.GJ4831@dhcp22.suse.cz> <268b1b6e-ff7a-8f1a-f97c-f94e14591975@intel.com> <9fb9af97-167c-6a0b-ded1-2790113ece9a@intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-CMAE-Envelope: MS4wfFHEDxlbc1v+vxigHI+axAACydM1bZeKRXM4f7dchMB0cPzGb/nQ5J79xGYrWAo5caaZmaj7ysLWvSwLDfFAoVz16vw5uq0ma/Q9t+IIheZx3o76yI/v hmlcAo0l0UB8uQq/sSLnhCfRvPawgBD8t9z/eL5XQeXyYxVOYT51FRjXDJNDucbnhe+pNoFjJ1G/1bf3eGMIQkpaxWlKDxo7+h2HgaxA3vhx+5oyuI33FJPo EJmLaVPQgLPfUEJ/KbkqBZn5oMh0xOkk8ZyZkHYdNemDMjhUKIn+cJCvAtIc8uuF2RX/uZejrVjRrR2qVBWxgjlKHLZfU1HX9rTWcCvwuZa1zxzKBDAJUe2L kCFy+jzxas9Q3xIBPo3J0cIflRjOfDzh27tD1yc8F8C9tKd8EthkqqvbzGU9h9SX6xIKV/Cysa8eOexw0VAofwLN8bvbdnU39Lo1eLd69O7Jhwu4rHPGP7GJ Yk5TipI8eLGGR0yMDinVICQTk0DrPGqZjN7qNNI/sXmPmGbMDomH6I8H+krTmyOZk1ymik02OCTWFye828+QXSmG/njhyuCTbM7P6wkUJeyDd0rDNLDuHQ+l GGBrP7jLvfwTPihs9kzCxxdTMcUCu9Uwx8rlVnwHs3C721c5DbS3PBysO+ff/IIsKNADAVQxo5QqUXOwiTWjsD95uWiMiB5kH5aXMoB8WgFrpMGZ54nd8762 UWLh8gPyBxpAAzKfiDjDGy5wSc2pjCFKkNyvUyt1bynu+hm6XZVsS7rR8HOO7tieV1k0a/ByQ2ALWI1lEqQVNUoY6r7RqOBmWPDnzsBakV1wZ8JBv21J6fow 2c18CtKjlh3trumR/MNzfVcyXnmVnTd0yplGiddp Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1924 Lines: 46 On Fri, 22 Dec 2017, kemi wrote: > > I think you are fighting a lost battle there. As evident from the timing > > constraints on packet processing in a 10/40G you will have a hard time to > > process data if the packets are of regular ethernet size. And we alrady > > have 100G NICs in operation here. > > > > Not really. > For 10/40G NIC or even 100G, I admit DPDK is widely used in data center network > rather than kernel driver in production environment. Shudder. I would rather have an user space API that is vendor neutral and that allows the use of multiple NICs. The Linux kernel has an RDMA subsystem that does just that. But time budget is difficult to deal with even using RDMA or DPKG where we can avoid the OS overhead. > That's due to the slow page allocator and long pipeline processing in network > protocol stack. Right the timing budget there for processing a single packet gets below a microsecond at some point and there its going to be difficult to do much. Some aggregation / offloading is required and that increases as speeds become higher. > That's not easy to change this state in short time, but if we can do something > here to change it a little, why not. How much of an improvement is this going to be? If it is significant then by all means lets do it. > > We can try to get the performance as high as possible but full rate high > > speed networking invariable must use offload mechanisms and thus the > > statistics would only be available from the hardware devices that can do > > wire speed processing. > > > > I think you may be talking something about SmartNIC (e.g. OpenVswitch offload + > VF pass through). That's usually used in virtualization environment to eliminate > the overhead from device emulation and packet processing in software virtual > switch(OVS or linux bridge). The switch offloads Can also be used elsewhere. Also the RDMA subsystem has counters like that.