Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754282AbZAIHmz (ORCPT ); Fri, 9 Jan 2009 02:42:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751930AbZAIHmo (ORCPT ); Fri, 9 Jan 2009 02:42:44 -0500 Received: from 1wt.eu ([62.212.114.60]:1304 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751125AbZAIHmo (ORCPT ); Fri, 9 Jan 2009 02:42:44 -0500 Date: Fri, 9 Jan 2009 08:42:17 +0100 From: Willy Tarreau To: Eric Dumazet Cc: David Miller , ben@zeus.com, jarkao2@gmail.com, mingo@elte.hu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, jens.axboe@oracle.com Subject: Re: [PATCH] tcp: splice as many packets as possible at once Message-ID: <20090109074217.GB27758@1wt.eu> References: <20090108173028.GA22531@1wt.eu> <49667534.5060501@zeus.com> <20090108.135515.85489589.davem@davemloft.net> <4966F2F4.9080901@cosmosbay.com> <20090109070415.GA27758@1wt.eu> <4966FC89.8040006@cosmosbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4966FC89.8040006@cosmosbay.com> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3179 Lines: 69 On Fri, Jan 09, 2009 at 08:28:09AM +0100, Eric Dumazet wrote: > My point is to use Gigabit links or 10Gb links and hundred or thousand of flows :) > > But if it doesnt work on a single flow, it wont work on many :) Yes it will, precisely because during the time you spend processing flow #1, you're still receiving data for flow #2. I really invite you to try. That's what I've been observing for years of userland coding of proxies. > I tried my test program with a Gb link, one flow, and got splice() calls returns 23000 bytes > in average, using a litle too much of CPU : If poll() could wait a litle bit more, CPU > could be available for other tasks. I also observe 23000 bytes on average on gigabit, which is very good (only about 5000 calls per second). And the CPU usage is lower than with recv/send, and I'd like to be able to run some profiling because I observed very different performance patterns depending on the network cards used. Generally, almost all the time is spent in softirqs. It's easy to make poll wait a little bit more : call it later and do your work before calling it. Also, epoll_wait() lets you ask it to return just a few amount of FDs. This really improves data gathering. I generally observe best performance between 30-200 FDs per call, even with 10000 concurrent connections. During the time I process the first 200 FDs, data is accumulating in the other's buffers. > If the application uses setsockopt(sock, SOL_SOCKET, SO_RCVLOWAT, [32768], 4), it > would be good if kernel was smart enough and could reduce number of wakeups. Yes, I agree about that. But my comment was about not making this behaviour mandatory for splice(). Letting the application choose is the way to go, of course. > (Next blocking point is the fixed limit of 16 pages per pipe, but > thats another story) Yes but that's not always easy to guess how many data you can feed into the pipe. It seems that depending on how the segments are gathered, you can store between 16 segments and 64 kB. I have observed some cases in blocking mode where I could not push more than a few kbytes with a small MSS, indicating to me that all those segments were each on a distinct page. I don't know precisely how that's handled internally. > >> About tcp_recvmsg(), we might also remove the "!timeo" test as well, > >> more testings are needed. > > > > No right now we can't (we must move it somewhere else at least). Because > > once at least one byte has been received (copied != 0), no other check > > will break out of the loop (or at least I have not found it). > > > > Of course we cant remove the test totally, but change the logic so that several skb > might be used/consumed per tcp_recvmsg() call, like your patch did for splice() OK, I initially understood that you suggested we could simply remove it like I did for splice. > Lets focus on functional changes, not on implementation details :) Agreed :-) Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/