Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp139584imp; Tue, 19 Feb 2019 20:13:02 -0800 (PST) X-Google-Smtp-Source: AHgI3IakvsNHrM6K8SRqr096Ek58YFBrqDk40DNYklCbcYo4y6uMKbbWRIYeiKPhz1k9oHecivfz X-Received: by 2002:a62:6dc7:: with SMTP id i190mr32771708pfc.166.1550635982642; Tue, 19 Feb 2019 20:13:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550635982; cv=none; d=google.com; s=arc-20160816; b=xaeubAWRmNlY5iK7Sulq/q76/dDcwhtaekLTgjys9dyQizvP47lVBcbPjkMS0Cu/iU XS4XX+wczMea8bUqGAY/6n3yGegQ5PUglZ7wNz+CnX67ojwndbn+/bfqmjXl26uIl6Fm 7RueFu4n/w1nXRoY9PHMSffsbD0trmxBtOKoKC4qIOSM3vtPbDOKOjVMCxVEOD1e0ZId 1gIJMjanvvOkSkc+TAFABryUaNJr/tdIcNeFWoIQskG0vkLrC3C8T0MIwFj+qjfF00YY mxntC/2qaRJC4f+mcZX4G4kiR4iT+P8ZVnnLuFMo82BAqB3e02cgtJ1mupRooHd+0eQq rzQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=3VsiShhdjz2x9LyiqaaO6C1VSvDuFgRlGDMgvo0bBb8=; b=gynIFE+0y0Gf8ifT5xr99DW/7eSwUVQvTouvdF2e+Ds7n3CIKGjWbnDeYgOQvJ+dJQ fc2RSxxuKS4iK+hYwHsq1Oit2cdkYqAXOKQaU6Gc42H8XFqJhyKM5xR4MtLD2mTfSC0v fleMJIMvxTD2aFFo5bec+WNKCReQf8VfEQVNwVrYMv154WpOla2fzys5NXS5W2cVGBhq vEj7o6uxKuYMvtl2znO4C5U0tJPSX4naiw9rNpBAHyvzGAN7XkokP4COvf8x4XTlcnfL ZGFnfcC0JnFCbopdT64yqoWsscRU5ohzI1cC8/8ou+Vb2+fsZu8lYygc42a4+Ja1I/XP qVgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b30si19655500pla.285.2019.02.19.20.12.47; Tue, 19 Feb 2019 20:13:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730376AbfBTEMN (ORCPT + 99 others); Tue, 19 Feb 2019 23:12:13 -0500 Received: from nautica.notk.org ([91.121.71.147]:39353 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727085AbfBTEMN (ORCPT ); Tue, 19 Feb 2019 23:12:13 -0500 Received: by nautica.notk.org (Postfix, from userid 1001) id 69EECC009; Wed, 20 Feb 2019 05:12:07 +0100 (CET) Date: Wed, 20 Feb 2019 05:11:52 +0100 From: Dominique Martinet To: Tom Herbert Cc: Tom Herbert , David Miller , Doron Roberts-Kedes , Dave Watson , Linux Kernel Network Developers , LKML Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages Message-ID: <20190220041151.GA13520@nautica> References: <20180917.184502.447385458615284933.davem@davemloft.net> <20180918015723.GA26300@nautica> <20181031025657.GA17861@nautica> <20190215010029.GA10899@nautica> <20190215015705.GA17974@nautica> <20190215033102.GA3099@nautica> <20190215045214.GA13123@nautica> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190215045214.GA13123@nautica> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dominique Martinet wrote on Fri, Feb 15, 2019: > With all that said I guess my patch should work correctly then, I'll try > to find some time to check the error does come back up the tcp socket in > my reproducer but I have no reason to believe it doesn't. Ok, so I can confirm this part - the 'csock' does come back up with POLLERR if the parse function returns ENOMEM in the current code. It also comes back up with POLLERR when the remote side closes the connection, which is expected, but I'm having a very hard time understanding how an application is supposed to deal with these POLLERR after having read the documentation and a bit of experimentation. I'm not sure how much it would matter for real life (if the other end closes the connection most servers would not care about what they said just before closing, but I can imagine some clients doing that in real life e.g. a POST request they don't care if it succeeds or not)... My test program works like this: - client connects, sends 10k messages and close()s the socket - server loops recving and close()s after 10k messages; it used to be recvmsg() directly but it's now using poll then recvmsg. When the client closes the socket, some messages are obviously still "in flight", and the server will recv a POLLERR notification on the csock at some point with many messages left. The documentation says to unattach the csock when you get POLLER. If I do that, the kcm socket will no longer give me any message, so all the messages still in flight at the time are lost. If I just ignore the csock like I used to, all the messages do come just fine, but as said previously on a real error this will just make recvmsg or the polling hang forever and I see no way to distinguish a "real" error vs. a connection shut down from the remote side with data left in the pipe. I thought of checking POLLERR on csock and POLLIN not set on kcmsock, but even that seems to happen fairly regularily - the kcm sock hasn't been filled up, it's still reading from the csock. On the other hand, checking POLLIN on the csock does say there is still data left, so I know there is data left on the csock, but this is also the case on a real error (e.g. if parser returns -ENOMEM) ... And this made me try to read from the csock after detaching it and I can resume manual tcp parsing for a few messages until read() fails with EPROTO ?! and I cannot seem to be able to get anything out of attaching it back to kcm (for e.g. an ENOMEM error that was transient)... I'm honestly not sure how the POLLERR notification mechanism works but I think it would be much easier to use KCM if we could somehow delay that error until KCM is done feeding from the csock (when netparser really stops reading from it like on real error, e.g. abort callback maybe?) I think it's fine if the csock is closed before the kcm sock message is read, but we should not lose messages like this. > I'd like to see some retry on ENOMEM before this is merged though, so > while I'm there I'll resend this with a second patch doing that > retry,.. I think just not setting strp->interrupted and not reporting > the error up might be enough? Will have to try either way. I also tried playing with that without much success. I had assumed just not calling strp_parser_err() (which calls the abort_parser cb) would be enough, eventually calling strp_start_timer() like the !len case, but no can do. With that said, returning 0 from the parse function also raises POLLERR on the csock and hangs netparser, so things aren't that simple... I could use a bit of help again. Thanks, -- Dominique