Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp302285imn; Wed, 27 Jul 2022 06:44:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uRJ+/ubhX2wBomnZSyivOWirAAIZIgLPDSPDdtkAWxby9Q9ZxV5V8uA1+eogh8Rb1SSBu8 X-Received: by 2002:a17:90a:4287:b0:1f2:2a61:e479 with SMTP id p7-20020a17090a428700b001f22a61e479mr4796310pjg.118.1658929496583; Wed, 27 Jul 2022 06:44:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658929496; cv=none; d=google.com; s=arc-20160816; b=LEcUp1Me5aFwIZjqsvhYGRbHeHbtmj1+awQmgODrTGAi1Ign/oGsjnNo/S06eHpkCh mhw4fMVF3gEeMa/TCbeq8kmIXiXNiIiME75wOmyNhi2QunoAKy0bqnE2DA2J3RNsjXFH AZE+Hohql51PoyR0RS1RBjlFidHXpKO69+FFB1JY4NTHlRzBLZvYGTTioMPHRNruorPC ntIdUhin1tbK6OykgRixRJ7WC5pJZj0VmA5b18WSZs6Bkr0kOVtuiTkzLhrWayTsgLdv RY/nX1Gu/XWDERn1oW5vYcOjK44RybOAzl/X+JwyT0fcY6cK5uvoERvamgPuAnitXpW+ 1Pvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=4LWH5gfewkP2m9y0QehAONnCxfwoC2hdUW6P/cyixj4=; b=b/0YxjWyA0jlp0onbjYcvAVnvC2tW9Fz1/iP7EtkKdpTsUrisTXyyfOhud5TIy8vuh OxqIJmyEv8hXBy9Qa1nW8HaDSC5QJ4Xa3jXm/8T+rJO+lCHvIkHrgWxUx8O5hscvK8Ut tbQ0I0/VF4GUHIHq8TKBYggMw8UPeYFnSyOQ7YkovfKvQEO0z+d+txoHcw+HM8CAsjoo R1mdkE3hsOl/KDZUB0BKSYz9pl0oC+EjQpldZx9gyBXi78pkvnFWUtkwGiNGEVaRKDr2 3wvZrg2DJsuDMfRlwnou15ZSE+xIvfOLZwB3SoJjuTtzF+yb/HaBU8ERBBskI8wQ91dm P30A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MpFDnMIn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f12-20020a65590c000000b0040dea781c31si20012230pgu.319.2022.07.27.06.44.40; Wed, 27 Jul 2022 06:44:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MpFDnMIn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233228AbiG0MhY (ORCPT + 99 others); Wed, 27 Jul 2022 08:37:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233224AbiG0MhU (ORCPT ); Wed, 27 Jul 2022 08:37:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5436733342 for ; Wed, 27 Jul 2022 05:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658925438; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4LWH5gfewkP2m9y0QehAONnCxfwoC2hdUW6P/cyixj4=; b=MpFDnMIn5YsixoYMfxydyZGmBOrW4acAfgOaEv/w7cUB4HilrTkHDxInljKQG+KzbfnDg1 mcPVOi91/ny6hRF/ehyVt/wbI2zy9Mdhx4jptMbCscUIrUgw60D4w3wmjWqtTaVee1vrNL qqeV07Zg2Prh3xrEROr6oXTEjaAq/ME= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-117-fAuxi1HbN0OrvJotTOPWig-1; Wed, 27 Jul 2022 08:37:17 -0400 X-MC-Unique: fAuxi1HbN0OrvJotTOPWig-1 Received: by mail-wr1-f70.google.com with SMTP id n7-20020adf8b07000000b0021e577a8784so2740713wra.5 for ; Wed, 27 Jul 2022 05:37:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=4LWH5gfewkP2m9y0QehAONnCxfwoC2hdUW6P/cyixj4=; b=P/rzooliIxANt0rQhW5vdZFdFhoXkjmPrgiew3UFYASDCVLUJZ2ZhPY2AMZ2Z4aYj4 LC/Q2N3TOyQpyXI2sTUNeLafCPIHh5vUNYvVDYdUDhS3GLG54VpSOE2T1nrcstDESopj x41ekf+Fk2yLJROISAX93pJaQPrpPNSiuif7vE3o00i2GmQ11FUmU82ByXEhn67fwORb Jg1cOb+wh3IXzh7WDIVjoKU5oyYOd2Wc1fRv/MoDYzasyVhizKtPGFrJtU8/dCIRLXSF Lyp1mHcO0Rp0Y+H1JfhBjg9SexiOLbr4cSNaizb4eCPwva6xP30mHoe7KP2DhifwAg4Q j/Cg== X-Gm-Message-State: AJIora//hhOF8akXkK+EXKfbzK1Vm27Ef7DThXhYPjTPOPpUV/BgyPz+ TSxs1Ia/+DRpVsk82IeVd2l0d9LvKEpfzb3rfDIGtdBx4jDsqlixkFje/SBSxQWYBFNcKEbL68/ WtfzlnBbARhbwZWAZi24GXjam X-Received: by 2002:adf:d1e8:0:b0:21d:ac9c:983d with SMTP id g8-20020adfd1e8000000b0021dac9c983dmr13770285wrd.629.1658925436047; Wed, 27 Jul 2022 05:37:16 -0700 (PDT) X-Received: by 2002:adf:d1e8:0:b0:21d:ac9c:983d with SMTP id g8-20020adfd1e8000000b0021dac9c983dmr13770264wrd.629.1658925435819; Wed, 27 Jul 2022 05:37:15 -0700 (PDT) Received: from sgarzare-redhat (host-79-46-200-178.retail.telecomitalia.it. [79.46.200.178]) by smtp.gmail.com with ESMTPSA id u9-20020adff889000000b0020fcaba73bcsm16755266wrp.104.2022.07.27.05.37.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Jul 2022 05:37:15 -0700 (PDT) Date: Wed, 27 Jul 2022 14:37:10 +0200 From: Stefano Garzarella To: Arseniy Krasnov , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: "David S. Miller" , "edumazet@google.com" , Jakub Kicinski , Paolo Abeni , Stefan Hajnoczi , "Michael S. Tsirkin" , "kys@microsoft.com" , "haiyangz@microsoft.com" , "sthemmin@microsoft.com" , "wei.liu@kernel.org" , Dexuan Cui , Krasnov Arseniy , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-hyperv@vger.kernel.org" , "kvm@vger.kernel.org" , kernel Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling Message-ID: <20220727123710.pwzy6ag3gavotxda@sgarzare-redhat> References: <19e25833-5f5c-f9b9-ac0f-1945ea17638d@sberdevices.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <19e25833-5f5c-f9b9-ac0f-1945ea17638d@sberdevices.ru> X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Arseniy, On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote: >Hello, > >This patchset includes some updates for SO_RCVLOWAT: > >1) af_vsock: > During my experiments with zerocopy receive, i found, that in some > cases, poll() implementation violates POSIX: when socket has non- > default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and > POLLRDNORM bits in 'revents' even number of bytes available to read > on socket is smaller than SO_RCVLOWAT value. In this case,user sees > POLLIN flag and then tries to read data(for example using 'read()' > call), but read call will be blocked, because SO_RCVLOWAT logic is > supported in dequeue loop in af_vsock.c. But the same time, POSIX > requires that: > > "POLLIN Data other than high-priority data may be read without > blocking. > POLLRDNORM Normal data may be read without blocking." > > See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293. > > So, we have, that poll() syscall returns POLLIN, but read call will > be blocked. > > Also in man page socket(7) i found that: > > "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a > socket as readable only if at least SO_RCVLOWAT bytes are available." > > I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it > uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with > this case for TCP socket, it works as POSIX required. > > I've added some fixes to af_vsock.c and virtio_transport_common.c, > test is also implemented. > >2) virtio/vsock: > It adds some optimization to wake ups, when new data arrived. Now, > SO_RCVLOWAT is considered before wake up sleepers who wait new data. > There is no sense, to kick waiter, when number of available bytes > in socket's queue < SO_RCVLOWAT, because if we wake up reader in > this case, it will wait for SO_RCVLOWAT data anyway during dequeue, > or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such > exit from poll() will be "spurious". This logic is also used in TCP > sockets. Nice, it looks good! > >3) vmci/vsock: > Same as 2), but i'm not sure about this changes. Will be very good, > to get comments from someone who knows this code. I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions. (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and pv-drivers@vmware.com) > >4) Hyper-V: > As Dexuan Cui mentioned, for Hyper-V transport it is difficult to > support SO_RCVLOWAT, so he suggested to disable this feature for > Hyper-V. I left a couple of comments in some patches, but it seems to me to be in a good state :-) I would just suggest a bit of a re-organization of the series (the patches are fine, just the order): - introduce vsock_set_rcvlowat() - disabling it for hv_sock - use 'target' in virtio transports - use 'target' in vmci transports - use sock_rcvlowat in vsock_poll() ?I think is better to pass sock_rcvlowat() as 'target' when the transports are already able to use it - add vsock_data_ready() - use vsock_data_ready() in virtio transports - use vsock_data_ready() in vmci transports - tests What do you think? Thanks, Stefano