Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp30264206rwd; Thu, 6 Jul 2023 03:25:35 -0700 (PDT) X-Google-Smtp-Source: APBJJlECdKfxpktp3+ZqzFSSvY9TMM135Su90AgZqqQdBsKgi8Iv7gOIcdnYZDxDBENEt7oMm2cM X-Received: by 2002:a05:6870:96a0:b0:1b0:7c30:7baf with SMTP id o32-20020a05687096a000b001b07c307bafmr1779163oaq.42.1688639135657; Thu, 06 Jul 2023 03:25:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688639135; cv=none; d=google.com; s=arc-20160816; b=wDIF2lb+1on8wYr4nETigny5KOjYtFKlFGCYWiGy6pTGsizyOu/DC93e90wFtC9iCM Tfk1KmyeYjSdzOb3cpXgMyNALYVv1c3wa0XnSXVENXoNJ5T/4IXxCErJcPjxDVqi8svc 71MCB6qVSqx/dJ0bR0i84dEQlyeFuWrNXM/Dd1aKfRvYs5gFDKxLuC2jt4wh2sE13fyF ql9pfka+3h75kgR0cfxHhfLYiNTKp8erJmm+bRra5bcDDSKqu4uRjzXlLlDLXfNy9NGx zY3EdfFkXPVKZ7adu/Lh84kpfWBuqx1uXJjVG7nsETDfYFURDgCfkG9qkJhoDy0hBC+n GFzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=PFB/bHaV+5S4HNAEYX/AEdRm3PBRbuuOs/N7fpLg5iE=; fh=w1K4rtpF9SXuv9VWGftJ6agpugyl6/178iePVfMZQl4=; b=GIwlLI8GH3IqMT2pURdsF8Nv5rmCtD7DYhc4LnNAj8NemyEBJDX/hNHWwyQZxgxT6n Snf+Elw+8p7hQqsizoEwk+BmMyecQhyemvNyOtt0nZmgUW2t6PZTZ5dcLJUeF2lFjfK+ z3BUnmZo8xajPufdTGcEr+u9sq2C+gWISkmOdUTLk+54bJgMi1RHvi2MR+TpfQWK00Wu PSUN5XUPUND4BKNPbY70tcYd4WICF/J1oMyznw+n/7OgryiVe4CvZig89v9Ua2jn8Yjw FedNJ2xtyKS04/vYuOGjiUX0tymuonQr5XblNSC6rinpNZym3mQVG+GCkcxymv2+o9O8 k7+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hKw1k4X7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gq12-20020a17090b104c00b0026306246efdsi3464967pjb.84.2023.07.06.03.25.23; Thu, 06 Jul 2023 03:25:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hKw1k4X7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230108AbjGFKCp (ORCPT + 99 others); Thu, 6 Jul 2023 06:02:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230159AbjGFKCg (ORCPT ); Thu, 6 Jul 2023 06:02:36 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 725F9199E for ; Thu, 6 Jul 2023 03:01:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688637711; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PFB/bHaV+5S4HNAEYX/AEdRm3PBRbuuOs/N7fpLg5iE=; b=hKw1k4X7OtdOsb8IlPWp/krUxNJCPDcMl2THTnD/JQDIDIqEe4jSI/J7Yz18cmLYiPKypi EDb7CnWlLlPCgDjAAtJ5IMW1e6q3sKSGMMt8J+WE4t0IN2A4Y3XQ98j3zEsUCezK0tN6Kr G01gBm5htE8Ppf44EOCIk38faQUsR30= Received: from mail-yw1-f197.google.com (mail-yw1-f197.google.com [209.85.128.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-264-BD_Yae-hOVqnySsQN8OXUw-1; Thu, 06 Jul 2023 06:01:50 -0400 X-MC-Unique: BD_Yae-hOVqnySsQN8OXUw-1 Received: by mail-yw1-f197.google.com with SMTP id 00721157ae682-56942442eb0so6132007b3.1 for ; Thu, 06 Jul 2023 03:01:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688637709; x=1691229709; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PFB/bHaV+5S4HNAEYX/AEdRm3PBRbuuOs/N7fpLg5iE=; b=Lt+wZBPyAWe26FwJUOdicHkRrayjEhpzy4GRVztd8qh/zV7IjDkpIN+wqnsoOdLI2B DuLcHysCzE1+BENWIOkRaHc8RJVaN1UT2EDWr9e6xkFJSwK/b9YyKHmQo8xF8BL8W9+M fBBlmqt+sOJpCr3qFEieujxLybw+xgvkWTgV54C7/7q9Ee7NPaTuCalpKVMnFEFwTm77 AMkQtdd7IL3bMvu/Bx18fcc7GQoaU1D4RVPT3Rdqf27h3STcFKiw82rGCUv0wXTIqq0b KJ3Ypync6nLXixIem/xoprJySFe4Sb1RJ1/TiwWWf16dh4iOW+oyuN4RfPbGY5qlk1vX WlYg== X-Gm-Message-State: ABy/qLacSE5cHyG5ObdbaQH5Y1JhUnyvIFKX13/UQdn9L6uyeqazkIA+ mYZPEfZQRQ5cSBH2wcwKUb717zLrJ4o/6P25iEwL3eI9E3r2iHt36v5ul3or4UWqh8Ne3rvzvxo iagc+Sevey/aoy8LJPDx8wNtJxOq6DtoZbHMOiXqb X-Received: by 2002:a81:8384:0:b0:56d:c97:39f4 with SMTP id t126-20020a818384000000b0056d0c9739f4mr1424682ywf.8.1688637709729; Thu, 06 Jul 2023 03:01:49 -0700 (PDT) X-Received: by 2002:a81:8384:0:b0:56d:c97:39f4 with SMTP id t126-20020a818384000000b0056d0c9739f4mr1424668ywf.8.1688637709450; Thu, 06 Jul 2023 03:01:49 -0700 (PDT) MIME-Version: 1.0 References: <20230704234532.532c8ee7.gary@garyguo.net> In-Reply-To: <20230704234532.532c8ee7.gary@garyguo.net> From: Stefano Garzarella Date: Thu, 6 Jul 2023 12:01:38 +0200 Message-ID: Subject: Re: Hyper-V vsock streams do not fill the supplied buffer in full To: Gary Guo , Dexuan Cui Cc: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Gary, On Wed, Jul 5, 2023 at 12:45=E2=80=AFAM Gary Guo wrote: > > When a vsock stream is called with recvmsg with a buffer, it only fills > the buffer with data from the first single VM packet. Even if there are > more VM packets at the time and the buffer is still not completely > filled, it will just leave the buffer partially filled. > > This causes some issues when in WSLD which uses the vsock in > non-blocking mode and uses epoll. > > For stream-oriented sockets, the epoll man page [1] says that > > > For stream-oriented files (e.g., pipe, FIFO, stream socket), > > the condition that the read/write I/O space is exhausted can > > also be detected by checking the amount of data read from / > > written to the target file descriptor. For example, if you > > call read(2) by asking to read a certain amount of data and > > read(2) returns a lower number of bytes, you can be sure of > > having exhausted the read I/O space for the file descriptor. > > This has been used as an optimisation in the wild for reducing number > of syscalls required for stream sockets (by asserting that the socket > will not have to polled to EAGAIN in edge-trigger mode, if the buffer > given to recvmsg is not filled completely). An example is Tokio, which > starting in v1.21.0 [2]. > > When this optimisation combines with the behaviour of Hyper-V vsock, it > causes issue in this scenario: > * the VM host send data to the guest, and it's splitted into multiple > VM packets > * sk_data_ready is called and epoll returns, notifying the userspace > that the socket is ready > * userspace call recvmsg with a buffer, and it's partially filled > * userspace assumes that the stream socket is depleted, and if new data > arrives epoll will notify it again. > * kernel always considers the socket to be ready, and since it's in > edge-trigger mode, the epoll instance will never be notified again. > > This different realisation of the readiness causes the userspace to > block forever. Thanks for the detailed description of the problem. I think we should fix the hvs_stream_dequeue() in net/vmw_vsock/hyperv_transport.c. We can do something similar to what we do in virtio_transport_stream_do_dequeue() in net/vmw_vsock/virtio_transport_common.c @Dexuan WDYT? Thanks, Stefano