Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1690456rwb; Thu, 29 Sep 2022 00:23:31 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4aeKfRHn+SexI8yaQeCBKMjb47gHSqgVl2zupA6/N3Xokjjx9rEYot1BevYYn090B+HZmp X-Received: by 2002:aa7:8801:0:b0:555:46a9:b3db with SMTP id c1-20020aa78801000000b0055546a9b3dbmr2062323pfo.11.1664436210904; Thu, 29 Sep 2022 00:23:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664436210; cv=none; d=google.com; s=arc-20160816; b=V6S3m/OauLBi5+x1etqoCcT2F3HGCmWyte+H+O4RA8YS71wbrwS+NH8YuBTQ10byz5 sI8um/rJNYjGDs9dDOjk+5nUTzw97t+8akn9i86d614OYUrUq1S2A1623pVM2WIhjZYV 4WzyS0U7bUFbzrdk67ymmJs+wehG3/87X3zOXpiVyG53hg1YuxuqEuYjbBrL16AwBBhp Bs9HGo+jSWo1LFXuqxbZnRuEQlzFFYuGah8XrJt2abJ+zHwYmUGf9S7CjwmiEnivb55l CEN8dfYbA+QmleUS+UH05uQjgA24bi88gR1wgrNQQVeK3+ku1Zp9H9dPJKV//pVYfoCS kY2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Pahe0Oi4C4m9h6JwmtN0So34FXt/ImLGT/jbwYjYeWk=; b=RVwnCUljwvvGhPUVEWGSR175XFlCI8rBY0A4MM/deEDvpMMxYxV7kCszaAM/EXI3Ed xAeXnQDQvirRatUFLypX2Vnj+swox6wSLvnQTJh5ymC9wFhBZAex3A2AZIAiikCUXWbk CN1sO513fPdONE5Ss6baw3Qk/ic7DruFjnLuEm1JEB0wCWkuT4U2E3A5zMrlLu2juGFO DBGMg4pAoMX+n32NKKi0pyTxBoo6g3q/WHtW8/jVJ7MwtLN4EjDvm2AKcgpX+WcguPx+ en9LbLaSRQ6C7X7R3ylMOhGGUfzQpiaVXJ1YEX7laR2fqSqGNplzvMV/sGDUWA/jYSr3 8Ibg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="gWuOMJ/1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m19-20020a637113000000b0043ca91d07a9si8029131pgc.506.2022.09.29.00.23.19; Thu, 29 Sep 2022 00:23:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="gWuOMJ/1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235183AbiI2HTZ (ORCPT + 99 others); Thu, 29 Sep 2022 03:19:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235184AbiI2HTX (ORCPT ); Thu, 29 Sep 2022 03:19:23 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BF231166C3 for ; Thu, 29 Sep 2022 00:19:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664435961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Pahe0Oi4C4m9h6JwmtN0So34FXt/ImLGT/jbwYjYeWk=; b=gWuOMJ/1zKrLQlGC2UnlEQeaQJo83D5cJTdqogmN+Wjzbjdd2F+RLvINFb3CG75vOZnGEu KGRWqpj6oiTAuGSH/y73A3azFvWHX1Xb7Kkq7ES8IEYksvEFDRoFLChJYCllS9KHFX5tZD e0dIh9wDllFDxz8CEvXmyd0hjwDHdC8= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-387-ROPyK-lrOZCcAAgq3ubqpQ-1; Thu, 29 Sep 2022 03:19:19 -0400 X-MC-Unique: ROPyK-lrOZCcAAgq3ubqpQ-1 Received: by mail-wr1-f71.google.com with SMTP id h20-20020adfaa94000000b0022cc1de1251so161256wrc.15 for ; Thu, 29 Sep 2022 00:19:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date; bh=Pahe0Oi4C4m9h6JwmtN0So34FXt/ImLGT/jbwYjYeWk=; b=fwfCwoMpmO4lxJSxirVHHzNSUpH8MgbO2T+kIcQYo/V9kkvAjNHcJT0TT3Eu9Uerev ptwo+QzIf6EYiLFvymlVulIctNx8t7t9YoOv+p9Z4PiykgHRLBAY2czAXmSB74S53jvI 3ey/us0YK6Yrw/cstszsNtYkVLnysyr+JrvAfAopZSoAzE+L59tac2YoBBnATjzBBac5 OsZ020Acei+Ze7VIR7QPiwy3ncQCs/WPz+PTWW5ZuApmLQWtmbVrGRDTs4fzjgXjMWqK Co/vLWHY8Pn1LPOpIYaPhLMIn6SBL38KfVxz8WrNXQ9oLcnECEkIRyC7UJcecvxNnuJV Yr7Q== X-Gm-Message-State: ACrzQf3e6snRA67uQeVhOfIJTUEOWSBETkXfqXusu6kJeMjXxK5H5NMB grlCkHYkaROarLtBq41Nrcv5RisvS74qbrHpLRabmY5d2vzAkqYAIWFKHQEKSN0Vc+N9wUUYu2O JFdsaCnNyYOCsGc2Pkncs9v5v X-Received: by 2002:a05:600c:4841:b0:3b4:76f0:99f with SMTP id j1-20020a05600c484100b003b476f0099fmr1218125wmo.85.1664435958808; Thu, 29 Sep 2022 00:19:18 -0700 (PDT) X-Received: by 2002:a05:600c:4841:b0:3b4:76f0:99f with SMTP id j1-20020a05600c484100b003b476f0099fmr1218099wmo.85.1664435958545; Thu, 29 Sep 2022 00:19:18 -0700 (PDT) Received: from redhat.com ([2.55.47.213]) by smtp.gmail.com with ESMTPSA id p16-20020adfe610000000b00225239d9265sm6149634wrm.74.2022.09.29.00.19.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Sep 2022 00:19:17 -0700 (PDT) Date: Thu, 29 Sep 2022 03:19:14 -0400 From: "Michael S. Tsirkin" To: Junichi Uekawa =?utf-8?B?KOS4iuW3nee0lOS4gCk=?= Cc: Stefano Garzarella , Stefan Hajnoczi , Jason Wang , Eric Dumazet , davem@davemloft.net, netdev@vger.kernel.org, Jakub Kicinski , virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, Paolo Abeni , linux-kernel@vger.kernel.org, Bobby Eshleman Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets. Message-ID: <20220929031419-mutt-send-email-mst@kernel.org> References: <20220928064538.667678-1-uekawa@chromium.org> <20220928082823.wyxplop5wtpuurwo@sgarzare-redhat> <20220928052738-mutt-send-email-mst@kernel.org> <20220928151135.pvrlsylg6j3hzh74@sgarzare-redhat> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 29, 2022 at 08:14:24AM +0900, Junichi Uekawa (上川純一) wrote: > 2022年9月29日(木) 0:11 Stefano Garzarella : > > > > On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote: > > >On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote: > > >> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote: > > >> > When copying a large file over sftp over vsock, data size is usually 32kB, > > >> > and kmalloc seems to fail to try to allocate 32 32kB regions. > > >> > > > >> > Call Trace: > > >> > [] dump_stack+0x97/0xdb > > >> > [] warn_alloc_failed+0x10f/0x138 > > >> > [] ? __alloc_pages_direct_compact+0x38/0xc8 > > >> > [] __alloc_pages_nodemask+0x84c/0x90d > > >> > [] alloc_kmem_pages+0x17/0x19 > > >> > [] kmalloc_order_trace+0x2b/0xdb > > >> > [] __kmalloc+0x177/0x1f7 > > >> > [] ? copy_from_iter+0x8d/0x31d > > >> > [] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] > > >> > [] vhost_worker+0xf7/0x157 [vhost] > > >> > [] kthread+0xfd/0x105 > > >> > [] ? vhost_dev_set_owner+0x22e/0x22e [vhost] > > >> > [] ? flush_kthread_worker+0xf3/0xf3 > > >> > [] ret_from_fork+0x4e/0x80 > > >> > [] ? flush_kthread_worker+0xf3/0xf3 > > >> > > > >> > Work around by doing kvmalloc instead. > > >> > > > >> > Signed-off-by: Junichi Uekawa > > > > > >My worry here is that this in more of a work around. > > >It would be better to not allocate memory so aggressively: > > >if we are so short on memory we should probably process > > >packets one at a time. Is that very hard to implement? > > > > Currently the "virtio_vsock_pkt" is allocated in the "handle_kick" > > callback of TX virtqueue. Then the packet is multiplexed on the right > > socket queue, then the user space can de-queue it whenever they want. > > > > So maybe we can stop processing the virtqueue if we are short on memory, > > but when can we restart the TX virtqueue processing? > > > > I think as long as the guest used only 4K buffers we had no problem, but > > now that it can create larger buffers the host may not be able to > > allocate it contiguously. Since there is no need to have them contiguous > > here, I think this patch is okay. > > > > However, if we switch to sk_buff (as Bobby is already doing), maybe we > > don't have this problem because I think there is some kind of > > pre-allocated pool. > > > > Thank you for the review! I was wondering if this is a reasonable workaround (as > we found that this patch makes a reliably crashing system into a > reliably surviving system.) > > > ... Sounds like it is a reasonable patch to use backported to older kernels? Hmm. Good point about stable. OK. Acked-by: Michael S. Tsirkin > > > > > > > > > > > >> > --- > > >> > > > >> > drivers/vhost/vsock.c | 2 +- > > >> > net/vmw_vsock/virtio_transport_common.c | 2 +- > > >> > 2 files changed, 2 insertions(+), 2 deletions(-) > > >> > > > >> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > > >> > index 368330417bde..5703775af129 100644 > > >> > --- a/drivers/vhost/vsock.c > > >> > +++ b/drivers/vhost/vsock.c > > >> > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, > > >> > return NULL; > > >> > } > > >> > > > >> > - pkt->buf = kmalloc(pkt->len, GFP_KERNEL); > > >> > + pkt->buf = kvmalloc(pkt->len, GFP_KERNEL); > > >> > if (!pkt->buf) { > > >> > kfree(pkt); > > >> > return NULL; > > >> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > > >> > index ec2c2afbf0d0..3a12aee33e92 100644 > > >> > --- a/net/vmw_vsock/virtio_transport_common.c > > >> > +++ b/net/vmw_vsock/virtio_transport_common.c > > >> > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); > > >> > > > >> > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) > > >> > { > > >> > - kfree(pkt->buf); > > >> > + kvfree(pkt->buf); > > >> > > >> virtio_transport_free_pkt() is used also in virtio_transport.c and > > >> vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC > > >> kvfree() can be used with that memory, so this should be fine. > > >> > > >> > kfree(pkt); > > >> > } > > >> > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt); > > >> > -- > > >> > 2.37.3.998.g577e59143f-goog > > >> > > > >> > > >> This issue should go away with the Bobby's work about introducing sk_buff > > >> [1], but we can queue this for now. > > >> > > >> I'm not sure if we should do the same also in the virtio-vsock driver > > >> (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in > > >> the host, while in the virtio-vsock driver the buffer is exposed to the > > >> device emulated in the host, so it should be physically contiguous (if not, > > >> maybe we need to adjust virtio_vsock_rx_fill()). > > > > > >More importantly it needs to support DMA API which IIUC kvmalloc > > >memory does not. > > > > > > > Right, good point! > > > > Thanks, > > Stefano > > > > > -- > Junichi Uekawa > Google