Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5294484rwr; Mon, 1 May 2023 03:34:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7r1qKC3xpULd9YoG5snNGX4J9CIPnRJGrRLpBjxyqcWheT+FEDh3Z2lCt0G74E5/WBkZEM X-Received: by 2002:a05:6a21:1509:b0:ef:a696:9940 with SMTP id nq9-20020a056a21150900b000efa6969940mr13256432pzb.15.1682937291165; Mon, 01 May 2023 03:34:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682937291; cv=none; d=google.com; s=arc-20160816; b=HbLNo03+rMrAGO2a76mWSar0tWrtmI9PIfxFY/O4+31l5PcBDyUGqoLMeXRjI/cYkc H3wEuRgERjieSv/Rc2+qPr52btkoD6ciZlms34SGY4Rj52j0HIdbFTvj6aTbyPHdlxcE R69NaBSCs3zTkrAmNNqgOVYn0tJYGnWHgwp9C7PAexNGTQruLtjWKzQTw/N+juygIFLJ Sw1wGfa5lKKxy1pGkA3SAtI8U9cpQopY2Fnig3b0+c3puaWo/ubB11B9BtBr1ZpbiDek Dz25VPxQ+ljzNp0cTp4/DNBViL6VPbE39m20T/pkLUAYUGiyDSnASEZehfF1hsZMu8GD asRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xqEYTLc79qp9oTPQxNKqZLuaG3/dvV631bV4KWP+w7E=; b=isUfiYf2mpdkdpMoF1jdfsaqNI9NpuC2C2qRY1uldRACc3X8sgWM9AzX8TTgOEkeMk T5N9WTRDMdlneg4fN8v5K0nVSLT9/C2Y/AJXdnGp8DJublX1vkk7023jOBkR6qbK4aVH tIWGgk8/l8cW0uppIuLVCrSPjYKIV0lFcArWBx0WnsFlcA64ykd7hPARwMIArafjSiTe sfoGCBstyeWqsz5oGI0hjOj569ltVtavKBIDeHchwMJQG9Du9aitz007OXzp/LgItLsd B7kU4zKcc65ggwpNoBTexdYP+IaUrC+Xz3UZnVhmMZ6kdPM55SYqrNt6ubAVim8lqdt2 KH1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D49GZzgd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 20-20020a630a14000000b00517f7c24664si28716006pgk.491.2023.05.01.03.34.40; Mon, 01 May 2023 03:34:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=D49GZzgd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232231AbjEAKWB (ORCPT + 99 others); Mon, 1 May 2023 06:22:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229679AbjEAKWA (ORCPT ); Mon, 1 May 2023 06:22:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CAB710E5 for ; Mon, 1 May 2023 03:21:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682936459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=xqEYTLc79qp9oTPQxNKqZLuaG3/dvV631bV4KWP+w7E=; b=D49GZzgdH3Z7a2EX/fXuL7zDJ0NgpMqd5qeUPRQv2WJVL6jNQMAc8OTo9Qs/bM9qcgZ2ki 7a3ZGQKvMiWqeS/ozIGmkLtB6qmMYuOHdnyXpE43HDmaJ857IleHLIB9i+EwINQgUrlaHf O13d90aqhAHoEWQF35kJh5ClG77UpuE= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-146-U2lkcp7qPlygcBF5lphqRw-1; Mon, 01 May 2023 06:20:58 -0400 X-MC-Unique: U2lkcp7qPlygcBF5lphqRw-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f173bd0d1bso14191225e9.3 for ; Mon, 01 May 2023 03:20:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682936456; x=1685528456; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xqEYTLc79qp9oTPQxNKqZLuaG3/dvV631bV4KWP+w7E=; b=lOnbG//o8nsdHsxNJ7aAQV1/FLI2SBbnNi15OEBYrR32/upeSaXIiU4FTNfxroFy84 Ycp12Px5nKm7Ph5clpk04aUrixmSt7p8X0BTbpdXi1oQkeRB7EmWyH9eSwQZckDdk5ZJ LxFNsg5hqjvhlNSfzNVWhYuJlrqchfdwkHdQkWiMfEmDh6Kbtujzqr2lWncEiccc7fQ1 XCBU5i/6WQAhzVkoYcp8kqZcIKi0AXpxjpKgV+FM+Ti3sNoihEA0NaVFnbtT0bfvGpSE Znls0Dpt/ylnPEtTpSYb9RGSE/0QDeH2fJ1Z9s1rWDJ5p5lb/m8xjC4neigfXVhV5Kky +y8w== X-Gm-Message-State: AC+VfDyXYwMsjJfaWUms7qCxoouCcBpWpP6MpZdX3DwVueQgyf8wIDBj 5ec6eNYL3xTMpMUnSULtsumkDV2B7ZoSv5AFtTx5OZixUj/LbGSfkAKG1kWdIhAaitQWR7geCTZ GfarrbOZnER57bi8Zyw50W9P0lL2Lii5f X-Received: by 2002:a1c:ed0e:0:b0:3f2:5999:4f2b with SMTP id l14-20020a1ced0e000000b003f259994f2bmr9139286wmh.33.1682936456679; Mon, 01 May 2023 03:20:56 -0700 (PDT) X-Received: by 2002:a1c:ed0e:0:b0:3f2:5999:4f2b with SMTP id l14-20020a1ced0e000000b003f259994f2bmr9139277wmh.33.1682936456371; Mon, 01 May 2023 03:20:56 -0700 (PDT) Received: from redhat.com ([2a06:c701:742c:c300:3695:a81b:6f0b:8940]) by smtp.gmail.com with ESMTPSA id k6-20020a5d5186000000b003062b57ffd1sm3366770wrv.50.2023.05.01.03.20.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 May 2023 03:20:55 -0700 (PDT) Date: Mon, 1 May 2023 06:20:52 -0400 From: "Michael S. Tsirkin" To: Alvaro Karsz Cc: "jasowang@redhat.com" , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "xuanzhuo@linux.alibaba.com" Subject: Re: [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 Message-ID: <20230501061401-mutt-send-email-mst@kernel.org> References: <20230430131518.2708471-1-alvaro.karsz@solid-run.com> <20230430131518.2708471-3-alvaro.karsz@solid-run.com> <20230430093009-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 30, 2023 at 06:54:08PM +0000, Alvaro Karsz wrote: > > > At the moment, if a network device uses vrings with less than > > > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > > > > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > > > evaluate to false, leading to TX timeouts. > > > > > > This patch introduces a new variable, single_pkt_max_descs, that holds > > > the max number of descriptors we may need to handle a single packet. > > > > > > This patch also detects the small vring during probe, blocks some > > > features that can't be used with small vrings, and fails probe, > > > leading to a reset and features re-negotiation. > > > > > > Features that can't be used with small vrings: > > > GRO features (VIRTIO_NET_F_GUEST_*): > > > When we use small vrings, we may not have enough entries in the ring to > > > chain page size buffers and form a 64K buffer. > > > So we may need to allocate 64k of continuous memory, which may be too > > > much when the system is stressed. > > > > > > This patch also fixes the MTU size in small vring cases to be up to the > > > default one, 1500B. > > > > and then it should clear VIRTIO_NET_F_MTU? > > > > Following [1], I was thinking to accept the feature and a let the device figure out that it can't transmit a big packet, since the RX buffers are not big enough (without VIRTIO_NET_F_MRG_RXBUF). > But, I think that we may need to block the MTU feature after all. > Quoting the spec: > > A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it. > If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive buffers to receive at least one receive packet of size mtu (plus low level ethernet header length) with gso_type NONE or ECN. > > So, if VIRTIO_NET_F_MTU is negotiated, we MUST supply enough receive buffers. > So I think that blocking VIRTIO_NET_F_MTU should be the way to go, If mtu > 1500. > > [1] https://lore.kernel.org/lkml/20230417031052-mutt-send-email-mst@kernel.org/ First up to 4k should not be a problem. Even jumbo frames e.g. 9k is highly likely to succeed. And a probe time which is often boot even 64k isn't a problem ... Hmm. We could allocate large buffers at probe time. Reuse them and copy data over. IOW reusing GOOD_COPY_LEN flow for this case. Not yet sure how I feel about this. OTOH it removes the need for the whole feature blocking approach, does it not? WDYT? > > > + /* How many ring descriptors we may need to transmit a single packet */ > > > + u16 single_pkt_max_descs; > > > + > > > + /* Do we have virtqueues with small vrings? */ > > > + bool svring; > > > + > > > /* CPU hotplug instances for online & dead */ > > > struct hlist_node node; > > > struct hlist_node node_dead; > > > > worth checking that all these layout changes don't push useful things to > > a different cache line. can you add that analysis? > > > > Good point. > I think that we can just move these to the bottom of the struct. > > > > > I see confusiong here wrt whether some rings are "small"? all of them? > > some rx rings? some tx rings? names should make it clear. > > The small vring is a device attribute, not a vq attribute. It blocks features, which affects the entire device. > Maybe we can call it "small vring mode". > > > also do we really need bool svring? can't we just check single_pkt_max_descs > > all the time? > > > > We can work without the bool, we could always check if single_pkt_max_descs != MAX_SKB_FRAGS + 2. > It doesn't really matter to me, I was thinking it may be more readable this way. > > > > +static bool virtnet_uses_svring(struct virtnet_info *vi) > > > +{ > > > + u32 i; > > > + > > > + /* If a transmit/receive virtqueue is small, > > > + * we cannot handle fragmented packets. > > > + */ > > > + for (i = 0; i < vi->max_queue_pairs; i++) { > > > + if (IS_SMALL_VRING(virtqueue_get_vring_size(vi->sq[i].vq)) || > > > + IS_SMALL_VRING(virtqueue_get_vring_size(vi->rq[i].vq))) > > > + return true; > > > + } > > > + > > > + return false; > > > +} > > > > I see even if only some rings are too small we force everything to use > > small ones. Wouldn't it be better to just disable small ones in this > > case? That would not need a reset. > > > > I'm not sure. It may complicate things. > > What if all TX vqs are small? > What if all RX vqs are small? > What if we end up with an unbalanced number of TX vqs and RX vqs? is this allowed by the spec? > What if we end up disabling the RX default vq (receiveq1)? > > I guess we could do it, after checking some conditions. > Maybe we can do it in a follow up patch? > Do you think it's important for it to be included since day 1? > > I think that the question is: what's more important, to use all the vqs while blocking some features, or to use part of the vqs without blocking features? > > > > + > > > +/* Function returns the number of features it blocked */ > > > > We don't need the # though. Make it bool? > > > > Sure. >