Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp6940539rwr; Tue, 25 Apr 2023 06:10:59 -0700 (PDT) X-Google-Smtp-Source: AKy350bVqfgEj+2NdmTYHth7l3BL4JOeT/WrBib8zbzlVSNwUOOsni/2V5u1Ho27cQqNYZKX6L8P X-Received: by 2002:a17:90a:be17:b0:247:271:c433 with SMTP id a23-20020a17090abe1700b002470271c433mr16462618pjs.34.1682428259142; Tue, 25 Apr 2023 06:10:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682428259; cv=none; d=google.com; s=arc-20160816; b=Hl3Ec4U9HDUZbI10z+eWyFVf1L042RK+w5+rhofj4frxC6+6tx2ru1qugyYmvvcrif JjB1HgfyX1w86zjqUYuVIwZR+7jEZ6+3XBHl7ZXC4QqtK4K5UoksnRzGHMCgjlkJQlH9 U2p3zUsuerhrEMue1rFRdMUgoNxUFkwuVkHC2CvNXhQsZUv6sjo66hyaX/70eBnkd1v9 X8HbYsF34ZuxasMGm1VTgwHLS/trF4B1VyAsMfjmiQ6AtGv4jBLMoHxWFos+V0zMEIyi k42FlnQwudADYxoxQp/90qI3TNMPIlvKhYzkus4AFSk3ybEFk7VorXDLL2911J+/s/JM f+zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=+fKc7C0hTjXYruI3suD5rQvaDU7fCvCV7e+LIaqkh9A=; b=WZbaB3BOGhCj0Ro89ESeliBWOsf1J8Tuh6efUBtUZReSZVqNz6QunZP1NjYWFStb4r pd2Wr/nBf+Qb3i/T39pMvt9yfof/qEOmA9vJ6k4XYPuVsyrVJCxW4pQgN5AnReRIupDJ EhjlQoRVsWmdlGwBXmvrjBDJD+o0ZhVujyUSF5G3kSaO3OpbZ8wSJyqWmBm77Jxf71Wo PDPdrjY4y8CLQaTRf21BqTG735IOdqBY8DwY/imeY4jQxlJ1osXajGB4GsomCm5iuNZg 37wUC6xNvjVYWNCdJNQy+Sdwd3ptCDohYDKSlWc49b4LDdTPtQXKBFUkfJU6NQ/MCg6p A90A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WYQSRiDn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ob8-20020a17090b390800b002474ddc6a90si10979360pjb.125.2023.04.25.06.10.36; Tue, 25 Apr 2023 06:10:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WYQSRiDn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233896AbjDYNJr (ORCPT + 99 others); Tue, 25 Apr 2023 09:09:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233082AbjDYNJp (ORCPT ); Tue, 25 Apr 2023 09:09:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B88C10FE for ; Tue, 25 Apr 2023 06:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682428139; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+fKc7C0hTjXYruI3suD5rQvaDU7fCvCV7e+LIaqkh9A=; b=WYQSRiDnIq14pGQp53/vnjFrwMdufCgByUApIlvUfXNZUviJwUFIorhwJQKPJToHqsfWqu RKu/Vw7QVOkOg0ZdL3RDrvN+sT+LKlB9Sx4FgzUB0PR8u7x33nBE76P5ttOwojTlC8uZoU AYtFbi8Indd8/xk2NX9iWMLOriXWd/8= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-211-q7O8O4ZJNpmlGJl1RGJGeQ-1; Tue, 25 Apr 2023 09:08:56 -0400 X-MC-Unique: q7O8O4ZJNpmlGJl1RGJGeQ-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-2f69e5def13so2086107f8f.0 for ; Tue, 25 Apr 2023 06:08:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682428135; x=1685020135; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=+fKc7C0hTjXYruI3suD5rQvaDU7fCvCV7e+LIaqkh9A=; b=fRWYbtgXCLUBA7n6Kcd7T55RkWZ/VEGrynqll4k4tfquxd0mopVo9CAr6C3t6NiRqm Q3VvLTfHsW2HtUeTwX8mAhKPr4MKMV9Xl38wOo1YAn+eO5ra9qRM48fJHpP/X/doqcev rjD7tP8Vqae5bv4wWw6Hnvx/+Jw7AbN6CWsfDQ/FMlu6t+KwNVZOFX1Do0HgLsJs2NJi EJK18zXuejI+23UOlT3KcvdBwGLT86RI35omwW8LAEuRWS7v0jb4S0LjNxdPTz19dtRo uIhlL2Jv8/dMgbx9PNNYAmuAHNTh9CV5vq++oWwx/BbRJTvwDkeDi0v4acd5OE+HsrbO LlcA== X-Gm-Message-State: AAQBX9dXrc3CxuEY1IHA9veZGn+vBCFT8Omc6iG+Szvei5PkDHrP+Ky4 3r2t28XkPzVELyr5aT6FGYeNdUOowa1wWrCKBodxSl0n59AqE6P5ysH9sndy8GFwxTZ2kHQo3mY IUoD0jyzyzNh6227XHxoyC1uz X-Received: by 2002:a5d:6a85:0:b0:303:a2e4:e652 with SMTP id s5-20020a5d6a85000000b00303a2e4e652mr10068942wru.14.1682428135279; Tue, 25 Apr 2023 06:08:55 -0700 (PDT) X-Received: by 2002:a5d:6a85:0:b0:303:a2e4:e652 with SMTP id s5-20020a5d6a85000000b00303a2e4e652mr10068917wru.14.1682428134946; Tue, 25 Apr 2023 06:08:54 -0700 (PDT) Received: from redhat.com ([2.55.17.255]) by smtp.gmail.com with ESMTPSA id o4-20020a056000010400b002fa67f77c16sm13024173wrx.57.2023.04.25.06.08.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Apr 2023 06:08:54 -0700 (PDT) Date: Tue, 25 Apr 2023 09:08:50 -0400 From: "Michael S. Tsirkin" To: Alvaro Karsz Cc: Jason Wang , "davem@davemloft.net" , "edumazet@google.com" , "kuba@kernel.org" , "pabeni@redhat.com" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH net] virtio-net: reject small vring sizes Message-ID: <20230425090723-mutt-send-email-mst@kernel.org> References: <20230417075645-mutt-send-email-mst@kernel.org> <20230423031308-mutt-send-email-mst@kernel.org> <20230423065132-mutt-send-email-mst@kernel.org> <20230425041352-mutt-send-email-mst@kernel.org> <20230425082150-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 25, 2023 at 01:02:38PM +0000, Alvaro Karsz wrote: > > > In the virtnet case, we'll decide which features to block based on the ring size. > > > 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF > > > ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ > > > > why MRG_RXBUF? what does it matter? > > > > You're right, it should be blocked only when ring < 2. > Or we should let this pass, and let the device figure out that MRG_RXBUF is meaningless with 1 entry.. yep, later I think. > > > So we'll need a new virtio callback instead of flags. > > > Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment). > > > So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features? > > > > > > In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features. > > > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation) > > > > > > So, the flow is something like: > > > > > > * Super early in virtnet probe, we peek at the VQ lengths and decide if we are > > > using small vrings, if so, we reset and renegotiate the features. > > > > Using which APIs? What does peek_vqs_len do and why does it matter that > > it is super early? > > > > We peek at the lengths using a new virtio_config.h function that calls a transport specific callback. > We renegotiate calling the new, exported virtio core function. > > peek_vqs_len fills an array of u16 variables with the max length of every VQ. > > The idea here is not to fail probe. > So we start probe, check if the ring is small, renegotiate the features and then continue with the new features. > This needs to be super early because otherwise, some virtio_has_feature calls before re-negotiating may be invalid, meaning a lot of reconfigurations. > > > > * We continue normally and create the VQs. > > > * We check if the created rings are small. > > > If they are and some blocked features were negotiated anyway (may occur if > > > the re-negotiation fails, or if the transport has no implementation for > > > peek_vqs_len), we fail probe. > > > If the ring is small and the features are ok, we mark the virtnet device as > > > vring_small and fixup some variables. > > > > > > > > > peek_vqs_len is needed because we must know the VQ length before calling init_vqs. > > > > > > During virtnet_find_vqs we check the following: > > > vi->has_cvq > > > vi->big_packets > > > vi->mergeable_rx_bufs > > > > > > But these will change if the ring is small.. > > > > > > (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure) > > > > > > > > > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working. > > > > > > I'm considering splitting the effort into 2 series. > > > A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case. > > > > > > I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested. > > > > > > What do you think? > > > > Lots of work spilling over to transports. > > > > And I especially don't like that it slows down boot on good path. > > Yes, but I don't think that this is really significant. > It's just a call to the transport to get the length of the VQs. With lots of VQs that is lots of exits. > If ring is not small, we continue as normal. > If ring is small, we renegotiate and continue, without failing probe. > > > > > I have the following idea: > > - add a blocked features value in virtio_device > > - before calling probe, core saves blocked features > > - if probe fails, checks blocked features. > > if any were added, reset, negotiate all features > > except blocked ones and do the validate/probe dance again > > > > > > This will mean mostly no changes to drivers: just check condition, > > block feature and fail probe. > > > > I like the idea, will try to implement it. > > Thanks,