Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2876288rdb; Sat, 9 Dec 2023 02:42:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IEmcCGZ+yNAdU4xvNBpTcXmq6HbqzFqUm0s3ULX/rvh7Zu0zO109E7sBnxm2Z4g1ODLoIe8 X-Received: by 2002:a17:902:c94d:b0:1d0:6ffd:e2bc with SMTP id i13-20020a170902c94d00b001d06ffde2bcmr1517481pla.86.1702118560621; Sat, 09 Dec 2023 02:42:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702118560; cv=none; d=google.com; s=arc-20160816; b=ahERBJaOfSjVekjFnAOkSpe/qylVY6dqQxGNgx308bSnczfO0ILaFxxaWIhl0MHvNa KL89d07PhynStEyhuFzqUt9BdUatBMqq+aKiSMx7icXogaFVRp0dBbYcY+Yvd9WvXcIF U+2j9K4KL5XKgqqiV+NTiK5rMv0zO9qXUTROItyurDRdVjSXAYazLi/dioOXNtJIVmYZ ROCGRKrCc3kYPS+CgewSmI4gDYLUBop9U6J1gLiM+STbeTSN1YkziGbMKK9N2V8hx4Mf ivIe2KCrcFI3L0N5m1BKOMiIhOQ/wUzR5PGDnV+vlTeTJY54HotFXtrw506qgjEmdYXd sdvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=jYXGvxRUfrhZD8ZKPprmEb3EjUd1OkMKc/tesKE8NYk=; fh=K0zzkQOXw3TW1ff0eCufJ5TPQjK6nPOA0w843F20MvM=; b=M8gqfyfd7uUcfrzbPAd+d0OT0JiDulKoQjYHccqyW32KCf/EHsSwBgj01jsmwE7z9F VeYVZpAjugq+unaxE47kwBMZj7vKoBx657q916Peb1Dhhy6+S3CI5MpAwzazvQGcpLB5 HJmwO4Ni6GJY9h+xUmE02qqyjqi4Je25EDS7mbXO2wIfi5yo27JsxX5OXLnIzcOmKzEb Ep41BbAArfHG20WDDog2wg/jJOkfPtZ3N+JtGrL66JwbnYKfiMfb6BUEbKdJ/zi6dWcD HZeW3MQ5SKCRggmuyOSHnQHVSabQadmG02+CYtxtHN04t26nNozw7xapPcPL3kFyD7aV I+Fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=czHkUlaR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id u11-20020a170902e5cb00b001cf5c99f036si3060595plf.90.2023.12.09.02.42.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 09 Dec 2023 02:42:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=czHkUlaR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 6D81D805F96A; Sat, 9 Dec 2023 02:42:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229743AbjLIKm3 (ORCPT + 99 others); Sat, 9 Dec 2023 05:42:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjLIKm2 (ORCPT ); Sat, 9 Dec 2023 05:42:28 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 733F2C8 for ; Sat, 9 Dec 2023 02:42:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702118552; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=jYXGvxRUfrhZD8ZKPprmEb3EjUd1OkMKc/tesKE8NYk=; b=czHkUlaRiaNFarIf6BSDsjk88VT9tvhozYXPxIXUCUsU23g9TIrnlziiDoKha1CmmfKduO fstOCHzGxgW31FGX+UT7NbBStab/JULX1Moikmk0df3Z+R/0IU2oGGeu6bM+GtHKXojTK0 zAxPpMNgGAd0ec6MER5K97T9Q7at8bI= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-441-ES1rarCwOI6_gYNRqZ01Ow-1; Sat, 09 Dec 2023 05:42:30 -0500 X-MC-Unique: ES1rarCwOI6_gYNRqZ01Ow-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-40b3d4d6417so17967935e9.0 for ; Sat, 09 Dec 2023 02:42:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702118549; x=1702723349; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jYXGvxRUfrhZD8ZKPprmEb3EjUd1OkMKc/tesKE8NYk=; b=qStFiU14v6+w6Su3WSSjpy277woNp2YNJ4dASHvG0yEDWvvxnAmJtTMtONLg6R8y/W hUr+NrW6y/2DEdawjaWHgdVViYIBkaELKqLMBR9WN8d2KHbU1x+BoZCoB/HqGn9smmhV IIyLOX+PECiIN6NSEdQAobTjd/GIBea9VOin7NOBQDtAITfnjDfePhoRkkw8+I0loCCr VmQn382dtAlaLZmSfSUf2pI8lufyC8fAsCF2jKEtXW/erRrZsnHFkQp1MPyNHYeU9OWn wpCDfvEYv9Wsc/Bm1jatqINOM+PkXVCL7jJ2z1PT0eA2WI3DS5cN593Z2t0gTdkQQw43 zkKw== X-Gm-Message-State: AOJu0YxeGXqb9dXgdoe5To+WnkVFvEfXZ3pTnjLFVO4Ewk3AQc3Yjzzt U6W5Rt6d6JrKKqe6RKGmEMdv1QF2K24XNfNy3FFbytoFzDstDVq5qbfpEAzLWHml1HED7tjAZuO fg7ZZQe7R+tcW/CaV3auN2sJQ X-Received: by 2002:a05:600c:808:b0:40c:32df:da03 with SMTP id k8-20020a05600c080800b0040c32dfda03mr347247wmp.305.1702118549572; Sat, 09 Dec 2023 02:42:29 -0800 (PST) X-Received: by 2002:a05:600c:808:b0:40c:32df:da03 with SMTP id k8-20020a05600c080800b0040c32dfda03mr347239wmp.305.1702118549151; Sat, 09 Dec 2023 02:42:29 -0800 (PST) Received: from redhat.com ([2a06:c701:73ff:4f00:b091:120e:5537:ac67]) by smtp.gmail.com with ESMTPSA id u15-20020a05600c138f00b004060f0a0fd5sm6031663wmf.13.2023.12.09.02.42.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 09 Dec 2023 02:42:28 -0800 (PST) Date: Sat, 9 Dec 2023 05:42:25 -0500 From: "Michael S. Tsirkin" To: Tobias Huschle Cc: Abel Wu , Peter Zijlstra , Linux Kernel , kvm@vger.kernel.org, virtualization@lists.linux.dev, netdev@vger.kernel.org, jasowang@redhat.com Subject: Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement) Message-ID: <20231209053443-mutt-send-email-mst@kernel.org> References: <46a997c2-5a38-4b60-b589-6073b1fac677@bytedance.com> <20231122100016.GO8262@noisy.programming.kicks-ass.net> <6564a012.c80a0220.adb78.f0e4SMTPIN_ADDED_BROKEN@mx.google.com> <07513.123120701265800278@us-mta-474.us.mimecast.lan> <20231207014626-mutt-send-email-mst@kernel.org> <56082.123120804242300177@us-mta-137.us.mimecast.lan> <20231208052150-mutt-send-email-mst@kernel.org> <53044.123120806415900549@us-mta-342.us.mimecast.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53044.123120806415900549@us-mta-342.us.mimecast.lan> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sat, 09 Dec 2023 02:42:39 -0800 (PST) On Fri, Dec 08, 2023 at 12:41:38PM +0100, Tobias Huschle wrote: > On Fri, Dec 08, 2023 at 05:31:18AM -0500, Michael S. Tsirkin wrote: > > On Fri, Dec 08, 2023 at 10:24:16AM +0100, Tobias Huschle wrote: > > > On Thu, Dec 07, 2023 at 01:48:40AM -0500, Michael S. Tsirkin wrote: > > > > On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote: > > > > > 3. vhost looping endlessly, waiting for kworker to be scheduled > > > > > > > > > > I dug a little deeper on what the vhost is doing. I'm not an expert on > > > > > virtio whatsoever, so these are just educated guesses that maybe > > > > > someone can verify/correct. Please bear with me probably messing up > > > > > the terminology. > > > > > > > > > > - vhost is looping through available queues. > > > > > - vhost wants to wake up a kworker to process a found queue. > > > > > - kworker does something with that queue and terminates quickly. > > > > > > > > > > What I found by throwing in some very noisy trace statements was that, > > > > > if the kworker is not woken up, the vhost just keeps looping accross > > > > > all available queues (and seems to repeat itself). So it essentially > > > > > relies on the scheduler to schedule the kworker fast enough. Otherwise > > > > > it will just keep on looping until it is migrated off the CPU. > > > > > > > > > > > > Normally it takes the buffers off the queue and is done with it. > > > > I am guessing that at the same time guest is running on some other > > > > CPU and keeps adding available buffers? > > > > > > > > > > It seems to do just that, there are multiple other vhost instances > > > involved which might keep filling up thoses queues. > > > > > > > No vhost is ever only draining queues. Guest is filling them. > > > > > Unfortunately, this makes the problematic vhost instance to stay on > > > the CPU and prevents said kworker to get scheduled. The kworker is > > > explicitly woken up by vhost, so it wants it to do something. > > > > > > At this point it seems that there is an assumption about the scheduler > > > in place which is no longer fulfilled by EEVDF. From the discussion so > > > far, it seems like EEVDF does what is intended to do. > > > > > > Shouldn't there be a more explicit mechanism in use that allows the > > > kworker to be scheduled in favor of the vhost? > > > > > > It is also concerning that the vhost seems cannot be preempted by the > > > scheduler while executing that loop. > > > > > > Which loop is that, exactly? > > The loop continously passes translate_desc in drivers/vhost/vhost.c > That's where I put the trace statements. > > The overall sequence seems to be (top to bottom): > > handle_rx > get_rx_bufs > vhost_get_vq_desc > vhost_get_avail_head > vhost_get_avail > __vhost_get_user_slow > translate_desc << trace statement in here > vhost_iotlb_itree_first I wonder why do you keep missing cache and re-translating. Is pr_debug enabled for you? If not could you check if it outputs anything? Or you can tweak: #define vq_err(vq, fmt, ...) do { \ pr_debug(pr_fmt(fmt), ##__VA_ARGS__); \ if ((vq)->error_ctx) \ eventfd_signal((vq)->error_ctx, 1);\ } while (0) to do pr_err if you prefer. > These functions show up as having increased overhead in perf. > > There are multiple loops going on in there. > Again the disclaimer though, I'm not familiar with that code at all. So there's a limit there: vhost_exceeds_weight should requeue work: } while (likely(!vhost_exceeds_weight(vq, ++recv_pkts, total_len))); then we invoke scheduler each time before re-executing it: { struct vhost_worker *worker = data; struct vhost_work *work, *work_next; struct llist_node *node; node = llist_del_all(&worker->work_list); if (node) { __set_current_state(TASK_RUNNING); node = llist_reverse_order(node); /* make sure flag is seen after deletion */ smp_wmb(); llist_for_each_entry_safe(work, work_next, node, node) { clear_bit(VHOST_WORK_QUEUED, &work->flags); kcov_remote_start_common(worker->kcov_handle); work->fn(work); kcov_remote_stop(); cond_resched(); } } return !!node; } These are the byte and packet limits: /* Max number of bytes transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others. */ #define VHOST_NET_WEIGHT 0x80000 /* Max number of packets transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others with small * pkts. */ #define VHOST_NET_PKT_WEIGHT 256 Try reducing the VHOST_NET_WEIGHT limit and see if that improves things any? -- MST