Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp1615170rdb; Mon, 8 Jan 2024 05:14:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IEK2uy+onZaxzG7Cu05v1X1fS9GHS6kl7oPGNeLFlljBM6nw96C3Dco5s/9vyrXB2EbN4Qs X-Received: by 2002:ac8:5ace:0:b0:429:84d9:767c with SMTP id d14-20020ac85ace000000b0042984d9767cmr4641392qtd.68.1704719640489; Mon, 08 Jan 2024 05:14:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704719640; cv=none; d=google.com; s=arc-20160816; b=iWb19vAWKp7fdugFqDDdehV5lq8Y2olrenbYdIs/CeJ5QsnR9PaEfUyPlBuDmwcpAo ACcyi+gMClk0HKuu+Ojyb+4ddODviW1Qu/hUR8QzCIsAkS58+SygCVNh6SyDJiYHFAR2 sJGYDyYsg+SZX8rq9ultTvKW5XAljMgi7he0m5SlDZabJj3aWZdT+RgeLY716Xqqnac7 bIUlba2gEbn7N/J2CkwSOnjgSeC3PDMh5OjCQtTx0S6hmiLplpx+xHca2FOScv6ukNqj pkx1+m8SGNBexbU/ArF8ritsmU1Y5jWeO0qy6ZwgAHt+qQCoerbcAkXkCQxhOMBx60jg Wadw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:subject:cc:to:from :date:dkim-signature:message-id; bh=TUkqyDZU2w0pJe2IADGbbappcrOnCPRNodexsVKjT+8=; fh=gfQEQoe0tA5SOM7iPZm3UUiovOGNjtQY8F/7NSCTybw=; b=TQjBTmtggFMIkt7RJASC7evqvsAFXGVFStD9plBxog1y+MRVaPCKR9zqy8dxHIz6MG JGhoRJVlNTqTyv/uKkMRYd13bTGXHOlJpBF9Tcv1te8VVRirf9apbPmapAuXXUvLcT5/ Ak+F/biY0k1OC5aYVazMwrYsdy2pN7Ng+JmCjjKyDtrEV4GbNWzfsIhBXyHCciAqUTsk a5SaFPLtxvtr1TxFqT4mxrA4nD7Ms1VPjkOCICf8wuOfiupFgoSpJTdJS534pb254UsA vyld8HgwyVDjLpthgkAh2IUHesr95LSDlq6wYXpLd238PdXsdUQAptJaQqoXZk07fZZZ SG2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=OAoSGY28; spf=pass (google.com: domain of linux-kernel+bounces-19599-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-19599-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id t21-20020ac85895000000b0042996fd5595si1762933qta.560.2024.01.08.05.14.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jan 2024 05:14:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-19599-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=OAoSGY28; spf=pass (google.com: domain of linux-kernel+bounces-19599-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-19599-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Message-ID: <659bf518.c80a0220.5d14f.73d6SMTPIN_ADDED_BROKEN@mx.google.com> X-Google-Original-Message-ID: Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 40AD91C22752 for ; Mon, 8 Jan 2024 13:14:00 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E3DF744C83; Mon, 8 Jan 2024 13:13:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="OAoSGY28" X-Original-To: linux-kernel@vger.kernel.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A156E4174D; Mon, 8 Jan 2024 13:13:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353727.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 408ClkX7008664; Mon, 8 Jan 2024 13:13:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=TUkqyDZU2w0pJe2IADGbbappcrOnCPRNodexsVKjT+8=; b=OAoSGY28pCKHNAjzJrlgLXmSe/AsB4iZf8ugjpTFG0OxcsC9b2DjyRTdaMr24WBfiREm 4hoGrP1gWHyMMPP7SCH1NIqBCGpRlNp2TsgvICA4c111As3MRU+oCMKQl0OlJvj1IDJS zotNXl+5uON9ss4zf/lxkZq6niHLf876crLFvBM2CYHSp55tKW0Pwyr2/eKPi336uqcb vENdCWRGiHzXxgr7Do/B98PTdcVUeca4jYTm5Z4dw3UX2t5E/znGqTRtKiQmZD8Go+L3 tYamG7a3JEwQW/j9j5unqnQRcN9O3pfSvVJDa5rMkFsjrCxAyDo7roMwHO/m/72/SRQj qg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3vghc3gnh3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Jan 2024 13:13:30 +0000 Received: from m0353727.ppops.net (m0353727.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 408Cm39u009852; Mon, 8 Jan 2024 13:13:30 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3vghc3gngk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Jan 2024 13:13:30 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 408B6lwk027006; Mon, 8 Jan 2024 13:13:29 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3vfkw1qjnm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Jan 2024 13:13:29 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 408DDRDV40501792 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 8 Jan 2024 13:13:27 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8327920049; Mon, 8 Jan 2024 13:13:27 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 56CFD20040; Mon, 8 Jan 2024 13:13:27 +0000 (GMT) Received: from DESKTOP-2CCOB1S. (unknown [9.171.166.51]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 8 Jan 2024 13:13:27 +0000 (GMT) Date: Mon, 8 Jan 2024 14:13:25 +0100 From: Tobias Huschle To: "Michael S. Tsirkin" Cc: Jason Wang , Abel Wu , Peter Zijlstra , Linux Kernel , kvm@vger.kernel.org, virtualization@lists.linux.dev, netdev@vger.kernel.org Subject: Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement) References: <20231209053443-mutt-send-email-mst@kernel.org> <20231211115329-mutt-send-email-mst@kernel.org> <20231212111433-mutt-send-email-mst@kernel.org> <42870.123121305373200110@us-mta-641.us.mimecast.lan> <20231213061719-mutt-send-email-mst@kernel.org> <25485.123121307454100283@us-mta-18.us.mimecast.lan> <20231213094854-mutt-send-email-mst@kernel.org> <20231214021328-mutt-send-email-mst@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231214021328-mutt-send-email-mst@kernel.org> X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ECAMvZcC_h-gkpU_Pbp9Ecir0gWIPvvz X-Proofpoint-GUID: uFGhddQc-GSToOejEHlZ_c_bFqoRBb5C X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-08_04,2024-01-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 priorityscore=1501 suspectscore=0 bulkscore=0 mlxlogscore=825 phishscore=0 clxscore=1011 adultscore=0 spamscore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401080113 On Thu, Dec 14, 2023 at 02:14:59AM -0500, Michael S. Tsirkin wrote: > > Peter, would appreciate feedback on this. When is cond_resched() > insufficient to give up the CPU? Should Documentation/kernel-hacking/hacking.rst > be updated to require schedule() instead? > Happy new year everybody! I'd like to bring this thread back to life. To reiterate: - The introduction of the EEVDF scheduler revealed a performance regression in a uperf testcase of ~50%. - Tracing the scheduler showed that it takes decisions which are in line with its design. - The traces showed as well, that a vhost instance might run excessively long on its CPU in some circumstance. Those cause the performance regression as they cause delay times of 100+ms for a kworker which drives the actual network processing. - Before EEVDF, the vhost would always be scheduled off its CPU in favor of the kworker, as the kworker was being woken up and the former scheduler was giving more priority to the woken up task. With EEVDF, the kworker, as a long running process, is able to accumulate negative lag, which causes EEVDF to not prefer it on its wake up, leaving the vhost running. - If the kworker is not scheduled when being woken up, the vhost continues looping until it is migrated off the CPU. - The vhost offers to be scheduled off the CPU by calling cond_resched(), but, the the need_resched flag is not set, therefore cond_resched() does nothing. To solve this, I see the following options (might not be a complete nor a correct list) - Along with the wakeup of the kworker, need_resched needs to be set, such that cond_resched() triggers a reschedule. - The vhost calls schedule() instead of cond_resched() to give up the CPU. This would of course be a significantly stricter approach and might limit the performance of vhost in other cases. - Preventing the kworker from accumulating negative lag as it is mostly not runnable and if it runs, it only runs for a very short time frame. This might clash with the overall concept of EEVDF. - On cond_resched(), verify if the consumed runtime of the caller is outweighing the negative lag of another process (e.g. the kworker) and schedule the other process. Introduces overhead to cond_resched. I would be curious on feedback on those ideas and interested in alternative approaches.