Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp6223837pxb; Mon, 8 Nov 2021 05:17:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJyYZLH0ZX1enf0+t6c1X9+UgofaDfhh9HWdgDLdSUR5vG/vZ+tX7ugi2qxS3HYLQ+v0DzYS X-Received: by 2002:a92:c983:: with SMTP id y3mr52514857iln.24.1636377449737; Mon, 08 Nov 2021 05:17:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1636377449; cv=none; d=google.com; s=arc-20160816; b=NTCyE+GTJV3lR50JR1CzPEv8MveE/I+QtxYyU1GtwGhj0eiMdWAA5uq3k21vAbkqfB hsqrbHBa6E+opragH4NhoggtpmEO90olm8heaYkp4CaTSQSQOae4bt6XTrCdHJ5VqVle DBcHQ44hBfvp+mFRTKd6r7XinXSf3rMtMrzzze9CH+EgnjxjqrPkBom5n7Tz6nQ3uNpn 4eV/RDiHkMuGYAkcOkkqO72yAcf7yt8oamTAcGG1FhkGJOyAapWF00h6KbI+XARBqAPS q8Mr/DWIbEJ7u67GcgEmEzGHY1/NkPhv57i9JndSpWnvauICHXTiIu7LigaiGhNsKeO2 SjRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=yvOuI8LHQCnzD0XXjEMcYafm6IRvktnS6XgGtbq1JxQ=; b=kRDYFcAKkOl+KI6vgNe+HrsVjox/N8e6R7ya+nDxp1GDaStHhEGg4yBr8vZjCJZHFd RTDbpP17XIqnW2gf2sNstez7Aj+91Y35OGuT0abYG+wFD6IiwyHDDqRL7rDInDX4OPa0 doJhotk2HXU8NfUhsaX3+5Q3ACPH1zntt5NK4vVvfXysFOd2YMxQixcY+VDBLLFskQ30 eZjnOuagwy0puhWaO4uw0A0WlFRNDO1WtpbYOzBNvmiYzSfZMt+hInBo0R5ujxr8Y+G5 T7xu0qkfCD5HFMZXhQU165z90d3dVVfkWQH1ksL66gPJd6Pqk6ucqhF6d//0GCIuGgxg +zCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=RVVkSYrn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v12si23721100ilc.141.2021.11.08.05.17.16; Mon, 08 Nov 2021 05:17:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=RVVkSYrn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238406AbhKHJbL (ORCPT + 99 others); Mon, 8 Nov 2021 04:31:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231657AbhKHJbL (ORCPT ); Mon, 8 Nov 2021 04:31:11 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5173FC061570; Mon, 8 Nov 2021 01:28:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=yvOuI8LHQCnzD0XXjEMcYafm6IRvktnS6XgGtbq1JxQ=; b=RVVkSYrnWnWvMTbCtDk+8k1/fS JyqSipGwP6DN1MYSc1xg2BIYd0UpmsGUEmgBUTeJAUFdD1P9MD2EikRsXmSU/nL8eMigjlihITgXZ lsRpRQuQn0tCDuVSjK/YwIdONWVzBfyiS79hW2/dxJFRSPt3Bvs4EGMltG3EZchOn6nFvdkLXB+jU DEuBBL9EQtL+kDGDTmI9OHvpqzJL4DI07zjULDn7wI3j2u3H+nBEW+8qsYbD4MF5NLwP3pvXFpK+C F5OWVJKE1k3gw1JKKlDDYnDhZs/Zz1cQEqDprDFzSPcbVVcD1vbl4kHwVDO2u1jpjClR+9wSdFQuP iHwaIkkw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mk0wE-00ErJv-FE; Mon, 08 Nov 2021 09:27:38 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id ADA673000A3; Mon, 8 Nov 2021 10:27:33 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 66E4B2CD0FE89; Mon, 8 Nov 2021 10:27:33 +0100 (CET) Date: Mon, 8 Nov 2021 10:27:33 +0100 From: Peter Zijlstra To: Barry Song <21cnbao@gmail.com> Cc: David Miller , kuba@kernel.org, Eric Dumazet , pabeni@redhat.com, fw@strlen.de, Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Thomas Gleixner , netdev@vger.kernel.org, LKML , Linuxarm , Guodong Xu , yangyicong , shenyang39@huawei.com, tangchengchang@huawei.com, Barry Song , Libo Chen , Tim Chen Subject: Re: [RFC PATCH] sched&net: avoid over-pulling tasks due to network interrupts Message-ID: References: <20211105105136.12137-1-21cnbao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 08, 2021 at 07:08:09AM +1300, Barry Song wrote: > On Sat, Nov 6, 2021 at 1:25 AM Peter Zijlstra wrote: > > > > On Fri, Nov 05, 2021 at 06:51:36PM +0800, Barry Song wrote: > > > From: Barry Song > > > > > > In LPC2021, both Libo Chen and Tim Chen have reported the overpull > > > of network interrupts[1]. For example, while running one database, > > > ethernet is located in numa0, numa1 might be almost idle due to > > > interrupts are pulling tasks to numa0 because of wake_up affine. > > > I have seen the same problem. One way to solve this problem is > > > moving to a normal wakeup in network rather than using a sync > > > wakeup which will be more aggressively pulling tasks in scheduler > > > core. > > > > > > On kunpeng920 with 4numa, ethernet is located at numa0, storage > > > disk is located at numa2. While using sysbench to connect this > > > mysql machine, I am seeing numa1 is idle though numa0,2 and 3 > > > are quite busy. > > > > > > > > I am not saying this patch is exactly the right approach, But I'd > > > like to use this RFC to connect the people of net and scheduler, > > > and start the discussion in this wider range. > > > > Well the normal way would be to use multi-queue crud and/or receive > > packet steering to get the interrupt/wakeup back to the cpu that data > > came from. > > The test case has been a multi-queue ethernet and irqs are balanced > to NUMA0 by irqbalanced or pinned to NUMA0 where the card is located > by the script like: > #!/bin/bash > irq_list=(`cat /proc/interrupts | grep network_name| awk -F: '{print $1}'`) > cpunum=0 > for irq in ${irq_list[@]} > do > echo $cpunum > /proc/irq/$irq/smp_affinity_list > echo `cat /proc/irq/$irq/smp_affinity_list` > (( cpunum+=1 )) > done > > I have heard some people are working around this issue by pinning > multi-queue IRQs to multiple NUMAs which can spread interrupts and > avoid over-pulling tasks to one NUMA only, but lose ethernet locality? So you're doing explicitly the wrong thing with your script above and then complain the scheduler follows that and destroys your data locality? The network folks made RPS/RFS specifically to spread the processing of the packets back to the CPUs/Nodes the TX happened on to increase data locality. Why not use that?