Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4208110ybl; Mon, 9 Dec 2019 07:10:40 -0800 (PST) X-Google-Smtp-Source: APXvYqzUSbKobjodPDH5EdYj+oEFGJnmn7TGkbaHUh8LWMQXED7EHH7h/RMkX/eWb7lDXZ/C3Nmt X-Received: by 2002:aca:2207:: with SMTP id b7mr24925809oic.109.1575904240076; Mon, 09 Dec 2019 07:10:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575904240; cv=none; d=google.com; s=arc-20160816; b=fpv90yfxPK+0fsOc42YDDYdtd6yU8r+hfKaF7BLtjBMii4VaD9hoGKZIs+0DrBWdSD IuJM4kLzdle6dIXBfX8hgdV14xfP3KaFjBGPV//3TKpYsvBnMrJd7qkm/N48BW9X+4MT j9cJvWY0T9SmiOEdlWhYSogZ4wsfPqBBCbv1LjlBE6JAFFVfyyihMDGOl04VPFJR6Te0 qFOTKyxTeU1aGvfc5GVMCxGXYleX21ovprHQdIsW+6D0OezK8vKkODHi6shghaIlWkHd PyoVKaXnqieKqv/OWTQzrEv59DbHrEWQ+LRY4UekhpliLl8AFRPx5SK99Se0sKtl0NKm 12Dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=LrEyzPJt3tF8cdtal6ktFGZxpVyAXJ+/Q2z38hiuZuw=; b=l2QzTyOC5NabbCGmgwAlQEJUdes85MpTXj3hK4GJoFW5TADirvCi7RaqbVwuSuh4C+ eUhBdVlbz2T1p+jzADq71B+NCqmIDjFxrOJxGDWYg1ew3Kp6nScNwyf2eqptCamnVmRP Xn6j4lcFgknGcokcy1zfn8rxJNnjO6k0GG8YEsOLZEp1cT4gJQsMdIDlIET+g2oA2Q/l 68JVGQHQDhs7Tk6+IqNQWjXNEpWXUV6KY7IQzNBOTt+fhWqdqx4jasngdz0xWiF72CQX dN5p58XfBLultsyMjTp0OXscSJcEE3XC7gIgV0EJZpdBGDxCdi19xmfeBe3TkN9Vw4cP 5IFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si12147500oib.104.2019.12.09.07.10.28; Mon, 09 Dec 2019 07:10:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726646AbfLIPJt (ORCPT + 99 others); Mon, 9 Dec 2019 10:09:49 -0500 Received: from mx2.suse.de ([195.135.220.15]:48966 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725956AbfLIPJs (ORCPT ); Mon, 9 Dec 2019 10:09:48 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0FEA1AC3C; Mon, 9 Dec 2019 15:09:46 +0000 (UTC) Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt To: John Garry , Ming Lei Cc: tglx@linutronix.de, chenxiang66@hisilicon.com, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, maz@kernel.org, hare@suse.com, hch@lst.de, axboe@kernel.dk, bvanassche@acm.org, peterz@infradead.org, mingo@redhat.com References: <1575642904-58295-1-git-send-email-john.garry@huawei.com> <1575642904-58295-2-git-send-email-john.garry@huawei.com> <20191207080335.GA6077@ming.t460p> <78a10958-fdc9-0576-0c39-6079b9749d39@huawei.com> From: Hannes Reinecke Message-ID: <305198e5-f76f-ded4-946b-9cfade18f08c@suse.de> Date: Mon, 9 Dec 2019 16:09:43 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.1 MIME-Version: 1.0 In-Reply-To: <78a10958-fdc9-0576-0c39-6079b9749d39@huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/9/19 3:30 PM, John Garry wrote: > On 07/12/2019 08:03, Ming Lei wrote: >> On Fri, Dec 06, 2019 at 10:35:04PM +0800, John Garry wrote: >>> Currently the cpu allowed mask for the threaded part of a threaded irq >>> handler will be set to the effective affinity of the hard irq. >>> >>> Typically the effective affinity of the hard irq will be for a single >>> cpu. As such, >>> the threaded handler would always run on the same cpu as the hard irq. >>> >>> We have seen scenarios in high data-rate throughput testing that the cpu >>> handling the interrupt can be totally saturated handling both the hard >>> interrupt and threaded handler parts, limiting throughput. >> > > Hi Ming, > >> Frankly speaking, I never observed that single CPU is saturated by one >> storage >> completion queue's interrupt load. Because CPU is still much quicker than >> current storage device. >> >> If there are more drives, one CPU won't handle more than one >> queue(drive)'s >> interrupt if (nr_drive * nr_hw_queues) < nr_cpu_cores. > > Are things this simple? I mean, can you guarantee that fio processes are > evenly distributed as such? > I would assume that it does, seeing that that was the primary goal of fio ... >> >> So could you describe your case in a bit detail? Then we can confirm >> if this change is really needed. > > The issue is that the CPU is saturated in servicing the hard and > threaded part of the interrupt together - here's the sort of thing which > we saw previously: > Before: > CPU    %usr    %sys    %irq    %soft    %idle > all    2.9    13.1    1.2    4.6    78.2 > 0    0.0    29.3    10.1    58.6    2.0 > 1    18.2    39.4    0.0    1.0    41.4 > 2    0.0    2.0    0.0    0.0    98.0 > > CPU0 has no effectively no idle. > > Then, by allowing the threaded part to roam: > After: > CPU    %usr    %sys    %irq    %soft    %idle > all    3.5    18.4    2.7    6.8    68.6 > 0    0.0    20.6    29.9    29.9    19.6 > 1    0.0    39.8    0.0    50.0    10.2 > > Note: I think that I may be able to reduce the irq hard part load in the > endpoint driver, but not that much such that we see still this issue. > Well ... to get a _real_ comparison you would need to specify the number of irqs handled (and the resulting IOPS) alongside the cpu load. It might well be that by spreading out the interrupts to other CPUs we're increasing the latency, thus trivially reducing the load ... My idea here is slightly different: can't we leverage SMT? Most modern CPUs do SMT (I guess even ARM does it nowadays) (Yes, I know about spectre and things. We're talking performance here :-) So for 2-way SMT one could move the submisson queue on one side, and the completion queue handling (ie the irq handling) on the other side. Due to SMT we shouldn't suffer from cache misses (keep fingers crossed), and might even get better performance. John, would such a scenario work on your boxes? IE can we tweak the interrupt and queue assignment? Initially I would love to test things out, just to see what'll be happening; might be that it doesn't bring any benefit at all, but it'd be interesting to test out anyway. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer