Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp833876imm; Fri, 31 Aug 2018 14:51:29 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaVNOIna5fFyCkYWZ/39OrMJ7v3NHZXKyn27Sqj0fdLMLfVcjkyF1zacjRWCU/wq8oBXPQR X-Received: by 2002:a62:9645:: with SMTP id c66-v6mr18079120pfe.56.1535752289733; Fri, 31 Aug 2018 14:51:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535752289; cv=none; d=google.com; s=arc-20160816; b=naTT2RyPJCLF+EchZFlbfIXCGebiobs8889xOtxEpkmafMC2EojBQF6dfFZ66y/jpE 0IJ+mEw9Ecf+SpM7+0f0HYFytB/w7g8qJGk50lGRaa9IXarsvglkWkF0k4aRszKCypQ8 f+w8vazAThVwIWtTaZwhukNvB8/LtkLsFQOj1gigseNgFVJJTdHTTWV5nqKkRAzmB12f WIerk0ukCo1zUiE2s1FpWE32mZ4QJjtHsIJLMlLUXoHO481x0n2pDVk+nU7GXg2K2C5u OCK+WmkTyC0yjLLs6dWZda9TjACRc8BsYmAqoiEi1zMQtzAUwNPvmQvgS6kz9YmqcHDW Zfjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date :thread-index:mime-version:in-reply-to:references:from :dkim-signature:arc-authentication-results; bh=tAEEqfIiv0uUZ02rBf+ol7bFaoJs31HIId0po3oKwJo=; b=VbuVp1YcyDx4qQrBtP+ijncCIXBverPn1H1DAfEDX/r9RmJayHIHunI/6GXCnAbDUP RY6BmUK53ZbPqr8K0OCKANVosTesKx36NxN6E8cay52ezX+NgY8N4laUQElKjb94uTmu fE2N3s6Nfnc983MtB4A9r3YBQekhjxuew2F6DSkT4yu1ayHx7F52W1GS1UEU5LiCslkA Gpf/BM/wHliDibyZ9o/K9RcCcTCsc9+ikTo2hIS6Izff8UBiv0MEd3CuDOjujb/5v7Ao GWM4MKxLtkexd0Yd3/2OtzTzKhhlNpt+9NzyrjD5aExiWIBbughmd+bdHfkmzY9MubWa KuhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=Py9D6ysI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g27-v6si12152158pfj.283.2018.08.31.14.51.03; Fri, 31 Aug 2018 14:51:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=Py9D6ysI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727533AbeIAB6p (ORCPT + 99 others); Fri, 31 Aug 2018 21:58:45 -0400 Received: from mail-it0-f50.google.com ([209.85.214.50]:52184 "EHLO mail-it0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727485AbeIAB6o (ORCPT ); Fri, 31 Aug 2018 21:58:44 -0400 Received: by mail-it0-f50.google.com with SMTP id e14-v6so8981744itf.1 for ; Fri, 31 Aug 2018 14:49:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc; bh=tAEEqfIiv0uUZ02rBf+ol7bFaoJs31HIId0po3oKwJo=; b=Py9D6ysI9Y2BndGJWMjQifuS71WX6yeEw+tdxUJdTG7ofAq+BWoY/UrENHnNUR7apC sk3tiviobBgOhh3gP3ZjjUrrHZl9hF19Ljutt7VJKcyhrRp1Asmx+ifyteMiEGisRpdQ rGFmkiOB1YnKJ6vdm+wF2OPcVk+sbjbr5zOss= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc; bh=tAEEqfIiv0uUZ02rBf+ol7bFaoJs31HIId0po3oKwJo=; b=cHq9TXSe/RwSSLIttXkCfEGkRLgcZ8EsuMxl+OYOcmruT/V71aHu7azLLW90IiacZ2 tnGH91yagCQ7wQfCQoSQEU0AkfMGGvCLcMBfEzU+oC+CO/z9E6urvVBUysWxEgUyQgNI ZWQcOijynAVJnjkOVIYy2E2FFVwKQOcag6ySlUNVA2qICjwdGJ1o31Jt3AR51e16SDfz BgdVqC8Iqwxgfl9mCsQquAufhRcf0LHTvl0HqJtz3j6j5P1sq2iwBLqj500KW9LDjIe1 vV/8v7+FoA7Ml21M4a6I8wOH8GV7gJVZP8Rb4GjpaTwpo2QYNeFJ/qWgl2BtO4DcRVfK p10Q== X-Gm-Message-State: APzg51AP8tNSubNQ36NU9YJnBdCa4PHxSUsawIIjxPqo7evtxDsfyRhd su2e6tbss+7R4dpnPR8aqS/cBOI/Y9p3Pq+IAeW6Cg== X-Received: by 2002:a02:39a:: with SMTP id e26-v6mr13981783jae.135.1535752159092; Fri, 31 Aug 2018 14:49:19 -0700 (PDT) From: Kashyap Desai References: <20180829084618.GA24765@ming.t460p> <300d6fef733ca76ced581f8c6304bac6@mail.gmail.com> <615d78004495aebc53807156d04d988c@mail.gmail.com> In-Reply-To: MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQL9fTS7902n0VSYivL2AMCzXDd9xwGQx87UAiSubfsBGBaHbgI3aj7TAe+rdbWiQEK9MA== Date: Fri, 31 Aug 2018 15:49:17 -0600 Message-ID: <486f94a563d63c4779498fe8829a546c@mail.gmail.com> Subject: RE: Affinity managed interrupts vs non-managed interrupts To: Thomas Gleixner Cc: Ming Lei , Sumit Saxena , Ming Lei , Christoph Hellwig , Linux Kernel Mailing List , Shivasharan Srikanteshwara , linux-block Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > > > It is not yet finalized, but it can be based on per sdev outstanding, > > shost_busy etc. > > We want to use special 16 reply queue for IO acceleration (these queues are > > working interrupt coalescing mode. This is a h/w feature) > > TBH, this does not make any sense whatsoever. Why are you trying to have > extra interrupts for coalescing instead of doing the following: Thomas, We are using this feature mainly for performance and not for CPU hotplug issues. I read your below #1 to #4 points are more of addressing CPU hotplug stuffs. Right ? We also want to make sure if we convert megaraid_sas driver from managed to non-managed interrupt, we can still achieve CPU hotplug requirement. If we use " pci_enable_msix_range" and manually set affinity in driver using irq_set_affinity_hint, cpu hotplug feature works as expected. is able to retain older mapping and whenever offlined cpu comes back, irqbalancer restore the same old mapping. If we use all 72 reply queue (all are in interrupt coalescing mode) without any extra reply queues, we don't have any issue with cpu-msix mapping and cpu hotplug issues. Our major problem with that method is latency is very bad on lower QD and/or single worker case. To solve that problem we have added extra 16 reply queue (this is a special h/w feature for performance only) which can be worked in interrupt coalescing mode vs existing 72 reply queue will work without any interrupt coalescing. Best way to map additional 16 reply queue is map it to the local numa node. I understand that, it is unique requirement but at the same time we may be able to do it gracefully (in irq sub system) as you mentioned " irq_set_affinity_hint" should be avoided in low level driver. > > 1) Allocate 72 reply queues which get nicely spread out to every CPU on the > system with affinity spreading. > > 2) Have a configuration for your reply queues which allows them to be > grouped, e.g. by phsyical package. > > 3) Have a mechanism to mark a reply queue offline/online and handle that on > CPU hotplug. That means on unplug you have to wait for the reply queue > which is associated to the outgoing CPU to be empty and no new requests > to be queued, which has to be done for the regular per CPU reply queues > anyway. > > 4) On queueing the request, flag it 'coalescing' which causes the > hard/firmware to direct the reply to the first online reply queue in the > group. > > If the last CPU of a group goes offline, then the normal hotplug mechanism > takes effect and the whole thing is put 'offline' as well. This works > nicely for all kind of scenarios even if you have more CPUs than queues. No > extras, no magic affinity hints, it just works. > > Hmm? > > > Yes. We did not used " pci_alloc_irq_vectors_affinity". > > We used " pci_enable_msix_range" and manually set affinity in driver using > > irq_set_affinity_hint. > > I still regret the day when I merged that abomination. Is it possible to have similar mapping in managed interrupt case as below ? for (i = 0; i < 16 ; i++) irq_set_affinity_hint (pci_irq_vector(instance->pdev, cpumask_of_node(local_numa_node)); Currently we always see managed interrupts for pre-vectors are 0-71 and effective cpu is always 0. We want some changes in current API which can allow us to pass flags (like *local numa affinity*) and cpu-msix mapping are from local numa node + effective cpu are spread across local numa node. > > Thanks, > > tglx