Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp6699948ybe; Wed, 18 Sep 2019 07:41:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqxljdhUD0y0Y+ztY9dASIXTF9gTsQLveEsdcf8URoL9H7qXoaRyw4HlW3rOF2LRMVxEqueO X-Received: by 2002:a50:baa5:: with SMTP id x34mr11145605ede.148.1568817717532; Wed, 18 Sep 2019 07:41:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568817717; cv=none; d=google.com; s=arc-20160816; b=GjOLuOVN9zkOB01xJFECa9poRCZoEFygapTvO5qAbtNVUJGOu2+0hOnsnphHsnsd/c qKWNat/yZySvCj9A+/QfZUvll6lfmbshlJ9/j+ZeNnqyDytxQScQLwjG/Fvddjbls/ue 3U9mDlGzBvKLDdv1kHSmtRbxm9481XA7/HkJuPZwSHeKHTKIOYGA3DeULJAQ15iWUxVc onMm6nabT8WGTcUlnUF2wbEPNZKqWRm+Ft+aakHkaWKjRyURLe9kmWf1wXjVSbz/a5kZ +KzeuWIkFGYXa4AICVJj5k+fxwk81u+LX+Fi5g0+JVR/L5+goy2VbbzpEkgdDxxRjQuY iQiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=79LaLG27yUtv50QerRyi0TGBLc045G/2C0KIJ6246Zo=; b=iBQpOlMmIl4fADg6O8fnE6V8B620LsP6p/HXRUe9zMBmwjI7EZnWyK2OlNJ7CMH6Hy lCKr20S7i+Vme1F0Emom8cqSslg9XNlI5aGWVRrpfQByFHb7iKK0mkd0LM7jr0WhHKkn P39gSz0D1hOjRfgh0RHI2TYcHnihZJmFuNnIHjQlDiQSUHI8rzkhed27zc79M4wGjaMV R0W92IQwJtB7+QB2c479AW+SdmBP7u2VNEiQTVx1b2ZcforR0nw+yDDouMkGVRMQyCbO 0wCOe6mwtKIbVFxVS3taDnFHUYTBwZfRI3dd9fySwFtUJ0iQEHXPmiywdL/joIu1LVGo WRPw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m17si2280761eja.318.2019.09.18.07.41.33; Wed, 18 Sep 2019 07:41:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731012AbfIROhs (ORCPT + 99 others); Wed, 18 Sep 2019 10:37:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34886 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725902AbfIROhs (ORCPT ); Wed, 18 Sep 2019 10:37:48 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BB26AA3D38D; Wed, 18 Sep 2019 14:37:47 +0000 (UTC) Received: from ming.t460p (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5FA5A1001B36; Wed, 18 Sep 2019 14:37:37 +0000 (UTC) Date: Wed, 18 Sep 2019 22:37:33 +0800 From: Ming Lei To: Sagi Grimberg Cc: Keith Busch , Hannes Reinecke , Daniel Lezcano , Bart Van Assche , linux-scsi@vger.kernel.org, Peter Zijlstra , Long Li , John Garry , LKML , linux-nvme@lists.infradead.org, Jens Axboe , Ingo Molnar , Thomas Gleixner , Christoph Hellwig Subject: Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism Message-ID: <20190918143732.GA19364@ming.t460p> References: <6f3b6557-1767-8c80-f786-1ea667179b39@acm.org> <2a8bd278-5384-d82f-c09b-4fce236d2d95@linaro.org> <20190905090617.GB4432@ming.t460p> <6a36ccc7-24cd-1d92-fef1-2c5e0f798c36@linaro.org> <20190906014819.GB27116@ming.t460p> <6eb2a745-7b92-73ce-46f5-cc6a5ef08abc@grimberg.me> <20190907000100.GC12290@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.68]); Wed, 18 Sep 2019 14:37:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 09, 2019 at 08:10:07PM -0700, Sagi Grimberg wrote: > Hey Ming, > > > > > Ok, so the real problem is per-cpu bounded tasks. > > > > > > > > I share Thomas opinion about a NAPI like approach. > > > > > > We already have that, its irq_poll, but it seems that for this > > > use-case, we get lower performance for some reason. I'm not > > > entirely sure why that is, maybe its because we need to mask interrupts > > > because we don't have an "arm" register in nvme like network devices > > > have? > > > > Long observed that IOPS drops much too by switching to threaded irq. If > > softirqd is waken up for handing softirq, the performance shouldn't > > be better than threaded irq. > > Its true that it shouldn't be any faster, but what irqpoll already has > and we don't need to reinvent is a proper budgeting mechanism that > needs to occur when multiple devices map irq vectors to the same cpu > core. > > irqpoll already maintains a percpu list and dispatch the ->poll with > a budget that the backend enforces and irqpoll multiplexes between them. > Having this mechanism in irq (hard or threaded) context sounds > unnecessary a bit. > > It seems like we're attempting to stay in irq context for as long as we > can instead of scheduling to softirq/thread context if we have more than > a minimal amount of work to do. Without at least understanding why > softirq/thread degrades us so much this code seems like the wrong > approach to me. Interrupt context will always be faster, but it is > not a sufficient reason to spend as much time as possible there, is it? If extra latency is added in IO completion path, this latency will be introduced in the submission path, because the hw queue depth is fixed, which is often small. Especially in case of multiple submission vs. single(shared) completion, the whole hw queue tags can be exhausted easily. I guess no such effect for networking IO. > > We should also keep in mind, that the networking stack has been doing > this for years, I would try to understand why this cannot work for nvme > before dismissing. The above may be one reason. Thanks, Ming