Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp18493658ybl; Fri, 3 Jan 2020 03:30:40 -0800 (PST) X-Google-Smtp-Source: APXvYqxLwDJLv3kUVQe2OXGRYUhB09YxNPTLV7L9dOJVI5OLplVX+7Ht4NOwnu0CiwmRRkN/iJPr X-Received: by 2002:a9d:21f5:: with SMTP id s108mr80996333otb.152.1578051039964; Fri, 03 Jan 2020 03:30:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578051039; cv=none; d=google.com; s=arc-20160816; b=KflBlwAvOJji0eUIU6fKzRBlbxL99ad8Ko8aS//Xa5qw2lAcxkYtpsmNXXYq2iu4IH mEyEvSZFHpcpq7QAP9FyeSws/lnhXa9IIAAwdNH3OOqfd7EwRrCtf15+Lietf5iqKsW7 0gjD6IWRzf94SLeybaO4ZRVty2IPXG5MqM16qvODdWs8C3dxHBo7xw6rOyux2mXQFWvP ER6yJqRexMyHTBP7nfxH2INKRQ0J6Aos+dQGqMsAskHQETWMuDT59InbXPcpN7zEHmSy vdjt8FhkNl/9nmx4ph9CxvNwnDg5eM7VCwnt8Aa7PtaZX4mAF5NX6dnguBK8Ucd4OVe8 mxww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=JyQ3qNGHTBqHtk3VlrCemOWYln8i4ZY0r6Vn9in7FBk=; b=vGFeMR6VFDcXuIwEillEzZ9KPqHdh7d3fmQTsjH0TvUU1MUOHt+/0J9Ml5rVL90XNx ow8OGoL5QU+fx1Kc79Bqg5JC4NB7hT5iZJtXgaomtpQszyLQMvv+uILPMKB2AJpRxlMR 2/dmIn505zAEqgrlmY+HeDP//NsSHE7l/pS8vmXzqF6drWxhS+/T7EnXvVTLa1sJVlXf INCLho98PV8q+RPlYlacmN8d0ZsTHMpJU6xjdzMLbOEo15Lo8S4x1qSik6ymId1dXT6N JW0++SXIsX2/+OOmQHFzeLR3P5vJlf15fROnKPzEs7BafWSMejnJPwhMlGyN8WxvOgLu IZ4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PCjjPbiv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m4si28713962otr.122.2020.01.03.03.30.28; Fri, 03 Jan 2020 03:30:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PCjjPbiv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727615AbgACL3c (ORCPT + 99 others); Fri, 3 Jan 2020 06:29:32 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:38894 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727453AbgACL3b (ORCPT ); Fri, 3 Jan 2020 06:29:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1578050970; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JyQ3qNGHTBqHtk3VlrCemOWYln8i4ZY0r6Vn9in7FBk=; b=PCjjPbivkxa4PEXO0gj329WWV5q0zxpvwaaRj3WzYgkFOsHtbRW6hvHgwsc6LB6vIlpVXS dFRszDYJuX/W4hsTc6tu1j3mJqndg81m6ckI9DnDAUDdfzr+FTn2XnVIR5wcJrFo3wuP6i eiEfLrg5hYQg5nT8tO+++oINWi3OX6M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-410-yHE6HHsJPGugG8HkrfPnfA-1; Fri, 03 Jan 2020 06:29:27 -0500 X-MC-Unique: yHE6HHsJPGugG8HkrfPnfA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 66E40801E70; Fri, 3 Jan 2020 11:29:25 +0000 (UTC) Received: from ming.t460p (ovpn-8-17.pek2.redhat.com [10.72.8.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2CF0860BF7; Fri, 3 Jan 2020 11:29:13 +0000 (UTC) Date: Fri, 3 Jan 2020 19:29:08 +0800 From: Ming Lei To: John Garry Cc: Marc Zyngier , "tglx@linutronix.de" , "chenxiang (M)" , "bigeasy@linutronix.de" , "linux-kernel@vger.kernel.org" , "hare@suse.com" , "hch@lst.de" , "axboe@kernel.dk" , "bvanassche@acm.org" , "peterz@infradead.org" , "mingo@redhat.com" , Zhang Yi Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt Message-ID: <20200103112908.GA20353@ming.t460p> References: <20191220233138.GB12403@ming.t460p> <20191224015926.GC13083@ming.t460p> <7a961950624c414bb9d0c11c914d5c62@www.loen.fr> <20191225004822.GA12280@ming.t460p> <72a6a738-f04b-3792-627a-fbfcb7b297e1@huawei.com> <20200103004625.GA5219@ming.t460p> <2b070d25-ee35-aa1f-3254-d086c6b872b1@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2b070d25-ee35-aa1f-3254-d086c6b872b1@huawei.com> User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 03, 2020 at 10:41:48AM +0000, John Garry wrote: > On 03/01/2020 00:46, Ming Lei wrote: > > > > > d the > > > > > DMA API more than an architecture-specific problem. > > > > > > > > > > Given that we have so far very little data, I'd hold off any conclusion. > > > > We can start to collect latency data of dma unmapping vs nvme_irq() > > > > on both x86 and arm64. > > > > > > > > I will see if I can get a such box for collecting the latency data. > > > To reiterate what I mentioned before about IOMMU DMA unmap on x86, a key > > > difference is that by default it uses the non-strict (lazy) mode unmap, i.e. > > > we unmap in batches. ARM64 uses general default, which is strict mode, i.e. > > > every unmap results in an IOTLB fluch. > > > > > > In my setup, if I switch to lazy unmap (set iommu.strict=0 on cmdline), then > > > no lockup. > > > > > > Are any special IOMMU setups being used for x86, like enabling strict mode? > > > I don't know... > > BTW, I have run the test on one 224-core ARM64 with one 32-hw_queue NVMe, the > > softlock issue can be triggered in one minute. > > > > nvme_irq() often takes ~5us to complete on this machine, then there is really > > risk of cpu lockup when IOPS is > 200K. > > Do you have a typical nvme_irq() completion time for a mid-range x86 server? ~1us. It is done via bcc script, and ebpf itself may introduce some overhead. > > > > > The soft lockup can be triggered too if 'iommu.strict=0' is passed in, > > just takes a bit longer by starting more IO jobs. > > > > In above test, I submit IO to one single NVMe drive from 4 CPU cores via 8 or > > 12 jobs(iommu.strict=0), meantime make the nvme interrupt handled just in one > > dedicated CPU core. > > Well a problem with so many CPUs is that it does not scale (well) with MQ > devices, like NVMe. > > As CPU count goes up, device queue count doesn't and we get more contention. The problem is worse on ARM64 system, in which there are more CPU cores, and each single CPU core is often slower than x86's. Meantime each hardware interrupt has to be handled on single CPU target. Also the storage device(such as NVMe) itself should be same for both from performance viewpoint. > > > > > Is there lock contention among iommu dma map and unmap callback? > > There would be the IOVA management, but that would be common to x86. Each > CPU keeps an IOVA cache, and there is a central pool of cached IOVAs, so > that reduces any contention, unless the caches are exhausted. > > I think most contention/bottleneck is at the SMMU HW interface, which has a > single queue interface. Not sure if it is related with single queue interface, given my test just uses single hw queue by pushing several CPU cores to submit IO and handling the single queue's interrupt on one dedicated CPU core. Thanks, Ming