Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp18455861ybl; Fri, 3 Jan 2020 02:44:07 -0800 (PST) X-Google-Smtp-Source: APXvYqwOwqwqc+/B6j0N6Wnt/7IF63avyMQgvaMXHiAwya2h2+y4A5EMrL5iGyRdYDDXMF/ctdIt X-Received: by 2002:a05:6830:194:: with SMTP id q20mr101539749ota.92.1578048247530; Fri, 03 Jan 2020 02:44:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578048247; cv=none; d=google.com; s=arc-20160816; b=Cg7weBPz/brXIcHWT0tctS1+QfB19tqq8dfirF2B92VeQzLvbUGt602xF+JSbw5yu2 toVXPYolU6KozwrRgn2OnU1RbHTcRzAgSAr3G6rWL+K6ChND/VlCu2H+cyef4NDFCy+I Uz+fvqIPHjsnEqfj20CfqI9u/SFVBb21wID2JitKl0k+g2UmTN5pgXEe3jEOyzFsdOKy beDjT5spHgWFslcka9Wr+E2RXxvyatI0sfjWjNAID0F/oARCuXKZxhJ+rn+BxoDYYR43 jLa6IAZPGpmdwewnIZ9aBXGuIssS1J9wcgOyHOQgT9LC/9A7QwaRmkV0oA0Y9lLK1DyC U6ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=hXw8r1RraTiU7w5P3zae+ZURje9dUR28D2HmD1QgdJA=; b=AGCBwMbEckDATTGFGcIEGOe745SJl0QKW8DKQsbnDcblVSnT5Lb53PKIqifSQF6kSS fqj+tUARDCsPesrJo+MOogiXMj8L4V80j4tyOq7RGZYmibtazLBDBk3OfSxY4LqninOI HmAjdOd5lL5yVzWtGJUGO8i3Ps9qM03F9t3DxHmtGDx/Gox14uEMkmU+4+lgNq3YQNGO bSfsOR50HW2uvW8heRBOWca2Ni51xnqZnNBy/vvrivsNjfGE6BdnJ7jRfWM0rO+Cclnt qkldvhikU9dTfYVCsHRQkj58krX57to0kr6qqEpk+CTW0RoUnV1kC8VTGiAZpNMMb92B hYoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q72si27487310oic.18.2020.01.03.02.43.55; Fri, 03 Jan 2020 02:44:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727560AbgACKlw (ORCPT + 99 others); Fri, 3 Jan 2020 05:41:52 -0500 Received: from lhrrgout.huawei.com ([185.176.76.210]:2223 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727220AbgACKlw (ORCPT ); Fri, 3 Jan 2020 05:41:52 -0500 Received: from lhreml705-cah.china.huawei.com (unknown [172.18.7.108]) by Forcepoint Email with ESMTP id 0A298D0611181D171E88; Fri, 3 Jan 2020 10:41:50 +0000 (GMT) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by lhreml705-cah.china.huawei.com (10.201.108.46) with Microsoft SMTP Server (TLS) id 14.3.408.0; Fri, 3 Jan 2020 10:41:49 +0000 Received: from [127.0.0.1] (10.202.226.43) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Fri, 3 Jan 2020 10:41:49 +0000 Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt To: Ming Lei CC: Marc Zyngier , "tglx@linutronix.de" , "chenxiang (M)" , "bigeasy@linutronix.de" , "linux-kernel@vger.kernel.org" , "hare@suse.com" , "hch@lst.de" , "axboe@kernel.dk" , "bvanassche@acm.org" , "peterz@infradead.org" , "mingo@redhat.com" , Zhang Yi References: <0fd543f8ffd90f90deb691aea1c275b4@www.loen.fr> <20191220233138.GB12403@ming.t460p> <20191224015926.GC13083@ming.t460p> <7a961950624c414bb9d0c11c914d5c62@www.loen.fr> <20191225004822.GA12280@ming.t460p> <72a6a738-f04b-3792-627a-fbfcb7b297e1@huawei.com> <20200103004625.GA5219@ming.t460p> From: John Garry Message-ID: <2b070d25-ee35-aa1f-3254-d086c6b872b1@huawei.com> Date: Fri, 3 Jan 2020 10:41:48 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <20200103004625.GA5219@ming.t460p> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.226.43] X-ClientProxiedBy: lhreml710-chm.china.huawei.com (10.201.108.61) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/01/2020 00:46, Ming Lei wrote: >>>> d the >>>> DMA API more than an architecture-specific problem. >>>> >>>> Given that we have so far very little data, I'd hold off any conclusion. >>> We can start to collect latency data of dma unmapping vs nvme_irq() >>> on both x86 and arm64. >>> >>> I will see if I can get a such box for collecting the latency data. >> To reiterate what I mentioned before about IOMMU DMA unmap on x86, a key >> difference is that by default it uses the non-strict (lazy) mode unmap, i.e. >> we unmap in batches. ARM64 uses general default, which is strict mode, i.e. >> every unmap results in an IOTLB fluch. >> >> In my setup, if I switch to lazy unmap (set iommu.strict=0 on cmdline), then >> no lockup. >> >> Are any special IOMMU setups being used for x86, like enabling strict mode? >> I don't know... > BTW, I have run the test on one 224-core ARM64 with one 32-hw_queue NVMe, the > softlock issue can be triggered in one minute. > > nvme_irq() often takes ~5us to complete on this machine, then there is really > risk of cpu lockup when IOPS is > 200K. Do you have a typical nvme_irq() completion time for a mid-range x86 server? > > The soft lockup can be triggered too if 'iommu.strict=0' is passed in, > just takes a bit longer by starting more IO jobs. > > In above test, I submit IO to one single NVMe drive from 4 CPU cores via 8 or > 12 jobs(iommu.strict=0), meantime make the nvme interrupt handled just in one > dedicated CPU core. Well a problem with so many CPUs is that it does not scale (well) with MQ devices, like NVMe. As CPU count goes up, device queue count doesn't and we get more contention. > > Is there lock contention among iommu dma map and unmap callback? There would be the IOVA management, but that would be common to x86. Each CPU keeps an IOVA cache, and there is a central pool of cached IOVAs, so that reduces any contention, unless the caches are exhausted. I think most contention/bottleneck is at the SMMU HW interface, which has a single queue interface. Thanks, John