Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1999016yba; Sun, 5 May 2019 19:57:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqyDHpPPCp01vaH09Fb8O/kTk2SoBn9c3kSpjvYKDiX58uMd0bmGpiN9BdGkoGzUM69/DEF2 X-Received: by 2002:a62:575b:: with SMTP id l88mr30196026pfb.143.1557111457654; Sun, 05 May 2019 19:57:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557111457; cv=none; d=google.com; s=arc-20160816; b=WYys6yrhi25IbalYgCe4p2btmu4kK7m1qF30qZIDA9mWCYlfOUljiNtYo5ejMYKw9+ 00dUFjZv7DTFR5JvlhCgdHNxHfc/MxyxyOwL+zg96tuaJWLSow+ld4fejKtYcCEJCcuv 0Dv/u4rxZcFc7F9akRYwT01JKSmFNQbpAG4Ag/YQKQ2+DSSWxABFmXXn1WwtvAWbqra8 6p8m7PraDIRolseFIQ8TTdIAebT149ueiLF1fMXef6y+xOi85kxbMVvDNcxZ0a9haHMw PgqBspY/K+DyiS/QPsGrnzTC9DPkcIS1UIgmkhG9ij294iuugYBBdySv6e4zrAIeeHo3 zkew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject:dkim-signature; bh=3EkTKV/xQkVhztjonY/d2AGuSlnO5DKm1O3uub0aHJ4=; b=baJn0t80bAODzrbzjThV26aQkialhRVX3xB9P6pQwH+6Eb96g9ZiWSEZuSC5FqqXVD 8GWCKXlhzk99Nuue+at4A1mb268ffybe09SM9bWJfWI+gK/YBsfRTfZsB+V82PMSC0jW ktLp9y5tWDR0ttqwfRulV6LhlXgoSctO5g2TfKxuNqan96pcDErNbHtVOIFMWYDzD5ld mJ4ArrPIVKZd4NY3yl5iblr1kLUu8EEPoNZok8J7LX1+PTpIeC8l6hQSbYzEHmlI/dD2 xDU3ZvFj0HAmEO4hBETMkthpcSOVjTFDvuom5pOv1m993YFhN+a486k3EtGrdppRiTao r9Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=CiNMRpI1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay3si12869058plb.201.2019.05.05.19.57.22; Sun, 05 May 2019 19:57:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=CiNMRpI1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726094AbfEFC4c (ORCPT + 99 others); Sun, 5 May 2019 22:56:32 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:33399 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725786AbfEFC4c (ORCPT ); Sun, 5 May 2019 22:56:32 -0400 Received: by mail-qk1-f195.google.com with SMTP id k189so1719277qkc.0 for ; Sun, 05 May 2019 19:56:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=3EkTKV/xQkVhztjonY/d2AGuSlnO5DKm1O3uub0aHJ4=; b=CiNMRpI1IAu6DnSiyjrSy5Aixri8/vQFDFlMzD8ensn0XCYTXSWhAgWy6g/hK7c6mZ wcEdg9GOe6dcwjVn+ntQt3OSopX7SR2cH23U90g5iLbWna5gJWKtGJAQjmHY1SC4V1RC n6WuBE62bihChM05MZ2kstQ978lwLmcaTSH4XSQi3DnhE0lAdsvBRR6+YJjmO4TpC0sz QTJPjVFEiBOPC5QnPf+kqUk8QWTEjmQsuTwU25paNSb4QawN8Wthhfl3kOLa1q8kc84R OFO2DhEt844O04oqS96eAv/tnGsjcMPWDKaiPADXa0zlfnjgX4BH/XPQtLS0N2Xh3F2i xsYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=3EkTKV/xQkVhztjonY/d2AGuSlnO5DKm1O3uub0aHJ4=; b=H3XDEsywAI00viXuGC8TaK03oO1Vo2k1RcnqHpuXb6QYlkjQPdk2wx2eGzR7JN7t5e FepzTvdAUuLo4LLiy0muP6kU8kvxhGTZWGqfLTHFY7RnyXkM5ywP9jenuNkq8DsIVtou 3HfkGKtRV73Eq77s8re0pvyFtPV7P1Q6sh6ul10u0Mzga6qndDUCIBbnyQXNLCXKllJZ q4Hfw0o1RCwIfFubUt47FPV3imdJ/iLYHg1XCq6ZLuUr5FHYiJfxkCO9K1cRIatoge9I B1DArl0UEt7v89FZoeAZeSiNYlDJjaICA6l+UgL1hZQvblISGd11EQNnoOxsX1+hrdCf 7v7g== X-Gm-Message-State: APjAAAUVQBFeT/FymwJkgHJNH6o2exOeSWFpSbEjWqpJnuq32fJBE5y9 jLcChMa76WfdGIKiERdFzWgsAA== X-Received: by 2002:a37:a3d8:: with SMTP id m207mr4392842qke.334.1557111391110; Sun, 05 May 2019 19:56:31 -0700 (PDT) Received: from ovpn-121-162.rdu2.redhat.com (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id d8sm2477648qtd.2.2019.05.05.19.56.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 May 2019 19:56:30 -0700 (PDT) Subject: Re: "iommu/amd: Set exclusion range correctly" causes smartpqi offline From: Qian Cai To: jroedel@suse.de, hch@lst.de Cc: iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, martin.petersen@oracle.com, jejb@linux.ibm.com, don.brace@microsemi.com, kevin.barnett@microsemi.com, scott.teel@microsemi.com, david.carroll@microsemi.com References: <1556290348.6132.6.camel@lca.pw> Message-ID: Date: Sun, 5 May 2019 22:56:28 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <1556290348.6132.6.camel@lca.pw> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/26/19 10:52 AM, Qian Cai wrote: > Applying some memory pressure would causes smartpqi offline even in today's > linux-next. This can always be reproduced by a LTP test cases [1] or sometimes > just compiling kernels. > > Reverting the commit "iommu/amd: Set exclusion range correctly" fixed the issue. > > [  213.437112] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x0000 address=0x1000 flags=0x0000] > [  213.447659] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x0000 address=0x1800 flags=0x0000] > [  233.362013] smartpqi 0000:23:00.0: controller is offline: status code 0x14803 > [  233.369359] smartpqi 0000:23:00.0: controller offline > [  233.388915] print_req_error: I/O error, dev sdb, sector 3317352 flags 2000001 > [  233.388921] sd 0:0:0:0: [sdb] tag#95 UNKNOWN(0x2003) Result: hostbyte=0x01 > driverbyte=0x00 > [  233.388931] sd 0:0:0:0: [sdb] tag#95 CDB: opcode=0x2a 2a 00 00 55 89 00 00 01 > 08 00 > [  233.389003] Write-error on swap-device (254:1:4474640) > [  233.389015] Write-error on swap-device (254:1:2190776) > [  233.389023] Write-error on swap-device (254:1:8351936) > > [1] /opt/ltp/testcases/bin/mtest01 -p80 -w It turned out another linux-next commit is needed to reproduce this, i.e., 7a5dbf3ab2f0 ("iommu/amd: Remove the leftover of bypass support"). Specifically, the chunks for map_sg() and unmap_sg(). This has been reproduced on 3 different HPE ProLiant DL385 Gen10 systems so far. Either reverted the chunks (map_sg() and unmap_sg()) on the top of the latest linux-next fixed the issue or applied them on the top of the mainline v5.1 reproduced it immediately. Lots of time it triggered this BUG_ON(!iova) in iova_magazine_free_pfns() instead of the smartpqi offline. kernel BUG at drivers/iommu/iova.c:813! Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0 Call Trace: free_cpu_cached_iovas+0xbd/0x150 alloc_iova_fast+0x8c/0xba dma_ops_alloc_iova.isra.6+0x65/0xa0 map_sg+0x8c/0x2a0 scsi_dma_map+0xc6/0x160 pqi_aio_submit_io+0x1f6/0x440 [smartpqi] pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi] scsi_queue_rq+0x79c/0x1200 blk_mq_dispatch_rq_list+0x4dc/0xb70 blk_mq_sched_dispatch_requests+0x249/0x310 __blk_mq_run_hw_queue+0x128/0x200 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x522/0xa10 worker_thread+0x63/0x5b0 kthread+0x1d2/0x1f0 ret_from_fork+0x22/0x40