Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp392948imu; Tue, 8 Jan 2019 22:48:56 -0800 (PST) X-Google-Smtp-Source: ALg8bN6RrZKk9d2TVrH/tXmGYqxB1dunvhelhDMz/Mc/nFOqBb8xgb/1UCNfvT3x9vf7EgVq33W+ X-Received: by 2002:a62:1709:: with SMTP id 9mr4716494pfx.249.1547016536886; Tue, 08 Jan 2019 22:48:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547016536; cv=none; d=google.com; s=arc-20160816; b=xElu636CsAMAQyZ0bHbtJoOAg6pylDAsS94oFUJb6De0uNxH9p8xef8xrpg5CWrENQ Fl2yIzJik0uufd4QY1eWmyB98TuCeCg3wAbaVHRJJIXjIRiFy8td2g4CSUZ/1bil6tpF Zv8tpuV636p+HfEkaA8oPCiN8ib5TiZXkW0z87xjtpeiHnbVJ7Flq3m4o8rQlL6iUBtg 5CCPFS24exblHsk/N99Ryz9dEeAP/n7mzrnYGjJO/9GlF5lL4t6PWP9p+ErPuHznS57d BJzoCILN8viuBlXCVxtP1cBjsxM49SeLAhuUN83nVg82FpnlsyN2hLMgkgd9/l/1QMxl Je7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:autocrypt:openpgp:from:references:cc:to:subject :dkim-signature; bh=2MbWo+cQTFinm1UxyW7WwRdPCT46XqMAbMqL1ruTtDU=; b=zGZCefos64pfS5ud3HJXlowubk1Lxc6zwU9AG5FXJlmE1Q2QPLOy2ZE4PH3ZpkOSY1 JzH6H4hHkaRC4G/5obdQeJNyFZ+IllNmb53PpLQMiYT1067SJlWB4aov/6koRvYnEQRK 8uWmK63YJqkqXsqvtKX9wfHEnQINjtm3TFJAS9aO300JwXGG3kSEIT3c5zviJvUU3Num nanBxoAVNhftsQx81MqgbG74mYTrqShkzVsnCi5xpuEw3goBNAYOPqUEQ/JnEBVq5xAf iWYM3KDy5ZuyyD1+WtjZ4A7E/iiCfCAyCuomAdOIvV3j7Qn1U3sJgc/c4gFL3yEHybNp aKKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ozlabs-ru.20150623.gappssmtp.com header.s=20150623 header.b=D2hiWoCh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 189si18650370pfd.142.2019.01.08.22.48.41; Tue, 08 Jan 2019 22:48:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ozlabs-ru.20150623.gappssmtp.com header.s=20150623 header.b=D2hiWoCh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729822AbfAIGcM (ORCPT + 99 others); Wed, 9 Jan 2019 01:32:12 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:44292 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729648AbfAIGcM (ORCPT ); Wed, 9 Jan 2019 01:32:12 -0500 Received: by mail-pg1-f194.google.com with SMTP id t13so2852473pgr.11 for ; Tue, 08 Jan 2019 22:32:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ozlabs-ru.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:openpgp:autocrypt:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2MbWo+cQTFinm1UxyW7WwRdPCT46XqMAbMqL1ruTtDU=; b=D2hiWoChmrkeMFu8AASzEpnyZ2dAe8XP3HRlZ+UJqAYEskDFyAToPOaxLmzZ/hc+7V LbryuLwoAixLpQnOA1hiCd/lIJ6lgXgK68G6YDx03buHRXAcNDCxEzppiPDe7lT/GuYZ eEZdjgkgxXwY3dw30eBb5DWqrCDTMXMEaHkw+pmEhUsE2RXpHEYDKn9xrYHe7Y0ByYG5 vb8rR0WHMsUOW56VIgVyuZi+1xV3TcQCyc9hmuQf0BIuenImSH7J1TIu1KWkK9Nu6Iuh r9FSwTL5fwspEHLUtH0NPws+6JYKOsIszV/tyAOwNhQYbNnRzxlK+MdZqC9NE3l+uUFl aw3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=2MbWo+cQTFinm1UxyW7WwRdPCT46XqMAbMqL1ruTtDU=; b=W5gdtG5oErL3CDgVQn6m6JLUVk2iitS9cKv/3M/1ddl/UyYhREBH02SK8tT1SUwWeH Vc5gAUYnf0AerJatK3uzWihaqoC6tJEBpyB5SH+yZAtf9RSfTU6yqN5fNNZP6/ILKWnF oL5whyEAjXbrdQD8SZzlxfOjGZRxONc/lglgctqEaD034xoq3fh5RHsTUmgdthAci59Q 7sFbbFyPxR47U6lagwSuVsw3E1CZukGkIJZZeAf6tfyzqwD4B1grUxVZG3HO3S9ictXR P1V2dUp3p2EXCZ8e/wTl8en17vvAG7OJnRqgO3fcFuL5l8llTO1I83rCXABZsQhMrNHx PxPQ== X-Gm-Message-State: AJcUuke6Q4lPdWADwagyH5u26S/jQMI1k9S1bT3USEOiVuyfaXq91Xga r4PnJDwIBdm5lGztQ1dNuiYS7A== X-Received: by 2002:a62:18ce:: with SMTP id 197mr4929920pfy.88.1547015530290; Tue, 08 Jan 2019 22:32:10 -0800 (PST) Received: from [10.61.2.175] ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id g15sm311824947pfj.131.2019.01.08.22.32.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Jan 2019 22:32:09 -0800 (PST) Subject: Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45] To: David Gibson , Benjamin Herrenschmidt Cc: Jason Gunthorpe , Leon Romanovsky , davem@davemloft.net, saeedm@mellanox.com, ogerlitz@mellanox.com, tariqt@mellanox.com, bhelgaas@google.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org, alex.williamson@redhat.com, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, sbest@redhat.com, paulus@samba.org References: <20181206041951.22413-1-david@gibson.dropbear.id.au> <20181206064509.GM15544@mtr-leonro.mtl.com> <20190104034401.GA2801@umbus.fritz.box> <20190105175116.GB14238@ziepe.ca> <20190108040129.GE5336@ziepe.ca> <012d24d58a542ed44c8af9f517f1bd61ab912037.camel@kernel.crashing.org> <20190109053045.GE6682@umbus.fritz.box> From: Alexey Kardashevskiy Openpgp: preference=signencrypt Autocrypt: addr=aik@ozlabs.ru; keydata= mQINBE+rT0sBEADFEI2UtPRsLLvnRf+tI9nA8T91+jDK3NLkqV+2DKHkTGPP5qzDZpRSH6mD EePO1JqpVuIow/wGud9xaPA5uvuVgRS1q7RU8otD+7VLDFzPRiRE4Jfr2CW89Ox6BF+q5ZPV /pS4v4G9eOrw1v09lEKHB9WtiBVhhxKK1LnUjPEH3ifkOkgW7jFfoYgTdtB3XaXVgYnNPDFo PTBYsJy+wr89XfyHr2Ev7BB3Xaf7qICXdBF8MEVY8t/UFsesg4wFWOuzCfqxFmKEaPDZlTuR tfLAeVpslNfWCi5ybPlowLx6KJqOsI9R2a9o4qRXWGP7IwiMRAC3iiPyk9cknt8ee6EUIxI6 t847eFaVKI/6WcxhszI0R6Cj+N4y+1rHfkGWYWupCiHwj9DjILW9iEAncVgQmkNPpUsZECLT WQzMuVSxjuXW4nJ6f4OFHqL2dU//qR+BM/eJ0TT3OnfLcPqfucGxubhT7n/CXUxEy+mvWwnm s9p4uqVpTfEuzQ0/bE6t7dZdPBua7eYox1AQnk8JQDwC3Rn9kZq2O7u5KuJP5MfludMmQevm pHYEMF4vZuIpWcOrrSctJfIIEyhDoDmR34bCXAZfNJ4p4H6TPqPh671uMQV82CfTxTrMhGFq 8WYU2AH86FrVQfWoH09z1WqhlOm/KZhAV5FndwVjQJs1MRXD8QARAQABtCRBbGV4ZXkgS2Fy ZGFzaGV2c2tpeSA8YWlrQG96bGFicy5ydT6JAjgEEwECACIFAk+rT0sCGwMGCwkIBwMCBhUI AgkKCwQWAgMBAh4BAheAAAoJEIYTPdgrwSC5fAIP/0wf/oSYaCq9PhO0UP9zLSEz66SSZUf7 AM9O1rau1lJpT8RoNa0hXFXIVbqPPKPZgorQV8SVmYRLr0oSmPnTiZC82x2dJGOR8x4E01gK TanY53J/Z6+CpYykqcIpOlGsytUTBA+AFOpdaFxnJ9a8p2wA586fhCZHVpV7W6EtUPH1SFTQ q5xvBmr3KkWGjz1FSLH4FeB70zP6uyuf/B2KPmdlPkyuoafl2UrU8LBADi/efc53PZUAREih sm3ch4AxaL4QIWOmlE93S+9nHZSRo9jgGXB1LzAiMRII3/2Leg7O4hBHZ9Nki8/fbDo5///+ kD4L7UNbSUM/ACWHhd4m1zkzTbyRzvL8NAVQ3rckLOmju7Eu9whiPueGMi5sihy9VQKHmEOx OMEhxLRQbzj4ypRLS9a+oxk1BMMu9cd/TccNy0uwx2UUjDQw/cXw2rRWTRCxoKmUsQ+eNWEd iYLW6TCfl9CfHlT6A7Zmeqx2DCeFafqEd69DqR9A8W5rx6LQcl0iOlkNqJxxbbW3ddDsLU/Y r4cY20++WwOhSNghhtrroP+gouTOIrNE/tvG16jHs8nrYBZuc02nfX1/gd8eguNfVX/ZTHiR gHBWe40xBKwBEK2UeqSpeVTohYWGBkcd64naGtK9qHdo1zY1P55lHEc5Uhlk743PgAnOi27Q ns5zuQINBE+rT0sBEACnV6GBSm+25ACT+XAE0t6HHAwDy+UKfPNaQBNTTt31GIk5aXb2Kl/p AgwZhQFEjZwDbl9D/f2GtmUHWKcCmWsYd5M/6Ljnbp0Ti5/xi6FyfqnO+G/wD2VhGcKBId1X Em/B5y1kZVbzcGVjgD3HiRTqE63UPld45bgK2XVbi2+x8lFvzuFq56E3ZsJZ+WrXpArQXib2 hzNFwQleq/KLBDOqTT7H+NpjPFR09Qzfa7wIU6pMNF2uFg5ihb+KatxgRDHg70+BzQfa6PPA o1xioKXW1eHeRGMmULM0Eweuvpc7/STD3K7EJ5bBq8svoXKuRxoWRkAp9Ll65KTUXgfS+c0x gkzJAn8aTG0z/oEJCKPJ08CtYQ5j7AgWJBIqG+PpYrEkhjzSn+DZ5Yl8r+JnZ2cJlYsUHAB9 jwBnWmLCR3gfop65q84zLXRQKWkASRhBp4JK3IS2Zz7Nd/Sqsowwh8x+3/IUxVEIMaVoUaxk Wt8kx40h3VrnLTFRQwQChm/TBtXqVFIuv7/Mhvvcq11xnzKjm2FCnTvCh6T2wJw3de6kYjCO 7wsaQ2y3i1Gkad45S0hzag/AuhQJbieowKecuI7WSeV8AOFVHmgfhKti8t4Ff758Z0tw5Fpc BFDngh6Lty9yR/fKrbkkp6ux1gJ2QncwK1v5kFks82Cgj+DSXK6GUQARAQABiQIfBBgBAgAJ BQJPq09LAhsMAAoJEIYTPdgrwSC5NYEP/2DmcEa7K9A+BT2+G5GXaaiFa098DeDrnjmRvumJ BhA1UdZRdfqICBADmKHlJjj2xYo387sZpS6ABbhrFxM6s37g/pGPvFUFn49C47SqkoGcbeDz Ha7JHyYUC+Tz1dpB8EQDh5xHMXj7t59mRDgsZ2uVBKtXj2ZkbizSHlyoeCfs1gZKQgQE8Ffc F8eWKoqAQtn3j4nE3RXbxzTJJfExjFB53vy2wV48fUBdyoXKwE85fiPglQ8bU++0XdOr9oyy j1llZlB9t3tKVv401JAdX8EN0++ETiOovQdzE1m+6ioDCtKEx84ObZJM0yGSEGEanrWjiwsa nzeK0pJQM9EwoEYi8TBGhHC9ksaAAQipSH7F2OHSYIlYtd91QoiemgclZcSgrxKSJhyFhmLr QEiEILTKn/pqJfhHU/7R7UtlDAmFMUp7ByywB4JLcyD10lTmrEJ0iyRRTVfDrfVP82aMBXgF tKQaCxcmLCaEtrSrYGzd1sSPwJne9ssfq0SE/LM1J7VdCjm6OWV33SwKrfd6rOtvOzgadrG6 3bgUVBw+bsXhWDd8tvuCXmdY4bnUblxF2B6GOwSY43v6suugBttIyW5Bl2tXSTwP+zQisOJo +dpVG2pRr39h+buHB3NY83NEPXm1kUOhduJUA17XUY6QQCAaN4sdwPqHq938S3EmtVhsuQIN BFq54uIBEACtPWrRdrvqfwQF+KMieDAMGdWKGSYSfoEGGJ+iNR8v255IyCMkty+yaHafvzpl PFtBQ/D7Fjv+PoHdFq1BnNTk8u2ngfbre9wd9MvTDsyP/TmpF0wyyTXhhtYvE267Av4X/BQT lT9IXKyAf1fP4BGYdTNgQZmAjrRsVUW0j6gFDrN0rq2J9emkGIPvt9rQt6xGzrd6aXonbg5V j6Uac1F42ESOZkIh5cN6cgnGdqAQb8CgLK92Yc8eiCVCH3cGowtzQ2m6U32qf30cBWmzfSH0 HeYmTP9+5L8qSTA9s3z0228vlaY0cFGcXjdodBeVbhqQYseMF9FXiEyRs28uHAJEyvVZwI49 CnAgVV/n1eZa5qOBpBL+ZSURm8Ii0vgfvGSijPGbvc32UAeAmBWISm7QOmc6sWa1tobCiVmY SNzj5MCNk8z4cddoKIc7Wt197+X/X5JPUF5nQRvg3SEHvfjkS4uEst9GwQBpsbQYH9MYWq2P PdxZ+xQE6v7cNB/pGGyXqKjYCm6v70JOzJFmheuUq0Ljnfhfs15DmZaLCGSMC0Amr+rtefpA y9FO5KaARgdhVjP2svc1F9KmTUGinSfuFm3quadGcQbJw+lJNYIfM7PMS9fftq6vCUBoGu3L j4xlgA/uQl/LPneu9mcvit8JqcWGS3fO+YeagUOon1TRqQARAQABiQRsBBgBCAAgFiEEZSrP ibrORRTHQ99dhhM92CvBILkFAlq54uICGwICQAkQhhM92CvBILnBdCAEGQEIAB0WIQQIhvWx rCU+BGX+nH3N7sq0YorTbQUCWrni4gAKCRDN7sq0YorTbVVSD/9V1xkVFyUCZfWlRuryBRZm S4GVaNtiV2nfUfcThQBfF0sSW/aFkLP6y+35wlOGJE65Riw1C2Ca9WQYk0xKvcZrmuYkK3DZ 0M9/Ikkj5/2v0vxz5Z5w/9+IaCrnk7pTnHZuZqOh23NeVZGBls/IDIvvLEjpD5UYicH0wxv+ X6cl1RoP2Kiyvenf0cS73O22qSEw0Qb9SId8wh0+ClWet2E7hkjWFkQfgJ3hujR/JtwDT/8h 3oCZFR0KuMPHRDsCepaqb/k7VSGTLBjVDOmr6/C9FHSjq0WrVB9LGOkdnr/xcISDZcMIpbRm EkIQ91LkT/HYIImL33ynPB0SmA+1TyMgOMZ4bakFCEn1vxB8Ir8qx5O0lHMOiWMJAp/PAZB2 r4XSSHNlXUaWUg1w3SG2CQKMFX7vzA31ZeEiWO8tj/c2ZjQmYjTLlfDK04WpOy1vTeP45LG2 wwtMA1pKvQ9UdbYbovz92oyZXHq81+k5Fj/YA1y2PI4MdHO4QobzgREoPGDkn6QlbJUBf4To pEbIGgW5LRPLuFlOPWHmIS/sdXDrllPc29aX2P7zdD/ivHABslHmt7vN3QY+hG0xgsCO1JG5 pLORF2N5XpM95zxkZqvYfC5tS/qhKyMcn1kC0fcRySVVeR3tUkU8/caCqxOqeMe2B6yTiU1P aNDq25qYFLeYxg67D/4w/P6BvNxNxk8hx6oQ10TOlnmeWp1q0cuutccblU3ryRFLDJSngTEu ZgnOt5dUFuOZxmMkqXGPHP1iOb+YDznHmC0FYZFG2KAc9pO0WuO7uT70lL6larTQrEneTDxQ CMQLP3qAJ/2aBH6SzHIQ7sfbsxy/63jAiHiT3cOaxAKsWkoV2HQpnmPOJ9u02TPjYmdpeIfa X2tXyeBixa3i/6dWJ4nIp3vGQicQkut1YBwR7dJq67/FCV3Mlj94jI0myHT5PIrCS2S8LtWX ikTJSxWUKmh7OP5mrqhwNe0ezgGiWxxvyNwThOHc5JvpzJLd32VDFilbxgu4Hhnf6LcgZJ2c Zd44XWqUu7FzVOYaSgIvTP0hNrBYm/E6M7yrLbs3JY74fGzPWGRbBUHTZXQEqQnZglXaVB5V ZhSFtHopZnBSCUSNDbB+QGy4B/E++Bb02IBTGl/JxmOwG+kZUnymsPvTtnNIeTLHxN/H/ae0 c7E5M+/NpslPCmYnDjs5qg0/3ihh6XuOGggZQOqrYPC3PnsNs3NxirwOkVPQgO6mXxpuifvJ DG9EMkK8IBXnLulqVk54kf7fE0jT/d8RTtJIA92GzsgdK2rpT1MBKKVffjRFGwN7nQVOzi4T XrB5p+6ML7Bd84xOEGsj/vdaXmz1esuH7BOZAGEZfLRCHJ0GVCSssg== Message-ID: <23e44611-7eec-b57c-c5a0-6c3b96133a9e@ozlabs.ru> Date: Wed, 9 Jan 2019 17:32:02 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190109053045.GE6682@umbus.fritz.box> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/01/2019 16:30, David Gibson wrote: > On Wed, Jan 09, 2019 at 04:09:02PM +1100, Benjamin Herrenschmidt wrote: >> On Mon, 2019-01-07 at 21:01 -0700, Jason Gunthorpe wrote: >>> >>>> In a very cryptic way that requires manual parsing using non-public >>>> docs sadly but yes. From the look of it, it's a completion timeout. >>>> >>>> Looks to me like we don't get a response to a config space access >>>> during the change of D state. I don't know if it's the write of the D3 >>>> state itself or the read back though (it's probably detected on the >>>> read back or a subsequent read, but that doesn't tell me which specific >>>> one failed). >>> >>> If it is just one card doing it (again, check you have latest >>> firmware) I wonder if it is a sketchy PCI-E electrical link that is >>> causing a long re-training cycle? Can you tell if the PCI-E link is >>> permanently gone or does it eventually return? >> >> No, it's 100% reproducable on systems with that specific card model, >> not card instance, and maybe different systems/cards as well, I'll let >> David & Alexey comment further on that. > > Well, it's 100% reproducable on a particular model of system > (garrison) with a particular model of card. I've had some suggestions > that it fails with some other systems card card models, but nothing > confirmed - the one other system model I've been able to try, which > also had a newer card model didn't reproduce the problem. I have just moved the "Mellanox Technologies MT27700 Family [ConnectX-4]" from garrison to firestone machine and there it does not produce an EEH, with the same kernel and skiboot (both upstream + my debug). Hm. I cannot really blame the card but I cannot see what could cause the difference in skiboot either. I even tried disabling NPU so garrison would look like firestone, still EEH'ing. >>> Does the card work in Gen 3 when it starts? Is there any indication of >>> PCI-E link errors? >> >> Nope. >> >>> Everytime or sometimes? >>> >>> POWER 8 firmware is good? If the link does eventually come back, is >>> the POWER8's D3 resumption timeout long enough? >>> >>> If this doesn't lead to an obvious conclusion you'll probably need to >>> connect to IBM's Mellanox support team to get more information from >>> the card side. >> >> We are IBM :-) So far, it seems to be that the card is doing something >> not quite right, but we don't know what. We might need to engage >> Mellanox themselves. > > Possibly. On the other hand, I've had it reported that this is a > software regression at least with downstream red hat kernels. I > haven't yet been able to eliminate factors that might be confusing > that, or try to find a working version upstream. Do you have tarballs handy? I'd diff... -- Alexey