Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1032557AbcJTPho (ORCPT ); Thu, 20 Oct 2016 11:37:44 -0400 Received: from mail-db5eur01on0081.outbound.protection.outlook.com ([104.47.2.81]:4023 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1030765AbcJTPha (ORCPT ); Thu, 20 Oct 2016 11:37:30 -0400 X-Greylist: delayed 10738 seconds by postgrey-1.27 at vger.kernel.org; Thu, 20 Oct 2016 11:37:30 EDT Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=tariqt@mellanox.com; Subject: Re: mlx4: panic during shutdown To: Sebastian Ott , Yishai Hadas References: CC: , From: Tariq Toukan Message-ID: Date: Thu, 20 Oct 2016 12:07:01 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [193.47.165.251] X-ClientProxiedBy: AM3PR03CA044.eurprd03.prod.outlook.com (10.141.191.172) To DB6PR0501MB2037.eurprd05.prod.outlook.com (10.168.9.19) X-MS-Office365-Filtering-Correlation-Id: 117337c6-5e6a-4393-3a53-08d3f8c87703 X-Microsoft-Exchange-Diagnostics: 1;DB6PR0501MB2037;2:dSGN/x4JHrAV8pChuyawcDiwt74Uc6mz1BHzzdcEY9m+pZIU3GkKC6HOeMPjg0hiUof7omEtUa5hG3q5LjTyLRJlrf/QlEl0x9wYS20n988yQytIgRs3dfC+BZ1Z+QhJvdoN1F/gGYgB4ruo7f6UDnqiiCBYYt3DmwbaBQfQo0EERgpDZaVfZXP/s/mjiQuEX77Kw5QVgsqgFK7yMSy15g==;3:mqTqlwPJ3b50RBRDp1d235P6rokioodjjWqcccqCa4JoL7KCOJiOnABsDRCdVk/fEeyISssI55ncOyPk92gb+dK7yZa5W9o0UIcesejp16mnwPEwGivwy/GvuJx7xfXiuYs/n1OM7Jps3hCxERVvwg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0501MB2037; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0501MB2037;25:kDcKvqzrHG5isIHu/CoqVm0dTSAawrxwWIWDzYkqYn3SJ7FnyfsA8Hiuo3rLyvNwU9vgI1pU2/yVE/9i477d70gyhQ9jdIR2VNxVDqTe76EaqlRygwT/sfUUkC66gb5oroB+Ug20LuZUUtTwe/MrRJkCErKI6py+1rmoJwOhoO6hG9+Nw5/HctNFuIQ67LXB2Y2Lg6El4AiL0j4TW6nodsgnJ3guMhCJcq1gNegUDUrAbsouDa+clK1Guen3zbzJYHCVIDTm5xrurhUcM+V6wVmiKtsSeN73ReQGe+BBJ9RNAmTpP1KoNolCuHjx/9ueiST0SzKKQIXg/KB1ftLF/wN/FyS9163697d0oJWIy2omcQsVT/6aQdepAT/2xhjYQ1lEnV1EPNf/yrDg+/GMEC6PZ2Y8RLXEUxbSFJi5rDeqZp93cMrProJpM/9+yld4qXhu0YO3164x3yIJ40qVRMVRsfh7D0fXH/c7Sv33fdyeixBGEKTr6RIwK1RR+F2LilIswPml+93RiW7TaDYDZroMqp4xMnS+afHTwL0p1QIvXjc+vKHOogZNN/U1PJHM10rEkb1PS6YKlckgSlKUcRW5ZuvZRj5q76ojND9bXDLZicO2i9qNlhFSVVSJ3pWrnNK58dLYNZFjDYxPAR4O9frormaLpkrdtr3Bc2y5InXxcgyI/dA9P0wU9SJNm1ZgWthSAL9mdtVUxDj19XDCQQmSlEwLWCGmp0GKc37agPM= X-Microsoft-Exchange-Diagnostics: 1;DB6PR0501MB2037;31:rzKUHwefEb4hkemfNntg6HKDOw3zMzrVJSoIURrbYo25kei31DuYA9wkyCdBsg3SnQuZHTGlxmXX8yUwDO+oX8z6qsuzbG6BNOZcH4NsQIOr1cItitz4+VhE3TBLj1hHy4Fs37L/S2nftzPqU5vCC1xtYN0SmoyGZqK794rfkTrGEr5/BuheFQLV1K4hjJiGAttbBR8sZoBqvNVVIzklidB9YFQRp3pisN8HG0Fz5OeXe+tIKtGoOsU9gYeKmPjG;20:JUAe6lCVsY1yi+gqBYIoy0yvDO8OAGSKEbM0//mglinze9a/U0Tm3H37IeT/bIj/Pi4BG2+FeZcTOuBEfHyKjn80K+lF9uGs4Xd9Tat49nGbmcL2wWu5sPpE10UxyecgtN0CdGMoFfgmJkiJWgrLfrPEns0s1vSZ/lv2EwQ9ZB2F9peAYWMnvDJHwO5cjeGukk9PPupxFpkESRNSDCTxnBLbxlorKHf26eumkhmQZLwh8zKrrmccEVpDs2IsPjTOvh7sxgdOqiSgbmhb5V1wVurNFK2xaijUdXxc5IR1/ja1FvBp+1XR9uCLM7OH0kDIv6H8lg7I/mutJoqNW7roQrn/yq+NN334vHtDdvAmX6PPzt5jBk/oa7ZbaXDizuHDi0ip8lHb8yKmtqQMXRE3f6MNuEOrBcPru4aRM0nv7Y6HKi8k+l3i90FEJydeb0Ef8n2oJ8s0MfDavWz+ppUUKPDu35jpIObMpbpDF77hBQxgnxNjvqEY3uIy0q2nFIGC X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6055026);SRVR:DB6PR0501MB2037;BCL:0;PCL:0;RULEID:;SRVR:DB6PR0501MB2037; X-Microsoft-Exchange-Diagnostics: 1;DB6PR0501MB2037;4:wPoJHF+YMIX6Iwo3R0boUEEparWzUrduBTGEEdEtZv0RwcoSNxqPtyRxwoULT5UNlF0/qDgzWwKGNQkIlHOHO6qIbVIDozk09LOs9Ros6JdnfeI+EM6yLE+YbvZ4JqNhQe/Z5qW561Lq4bBwdeXkaXwveBE8Fq+/HP7Vp39bx8P7uqYLXUAQKgN9r7x9X8t37gZrVOUQhMdbo5KIWsLwRudglP05Ai2agcfKPjjbx0BRn3/z2QliqQGo2KvrTfQ40TXS63BiOWZ+dAWcJKK5+zeNCDCBUZLZl5HQXG6hbS3un5JRbH6TkRCmVTCBIBBYlzMamsxs/P4psit3sI3w72s6KFT+2c4LeXS5hkhcs+1wcAhCr2drlXjet0rmDqxxoqmkqxdhse93sBPHyhK6dBBaSGzRbsvBXG5reVTaAWw= X-Forefront-PRVS: 01018CB5B3 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(979002)(6009001)(6049001)(7916002)(51914003)(199003)(377454003)(189002)(24454002)(51234002)(575784001)(86362001)(101416001)(4001350100001)(97736004)(6116002)(3846002)(5001770100001)(31696002)(586003)(47776003)(66066001)(23746002)(65806001)(92566002)(230700001)(33646002)(65956001)(2906002)(6666003)(106356001)(6862003)(83506001)(6636002)(4326007)(36756003)(64126003)(7736002)(2950100002)(65826007)(105586002)(7846002)(81166006)(77096005)(50986999)(50466002)(81156014)(31686004)(305945005)(42186005)(76176999)(68736007)(54356999)(189998001)(8676002)(5660300001)(969003)(989001)(999001)(1009001)(1019001);DIR:OUT;SFP:1101;SCL:1;SRVR:DB6PR0501MB2037;H:[10.8.0.228];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;DB6PR0501MB2037;23:4T6jmtiCRFbI8u0W6mDq3F7XBTXu6LtNsqN?= =?Windows-1252?Q?yYjfuavwOPlvMdBBp4aAu9LoUWsq3MNVKw0vkj6+K8Ya99MEaS0b3wsc?= =?Windows-1252?Q?RY0yIThJ2rJrddgjYXWmZ+JKecUtXMOLuztYiuIKwuOTg5uBbl7pqz7Q?= =?Windows-1252?Q?IjnWfuP3kkvoEtecpw6FEhFn+wlMnEvBcIFglHqexV+wkg/xolmBu1Nt?= =?Windows-1252?Q?pClnUh4+lDYOcIEyd26Ce0IYLaLJ/4ZZ8ihkjtzk8Z/ErVTRQo9q94ja?= =?Windows-1252?Q?ylxcR1jdlw2V58ZgrxUn/AeNgGYlgEkP5Ni/czHU62L3qeuHqCwcuyMN?= =?Windows-1252?Q?j8eFPvo0dwwY2Z3MqQBG2oJ9STMK2Gv5cusj4UZocWhve5obOI65KwNV?= =?Windows-1252?Q?FL3TZEvPNDbjIqYgBbOEEKCm6rCXMbi0nO8gWAYWzyxEg1PKConxJm//?= =?Windows-1252?Q?YYTV8OwG1oeNykrbcNrUGKj1Z6NnMdWwYI5N6ygNTbdM8xxQREEnGJ5t?= =?Windows-1252?Q?VBoCGPu8JiOm38doAF07Y3KjnlpL9UxSfHW8MWwdRLiPNdMt7q9N54+K?= =?Windows-1252?Q?70SWbSFnQNUhrlf68MLhtcUCEe7QEK0TF2nwoFouIfSm9xJ5j1KD5O+S?= =?Windows-1252?Q?TSE6Th0DSbEdJJwc4Xs6Src2tsX3qmzEoFCzXpf8O/xaKQGA2LjJdUSi?= =?Windows-1252?Q?+kLuwaRoxD7AryE3g/oD3Zb0evswQCvQQAyTF5jwp0q+COkR+RhRIArV?= =?Windows-1252?Q?BG46XT5LbiBihKmRXclociXO3wfhYLjxOsgh1zrHo+UGw1yRfQtLpQlo?= =?Windows-1252?Q?HMgtyknPOTIDU7dGGSvP/3g04TLMTEF6VLTD/XnF1GFoLBG5UXbbFReH?= =?Windows-1252?Q?SPW8ODVwvQsSU59XA/RkmCCjAoJ5DPy8vIQDePMiDxvmsoxt9EXgvzAx?= =?Windows-1252?Q?9c4PdROL0Gry25H33yC37m3CSkvr7GKeQagIgIi/GDCRO1qbC2pLVdCB?= =?Windows-1252?Q?XhsuGy9O3qHNQAsbcoH88Bc9pR0wMHRe/5UIYgYsYW0yKrOjcvs+CCwn?= =?Windows-1252?Q?GslsrrwuzFHQXZ1q7D2F5SCrp5yjTDhDLUa9CeoQPunB7r1BusqTD/8j?= =?Windows-1252?Q?RZn/ET2ren+aVIVnEbY4vujE1S6325WClsW+KMvHrOpcZrXk6WsdjG/8?= =?Windows-1252?Q?wAZaD0rkk0VEepMfUmB47a8nAuCSyvYNOKu1tzDcCrjVxQ+5KVGqmFnb?= =?Windows-1252?Q?ms4CW7eH7UGgLEc0Y0dX1kUBNLdsqM5cESV+s1tD4y1nFuOl6OKsFxxV?= =?Windows-1252?Q?dDvQMq7GmFaoPnGpciQwVwfoIwFTcQ96RxqBooH0XAYyjxSQNZ7CRAkP?= =?Windows-1252?Q?1E6f5lQOUrxFhgZLUoHjpKE5YHi3wss1i9Hfv9FZEfr9HlqZo/7FimEU?= =?Windows-1252?Q?NWgnAcKqVP3tHEdhnI8aCgo0deksdOfhgKA5vtBmuqDg9jIkMlPmaTGP?= =?Windows-1252?Q?Djq5kWP+6x928QAejiFnMoUSdw4g10Fwixx6VISAq+mJ8HuN0zFqzRzC?= =?Windows-1252?Q?fqe8P6crr4jP9e65DkcrckZPw3NTJ/lbAjHxN?= X-Microsoft-Exchange-Diagnostics: 1;DB6PR0501MB2037;6:Y+Hrpn1cZGBKwfkxKdoPVIk/gRvKFgHva0CPuln6vOrU0nFu3/9iruwbCnMHqR24+iXh6fDkPY4Fei/T7YLB+Phum5aTDaOjUc7yvajVD2OmZwNogN93kUm8pZAJqeDQWj5xjdbBMG+YfEFrVwK9zM1No/JQrwUDyX/b07iTM76xgOigNn/YIn4fJ8SyWIl7D0NO7R82wyQ+Qas3qKuuovXxP6MkheYE9YzrmmuKssRyXzFXljFVuo53cD1yDLkMApiklbXSCo7+450hMGt3E/hZ5NvbxgNrRCNZxg9BuewhYacNX200CqB5gWdJo6ULHWcrBXIExpZaiovxivv4Xg==;5:QvrlD4E5B+irpmWrMLYtfG/tf1mXI8e9t1Pw0p/VaM7HUh+vZ01yDPgkVHuY6UABmXA14ajvQ2uA5/xtbnRTSLiJGHzIiPPN6CEd9f9rqQdBiZReUGx0LTbTS7Wg/g/ZQr0DDFMAntSaBiQ6/D+F1w==;24:J1RNdfRbReJBY5dz7wocRYPjqB7n+BTcfkRnscqxiprSl4U65jrQWVUR36WwajZpb8pNhyEleqiLtOS9e12O+pTq4CKEBVroVY0apv4DG6g= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DB6PR0501MB2037;7:cALsXgbAVqWR+mzUYocSj/LH8SEncQKA8zC+Qxv1PKLRNIavX+XZfBt3XNMKNK/2LN0TyWzWiTxGtEwdVEcOAsapCF6i3Sb6KdBhF4afMTseWfx0ZU7wUEDdsJSO/46260R68WWddTi/LBRjTJLtAVzU7fhaHRzgIGLmVNHfc9Dwsx95f8A8v/Hr9XdhEWfOAIWiTCNKvQ5UKYYFTW7d/Sro0a9BGJBQZAgVbfDFSTYub9ontWUSLgsKQLYVpwW0GxnV9iENZTtXEbVnT/Egc9ZyQR4ovh9BsnuhM10yuxMtD/29YFM40poChRJnlYiwt4j9CKHx+4N1ETZBFbrkta0kyHMVg7j/KFZNmIv2wU4= X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Oct 2016 09:07:05.7174 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0501MB2037 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4566 Lines: 92 Hi Sebastian, Thanks for the report. We've encountered this as well, and trying to find the correct way of solving it. On 19/10/2016 5:35 PM, Sebastian Ott wrote: > Hi, > > After a userspace update (fedora 23->24) I reproducibly run into the > following oops during shutdown (on s390): > > [ 71.054832] Unable to handle kernel pointer dereference in virtual kernel address space > [ 71.054835] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 > [ 71.054838] Fault in home space mode while using kernel ASCE. > [ 71.054847] AS:0000000000f70007 R3:0000000000000024 > [ 71.054883] Oops: 0038 ilc:3 [#1] PREEMPT SMP > [ 71.054887] Modules linked in: mlx4_ib ib_core mlx4_en ptp pps_core mlx4_core [...] > [ 71.054912] CPU: 8 PID: 809 Comm: kworker/8:6 Not tainted 4.8.0-02896-g7137af2-dirty #6 > [ 71.054913] Hardware name: IBM 2964 N96 704 (LPAR) > [ 71.054919] Workqueue: events linkwatch_event > [ 71.054921] task: 00000000dbea0008 task.stack: 00000000dbea4000 > [ 71.054923] Krnl PSW : 0704e00180000000 000003ff8007a496 (mlx4_en_get_phys_port_id+0x66/0xb0 [mlx4_en]) > [ 71.054933] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 > Krnl GPRS: 0000000000000080 0000000000000268 000000000000004e 00000000001c33e0 > [ 71.054937] 000003ff8007a486 0000000000882790 6b6b6b6b6b6b6b6b 0000000000000010 > [ 71.054939] 00000000dbea7b18 6b6b6b6b6b6b6b6b 00000000dbea7b18 00000000e72e0000 > [ 71.054941] 00000000f15ec900 0000000000000000 000003ff8007a486 00000000dbea79c8 > [ 71.054950] Krnl Code: 000003ff8007a486: e310b81c0d14 lgf %r1,55324(%r11) > 000003ff8007a48c: a71b004b aghi %r1,75 > #000003ff8007a490: eb110003000d sllg %r1,%r1,3 > >000003ff8007a496: e31190000002 ltg %r1,0(%r1,%r9) > 000003ff8007a49c: a7840015 brc 8,3ff8007a4c6 > 000003ff8007a4a0: 9208a020 mvi 32(%r10),8 > 000003ff8007a4a4: 4130a007 la %r3,7(%r10) > 000003ff8007a4a8: a7290008 lghi %r2,8 > [ 71.054965] Call Trace: > [ 71.054969] ([<000003ff8007a486>] mlx4_en_get_phys_port_id+0x56/0xb0 [mlx4_en]) > [ 71.054971] ([<0000000000760b94>] rtnl_fill_ifinfo+0x4ec/0xc90) > [ 71.054974] ([<0000000000764fae>] rtmsg_ifinfo_build_skb+0x96/0xe8) > [ 71.054976] ([<0000000000765038>] rtmsg_ifinfo+0x38/0x78) > [ 71.054978] ([<000000000074150e>] netdev_state_change+0x5e/0x70) > [ 71.054981] ([<0000000000765ca6>] linkwatch_do_dev+0x66/0xc8) > [ 71.054983] ([<0000000000765fd6>] __linkwatch_run_queue+0x13e/0x190) > [ 71.054985] ([<0000000000766070>] linkwatch_event+0x48/0x58) > [ 71.054988] ([<0000000000162a2e>] process_one_work+0x3fe/0x820) > [ 71.054990] ([<00000000001630e6>] worker_thread+0x296/0x460) > [ 71.054992] ([<000000000016b41a>] kthread+0x112/0x120) > [ 71.054996] ([<00000000008762b2>] kernel_thread_starter+0x6/0xc) > [ 71.054998] ([<00000000008762ac>] kernel_thread_starter+0x0/0xc) > [ 71.055000] INFO: lockdep is turned off. > [ 71.055001] Last Breaking-Event-Address: > [ 71.055004] [<0000000000294480>] printk+0xc8/0xd0 > [ 71.055006] > [ 71.055008] Kernel panic - not syncing: Fatal exception: panic_on_oops > > > This was observed with 4.8 but it's also reproducible on 4.9-rc1. > In mlx4_en_get_phys_port_id (which looks like it's called from userspace > via sysfs) the data behind mlx4_en_priv->mdev is already freed. > > The problem probably is that the lifetime of mlx4_en_priv->mdev seems to > be shorter than that of struct net_device (and mlx4_en_get_phys_port_id > can be called as long as struct net_device exists). Right. This happens because we've already freed some resources. One possible solution is to add a check of netif_device_present in dev_get_phys_port_id. Something like this: --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6601,6 +6601,8 @@ int dev_get_phys_port_id(struct net_device *dev, if (!ops->ndo_get_phys_port_id) return -EOPNOTSUPP; + if (!netif_device_present(dev)) + return -ENODEV; return ops->ndo_get_phys_port_id(dev, ppid); } EXPORT_SYMBOL(dev_get_phys_port_id); However, this causes other issues when combining with MTU change. In MTU change, netif_device_present returns false for a while, causing an unexpected failure of dev_get_phys_port_id. > > Regards, > Sebastian > Regards, Tariq Toukan