Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752579AbdG1RiZ (ORCPT ); Fri, 28 Jul 2017 13:38:25 -0400 Received: from ale.deltatee.com ([207.54.116.67]:47528 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752124AbdG1RiX (ORCPT ); Fri, 28 Jul 2017 13:38:23 -0400 From: Logan Gunthorpe To: Matan Barak , Yishai Hadas , Doug Ledford , "linux-rdma@vger.kernel.org" Cc: Sean Hefty , Hal Rosenstock , Jason Gunthorpe , Stephen Bates , "linux-kernel@vger.kernel.org" Message-ID: <216b770e-fc08-68a6-c1bf-be96d52e325e@deltatee.com> Date: Fri, 28 Jul 2017 11:38:13 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------5CD288422DE25B47BAC18EFE" Content-Language: en-US X-SA-Exim-Connect-IP: 172.16.1.162 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, sbates@raithlin.com, jgunthorpe@obsidianresearch.com, hal.rosenstock@gmail.com, sean.hefty@intel.com, linux-rdma@vger.kernel.org, dledford@redhat.com, yishaih@mellanox.com, matanb@mellanox.com X-SA-Exim-Mail-From: logang@deltatee.com Subject: BUG: NULL pointer dereference at ib_uverbs_comp_handler+0x20 X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7250 Lines: 161 This is a multi-part message in MIME format. --------------5CD288422DE25B47BAC18EFE Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hi, My system has been failing with recent kernels (4.12.x and 4.13-rc2) with a NULL pointer dereference at the stack trace given at the end of this email. This happens when simply running 'ib_write_bw -R ' with a Chelsio T6 (cxgb4). I've bisected (log attached) to find the offending commit to be: commit 1e7710f3f6563940bb6bbc94aa8eadfd344a86af Author: Matan Barak IB/core: Change completion channel to use the reworked objects schema Reverting this commit (and the dependent commits db1b5ddd53365 and e0fcc61113c that also fix other bugs with this commit) from v4.12.3 fixes the issue. I did the bisect with the userspace libraries in Debian Stretch but I also had this bug with rdma-core v14. I was pretty sure v4.12 kernels worked for me in the past but likely only before I upgraded from Jessie to Stretch. Thanks, Logan PS. As a side rant, this bug was found after a very *frustrating* day of what was supposed to be the 20 minute task of getting my RDMA cards plugged in again. I tried with both CX4s and the T6s (and I'm still not sure if my CX4s work yet). Instead, it turns out there's a whole mess of bugs in the kernel I had to go up against. I went back and forth between different versions of the userspace libraries because I was sure 4.11 worked -- but it turned out 4.11.10+, 4.12.x and who knows what other stable kernels are currently broken by the bug fixed in [1]. And there was a whole other bug that broke things that was fixed in the 4.12-rc series that I had to carefully bisect around to find the one reported above. So frustrating!! [1] 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3 -- [ 53.320439] iwpm_register_pid: Unable to send a nlmsg (client = 2) [ 54.738579] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 54.747439] IP: _raw_spin_lock_irqsave+0x10/0x30 [ 54.752719] PGD 0 [ 54.752721] P4D 0 [ 54.755049] [ 54.759109] Oops: 0002 [#1] SMP [ 54.762699] Modules linked in: [ 54.766195] CPU: 0 PID: 5 Comm: kworker/u16:0 Not tainted 4.13.0-rc2.direct #708 [ 54.774536] Hardware name: Supermicro SYS-7047GR-TRF/X9DRG-QF, BIOS 3.0a 12/05/2013 [ 54.783182] Workqueue: iw_cxgb4 process_work [ 54.788036] task: ffff880276a5ee80 task.stack: ffffc900000c4000 [ 54.794728] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30 [ 54.800552] RSP: 0018:ffffc900000c7c70 EFLAGS: 00010046 [ 54.806473] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 [ 54.814524] RDX: 0000000000000001 RSI: 0000000000000058 RDI: 0000000000000058 [ 54.822583] RBP: ffff880470484600 R08: 0000000000000001 R09: 0000000000000001 [ 54.830663] R10: 0000000000000040 R11: ffff88047420b400 R12: 0000000000000282 [ 54.838744] R13: ffffc900000c7dc0 R14: 0000000000000001 R15: ffff880470484600 [ 54.846825] FS: 0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000 [ 54.855997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.862522] CR2: 0000000000000058 CR3: 0000000001e0a000 CR4: 00000000000406f0 [ 54.870602] Call Trace: [ 54.873442] ? ib_uverbs_comp_handler+0x20/0xe0 [ 54.878610] ? flush_qp+0x6e/0x2b0 [ 54.882514] ? c4iw_modify_qp+0x11c2/0x1870 [ 54.887295] ? close_con_rpl+0xe7/0x170 [ 54.891686] ? kfree_skb+0x33/0x90 [ 54.895592] ? skb_dequeue+0x52/0x60 [ 54.899690] ? process_work+0x4a/0x60 [ 54.903887] ? process_one_work+0x1c2/0x3e0 [ 54.908664] ? worker_thread+0x47/0x3d0 [ 54.913056] ? kthread+0xfc/0x130 [ 54.916864] ? create_worker+0x180/0x180 [ 54.921353] ? kthread_create_on_node+0x40/0x40 [ 54.926521] ? ret_from_fork+0x22/0x30 [ 54.930811] Code: c0 74 05 e8 b3 1c 73 ff 48 89 d8 5b c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 9c 5b fa 31 c0 ba 01 00 00 00 0f b1 17 85 c0 75 05 48 89 d8 5b c3 89 c6 e8 9c 09 73 ff 48 [ 54.952099] RIP: _raw_spin_lock_irqsave+0x10/0x30 RSP: ffffc900000c7c70 [ 54.959598] CR2: 0000000000000058 [ 54.963405] ---[ end trace 896cfe0234c949d2 ]--- [ 102.633421] random: crng init done --------------5CD288422DE25B47BAC18EFE Content-Type: text/x-log; name="bisect.log" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="bisect.log" git bisect start # good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11 git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12 # bad: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1 git bisect bad 2ea659a9ef488125eb46da6eb571de5eae5c43f6 # good: [221656e7c4ce342b99c31eca96c1cbb6d1dce45f] Merge tag 'sound-4.12-= rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect good 221656e7c4ce342b99c31eca96c1cbb6d1dce45f # bad: [c6a677c6f37bb7abc85ba7e3465e82b9f7eb1d91] Merge tag 'staging-4.12= -rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad c6a677c6f37bb7abc85ba7e3465e82b9f7eb1d91 # bad: [e579dde654fc2c6b0d3e4b77a9a4b2d2405c510e] Merge branch 'for-linus= ' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespac= e git bisect bad e579dde654fc2c6b0d3e4b77a9a4b2d2405c510e # bad: [a96480723c287c502b02659f4b347aecaa651ea1] Merge tag 'for-linus-4.= 12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip git bisect bad a96480723c287c502b02659f4b347aecaa651ea1 # good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linu= s' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 16a12fa9aed176444fc795b09e796be41902bb08 # bad: [1684096b1ed813f621fb6cbd06e72235c1c2a0ca] Merge tag 'for-linus' o= f git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma git bisect bad 1684096b1ed813f621fb6cbd06e72235c1c2a0ca # bad: [e821303c428eedcc20746224d590b11c7000a7e5] iw_cxgb4: Use dsgl by d= efault git bisect bad e821303c428eedcc20746224d590b11c7000a7e5 # bad: [515ed4f3aab4e8a0855d0cdfd9753a419ccfb297] IB/IPoIB: Separate cont= rol and data related initializations git bisect bad 515ed4f3aab4e8a0855d0cdfd9753a419ccfb297 # bad: [f7b42633720deb5ca8f4bcb175c7dc2933057e7f] IB/hfi1: Ensure VL inde= x is within bounds git bisect bad f7b42633720deb5ca8f4bcb175c7dc2933057e7f # bad: [8688426ba6464f7079649f52cf9108856c419415] IB/hfi1: Cache register= s during state change git bisect bad 8688426ba6464f7079649f52cf9108856c419415 # good: [cf8966b3477d5e6545393bb4499f2051ea554c62] IB/core: Add support f= or fd objects git bisect good cf8966b3477d5e6545393bb4499f2051ea554c62 # bad: [771a52584096c45e4565e8aabb596eece9d73d61] IB/IPoIB: ibX: failed t= o create mcg debug file git bisect bad 771a52584096c45e4565e8aabb596eece9d73d61 # bad: [cd6ce4a5737829052abc4ffc8befd0adfff8998d] IB/hns: Explicitly incl= ude linux/of.h git bisect bad cd6ce4a5737829052abc4ffc8befd0adfff8998d # bad: [1e7710f3f6563940bb6bbc94aa8eadfd344a86af] IB/core: Change complet= ion channel to use the reworked objects schema git bisect bad 1e7710f3f6563940bb6bbc94aa8eadfd344a86af # first bad commit: [1e7710f3f6563940bb6bbc94aa8eadfd344a86af] IB/core: C= hange completion channel to use the reworked objects schema --------------5CD288422DE25B47BAC18EFE--