Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932476AbdLVJxf (ORCPT ); Fri, 22 Dec 2017 04:53:35 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39512 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751334AbdLVJx2 (ORCPT ); Fri, 22 Dec 2017 04:53:28 -0500 From: Christian Borntraeger Subject: regression 4.15-rc: kernel oops in dm-multipath To: Alasdair Kergon , Mike Snitzer , dm-devel@redhat.com, "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" Date: Fri, 22 Dec 2017 10:53:21 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17122209-0012-0000-0000-0000059BE963 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17122209-0013-0000-0000-000019171D47 Message-Id: <7dc463af-554f-1778-7e24-757de143d6b7@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-22_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712220136 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4935 Lines: 92 Since 4.15-rc1 I get the following during boot relatively often (but not 100% reproducable) Seems to be 2 oopses... "[ 5.851954] device-mapper: multipath service-time: version 0.3.0 loaded "[ 5.902244] Unable to handle kernel pointer dereference in virtual kernel address space "[ 5.902272] Failing address: 000003ff82196000 TEID: 000003ff82196803 "[ 5.902275] Fault in home space mode while using kernel ASCE. "[ 5.902283] AS:000000000135c007 R3:00000002105e0007 S:0000000000000020 "[ 5.902390] Oops: 0010 ilc:3 [#1] SMP "[ 5.902437] Modules linked in: dm_service_time mlx4_ib mlx4_en ptp ib_core pp "s_core ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha "1_s390 sha_common mlx4_core eadm_sch dm_multipath dm_mod zcrypt_cex4 zcrypt rng_ "core "[ 5.902818] Unable to handle kernel pointer dereference in virtual kernel address space "[ 5.902829] Failing address: 000003ff8218e000 TEID: 000003ff8218e803 "[ 5.902840] Fault in home space mode while using "[ 5.902867] vhost_net sch_fq_codel tun "[ 5.902899] kernel "[ 5.902917] vhost tap ip_tables "[ 5.902940] ASCE. "[ 5.902955] AS:000000000135c007 R3:00000002105e0007 "[ 5.902972] x_tables autofs4 "[ 5.902987] S:0000000000000020 "[ 5.903012] CPU: 0 PID: 742 Comm: systemd-udevd Not tainted 4.15.0-rc3+ #11 "[ 5.903024] Hardware name: IBM 2964 NC9 704 (LPAR) "[ 5.903035] Krnl PSW : 0000000047407382 00000000702c2011 (multipath_busy+0x9a "/0x128 [dm_multipath]) "[ 5.903085] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI: 0 EA:3 "[ 5.903112] Krnl GPRS: 0000000000000001 000003ff82195a72 0000000000000000 ffffffff00000000 "[ 5.903133] 000003ff800cff9c 0000000000000000 0000000000000800 00000001fa508730 "[ 5.903154] 00000001f1f48000 000003e000000000 00000001f808c030 00000001e76afb00 "[ 5.903173] 00000001f1f48000 00000001f89efc58 00000001f89efa08 00000001f89ef9c8 "[ 5.903191] Krnl Code: 000003ff800f4e30: e310b0200004 lg %r1,32(%r11) "[ 5.903191] 000003ff800f4e36: e31010000004 lg %r1,0(%r1) "[ 5.903191] #000003ff800f4e3c: e31011100004 lg %r1,272(%r1) "[ 5.903191] >000003ff800f4e42: e32016980004 lg %r2,1688(%r1) "[ 5.903191] 000003ff800f4e48: c0e5fffff972 brasl %r14,3ff800f412c "[ 5.903191] 000003ff800f4e4e: ec28000d007e cij %r2,0,8,3ff800f4e68 "[ 5.903191] 000003ff800f4e54: a7180001 lhi %r1,1 "[ 5.903191] 000003ff800f4e58: e3b0b0000004 lg %r11,0(%r11) "[ 5.903308] Call Trace: "[ 5.903319] ([<00000001f89ef9c0>] 0x1f89ef9c0) "[ 5.903342] [<000003ff800cff3e>] dm_old_request_fn+0x56/0x1d0 [dm_mod] "[ 5.903367] [<0000000000734f66>] __blk_run_queue+0x86/0x108 "[ 5.903385] [<0000000000736132>] queue_unplugged+0x8a/0x200 "[ 5.903404] [<000000000073ca0c>] blk_flush_plug_list+0x284/0x2f0 "[ 5.903417] [<000000000073d234>] blk_finish_plug+0x3c/0x60 "[ 5.903426] [<0000000000313dd8>] __do_page_cache_readahead+0x2e8/0x3d0 "[ 5.903441] [<0000000000314512>] force_page_cache_readahead+0xb2/0x150 "[ 5.903454] [<00000000002ff1f0>] generic_file_read_iter+0x6b0/0xa28 "[ 5.903477] [<00000000003b7e98>] __vfs_read+0x100/0x178 "[ 5.903490] [<00000000003b7f9a>] vfs_read+0x8a/0x148 "[ 5.903506] [<00000000003b864e>] SyS_read+0x66/0xd8 "[ 5.903520] [<0000000000ae9144>] system_call+0x290/0x2b0 "[ 5.903523] INFO: lockdep is turned off. "[ 5.903527] Last Breaking-Event-Address: "[ 5.903541] [<000003ff800f4e18>] multipath_busy+0x70/0x128 [dm_multipath] "[ 5.903552] "[ 5.903562] Oops: 0010 ilc:3 [#2] "[ 5.903566] Kernel panic - not syncing: Fatal exception: panic_on_oops The faulting code seems to be list_for_each_entry(pgpath, &pg->pgpaths, list) { 854: e3 b0 b0 00 00 04 lg %r11,0(%r11) 85a: ec ba 00 21 80 64 cgrje %r11,%r10,89c if (pgpath->is_active) { 860: 91 80 b0 f8 tm 248(%r11),128 864: a7 84 ff f8 je 854 struct request_queue *q = bdev_get_queue(pgpath->path.dev->bdev); 868: e3 10 b0 20 00 04 lg %r1,32(%r11) bool blk_poll(struct request_queue *q, blk_qc_t cookie); static inline struct request_queue *bdev_get_queue(struct block_device *bdev) { return bdev->bd_disk->queue; /* this is never NULL */ 86e: e3 10 10 00 00 04 lg %r1,0(%r1) 874: e3 10 11 10 00 04 lg %r1,272(%r1) return blk_lld_busy(q); 87a: e3 20 16 98 00 04 lg %r2,1688(%r1) 880: c0 e5 00 00 00 00 brasl %r14,880 any quick ideas?