Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1836161imm; Mon, 3 Sep 2018 10:41:10 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYsV9Y3hOt9XzYJgcncsu7jg7wQPmj5ffzu+zG8C60/I80M6l9zhUhF8eY8tF0kC6DFDlcg X-Received: by 2002:a62:9894:: with SMTP id d20-v6mr30598580pfk.186.1535996470929; Mon, 03 Sep 2018 10:41:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535996470; cv=none; d=google.com; s=arc-20160816; b=pu6rZgmcCaeHYQ5G5o6aoLSQf48xKv3ERUcNj+td55WvZa8Qf/Azl3jyJQuATnxKwm tozma8W02M4bDRAZCXPkNXbMGuJjhp2rhEQF4sCDP6PgkVL+Fvyi9dN0fUq/2CJoNCTz JgP7cqBrzvmCKkoCbb+TyTSzL4la16E85zu0GHh7qpJHb822bpO6oGM81gASl8lOzIiY vGjIIPz0rMrd75saJBbcnk+bojrf7nSQn1Gfvb1zn+6wX95jqZo2VESNDaWBj1VxUjBN 12opZ1Fx5ec2YUXGKtY7r/Z8c76wWTk2anTXEnvQ3fcGoWDtFNxVSPF/mKKSDCNl6+47 T6+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=t9gkz6RCRtx5dWSGEzRY3Uj5g9GHsyOFWkosIfLgeeE=; b=ADiuF64tnrL3ySxNPefKdR8gyvZdtywrls4U9bfXer9O2Knwpu83cGo0pSj/vA/JWU gCZkHjLcHbhLYiTjG5GHFSf7wOll13yBfDORXJTiL4+fOj5VhGJ2MjY5i3xw7phbjHbu pxgVB0jhK8mUJSdC4k6m9lzn/39bEwzkRgUYOM/pCFN95vEQSXWdaKgTV/E7BBZrD5Vh za/CpxI9LYGFGG9hAf3TICwvk+cdyATB3zFjMZj9WSWW+FRVBi4Ery5+UmtP8Our5uFT r3jH9+enYmk8tNHvgsGaod+7A1njdvemxjZJS1Z/vpgLuPkGWQm96324BXgDPGxrLChv M8VQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o9-v6si18546299pgf.331.2018.09.03.10.40.55; Mon, 03 Sep 2018 10:41:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731927AbeICWAe (ORCPT + 99 others); Mon, 3 Sep 2018 18:00:34 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:49186 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728885AbeICWAe (ORCPT ); Mon, 3 Sep 2018 18:00:34 -0400 Received: from localhost (ip-213-127-74-90.ip.prioritytelecom.net [213.127.74.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id A2979A95; Mon, 3 Sep 2018 17:39:22 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sreekanth Reddy , Tomas Henzl , Bart Van Assche , "Martin K. Petersen" Subject: [PATCH 4.18 106/123] scsi: mpt3sas: Fix calltrace observed while running IO & reset Date: Mon, 3 Sep 2018 18:57:30 +0200 Message-Id: <20180903165724.003783639@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180903165719.499675257@linuxfoundation.org> References: <20180903165719.499675257@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.18-stable review patch. If anyone has any objections, please let me know. ------------------ From: Sreekanth Reddy commit e70183143cc472960bc60dfee1b7bbe1949feffb upstream. Below kernel BUG was observed while running IOs with host reset (issued from application), mpt3sas_cm0: diag reset: SUCCESS ------------[ cut here ]------------ WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas] Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G W ------------ 3.10.0-875.el7.brdc.x86_64 #1 Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013 Call Trace: [] dump_stack+0x19/0x1b [] __warn+0xd8/0x100 [] warn_slowpath_null+0x1d/0x20 [] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas] [] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas] [] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas] [] ? vprintk_default+0x29/0x40 [] ? printk+0x60/0x77 [] ? _base_diag_reset+0x238/0x340 [mpt3sas] [] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas] [] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas] [] ? xfs_file_aio_write+0x155/0x1b0 [xfs] [] ? do_sync_write+0x93/0xe0 [] _ctl_ioctl+0x1a/0x20 [mpt3sas] [] do_vfs_ioctl+0x350/0x560 [] ? __sb_end_write+0x31/0x60 [] SyS_ioctl+0xa1/0xc0 [] ? system_call_after_swapgs+0xa2/0x146 [] system_call_fastpath+0x1c/0x21 [] ? system_call_after_swapgs+0xae/0x146 ---[ end trace 5dac5b98d89aaa3c ]--- ------------[ cut here ]------------ kernel BUG at block/blk-core.c:1476! invalid opcode: 0000 [#1] SMP Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G W ------------ 3.10.0-875.el7.brdc.x86_64 #1 Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013 task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000 RIP: 0010:[] [] blk_requeue_request+0x90/0xa0 RSP: 0018:ffff903c6b783dc0 EFLAGS: 00010087 RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200 RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0 RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000 R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580 R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057 FS: 00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0 Call Trace: [] __scsi_queue_insert+0xbf/0x110 [] scsi_io_completion+0x5da/0x6a0 [] scsi_finish_command+0xdc/0x140 [] scsi_softirq_done+0x132/0x160 [] blk_done_softirq+0x96/0xc0 [] __do_softirq+0xf5/0x280 [] call_softirq+0x1c/0x30 [] do_softirq+0x65/0xa0 [] irq_exit+0x105/0x110 [] smp_apic_timer_interrupt+0x48/0x60 [] apic_timer_interrupt+0x162/0x170 [] ? scsi_done+0x21/0x60 [] ? delay_tsc+0x38/0x60 [] __const_udelay+0x2d/0x30 [] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas] [] _base_get_ioc_facts+0x123/0x590 [mpt3sas] [] ? _base_diag_reset+0x238/0x340 [mpt3sas] [] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas] [] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas] [] ? xfs_file_aio_write+0x155/0x1b0 [xfs] [] ? do_sync_write+0x93/0xe0 [] _ctl_ioctl+0x1a/0x20 [mpt3sas] [] do_vfs_ioctl+0x350/0x560 [] ? __sb_end_write+0x31/0x60 [] SyS_ioctl+0xa1/0xc0 [] ? system_call_after_swapgs+0xa2/0x146 [] system_call_fastpath+0x1c/0x21 [] ? system_call_after_swapgs+0xae/0x146 Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 RIP [] blk_requeue_request+0x90/0xa0 RSP As a part of host reset operation, driver will flushout all IOs outstanding at driver level with "DID_RESET" result. To find which are all commands outstanding at the driver level, driver loops with smid starting from one to HBA queue depth and calls mpt3sas_scsih_scsi_lookup_get() to get scmd as shown below for (smid = 1; smid <= ioc->scsiio_depth; smid++) { scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); if (!scmd) continue; But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some scsi cmnds which are not outstanding at the driver level (possibly request is constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if driver uses scsi_block_requests and scsi_unblock_requests, issue still persists as they will be just blocking further IO from scsi layer and not from block layer) and these commands are flushed with DID_RESET host bytes thus resulting into above kernel BUG. This issue got introduced by commit dbec4c9040ed ("scsi: mpt3sas: lockless command submission"). To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get() to check for smid equals to zero (note: whenever any scsi cmnd is processing at the driver level then smid for that scsi cmnd will be non-zero, always it starts from one) before it returns the scmd pointer to the caller. If smid is zero then this function returns scmd pointer as NULL and driver won't flushout those scsi cmnds at driver level with DID_RESET host byte thus this issue will not be observed. [mkp: amended with updated fix from Sreekanth] Signed-off-by: Sreekanth Reddy Fixes: dbec4c9040ed ("scsi: mpt3sas: lockless command submission") Cc: stable@vger.kernel.org # v4.16+ Reviewed-by: Tomas Henzl Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/mpt3sas/mpt3sas_base.c | 1 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -3284,6 +3284,7 @@ void mpt3sas_base_clear_st(struct MPT3SA st->cb_idx = 0xFF; st->direct_io = 0; atomic_set(&ioc->chain_lookup[st->smid - 1].chain_offset, 0); + st->smid = 0; } /** --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -1489,7 +1489,7 @@ mpt3sas_scsih_scsi_lookup_get(struct MPT scmd = scsi_host_find_tag(ioc->shost, unique_tag); if (scmd) { st = scsi_cmd_priv(scmd); - if (st->cb_idx == 0xFF) + if (st->cb_idx == 0xFF || st->smid == 0) scmd = NULL; } }