Hi, I came across a problem when un-mounting CIFS mounts that bisected
back to the commit "[ef2cc88e2a205b8a11a19e78db63a70d3728cdf5] Merge tag
'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi".
I had been tracking down why Firefox was refusing to load some tabs, but
also noticed that CIFS umount operations at shutdown were giving the
dmesg output listed below.
If I booted in single user mode, then did the CIFS mount:
mount.cifs //192.168.1.104/homes homes -o
vers=2.1,credentials=/usr/local/bin/credentials
followed by
umount /mnt/homes
I still received the dmesg output "refcount_t: underflow;
use-after-free" with the first bad commit and not with the last good commit
The .config changed considerably when bisecting from the last good commit
"5.3.0+ [7807759e4ad8d46347a5d52a0910269320b81e65]"
to the first bad commit
"5.4.0+ [ef2cc88e2a205b8a11a19e78db63a70d3728cdf5]"
so I've attached the .config files as well
Kernel 5.4.0 did not have this problem.
Machine has a 4-core AMD Athlon(tm) II X4 640 Processor with 8GiB RAM.
Compiler: gcc (Debian 9.2.1-21) 9.2.1 20191130
git bisect start
# bad: [63de37476ebd1e9bab6a9e17186dc5aa1da9ea99] Merge tag
'tag-chrome-platform-for-v5.5' of
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
git bisect bad 63de37476ebd1e9bab6a9e17186dc5aa1da9ea99
# good: [21b26d2679584c6a60e861aa3e5ca09a6bab0633] Merge tag
'5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect good 21b26d2679584c6a60e861aa3e5ca09a6bab0633
# good: [a5255bc31673c72e264d837cd13cd3085d72cb58] Merge tag
'dmaengine-5.5-rc1' of git://git.infradead.org/users/vkoul/slave-dma
git bisect good a5255bc31673c72e264d837cd13cd3085d72cb58
# bad: [97eeb4d9d755605385fa329da9afa38729f3413c] Merge tag
'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad 97eeb4d9d755605385fa329da9afa38729f3413c
# good: [937d6eefc716a9071f0e3bada19200de1bb9d048] Merge tag 'docs-5.5a'
of git://git.lwn.net/linux
git bisect good 937d6eefc716a9071f0e3bada19200de1bb9d048
# good: [cef1538456ba1f965289c778c418460106d0c1c7] scsi: pm80xx:
Initialize variable used as return status
git bisect good cef1538456ba1f965289c778c418460106d0c1c7
# good: [b16be561876ed9b72dcb2bf8c48b30f573f63c1c] xfs: use unsigned int
for all size values in struct xfs_da_geometry
git bisect good b16be561876ed9b72dcb2bf8c48b30f573f63c1c
# good: [e8777b27ca8a6946bd69ad6a0f282e519c895e4b] xfs: avoid time_t in
user api
git bisect good e8777b27ca8a6946bd69ad6a0f282e519c895e4b
# bad: [9b326948c23908692d7dfe56ed149840d3829eaa] Merge tag
'firewire-update' of
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
git bisect bad 9b326948c23908692d7dfe56ed149840d3829eaa
# good: [aa5334c4f3014940f11bf876e919c956abef4089] scsi: scsi_debug:
num_tgts must be >= 0
git bisect good aa5334c4f3014940f11bf876e919c956abef4089
# good: [80647a89eaf3f2549741648f3230cd6ff68c23b4] scsi: target: core:
Release SPC-2 reservations when closing a session
git bisect good 80647a89eaf3f2549741648f3230cd6ff68c23b4
# good: [65309ef6b258f5a7b57c1033a82ba2aba5c434cc] scsi: bnx2fc: timeout
calculation invalid for bnx2fc_eh_abort()
git bisect good 65309ef6b258f5a7b57c1033a82ba2aba5c434cc
# good: [7807759e4ad8d46347a5d52a0910269320b81e65] firewire: core: code
cleanup after vm_map_pages_zero introduction
git bisect good 7807759e4ad8d46347a5d52a0910269320b81e65
# bad: [ef2cc88e2a205b8a11a19e78db63a70d3728cdf5] Merge tag 'scsi-misc'
of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad ef2cc88e2a205b8a11a19e78db63a70d3728cdf5
# first bad commit: [ef2cc88e2a205b8a11a19e78db63a70d3728cdf5] Merge tag
'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad
ef2cc88e2a205b8a11a19e78db63a70d3728cdf5 is the first bad commit
commit ef2cc88e2a205b8a11a19e78db63a70d3728cdf5
Merge: 937d6eefc716 65309ef6b258
Author: Linus Torvalds <[email protected]>
Date: Mon Dec 2 13:37:02 2019 -0800
Merge tag 'scsi-misc' of
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes.
The major core changes are Al Viro's reworking of sg's handling of
copy to/from user, Ming Lei's removal of the host busy counter to
avoid contention in the multiqueue case and Damien Le Moal's
fixing of
residual tracking across error handling"
* tag 'scsi-misc' of
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (251 commits)
scsi: bnx2fc: timeout calculation invalid for bnx2fc_eh_abort()
scsi: target: core: Fix a pr_debug() argument
scsi: iscsi: Don't send data to unbound connection
scsi: target: iscsi: Wait for all commands to finish before
freeing a session
scsi: target: core: Release SPC-2 reservations when closing a session
scsi: target: core: Document target_cmd_size_check()
scsi: bnx2i: fix potential use after free
Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
scsi: NCR5380: Add disconnect_mask module parameter
scsi: NCR5380: Unconditionally clear ICR after do_abort()
scsi: NCR5380: Call scsi_set_resid() on command completion
scsi: scsi_debug: num_tgts must be >= 0
scsi: lpfc: use hdwq assigned cpu for allocation
scsi: arcmsr: fix indentation issues
scsi: qla4xxx: fix double free bug
scsi: pm80xx: Modified the logic to collect fatal dump
scsi: pm80xx: Tie the interrupt name to the module instance
scsi: pm80xx: Controller fatal error through sysfs
scsi: pm80xx: Do not request 12G sas speeds
scsi: pm80xx: Cleanup command when a reset times out
...
.../devicetree/bindings/ufs/ti,j721e-ufs.yaml | 68 ++
.../devicetree/bindings/ufs/ufshcd-pltfrm.txt | 1 +
Documentation/scsi/scsi_mid_low_api.txt | 3 +-
drivers/ata/pata_arasan_cf.c | 1 -
drivers/s390/scsi/Makefile | 2 +-
drivers/s390/scsi/zfcp_aux.c | 12 +-
drivers/s390/scsi/zfcp_dbf.c | 8 +-
drivers/s390/scsi/zfcp_def.h | 4 +-
drivers/s390/scsi/zfcp_diag.c | 305 +++++++
drivers/s390/scsi/zfcp_diag.h | 101 +++
drivers/s390/scsi/zfcp_erp.c | 4 +-
drivers/s390/scsi/zfcp_ext.h | 1 +
drivers/s390/scsi/zfcp_fsf.c | 73 +-
drivers/s390/scsi/zfcp_fsf.h | 21 +-
drivers/s390/scsi/zfcp_scsi.c | 4 +-
drivers/s390/scsi/zfcp_sysfs.c | 170 +++-
drivers/scsi/NCR5380.c | 37 +-
drivers/scsi/aacraid/aachba.c | 11 +-
drivers/scsi/aacraid/aacraid.h | 23 +-
drivers/scsi/aacraid/comminit.c | 5 +
drivers/scsi/aacraid/commsup.c | 21 +-
drivers/scsi/aacraid/linit.c | 35 +-
drivers/scsi/aacraid/src.c | 10 +
drivers/scsi/arcmsr/arcmsr_hba.c | 6 +-
drivers/scsi/arm/acornscsi.c | 4 +-
drivers/scsi/atari_scsi.c | 6 +-
drivers/scsi/atp870u.c | 2 +-
drivers/scsi/bfa/bfad.c | 3 +-
drivers/scsi/bfa/bfad_attr.c | 4 +-
drivers/scsi/bnx2fc/57xx_hsi_bnx2fc.h | 2 +-
drivers/scsi/bnx2fc/bnx2fc_io.c | 2 +-
drivers/scsi/bnx2i/bnx2i_iscsi.c | 2 +-
drivers/scsi/csiostor/csio_hw.c | 20 +-
drivers/scsi/csiostor/csio_init.c | 7 +-
drivers/scsi/csiostor/csio_lnode.c | 18 +-
drivers/scsi/csiostor/csio_mb.c | 2 +-
drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 2 -
drivers/scsi/cxgbi/libcxgbi.c | 28 -
drivers/scsi/cxlflash/main.c | 2 -
drivers/scsi/esas2r/esas2r_flash.c | 1 +
drivers/scsi/fnic/fnic_scsi.c | 3 +-
drivers/scsi/fnic/vnic_dev.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas.h | 67 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 376 +++++---
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 6 +-
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 13 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 30 +-
drivers/scsi/hosts.c | 19 +-
drivers/scsi/ips.c | 2 +-
drivers/scsi/isci/port_config.c | 2 +-
drivers/scsi/isci/remote_device.c | 2 +-
drivers/scsi/iscsi_tcp.c | 8 +
drivers/scsi/lpfc/lpfc.h | 40 +-
drivers/scsi/lpfc/lpfc_attr.c | 298 +++++--
drivers/scsi/lpfc/lpfc_bsg.c | 18 +-
drivers/scsi/lpfc/lpfc_crtn.h | 7 +
drivers/scsi/lpfc/lpfc_ct.c | 28 +-
drivers/scsi/lpfc/lpfc_debugfs.c | 118 ++-
drivers/scsi/lpfc/lpfc_els.c | 57 +-
drivers/scsi/lpfc/lpfc_hbadisc.c | 200 +++--
drivers/scsi/lpfc/lpfc_hw4.h | 31 +-
drivers/scsi/lpfc/lpfc_init.c | 954
++++++++++++++++-----
drivers/scsi/lpfc/lpfc_logmsg.h | 17 +
drivers/scsi/lpfc/lpfc_mbox.c | 1 +
drivers/scsi/lpfc/lpfc_mem.c | 3 -
drivers/scsi/lpfc/lpfc_nportdisc.c | 149 +++-
drivers/scsi/lpfc/lpfc_nvme.c | 85 +-
drivers/scsi/lpfc/lpfc_nvmet.c | 103 +--
drivers/scsi/lpfc/lpfc_nvmet.h | 2 -
drivers/scsi/lpfc/lpfc_scsi.c | 43 +-
drivers/scsi/lpfc/lpfc_sli.c | 391 +++++++--
drivers/scsi/lpfc/lpfc_sli.h | 3 +-
drivers/scsi/lpfc/lpfc_sli4.h | 42 +-
drivers/scsi/lpfc/lpfc_version.h | 2 +-
drivers/scsi/mac_scsi.c | 2 +-
drivers/scsi/megaraid/megaraid_sas.h | 3 +
drivers/scsi/megaraid/megaraid_sas_base.c | 8 +-
drivers/scsi/megaraid/megaraid_sas_fp.c | 7 +-
drivers/scsi/mpt3sas/mpt3sas_base.c | 36 +-
drivers/scsi/mpt3sas/mpt3sas_base.h | 15 +-
drivers/scsi/mpt3sas/mpt3sas_ctl.c | 344 +++++++-
drivers/scsi/mpt3sas/mpt3sas_ctl.h | 9 +
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 4 +-
drivers/scsi/mpt3sas/mpt3sas_trigger_diag.c | 12 +-
drivers/scsi/mvsas/mv_sas.c | 2 +-
drivers/scsi/ncr53c8xx.c | 2 +-
drivers/scsi/nsp32.c | 2 +-
drivers/scsi/pcmcia/Kconfig | 2 +-
drivers/scsi/pcmcia/nsp_cs.c | 2 -
drivers/scsi/pm8001/pm8001_ctl.c | 20 +
drivers/scsi/pm8001/pm8001_hwi.c | 131 ++-
drivers/scsi/pm8001/pm8001_init.c | 36 +-
drivers/scsi/pm8001/pm8001_sas.c | 70 +-
drivers/scsi/pm8001/pm8001_sas.h | 24 +-
drivers/scsi/pm8001/pm80xx_hwi.c | 451 +++++++---
drivers/scsi/pm8001/pm80xx_hwi.h | 3 +
drivers/scsi/qedf/qedf_dbg.h | 2 +-
drivers/scsi/qedf/qedf_main.c | 8 +
drivers/scsi/qedi/qedi_dbg.h | 2 +-
drivers/scsi/qla2xxx/qla_attr.c | 4 +-
drivers/scsi/qla2xxx/qla_def.h | 34 +-
drivers/scsi/qla2xxx/qla_fw.h | 2 +
drivers/scsi/qla2xxx/qla_gbl.h | 1 +
drivers/scsi/qla2xxx/qla_gs.c | 66 +-
drivers/scsi/qla2xxx/qla_init.c | 140 +--
drivers/scsi/qla2xxx/qla_inline.h | 12 +
drivers/scsi/qla2xxx/qla_iocb.c | 106 ++-
drivers/scsi/qla2xxx/qla_isr.c | 36 +-
drivers/scsi/qla2xxx/qla_mbx.c | 15 +-
drivers/scsi/qla2xxx/qla_mid.c | 11 +-
drivers/scsi/qla2xxx/qla_nvme.c | 4 +-
drivers/scsi/qla2xxx/qla_os.c | 174 ++--
drivers/scsi/qla2xxx/qla_target.c | 2 +-
drivers/scsi/qla2xxx/qla_tmpl.c | 29 +-
drivers/scsi/qla2xxx/qla_version.h | 2 +-
drivers/scsi/qla4xxx/ql4_mbx.c | 3 -
drivers/scsi/scsi.c | 6 +-
drivers/scsi/scsi_debug.c | 9 +-
drivers/scsi/scsi_lib.c | 45 +-
drivers/scsi/scsi_logging.c | 10 +-
drivers/scsi/scsi_priv.h | 2 +-
drivers/scsi/scsi_sysfs.c | 22 +-
drivers/scsi/scsi_trace.c | 124 +--
drivers/scsi/sd.c | 4 +
drivers/scsi/sg.c | 91 +-
drivers/scsi/smartpqi/smartpqi.h | 77 +-
drivers/scsi/smartpqi/smartpqi_init.c | 437 ++++++----
drivers/scsi/smartpqi/smartpqi_sas_transport.c | 22 +-
drivers/scsi/sun3_scsi.c | 4 +-
drivers/scsi/ufs/Kconfig | 10 +
drivers/scsi/ufs/Makefile | 1 +
drivers/scsi/ufs/ti-j721e-ufs.c | 90 ++
drivers/scsi/ufs/ufs-hisi.c | 5 +-
drivers/scsi/ufs/ufs-mediatek.c | 3 +
drivers/scsi/ufs/ufs-qcom.c | 53 ++
drivers/scsi/ufs/ufs-qcom.h | 3 +
drivers/scsi/ufs/ufs-sysfs.c | 15 +-
drivers/scsi/ufs/ufs_bsg.c | 1 +
drivers/scsi/ufs/ufshcd-dwc.c | 2 +-
drivers/scsi/ufs/ufshcd-pltfrm.c | 1 -
drivers/scsi/ufs/ufshcd.c | 214 +++--
drivers/scsi/ufs/ufshcd.h | 12 +
drivers/scsi/ufs/ufshci.h | 2 +-
drivers/scsi/zorro_esp.c | 11 +-
drivers/target/iscsi/cxgbit/cxgbit_ddp.c | 3 -
drivers/target/iscsi/iscsi_target.c | 24 +-
drivers/target/iscsi/iscsi_target_auth.c | 232 +++--
drivers/target/iscsi/iscsi_target_auth.h | 17 +-
drivers/target/iscsi/iscsi_target_parameters.h | 3 -
drivers/target/target_core_fabric_lib.c | 2 +-
drivers/target/target_core_tpg.c | 12 -
drivers/target/target_core_transport.c | 28 +
drivers/target/target_core_user.c | 6 +-
drivers/target/target_core_xcopy.c | 1 -
drivers/usb/storage/ene_ub6250.c | 2 +-
drivers/usb/storage/transport.c | 3 +-
drivers/usb/storage/uas.c | 1 -
include/scsi/iscsi_proto.h | 1 +
include/scsi/scsi_cmnd.h | 5 +-
include/scsi/scsi_device.h | 5 +-
include/scsi/scsi_host.h | 19 +-
include/target/target_core_base.h | 1 -
include/uapi/linux/chio.h | 11 +-
163 files changed, 5656 insertions(+), 1987 deletions(-)
create mode 100644 Documentation/devicetree/bindings/ufs/ti,j721e-ufs.yaml
create mode 100644 drivers/s390/scsi/zfcp_diag.c
create mode 100644 drivers/s390/scsi/zfcp_diag.h
create mode 100644 drivers/scsi/ufs/ti-j721e-ufs.c
[ 690.077830] ------------[ cut here ]------------
[ 690.077894] refcount_t: underflow; use-after-free.
[ 690.077936] WARNING: CPU: 3 PID: 5402 at lib/refcount.c:28
refcount_warn_saturate+0xa7/0xf0
[ 690.077976] Modules linked in: md4 sha512_generic cmac nls_utf8 cifs
libdes libarc4 dns_resolver fscache bnep bluetooth hmac drbg ansi_cprng
ecdh_generic ecc nfc rfkill snd_hrtimer cpufreq_userspace
cpufreq_conservative cpufreq_powersave binfmt_misc fuse max6650
hwmon_vid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi parport_pc ppdev lp parport
edac_mce_amd snd_emu10k1_synth snd_hda_codec_hdmi snd_emux_synth
snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event
kvm_amd snd_seq snd_hda_intel ccp snd_intel_dspcfg snd_emu10k1
snd_hda_codec radeon snd_util_mem snd_ac97_codec snd_hda_core ac97_bus
snd_rawmidi snd_hwdep snd_seq_device snd_pcm_oss ttm snd_mixer_oss
rng_core kvm snd_pcm drm_kms_helper drm snd_timer irqbypass snd
i2c_algo_bit soundcore wmi_bmof fb_sys_fops syscopyarea sysfillrect
sysimgblt k10temp evdev pcspkr emu10k1_gp gameport serio_raw
acpi_cpufreq asus_atk0110 sp5100_tco wmi button ext4 crc16 mbcache jbd2
btrfs
[ 690.078010] xor zstd_decompress zstd_compress raid6_pq libcrc32c
crc32c_generic hid_generic usbhid hid sg sr_mod cdrom sd_mod ata_generic
uas usb_storage ohci_pci firewire_ohci ahci libahci pata_atiixp
firewire_core crc_itu_t libata ehci_pci ohci_hcd i2c_piix4 ehci_hcd
scsi_mod r8169 realtek usbcore libphy
[ 690.078549] CPU: 3 PID: 5402 Comm: umount Not tainted 5.4.0+ #3676
[ 690.078579] Hardware name: System manufacturer System Product
Name/M3A78 PRO, BIOS 1701 01/27/2011
[ 690.078625] RIP: 0010:refcount_warn_saturate+0xa7/0xf0
[ 690.078651] Code: 05 81 ec d2 00 01 e8 38 bc cb ff 0f 0b c3 80 3d 6f
ec d2 00 00 75 94 48 c7 c7 38 33 86 9e c6 05 5f ec d2 00 01 e8 19 bc cb
ff <0f> 0b c3 80 3d 4e ec d2 00 00 0f 85 71 ff ff ff 48 c7 c7 90 33 86
[ 690.078738] RSP: 0018:ffffac82c1cbfd50 EFLAGS: 00010282
[ 690.078764] RAX: 0000000000000000 RBX: ffff9e12dfe78ed8 RCX:
0000000000000006
[ 690.078798] RDX: 0000000000000007 RSI: 0000000000000096 RDI:
ffff9e12e7cd88f0
[ 690.078832] RBP: ffff9e12dfe78ee8 R08: 000000000000039c R09:
ffff9e12deae4940
[ 690.078866] R10: 0000000000aaaaaa R11: 0000000000000000 R12:
000000000000001c
[ 690.081634] R13: ffff9e12dff9b000 R14: 0000000000000000 R15:
ffff9e12de774780
[ 690.084396] FS: 00007ff5551cc080(0000) GS:ffff9e12e7cc0000(0000)
knlGS:0000000000000000
[ 690.087180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 690.089975] CR2: 0000561df46bf058 CR3: 000000021e5dc000 CR4:
00000000000006e0
[ 690.092793] Call Trace:
[ 690.095750] close_shroot+0x86/0xc0 [cifs]
[ 690.098580] SMB2_tdis+0xa3/0x1b0 [cifs]
[ 690.101377] ? del_timer+0x54/0x80
[ 690.104157] ? __cancel_work_timer+0x120/0x1b0
[ 690.106947] cifs_put_tcon.part.0+0xbd/0x210 [cifs]
[ 690.109731] cifs_put_tlink+0x45/0x60 [cifs]
[ 690.112505] cifs_umount+0x52/0xc0 [cifs]
[ 690.115245] deactivate_locked_super+0x3b/0x90
[ 690.117978] cleanup_mnt+0x104/0x160
[ 690.120697] task_work_run+0x84/0xa0
[ 690.123403] do_syscall_64+0x3b6/0x410
[ 690.126104] ? do_user_addr_fault+0x206/0x4c0
[ 690.128807] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 690.131507] RIP: 0033:0x7ff555423307
[ 690.134197] Code: eb 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44
00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 eb 0b 00 f7 d8 64 89 01 48
[ 690.136988] RSP: 002b:00007ffc50e1c8f8 EFLAGS: 00000246 ORIG_RAX:
00000000000000a6
[ 690.139763] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
00007ff555423307
[ 690.142534] RDX: 0000561df46b86d8 RSI: 0000000000000000 RDI:
0000561df46bf300
[ 690.145291] RBP: 00007ffc50e1c930 R08: 0000000000000001 R09:
00007ffc50e1b6b0
[ 690.148043] R10: fffffffffffff11b R11: 0000000000000246 R12:
0000561df40e3660
[ 690.150783] R13: 00007ffc50e1db20 R14: 0000000000000000 R15:
0000000000000000
[ 690.153513] ---[ end trace 1dba57f292a4b0d2 ]---
Arthur Marsh wrote on 12/5/19 2:14 PM:
> Hi, I came across a problem when un-mounting CIFS mounts that bisected
> back to the commit "[ef2cc88e2a205b8a11a19e78db63a70d3728cdf5] Merge tag
> 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi".
This still happens with 5.5.0-rc1:
[ 301.679819] ------------[ cut here ]------------
[ 301.680095] refcount_t: underflow; use-after-free.
[ 301.680192] WARNING: CPU: 1 PID: 3569 at lib/refcount.c:28
refcount_warn_saturate+0xb4/0xf3
[ 301.680298] Modules linked in: md4 sha512_generic cmac hmac cifs
libdes dns_resolver fscache libarc4 nfc rfkill cpufreq_conservative
cpufreq_userspace cpufreq_powersave binfmt_misc nls_utf8 nls_cp437 vfat
fat lm75 it87 hwmon_vid tun cuse fuse ib_iser rdma_cm iw_cm ib_cm
ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
parport_pc ppdev lp parport radeon ttm snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel
snd_intel_dspcfg drm_kms_helper snd_hda_codec drm i2c_algo_bit
snd_hda_core fb_sys_fops iTCO_wdt iTCO_vendor_support intel_powerclamp
syscopyarea sysfillrect snd_hwdep sysimgblt snd_pcm evdev snd_timer
pcspkr snd serio_raw rng_core soundcore button acpi_cpufreq ext4 crc16
mbcache jbd2 sr_mod cdrom sg sd_mod ata_generic psmouse ata_piix tg3
pata_it821x firewire_ohci uhci_hcd i2c_i801 lpc_ich firewire_core
mfd_core crc_itu_t libata ehci_pci scsi_mod ehci_hcd usbcore usb_common
ptp pps_core libphy
[ 301.681361] CPU: 1 PID: 3569 Comm: umount Not tainted 5.5.0-rc1 #1692
[ 301.681444] Hardware name: /8I945P Pro, BIOS F10 04/06/2006
[ 301.681521] EIP: refcount_warn_saturate+0xb4/0xf3
[ 301.681586] Code: 24 f8 58 8b c1 e8 ab 91 d0 ff 0f 0b c9 c3 80 3d c3
be 9d c1 00 75 91 c6 05 c3 be 9d c1 01 c7 04 24 4c 59 8b c1 e8 8b 91 d0
ff <0f> 0b c9 c3 80 3d c1 be 9d c1 00 0f 85 6d ff ff ff c6 05 c1 be 9d
[ 301.681807] EAX: 00000026 EBX: ed5ae6d8 ECX: 00000007 EDX: f62950cc
[ 301.681889] ESI: ed5ae6e4 EDI: ed5d7000 EBP: ed503e2c ESP: ed503e28
[ 301.681971] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010292
[ 301.682057] CR0: 80050033 CR2: 0058f00c CR3: 2a20a000 CR4: 000006d0
[ 301.682139] Call Trace:
[ 301.682240] close_shroot+0x97/0xda [cifs]
[ 301.682351] SMB2_tdis+0x7c/0x176 [cifs]
[ 301.682456] ? _get_xid+0x58/0x91 [cifs]
[ 301.682563] cifs_put_tcon.part.0+0x99/0x202 [cifs]
[ 301.682637] ? ida_free+0x99/0x10a
[ 301.682727] ? cifs_umount+0x3d/0x9d [cifs]
[ 301.682829] cifs_put_tlink+0x3a/0x50 [cifs]
[ 301.682929] cifs_umount+0x44/0x9d [cifs]
[ 301.683023] cifs_kill_sb+0x16/0x19 [cifs]
[ 301.683084] deactivate_locked_super+0x28/0x5c
[ 301.683147] deactivate_super+0x30/0x33
[ 301.683204] cleanup_mnt+0x8a/0x103
[ 301.683256] __cleanup_mnt+0xb/0xd
[ 301.683308] task_work_run+0x6c/0x8b
[ 301.683364] exit_to_usermode_loop+0xbe/0xc0
[ 301.683427] do_fast_syscall_32+0x291/0x364
[ 301.683493] entry_SYSENTER_32+0xaa/0x102
[ 301.683549] EIP: 0xb7fa9af1
[ 301.683592] Code: 00 89 d3 5b 5e 5f 5d c3 b8 00 09 3d 00 eb c4 8b 04
24 c3 8b 14 24 c3 8b 1c 24 c3 8b 34 24 c3 90 90 51 52 55 89 e5 0f 34 cd
80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[ 301.683811] EAX: 00000000 EBX: 0058ccf0 ECX: 00000000 EDX: b7f12000
[ 301.683892] ESI: bfe75280 EDI: b7f12000 EBP: bfe74168 ESP: bfe740fc
[ 301.683973] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
[ 301.687861] irq event stamp: 8202
[ 301.691737] hardirqs last enabled at (8201): [<c10c741c>]
console_unlock+0x45a/0x598
[ 301.695687] hardirqs last disabled at (8202): [<c100171f>]
trace_hardirqs_off_thunk+0xc/0x1d
[ 301.699620] softirqs last enabled at (8132): [<c14f9770>]
peernet2id+0x3c/0x56
[ 301.703583] softirqs last disabled at (8130): [<c14f9756>]
peernet2id+0x22/0x56
[ 301.707502] ---[ end trace fbe0ba2a50b93358 ]---
On Sun, Dec 8, 2019 at 5:49 PM Arthur Marsh
<[email protected]> wrote:
>
> This still happens with 5.5.0-rc1:
Does it happen 100% of the time?
Your bisection result looks pretty nonsensical - not that it's
impossible (anything is possible), but it really doesn't look very
likely. Which makes me think maybe it's slightly timing-sensitive or
something?
Would you mind trying to re-do the bisection, and for each kernel try
the mount thing at least a few times before you decide a kernel is
good?
Bisection is very powerful, but if _any_ of the kernels you marked
good weren't really good (they just happened to not trigger the
problem), bisection ends up giving completely the wrong answer. And
with that bisection commit, there's not even a hint of what could have
gone wrong.
Linus
On Sun, Dec 08, 2019 at 06:23:02PM -0800, Linus Torvalds wrote:
> On Sun, Dec 8, 2019 at 5:49 PM Arthur Marsh
> <[email protected]> wrote:
> >
> > This still happens with 5.5.0-rc1:
>
> Does it happen 100% of the time?
>
> Your bisection result looks pretty nonsensical - not that it's
> impossible (anything is possible), but it really doesn't look very
> likely. Which makes me think maybe it's slightly timing-sensitive or
> something?
>
> Would you mind trying to re-do the bisection, and for each kernel try
> the mount thing at least a few times before you decide a kernel is
> good?
>
> Bisection is very powerful, but if _any_ of the kernels you marked
> good weren't really good (they just happened to not trigger the
> problem), bisection ends up giving completely the wrong answer. And
> with that bisection commit, there's not even a hint of what could have
> gone wrong.
FWIW, the thing that is IME absolutely incompatible with bisection
is CONFIG_GCC_PLUGIN_RANDSTRUCT. It can affect frequencies badly
enough, even in the cases when the bug isn't directly dependent
upon that thing.
I suspect that nonsense bisects spewed by CI bots lately (bisect on
x86 oops ending up at commit limited to arch/parisc, etc.) are at
least partially due to that kind of garbage...
On Sun, Dec 8, 2019 at 6:52 PM Al Viro <[email protected]> wrote:
>
> FWIW, the thing that is IME absolutely incompatible with bisection
> is CONFIG_GCC_PLUGIN_RANDSTRUCT. It can affect frequencies badly
> enough, even in the cases when the bug isn't directly dependent
> upon that thing.
It will easily affect timing in major ways, yes.
You're right that at least the CI bots might want to disable it for
bisecting. Or force a particular seed for RANDSTRUCT - I seem to
recall that there was some way to make it be at least repeatable for
any particular structure.
Linus
On Sun, Dec 8, 2019 at 7:10 PM Linus Torvalds
<[email protected]> wrote:
>
> You're right that at least the CI bots might want to disable it for
> bisecting. Or force a particular seed for RANDSTRUCT - I seem to
> recall that there was some way to make it be at least repeatable for
> any particular structure.
Yeah, if you build in the same directory and don't do a "distclean" or
"git clean -dqfx", the seed remains in
scripts/gcc-plgins/randomize_layout_seed.h across builds, and the
result then _should_ be repeatable.
But yes, it's the kind of noise that is likely not worth fighting for
bisection (or any random bug hunting) at all, and turning off
RANDSTRUCT is probably the right thing to do.
But I don't think Arthur had RANDSTRUCT enabled, so that should be fine.
Linus
On Sun, Dec 8, 2019 at 8:23 PM Linus Torvalds
<[email protected]> wrote:
>
> On Sun, Dec 8, 2019 at 5:49 PM Arthur Marsh
> <[email protected]> wrote:
> >
> > This still happens with 5.5.0-rc1:
>
> Does it happen 100% of the time?
I can reproduce it (although it was a little more difficult since WiFi doesn't
work on RC1 on some of my hardware - due to the 802.11 driver regression oops.
I was able to reproduce it to Samba localhost).
> Your bisection result looks pretty nonsensical - not that it's
> impossible (anything is possible), but it really doesn't look very
> likely. Which makes me think maybe it's slightly timing-sensitive or
> something?
The bisection result is implausible. I just did some experiments and
it looks far more likely is that it is related to commit
72e73c78c446e ("cifs: close the shared root handle on tree disconnect")
so added Ronnie to the cc. That patch added a call (at unmount time)
to close_shroot.
The idea of that patch made sense - although tree disconnect (and then
logoff of the session)
will indirectly free any open handles on the server for that session,
it is a little
cleaner to close the cached root SMB3 file handle explicitly.
void close_shroot(struct cached_fid *cfid)
{
mutex_lock(&cfid->fid_mutex);
kref_put(&cfid->refcount, smb2_close_cached_fid);
mutex_unlock(&cfid->fid_mutex);
}
Taking out the one line change in the patch from last week that calls
close_shroot from
umount (SMB2_tdis, ie tree_disconnect) I don't see the problem so far
more likely
that it is related to that commit. The problem seems to be related
to servers which
don't support directory leases. Will spin up a patch to fix this if
Ronnie hasn't already fixed it
Arthur,
If you want to avoid the issue which causes the oops you can remove
this one line change (see below) but I (or Ronnie) will post a patch
for this - but wanted to discuss with him briefly first.
# git show 72e73c78c446e
commit 72e73c78c446e3c009a29b017c7fa3d79463e2aa
Author: Ronnie Sahlberg <[email protected]>
Date: Thu Nov 7 17:00:38 2019 +1000
cifs: close the shared root handle on tree disconnect
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 05149862aea4..acb70f67efc9 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1807,6 +1807,8 @@ SMB2_tdis(const unsigned int xid, struct cifs_tcon *tcon)
if ((tcon->need_reconnect) || (tcon->ses->need_reconnect))
return 0;
+ close_shroot(&tcon->crfid);
+
rc = smb2_plain_req_init(SMB2_TREE_DISCONNECT, tcon, (void **) &req,
&total_len);
if (rc)
On Sun, Dec 8, 2019 at 9:18 PM Steve French <[email protected]> wrote:
>
> On Sun, Dec 8, 2019 at 8:23 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Sun, Dec 8, 2019 at 5:49 PM Arthur Marsh
> > <[email protected]> wrote:
> > >
> > > This still happens with 5.5.0-rc1:
> >
> > Does it happen 100% of the time?
>
> I can reproduce it (although it was a little more difficult since WiFi doesn't
> work on RC1 on some of my hardware - due to the 802.11 driver regression oops.
> I was able to reproduce it to Samba localhost).
>
>
> > Your bisection result looks pretty nonsensical - not that it's
> > impossible (anything is possible), but it really doesn't look very
> > likely. Which makes me think maybe it's slightly timing-sensitive or
> > something?
>
> The bisection result is implausible. I just did some experiments and
> it looks far more likely is that it is related to commit
> 72e73c78c446e ("cifs: close the shared root handle on tree disconnect")
> so added Ronnie to the cc. That patch added a call (at unmount time)
> to close_shroot.
> The idea of that patch made sense - although tree disconnect (and then
> logoff of the session)
> will indirectly free any open handles on the server for that session,
> it is a little
> cleaner to close the cached root SMB3 file handle explicitly.
>
> void close_shroot(struct cached_fid *cfid)
> {
> mutex_lock(&cfid->fid_mutex);
> kref_put(&cfid->refcount, smb2_close_cached_fid);
> mutex_unlock(&cfid->fid_mutex);
> }
>
>
> Taking out the one line change in the patch from last week that calls
> close_shroot from
> umount (SMB2_tdis, ie tree_disconnect) I don't see the problem so far
> more likely
> that it is related to that commit. The problem seems to be related
> to servers which
> don't support directory leases. Will spin up a patch to fix this if
> Ronnie hasn't already fixed it
--
Thanks,
Steve
Hi, I ran the last good kernel with several boot-up, cifs mount, un-mount, shut down cycles without encountering the problem.
After applying the patch from <[email protected]>:
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 0ab6b1200288..d2658f51ff60 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1847,7 +1847,8 @@ SMB2_tdis(const unsigned int xid, struct cifs_tcon *tcon)
if ((tcon->need_reconnect) || (tcon->ses->need_reconnect))
return 0;
- close_shroot(&tcon->crfid);
+ if (tcon->crfid.is_valid)
+ close_shroot(&tcon->crfid);
to kernel 5.5.0-rc1 I no longer experience the problem.
Regards,
Arthur.
On 9 December 2019 12:53:02 pm ACDT, Linus Torvalds <[email protected]> wrote:
>On Sun, Dec 8, 2019 at 5:49 PM Arthur Marsh
><[email protected]> wrote:
>>
>> This still happens with 5.5.0-rc1:
>
>Does it happen 100% of the time?
>
>Your bisection result looks pretty nonsensical - not that it's
>impossible (anything is possible), but it really doesn't look very
>likely. Which makes me think maybe it's slightly timing-sensitive or
>something?
>
>Would you mind trying to re-do the bisection, and for each kernel try
>the mount thing at least a few times before you decide a kernel is
>good?
>
>Bisection is very powerful, but if _any_ of the kernels you marked
>good weren't really good (they just happened to not trigger the
>problem), bisection ends up giving completely the wrong answer. And
>with that bisection commit, there's not even a hint of what could have
>gone wrong.
>
> Linus
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.