The patch stats this week look a little bit more normal than last tim,
probably simply because it's also a normal-sized rc4 rather than the
unusually small rc3.
So this time around, about 60% of the patch is drivers (networking,
HID, gpu, usb, mtd..) which is the usual distribution. The rest being
a random mix of networking, filesystem fixes, arch updates,
Documentation etc. And some fixes to the new xarray code.
Nothing looks particularly odd or scary, although we do have some
known stuff still pending. For example, the STIBP fixes are still
being discussed and fine-tuned and haven't been merged yet. And
there's a few mm fixes being talked about. Nothing that should keep
people from testing the 4.20 rc's, though, so go out and test.
One thing I did forget to mention last rc, but did come up in some of
the pull request threads, and that people might have noticed that way:
I've stopped doing the manual pull request acknowledgement emails,
because Konstantin's automation to do it has gone live and is working
well. It's worth pointing out, though, that the automation only works
for pull requests that have been cc'd to mailing lists that are being
tracked by the lore.kernel.org archives, and have an email address
that matches "linux-*". So that's obviously mainly LKML, but it does
trigger for linux-block too, for example.
The reason I'm mentioning it is that if you're not seeing the pull
request automation emails, it might be because you didn't cc a list
that is getting tracked..
Linus
---
Aaro Koskinen (1):
MIPS: OCTEON: cavium_octeon_defconfig: re-enable OCTEON USB driver
Aaron Ma (2):
usb: xhci: fix uninitialized completion when USB3 port got wrong status
usb: xhci: fix timeout for transition from RExit to U0
Adrian Hunter (1):
mmc: sdhci-pci: Workaround GLK firmware failing to restore the
tuning value
Alexander Stein (1):
can: flexcan: Always use last mailbox for TX
Andreas Fiedler (1):
net: gemini: Fix copy/paste error
Andrew Morton (1):
drivers/net/ethernet/qlogic/qed/qed_rdma.h: fix typo
Andy Shevchenko (2):
usb: dwc3: core: Clean up ULPI device
MAINTAINERS: Do maintain Intel GPIO drivers via separate tree
Anup Patel (1):
RISC-V: Build flat and compressed kernel images
Arnd Bergmann (2):
media: v4l: fix uapi mpeg slice params definition
mt76: fix building without CONFIG_LEDS_CLASS
Arthur Kiyanovski (3):
net: ena: fix crash during failed resume from hibernation
net: ena: fix crash during ena_remove()
net: ena: update driver version from 2.0.1 to 2.0.2
Aya Levin (1):
net/mlx4: Fix UBSAN warning of signed integer overflow
Bartosz Golaszewski (2):
MAINTAINERS: add myself as co-maintainer of gpiolib
gpio: mockup: fix indicated direction
Benjamin Tissoires (7):
Revert "HID: input: simplify/fix high-res scroll event handling"
Revert "HID: logitech: fix a used uninitialized GCC warning"
Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"
Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"
Revert "HID: logitech: Add function to enable HID++ 1.0
"scrolling acceleration""
Revert "HID: input: Create a utility class for counting scroll events"
Revert "Input: Add the `REL_WHEEL_HI_RES` event code"
Benson Leung (1):
HID: input: Ignore battery reported by Symbol DS4308
Bill Kuzeja (1):
scsi: qla2xxx: Timeouts occur on surprise removal of QLogic adapter
Boris Brezillon (2):
drm/vc4: Fix NULL pointer dereference in the async update path
drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates
Brenda J. Butler (1):
tc-testing: tdc.py: Guard against lack of returncode in executed command
Brian Foster (1):
xfs: fix shared extent data corruption due to missing cow reservation
Brian Norris (1):
ath10k: don't assume 'vif' is non-NULL in flush()
Chanho Min (1):
exec: make de_thread() freezable
Cherian, George (1):
xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
Chris Wilson (2):
drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
drm/i915: Write GPU relocs harder with gen3
Christophe JAILLET (1):
net: lantiq: Fix returned value in case of error in 'xrx200_probe()'
Colin Ian King (1):
test_firmware: fix error return getting clobbered
Connor McAdams (2):
ALSA: hda/ca0132 - Add new ZxR quirk
ALSA: hda/ca0132 - fix AE-5 pincfg
Dan Carpenter (3):
ath9k: Fix a locking bug in ath9k_add_interface()
uio: Fix an Oops on load
usb: dwc2: pci: Fix an error code in probe
Dave Chinner (11):
xfs: uncached buffer tracing needs to print bno
xfs: fix transient reference count error in
xfs_buf_resubmit_failed_buffers
xfs: finobt AG reserves don't consider last AG can be a runt
xfs: extent shifting doesn't fully invalidate page cache
xfs: flush removing page cache in xfs_reflink_remap_prep
xfs: delalloc -> unwritten COW fork allocation can go wrong
iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
iomap: sub-block dio needs to zeroout beyond EOF
iomap: dio data corruption and spurious errors when pipes fill
vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
iomap: readpages doesn't zero page tail beyond EOF
Dave Gerlach (1):
cpufreq: ti-cpufreq: Only register platform_device when supported
David Abdurachmanov (2):
riscv: fix warning in arch/riscv/include/asm/module.h
riscv: add asm/unistd.h UAPI header
David Ahern (1):
ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF
David Herrmann (1):
Revert "HID: uhid: use strlcpy() instead of strncpy()"
David Howells (1):
rxrpc: Fix life check
David S. Miller (1):
Revert "net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs"
Davide Caratti (3):
net/sched: act_pedit: fix memory leak when IDR allocation fails
net/sched: act_police: fix race condition on state variables
net/sched: act_police: add missing spinlock initialization
Denis Bolotin (5):
qed: Fix PTT leak in qed_drain()
qed: Fix overriding offload_tc by protocols without APP TLV
qed: Fix reading wrong value in loop condition
qed: Fix bitmap_weight() check
qed: Fix QM getters to always return a valid pq
Denis Drozdov (1):
net/mlx5e: IPoIB, Reset QP after channels are closed
Dennis Wassenberg (1):
usb: core: Fix hub port connection events lost
Dexuan Cui (1):
Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up
Emmanuel Grumbach (2):
iwlwifi: mvm: support sta_statistics() even on older firmware
iwlwifi: mvm: fix regulatory domain update when the firmware starts
Emmanuel Pescosta (1):
usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB
Eric Biggers (2):
HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
xfs: make xfs_file_remap_range() static
Eric Dumazet (3):
net_sched: sch_fq: ensure maxrate fq parameter applies to EDT flows
net-gro: reset skb->pkt_type in napi_reuse_skb()
tcp: defer SACK compression after DupThresh
Eugeniu Rosca (1):
dt-bindings: can: rcar_can: document r8a77965 support
Evan Quan (1):
drm/amd/powerplay: disable Vega20 DS related features
Ezequiel Garcia (1):
media: Rename vb2_m2m_request_queue -> v4l2_m2m_request_queue
Fabio Estevam (1):
dt-bindings: dsa: Fix typo in "probed"
Fabrizio Castro (2):
can: rcar_can: Fix erroneous registration
dt-bindings: can: rcar_can: Add r8a774a1 support
Felipe Balbi (1):
usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers
Felix Kuehling (1):
drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset
Filippo Sironi (1):
amd/iommu: Fix Guest Virtual APIC Log Tail Address Register
Ganesh Goudar (1):
cxgb4: fix thermal zone build error
Geert Uytterhoeven (1):
iommu/ipmmu-vmsa: Fix crash on early domain free
Gerd Hoffmann (1):
udmabuf: set read/write flag when exporting
Greathouse, Joseph (1):
drm/amd/pp: handle negative values when reading OD
Greg Kroah-Hartman (1):
MAINTAINERS: Add Sasha as a stable branch maintainer
Gustavo A. R. Silva (1):
drivers/misc/sgi-gru: fix Spectre v1 vulnerability
Hangbin Liu (2):
net/ipv6: re-do dad when interface has IFF_NOARP flag change
team: no need to do team_notify_peers or team_mcast_rejoin when
disabling port
Hans Verkuil (3):
media: vicodec: lower minimum height to 360
media: cec: check for non-OK/NACK conditions while claiming a LA
media: cec: increase debug level for 'queue full'
Hans Wippel (2):
net/smc: abort CLC connection in smc_release
net/smc: add SMC-D shutdown signal
Hans de Goede (1):
ACPI / platform: Add SMB0001 HID to forbidden_id_list
Heiner Kallweit (2):
MAINTAINERS: Add myself as third phylib maintainer
MAINTAINERS: add myself as co-maintainer for r8169
Huacai Chen (1):
MIPS: Let early memblock_alloc*() allocate memories bottom-up
Ilya Dryomov (1):
libceph: fall back to sendmsg for slab pages
Jack Morgenstein (1):
net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command
James Smart (1):
nvme-fc: resolve io failures during connect
Jason Wang (2):
virtio-net: disable guest csum during XDP set
virtio-net: fail XDP set if guest csum is negotiated
Jimmy Assarsson (2):
can: kvaser_usb: Fix potential uninitialized variable use
can: kvaser_usb: Fix accessing freed memory in kvaser_usb_start_xmit()
Jiri Olsa (2):
tools cpupower debug: Allow to use outside build flags
tools cpupower: Override CFLAGS assignments
Johan Hovold (3):
gnss: serial: fix synchronous write timeout
gnss: sirf: fix synchronous write timeout
mtd: rawnand: atmel: fix OF child-node lookup
John Stultz (1):
wlcore: Fixup "Add support for optional wakeirq"
Jon Maloy (2):
tipc: fix lockdep warning when reinitilaizing sockets
tipc: don't assume linear buffer when reading ancillary data
Juliet Kim (1):
net/ibmnvic: Fix deadlock problem in reset
Kai-Heng Feng (4):
USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub
USB: quirks: Add no-lpm quirk for Raydium touchscreens
HID: multitouch: Add pointstick support for Cirque Touchpad
HID: i2c-hid: Disable runtime PM for LG touchscreen
Karsten Graul (1):
net/smc: use queue pair number when matching link group
Keerthy (2):
opp: ti-opp-supply: Dynamically update u_volt_min
opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call
Kenneth Feng (1):
drm/amdgpu: Enable HDP memory light sleep
Konstantin Khlebnikov (1):
tools/power/cpupower: fix compilation with STATIC=true
Kuppuswamy Sathyanarayanan (1):
usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove()
Linus Torvalds (1):
Linux 4.20-rc4
Lorenzo Bianconi (3):
mt76: fix uninitialized mutex access setting rts threshold
net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
Lu Baolu (1):
iommu/vt-d: Fix NULL pointer dereference in prq_event_thread()
Luc Van Oostenryck (1):
MAINTAINERS: change Sparse's maintainer
Luca Coelho (1):
iwlwifi: mvm: don't use SAR Geo if basic SAR is not used
Lucas Bates (1):
tc-testing: tdc.py: ignore errors when decoding stdout/stderr
Lukas Wunner (1):
can: hi311x: Use level-triggered interrupt
Maarten Jacobs (1):
usb: cdc-acm: add entry for Hiro (Conexant) modem
Marc Kleine-Budde (5):
can: flexcan: remove not needed struct flexcan_priv::tx_mb and
struct flexcan_priv::tx_mb_idx
can: dev: can_get_echo_skb(): factor out non sending code to
__can_get_echo_skb()
can: dev: __can_get_echo_skb(): replace struct can_frame by
canfd_frame to access frame length
can: dev: __can_get_echo_skb(): Don't crash the kernel if
can_priv::echo_skb is accessed out of bounds
can: dev: __can_get_echo_skb(): print error message, if trying
to echo non existing skb
Martin Schiller (2):
net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
Mathias Nyman (3):
xhci: Fix leaking USB3 shared_hcd at xhci removal
xhci: handle port status events for removed USB3 hcd
usb: xhci: Prevent bus suspend if a port connect change or
polling state is detected
Matt Chen (1):
iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE
Matthew Cover (1):
tuntap: fix multiqueue rx
Matthew Wilcox (19):
XArray: Fix xa_for_each with a single element at 0
XArray: Export __xa_foo to non-GPL modules
nilfs2: Use xa_erase_irq
XArray: Regularise xa_reserve
XArray: Unify xa_cmpxchg and __xa_cmpxchg
XArray: Turn xa_erase into an exported function
XArray: Add xa_store_bh() and xa_store_irq()
XArray: Unify xa_store and __xa_store
XArray: Handle NULL pointers differently for allocation
XArray: Fix Documentation
XArray: Correct xa_store_range
XArray tests: Correct some 64-bit assumptions
dax: Remove optimisation from dax_lock_mapping_entry
dax: Make sure the unlocking entry isn't locked
dax: Reinstate RCU protection of inode
dax: Fix dax_unlock_mapping_entry for PMD pages
dax: Fix huge page faults
dax: Avoid losing wakeup in dax_lock_mapping_entry
XArray tests: Add missing locking
Mattias Jacobsson (1):
USB: misc: appledisplay: add 20" Apple Cinema Display
Mauro Carvalho Chehab (2):
v4l2-controls: add a missing include
media: dm365_ipipeif: better annotate a fall though
Maxime Chevallier (1):
net: mvneta: Don't advertise 2.5G modes
Michael Chan (5):
bnxt_en: Fix RSS context allocation.
bnxt_en: Fix rx_l4_csum_errors counter on 57500 devices.
bnxt_en: Disable RDMA support on the 57500 chips.
bnxt_en: Workaround occasional TX timeout on 57500 A0.
bnxt_en: Add software "missed_irqs" counter.
Michal Kalderon (1):
qed: Fix rdma_info structure allocation
Moshe Shemesh (1):
net/mlx5e: RX, verify received packet size in Linear Striding RQ
Nathan Chancellor (2):
media: tc358743: Remove unnecessary self assignment
misc: atmel-ssc: Fix section annotation on atmel_ssc_get_driver_data
Nicholas Kazlauskas (2):
drm/amdgpu: Add amdgpu "max bpc" connector property (v2)
drm/amd/display: Support amdgpu "max bpc" connector property (v2)
Nikolay Aleksandrov (1):
net: bridge: fix vlan stats use-after-free on destruction
Oleksij Rempel (4):
can: rx-offload: introduce can_rx_offload_get_echo_skb() and
can_rx_offload_queue_sorted() functions
can: flexcan: handle tx-complete CAN frames via rx-offload infrastructure
can: rx-offload: rename can_rx_offload_irq_queue_err_skb() to
can_rx_offload_queue_tail()
can: flexcan: use can_rx_offload_queue_sorted() for flexcan_irq_bus_*()
Olga Kornievskaia (1):
NFSv4.2 copy do not allocate memory under the lock
Oliver Hartkopp (1):
can: raw: check for CAN FD capable netdev in raw_sendmsg()
Olof Johansson (2):
mtd: rawnand: qcom: Namespace prefix some commands
RISC-V: Fix raw_copy_{to,from}_user()
Or Gerlitz (3):
net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded
net/mlx5e: Claim TC hw offloads support only under a proper build config
net/mlx5e: Always use the match level enum when parsing TC rule match
Pan Bian (1):
iommu/vt-d: Use memunmap to free memremap
Pankaj Bansal (1):
can: flexcan: Unlock the MB unconditionally
Paolo Abeni (1):
net: don't keep lonely packets forever in the gro hash
Patrick Stählin (1):
RISC-V: recognize S/U mode bits in print_isa
Paul Burton (1):
MIPS: Loongson3,SGI-IP27: Simplify max_low_pfn calculation
Paul Kocialkowski (1):
drm/fb-helper: Blacklist writeback when adding connectors to fbdev
Petr Machata (1):
net: skb_scrub_packet(): Scrub offload_fwd_mark
Quentin Schulz (1):
net: phy: mscc: fix deadlock in vsc85xx_default_config
Raed Salem (1):
net/mlx5: IPSec, Fix the SA context hash key
Rafał Miłecki (2):
brcmutil: really fix decoding channel info for 160 MHz bandwidth
brcmfmac: fix reporting support for 160 MHz channels
Rajat Jain (1):
mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL
Robert Jarzmik (1):
gpio: pxa: fix legacy non pinctrl aware builds again
Robin Murphy (2):
dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
swiotlb: Skip cache maintenance on map error
Rodrigo Rivas Costa (1):
HID: steam: remove input device when a hid client is running.
Roi Dayan (1):
net/mlx5e: Apply the correct check for supporting TC esw rules split
Sabrina Dubroca (1):
ip_tunnel: don't force DF when MTU is locked
Sakari Ailus (5):
media: v4l: event: Add subscription to list before calling "add" operation
media: docs: Document metadata format in struct v4l2_format
media: omap3isp: Unregister media device as first
media: ipu3-cio2: Unregister device nodes first, then release resources
media: ipu3-cio2: Use cio2_queues_exit
Sandeep Singh (1):
xhci: Add check for invalid byte size error when UAS devices are
connected.
Sebastian Parschauer (2):
HID: Add quirk for Microsoft PIXART OEM mouse
HID: Add quirk for Primax PIXART OEM mice
Sergey Matyukevich (1):
arm64: sysreg: fix sparse warnings
Shahar S Matityahu (1):
iwlwifi: fix D3 debug data buffer memory leak
Shay Agroskin (4):
net/mlx5e: Fix a bug in turning off FEC policy in unsupported speeds
net/mlx5e: Fix wrong field name in FEC related functions
net/mlx5e: Removed unnecessary warnings in FEC caps query
net/mlx5e: Fix failing ethtool query on FEC query error
Shen Jing (1):
Revert "usb: gadget: ffs: Fix BUG when userland exits with
submitted AIO transfers"
Siva Reddy Kallam (1):
tg3: Add PHY reset for 5717/5719/5720 in change ring and flow
control paths
Slavomir Kaslev (1):
socket: do a generic_file_splice_read when proto_ops has no splice_read
Srinivas Kandagatla (2):
slimbus: ngd: remove unnecessary check
nvmem: core: fix regression in of_nvmem_cell_get()
Stephen Mallon (1):
tcp: Fix SOF_TIMESTAMPING_RX_HARDWARE to use the latest
timestamp during TCP coalescing
Sudarsana Reddy Kalluru (1):
bnx2x: Assign unique DMAE channel number for FW DMAE transactions.
Sven Eckelmann (2):
batman-adv: Use explicit tvlv padding for ELP packets
batman-adv: Expand merged fragment buffer for full packet
Takashi Iwai (4):
ALSA: oss: Use kvzalloc() for local buffer allocations
ALSA: hda/realtek - Add quirk entry for HP Pavilion 15
ALSA: hda/ca0132 - Call pci_iounmap() instead of iounmap()
drm/amdgpu: Add missing firmware entry for HAINAN
Tal Gilboa (1):
net/dim: Update DIM start sample after each DIM iteration
Tariq Toukan (1):
net/mlx4_core: Fix uninitialized variable compilation warning
Thinh Nguyen (1):
usb: dwc3: gadget: Properly check last unaligned/zero chain TRB
Thomas Falcon (2):
ibmvnic: Fix RX queue buffer cleanup
ibmvnic: Update driver queues after change in ring size support
Thomas Zimmermann (1):
drm/ast: Remove existing framebuffers before loading driver
Thor Thayer (2):
MAINTAINERS: Replace Vince Bridgers as Altera TSE maintainer
mtd: spi-nor: Fix Cadence QSPI page fault kernel panic
Tigran Mkrtchyan (1):
flexfiles: use per-mirror specified stateid for IO
Toke Høiland-Jørgensen (1):
MAINTAINERS: Add entry for CAKE qdisc
Trond Myklebust (1):
NFSv4: Fix a NFSv4 state manager deadlock
[email protected] (5):
mtd: spi-nor: don't drop sfdp data if optional parsers fail
mtd: spi-nor: fix iteration over smpt array
mtd: spi-nor: don't overwrite errno in spi_nor_get_map_in_use()
mtd: spi_nor: pass DMA-able buffer to spi_nor_read_raw()
mtd: spi-nor: fix selection of uniform erase type in flexible conf
Ursula Braun (3):
s390/ism: clear dmbe_mask bit before SMC IRQ handling
net/smc: atomic SMCD cursor handling
net/smc: use after free fix in smc_wr_tx_put_slot()
Valentine Fatiev (1):
net/mlx5e: Fix selftest for small MTUs
Vasundhara Volam (1):
bnxt_en: Fix filling time in bnxt_fill_coredump_record()
Ville Syrjälä (3):
drm/i915: Disable LP3 watermarks on all SNB machines
drm/i915: Force a LUT update in intel_initial_commit()
drm/i915: Add rotation readout for plane initial config
Vincent Chen (1):
net: faraday: ftmac100: remove netif_running(netdev) check
before disabling interrupts
Vladimir Zapolskiy (1):
gpio: don't free unallocated ida on gpiochip_add_data_with_key()
error path
Wei Li (1):
scsi: ufs: Fix hynix ufs bug with quirk on hi36xx SoC
Will Deacon (2):
Documentation/security-bugs: Postpone fix publication in exceptional cases
arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
Willem de Bruijn (1):
packet: copy user buffers before orphan or clone
Xin Long (6):
l2tp: fix a sock refcnt leak in l2tp_tunnel_register
ipv6: fix a dst leak when removing its exception
sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit
sctp: not allow to set asoc prsctp_enable by sockopt
Revert "sctp: remove sctp_transport_pmtu_check"
sctp: not increase stream's incnt before sending addstrm_in request
Y.C. Chen (2):
drm/ast: change resolution may cause screen blurred
drm/ast: fixed cursor may disappear sometimes
Yangtao Li (1):
net: amd: add missing of_node_put()
YueHaibing (2):
can: ucan: remove set but not used variable 'udev'
can: ucan: remove duplicated include from ucan.c
Yuval Avnery (1):
net/mlx5e: Adjust to max number of channles when re-attaching
* Linus Torvalds <[email protected]> wrote:
> The patch stats this week look a little bit more normal than last tim,
> probably simply because it's also a normal-sized rc4 rather than the
> unusually small rc3.
So there's a new regression in v4.20-rc4, my desktop produces this
lockdep splat:
[ 1772.588771] WARNING: pkexec/4633 still has locks held!
[ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted
[ 1772.588775] ------------------------------------
[ 1772.588776] 1 lock held by pkexec/4633:
[ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70
[ 1772.588786] stack backtrace:
[ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1
[ 1772.588792] Call Trace:
[ 1772.588800] dump_stack+0x85/0xcb
[ 1772.588803] flush_old_exec+0x116/0x890
[ 1772.588807] ? load_elf_phdrs+0x72/0xb0
[ 1772.588809] load_elf_binary+0x291/0x1620
[ 1772.588815] ? sched_clock+0x5/0x10
[ 1772.588817] ? search_binary_handler+0x6d/0x240
[ 1772.588820] search_binary_handler+0x80/0x240
[ 1772.588823] load_script+0x201/0x220
[ 1772.588825] search_binary_handler+0x80/0x240
[ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60
[ 1772.588832] ? strncpy_from_user+0x40/0x180
[ 1772.588835] __x64_sys_execve+0x34/0x40
[ 1772.588838] do_syscall_64+0x60/0x1c0
The warning gets triggered by an ancient lockdep check in the freezer:
(gdb) list *0xffffffff812ece06
0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57).
52 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
53 * If try_to_freeze causes a lockdep warning it means the caller may deadlock
54 */
55 static inline bool try_to_freeze_unsafe(void)
56 {
57 might_sleep();
58 if (likely(!freezing(current)))
59 return false;
60 return __refrigerator(false);
61 }
I reviewed the ->cred_guard_mutex code, and the mutex is held across all
of exec() - and we always did this.
But there's this recent -rc4 commit:
> Chanho Min (1):
> exec: make de_thread() freezable
c22397888f1e: exec: make de_thread() freezable
I believe this commit is bogus, you cannot call try_to_freeze() from
de_thread(), because it's holding the ->cred_guard_mutex.
Also, I'm 3 times grumpy:
#1: I think this commit was never tested with lockdep turned on, as I
think splat this should trigger 100% of the time with the ELF
binfmt loader.
#2: Less than 4 days passed between the commit on Nov 19 and it being
upstream by Nov 23. *Please* give it more testing if you change
something as central as fs/exec.c ...
#3 Also, please also proof-read changelogs before committing them:
>> The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
>> [...]
>>
>> In our machine, it causes freeze timeout as bellows.
That's *three* typos in a single commit:
s/casue/cause
s/In our/On our
s/bellows/below
...
Grump! :-)
Note that I haven't tested the revert yet, but the code and the breakage
looks pretty obvious. (I'll boot the revert, will follow up if that
didn't solve the problem.)
Thanks,
Ingo
Signed-off-by: Ingo Molnar <[email protected]>
This reverts commit c22397888f1eed98cd59f0a88f2a5f6925f80e15.
---
fs/exec.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index acc3a5536384..fc281b738a98 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -62,7 +62,6 @@
#include <linux/oom.h>
#include <linux/compat.h>
#include <linux/vmalloc.h>
-#include <linux/freezer.h>
#include <linux/uaccess.h>
#include <asm/mmu_context.h>
@@ -1084,7 +1083,7 @@ static int de_thread(struct task_struct *tsk)
while (sig->notify_count) {
__set_current_state(TASK_KILLABLE);
spin_unlock_irq(lock);
- freezable_schedule();
+ schedule();
if (unlikely(__fatal_signal_pending(tsk)))
goto killed;
spin_lock_irq(lock);
@@ -1112,7 +1111,7 @@ static int de_thread(struct task_struct *tsk)
__set_current_state(TASK_KILLABLE);
write_unlock_irq(&tasklist_lock);
cgroup_threadgroup_change_end(tsk);
- freezable_schedule();
+ schedule();
if (unlikely(__fatal_signal_pending(tsk)))
goto killed;
}
On Mon 03-12-18 08:47:00, Ingo Molnar wrote:
[...]
> I reviewed the ->cred_guard_mutex code, and the mutex is held across all
> of exec() - and we always did this.
Yes, this is something that has been pointed out during the review. Oleg
has argued that making this path freezable is really hard and that we
should be changing de_thread to sleep withtou cred_guard_mutex long term
anyway (http://lkml.kernel.org/r/[email protected]).
Failing suspend seems like a real problem while the lockdep one doesn't
really reflect any real deadlock, right? So while the patch is not
perfect it shouldn't make the situation much worse. Lockdep splat is
certainly annoying but is it any worse than a suspend failing?
Now, I wouldn't mind to revert this because the code is really old and
we haven't seen many bug reports about failing suspend yet. But what is
the actual plan to make this work properly? Use
freezable_schedule_unsafe instead? Freezer code has some
fundamental design issues which are quite hard to get over.
--
Michal Hocko
SUSE Labs
On 12/03, Ingo Molnar wrote:
>
> So there's a new regression in v4.20-rc4, my desktop produces this
> lockdep splat:
>
> [ 1772.588771] WARNING: pkexec/4633 still has locks held!
> [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted
> [ 1772.588775] ------------------------------------
> [ 1772.588776] 1 lock held by pkexec/4633:
> [ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70
> [ 1772.588786] stack backtrace:
> [ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1
> [ 1772.588792] Call Trace:
> [ 1772.588800] dump_stack+0x85/0xcb
> [ 1772.588803] flush_old_exec+0x116/0x890
> [ 1772.588807] ? load_elf_phdrs+0x72/0xb0
> [ 1772.588809] load_elf_binary+0x291/0x1620
> [ 1772.588815] ? sched_clock+0x5/0x10
> [ 1772.588817] ? search_binary_handler+0x6d/0x240
> [ 1772.588820] search_binary_handler+0x80/0x240
> [ 1772.588823] load_script+0x201/0x220
> [ 1772.588825] search_binary_handler+0x80/0x240
> [ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60
> [ 1772.588832] ? strncpy_from_user+0x40/0x180
> [ 1772.588835] __x64_sys_execve+0x34/0x40
> [ 1772.588838] do_syscall_64+0x60/0x1c0
My fault.
Michal didn't like this patch, but I managed to confuse him ;)
I even mentioned that freezable_schedule() is obviously not right with
->cred_guard_mutex held, but I somehow I thought that this patch "doesn't
make things worse".
> I reviewed the ->cred_guard_mutex code, and the mutex is held across all
> of exec() - and we always did this.
Yes, and this was always wrong. For example, this test-case hangs:
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <pthread.h>
void *thread(void *arg)
{
ptrace(PTRACE_TRACEME, 0,0,0);
return NULL;
}
int main(void)
{
int pid = fork();
if (!pid) {
pthread_t pt;
pthread_create(&pt, NULL, thread, NULL);
pthread_join(pt, NULL);
execlp("echo", "echo", "passed", NULL);
}
sleep(1);
// or anything else which needs ->cred_guard_mutex,
// say open(/proc/$pid/mem)
ptrace(PTRACE_ATTACH, pid, 0,0);
kill(pid, SIGCONT);
return 0;
}
we really need to narrow the (huge) scope of ->cred_guard_mutex in exec paths.
my attempt to fix this was nacked, and nobody suggested a better solution so far.
> This reverts commit c22397888f1eed98cd59f0a88f2a5f6925f80e15.
Thanks.
> ---
> fs/exec.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index acc3a5536384..fc281b738a98 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -62,7 +62,6 @@
> #include <linux/oom.h>
> #include <linux/compat.h>
> #include <linux/vmalloc.h>
> -#include <linux/freezer.h>
>
> #include <linux/uaccess.h>
> #include <asm/mmu_context.h>
> @@ -1084,7 +1083,7 @@ static int de_thread(struct task_struct *tsk)
> while (sig->notify_count) {
> __set_current_state(TASK_KILLABLE);
> spin_unlock_irq(lock);
> - freezable_schedule();
> + schedule();
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> spin_lock_irq(lock);
> @@ -1112,7 +1111,7 @@ static int de_thread(struct task_struct *tsk)
> __set_current_state(TASK_KILLABLE);
> write_unlock_irq(&tasklist_lock);
> cgroup_threadgroup_change_end(tsk);
> - freezable_schedule();
> + schedule();
> if (unlikely(__fatal_signal_pending(tsk)))
> goto killed;
> }
On Monday, December 3, 2018 8:47:00 AM CET Ingo Molnar wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
> > The patch stats this week look a little bit more normal than last tim,
> > probably simply because it's also a normal-sized rc4 rather than the
> > unusually small rc3.
>
> So there's a new regression in v4.20-rc4, my desktop produces this
> lockdep splat:
>
> [ 1772.588771] WARNING: pkexec/4633 still has locks held!
> [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted
> [ 1772.588775] ------------------------------------
> [ 1772.588776] 1 lock held by pkexec/4633:
> [ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70
> [ 1772.588786] stack backtrace:
> [ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1
> [ 1772.588792] Call Trace:
> [ 1772.588800] dump_stack+0x85/0xcb
> [ 1772.588803] flush_old_exec+0x116/0x890
> [ 1772.588807] ? load_elf_phdrs+0x72/0xb0
> [ 1772.588809] load_elf_binary+0x291/0x1620
> [ 1772.588815] ? sched_clock+0x5/0x10
> [ 1772.588817] ? search_binary_handler+0x6d/0x240
> [ 1772.588820] search_binary_handler+0x80/0x240
> [ 1772.588823] load_script+0x201/0x220
> [ 1772.588825] search_binary_handler+0x80/0x240
> [ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60
> [ 1772.588832] ? strncpy_from_user+0x40/0x180
> [ 1772.588835] __x64_sys_execve+0x34/0x40
> [ 1772.588838] do_syscall_64+0x60/0x1c0
>
> The warning gets triggered by an ancient lockdep check in the freezer:
>
> (gdb) list *0xffffffff812ece06
> 0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57).
> 52 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
> 53 * If try_to_freeze causes a lockdep warning it means the caller may deadlock
> 54 */
> 55 static inline bool try_to_freeze_unsafe(void)
> 56 {
> 57 might_sleep();
> 58 if (likely(!freezing(current)))
> 59 return false;
> 60 return __refrigerator(false);
> 61 }
>
> I reviewed the ->cred_guard_mutex code, and the mutex is held across all
> of exec() - and we always did this.
>
> But there's this recent -rc4 commit:
>
> > Chanho Min (1):
> > exec: make de_thread() freezable
>
> c22397888f1e: exec: make de_thread() freezable
>
> I believe this commit is bogus, you cannot call try_to_freeze() from
> de_thread(), because it's holding the ->cred_guard_mutex.
>
> Also, I'm 3 times grumpy:
Well, sorry about that.
> #1: I think this commit was never tested with lockdep turned on, as I
> think splat this should trigger 100% of the time with the ELF
> binfmt loader.
It has been there in my local builds/tests, so I must have overlooked the
splat.
> #2: Less than 4 days passed between the commit on Nov 19 and it being
> upstream by Nov 23. *Please* give it more testing if you change
> something as central as fs/exec.c ...
It has been under review for quite a bit more time though and I had hoped
that there would be some reports from linux-next coverage if there were
problems with it, but nothig has been reported till now.
> #3 Also, please also proof-read changelogs before committing them:
I do that as a rule and I tend to fix up such things in changelogs.
Not sure why I didn't do that this time. Because of travel perhaps.
> >> The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for
> >> [...]
> >>
> >> In our machine, it causes freeze timeout as bellows.
>
> That's *three* typos in a single commit:
>
> s/casue/cause
> s/In our/On our
> s/bellows/below
>
> ...
>
> Grump! :-)
>
> Note that I haven't tested the revert yet, but the code and the breakage
> looks pretty obvious. (I'll boot the revert, will follow up if that
> didn't solve the problem.)
I can queue up a revert unless anyone beats me to that.
Thanks,
Rafael
On Monday, December 3, 2018 9:39:42 AM CET Michal Hocko wrote:
> On Mon 03-12-18 08:47:00, Ingo Molnar wrote:
> [...]
> > I reviewed the ->cred_guard_mutex code, and the mutex is held across all
> > of exec() - and we always did this.
>
> Yes, this is something that has been pointed out during the review. Oleg
> has argued that making this path freezable is really hard and that we
> should be changing de_thread to sleep withtou cred_guard_mutex long term
> anyway (http://lkml.kernel.org/r/[email protected]).
>
> Failing suspend seems like a real problem while the lockdep one doesn't
> really reflect any real deadlock, right? So while the patch is not
> perfect it shouldn't make the situation much worse. Lockdep splat is
> certainly annoying but is it any worse than a suspend failing?
>
> Now, I wouldn't mind to revert this because the code is really old and
> we haven't seen many bug reports about failing suspend yet. But what is
> the actual plan to make this work properly? Use
> freezable_schedule_unsafe instead? Freezer code has some
> fundamental design issues which are quite hard to get over.
I agree and we just need to look deeper into this.
I had hoped that this would work since you and Oleg agreed with it, but since
it doesn't, let's do a revert for now and get back to this later.
Thanks,
Rafael
On 12/03, Michal Hocko wrote:
>
> Now, I wouldn't mind to revert this because the code is really old and
> we haven't seen many bug reports about failing suspend yet. But what is
> the actual plan to make this work properly?
I don't see a simple solution...
But we need to fix exec/de_thread anyway, then we can probably reconsider
this patch.
Oleg.
On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> On 12/03, Michal Hocko wrote:
> >
> > Now, I wouldn't mind to revert this because the code is really old and
> > we haven't seen many bug reports about failing suspend yet. But what is
> > the actual plan to make this work properly?
>
> I don't see a simple solution...
>
> But we need to fix exec/de_thread anyway, then we can probably reconsider
> this patch.
My concern is that de_thread fix might be too disruptive for stable
kernels while we might want to have a simple enough fix for the the
suspend issue in the meantime. That was actually the primary reason I've
acked the hack even though I didn't like it.
So can we find a way to shut the lockdep up when this is not really a
deadlock? Or maybe this really is one and then we need a real fix for
stable as well.
--
Michal Hocko
SUSE Labs
On Mon 2018-12-03 13:38:57, Michal Hocko wrote:
> On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> > On 12/03, Michal Hocko wrote:
> > >
> > > Now, I wouldn't mind to revert this because the code is really old and
> > > we haven't seen many bug reports about failing suspend yet. But what is
> > > the actual plan to make this work properly?
> >
> > I don't see a simple solution...
> >
> > But we need to fix exec/de_thread anyway, then we can probably reconsider
> > this patch.
>
> My concern is that de_thread fix might be too disruptive for stable
> kernels while we might want to have a simple enough fix for the the
> suspend issue in the meantime. That was actually the primary reason I've
> acked the hack even though I didn't like it.
Do we care about failing sleep in stable? Does someone hit the issue there?
This sounds like issue only Android is hitting, and they run very
heavily patched kernels, far away from mainline or stable.
> So can we find a way to shut the lockdep up when this is not really a
> deadlock? Or maybe this really is one and then we need a real fix for
> stable as well.
Shutting up the lockdep is of course possible...
...but lets first see what good solution looks like.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 12/03, Michal Hocko wrote:
>
> On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> > On 12/03, Michal Hocko wrote:
> > >
> > > Now, I wouldn't mind to revert this because the code is really old and
> > > we haven't seen many bug reports about failing suspend yet. But what is
> > > the actual plan to make this work properly?
> >
> > I don't see a simple solution...
> >
> > But we need to fix exec/de_thread anyway, then we can probably reconsider
> > this patch.
>
> My concern is that de_thread fix might be too disruptive for stable
> kernels while we might want to have a simple enough fix for the the
> suspend issue in the meantime. That was actually the primary reason I've
> acked the hack even though I didn't like it.
>
> So can we find a way to shut the lockdep up
You have already mentioned freezable_schedule_unsafe(), but I agree with
"DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION" above it ...
> when this is not really a
> deadlock?
not a deadlock, but lockdep is obviously right, suspend still can fail if
another task needs the same mutex.
Again, we have already discussed this, in my opinion we should blame exec/
de_thread for this and other problems.
> Or maybe this really is one and then we need a real fix for
> stable as well.
Well, strace -f can hang if it races with mt exec, it would be nice to have
a fix for stable.
Oleg.
On Mon 03-12-18 14:10:06, Pavel Machek wrote:
> On Mon 2018-12-03 13:38:57, Michal Hocko wrote:
> > On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> > > On 12/03, Michal Hocko wrote:
> > > >
> > > > Now, I wouldn't mind to revert this because the code is really old and
> > > > we haven't seen many bug reports about failing suspend yet. But what is
> > > > the actual plan to make this work properly?
> > >
> > > I don't see a simple solution...
> > >
> > > But we need to fix exec/de_thread anyway, then we can probably reconsider
> > > this patch.
> >
> > My concern is that de_thread fix might be too disruptive for stable
> > kernels while we might want to have a simple enough fix for the the
> > suspend issue in the meantime. That was actually the primary reason I've
> > acked the hack even though I didn't like it.
>
> Do we care about failing sleep in stable? Does someone hit the issue there?
>
> This sounds like issue only Android is hitting, and they run very
> heavily patched kernels, far away from mainline or stable.
But the underlying issue is the same and independent on their patches
AFAIU. And is this really a common problem to care about in stable? I
dunno to be honest but it sounds annoying for sure. Failing suspend is
something that doesn't make your day when you are in hurry and want
find out only later when your laptop heats up your bag ;)
--
Michal Hocko
SUSE Labs
On Mon 2018-12-03 14:53:51, Michal Hocko wrote:
> On Mon 03-12-18 14:10:06, Pavel Machek wrote:
> > On Mon 2018-12-03 13:38:57, Michal Hocko wrote:
> > > On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> > > > On 12/03, Michal Hocko wrote:
> > > > >
> > > > > Now, I wouldn't mind to revert this because the code is really old and
> > > > > we haven't seen many bug reports about failing suspend yet. But what is
> > > > > the actual plan to make this work properly?
> > > >
> > > > I don't see a simple solution...
> > > >
> > > > But we need to fix exec/de_thread anyway, then we can probably reconsider
> > > > this patch.
> > >
> > > My concern is that de_thread fix might be too disruptive for stable
> > > kernels while we might want to have a simple enough fix for the the
> > > suspend issue in the meantime. That was actually the primary reason I've
> > > acked the hack even though I didn't like it.
> >
> > Do we care about failing sleep in stable? Does someone hit the issue there?
> >
> > This sounds like issue only Android is hitting, and they run very
> > heavily patched kernels, far away from mainline or stable.
>
> But the underlying issue is the same and independent on their patches
> AFAIU. And is this really a common problem to care about in stable? I
> dunno to be honest but it sounds annoying for sure. Failing suspend is
> something that doesn't make your day when you are in hurry and want
> find out only later when your laptop heats up your bag ;)
In general, yes. In practice, if it happens 1 in 1000000 suspends, you
don't care that much (but Android cares).
Do we actually have reports of this happening for people outside
Android?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon 03-12-18 15:14:59, Pavel Machek wrote:
> On Mon 2018-12-03 14:53:51, Michal Hocko wrote:
> > On Mon 03-12-18 14:10:06, Pavel Machek wrote:
> > > On Mon 2018-12-03 13:38:57, Michal Hocko wrote:
> > > > On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> > > > > On 12/03, Michal Hocko wrote:
> > > > > >
> > > > > > Now, I wouldn't mind to revert this because the code is really old and
> > > > > > we haven't seen many bug reports about failing suspend yet. But what is
> > > > > > the actual plan to make this work properly?
> > > > >
> > > > > I don't see a simple solution...
> > > > >
> > > > > But we need to fix exec/de_thread anyway, then we can probably reconsider
> > > > > this patch.
> > > >
> > > > My concern is that de_thread fix might be too disruptive for stable
> > > > kernels while we might want to have a simple enough fix for the the
> > > > suspend issue in the meantime. That was actually the primary reason I've
> > > > acked the hack even though I didn't like it.
> > >
> > > Do we care about failing sleep in stable? Does someone hit the issue there?
> > >
> > > This sounds like issue only Android is hitting, and they run very
> > > heavily patched kernels, far away from mainline or stable.
> >
> > But the underlying issue is the same and independent on their patches
> > AFAIU. And is this really a common problem to care about in stable? I
> > dunno to be honest but it sounds annoying for sure. Failing suspend is
> > something that doesn't make your day when you are in hurry and want
> > find out only later when your laptop heats up your bag ;)
>
> In general, yes. In practice, if it happens 1 in 1000000 suspends, you
> don't care that much (but Android cares).
This argument just doesn't make any sense. Rare bugs are maybe even more
annoying because you do not expect them to happen. But I would be more
interested to see whether they are any downside. Is there any actual
risk to silence the lockup detector that you can see?
> Do we actually have reports of this happening for people outside
> Android?
Not that I am aware of.
--
Michal Hocko
SUSE Labs
On Mon 2018-12-03 15:17:37, Michal Hocko wrote:
> On Mon 03-12-18 15:14:59, Pavel Machek wrote:
> > On Mon 2018-12-03 14:53:51, Michal Hocko wrote:
> > > On Mon 03-12-18 14:10:06, Pavel Machek wrote:
> > > > On Mon 2018-12-03 13:38:57, Michal Hocko wrote:
> > > > > On Mon 03-12-18 13:31:49, Oleg Nesterov wrote:
> > > > > > On 12/03, Michal Hocko wrote:
> > > > > > >
> > > > > > > Now, I wouldn't mind to revert this because the code is really old and
> > > > > > > we haven't seen many bug reports about failing suspend yet. But what is
> > > > > > > the actual plan to make this work properly?
> > > > > >
> > > > > > I don't see a simple solution...
> > > > > >
> > > > > > But we need to fix exec/de_thread anyway, then we can probably reconsider
> > > > > > this patch.
> > > > >
> > > > > My concern is that de_thread fix might be too disruptive for stable
> > > > > kernels while we might want to have a simple enough fix for the the
> > > > > suspend issue in the meantime. That was actually the primary reason I've
> > > > > acked the hack even though I didn't like it.
> > > >
> > > > Do we care about failing sleep in stable? Does someone hit the issue there?
> > > >
> > > > This sounds like issue only Android is hitting, and they run very
> > > > heavily patched kernels, far away from mainline or stable.
> > >
> > > But the underlying issue is the same and independent on their patches
> > > AFAIU. And is this really a common problem to care about in stable? I
> > > dunno to be honest but it sounds annoying for sure. Failing suspend is
> > > something that doesn't make your day when you are in hurry and want
> > > find out only later when your laptop heats up your bag ;)
> >
> > In general, yes. In practice, if it happens 1 in 1000000 suspends, you
> > don't care that much (but Android cares).
>
> This argument just doesn't make any sense. Rare bugs are maybe even
> more
I guess argumenting about this just does not make sense. Just bear in
mind -stable is not for theoretical problems.
> annoying because you do not expect them to happen. But I would be more
> interested to see whether they are any downside. Is there any actual
> risk to silence the lockup detector that you can see?
As someone else noticed:
a) the bug can be triggered outside suspend
b) the lockdep report is real. You'll still get suspend failure, but
you now need two processes to trigger it
> > Do we actually have reports of this happening for people outside
> > Android?
>
> Not that I am aware of.
Good.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, Dec 3, 2018 at 6:17 AM Michal Hocko <[email protected]> wrote:
>
> This argument just doesn't make any sense. Rare bugs are maybe even more
> annoying because you do not expect them to happen.
Absolutely.
And valid lockdep complaints are a real issue too.
So I don't think there's any question that this should be reverted,
the only question is whether I take the revert directly or get it from
the PM tree.
It does sound like the de_thread() behavior needs more work. Maybe,
for example, we need to make it clear that zapped threads are *not*
frozen, and instead the freezer will wait for them to exit?
Hmm?
Linus
On Mon 03-12-18 09:06:18, Linus Torvalds wrote:
> On Mon, Dec 3, 2018 at 6:17 AM Michal Hocko <[email protected]> wrote:
> >
> > This argument just doesn't make any sense. Rare bugs are maybe even more
> > annoying because you do not expect them to happen.
>
> Absolutely.
>
> And valid lockdep complaints are a real issue too.
No questions about that. But in this case there is no real deadlock so
the splat is disputable. And we have a "do not use" freezable_schedule_unsafe
which shouldn't tigger the splat yet it would make the suspend issue go
away AFIU. Not a great fix and we should target for something better
long term but good enough as a poor's man stop gap fix.
> So I don't think there's any question that this should be reverted,
> the only question is whether I take the revert directly or get it from
> the PM tree.
>
> It does sound like the de_thread() behavior needs more work. Maybe,
> for example, we need to make it clear that zapped threads are *not*
> frozen, and instead the freezer will wait for them to exit?
I was actually proposing something like that during the review
discussion. Basically set PF_NOFREEZE all the threads and wake them up.
But that is not so simple because they might be at any path when they
sleep (this is the biggest design flaw of the freezer IMHO).
--
Michal Hocko
SUSE Labs
On Mon 2018-12-03 09:06:18, Linus Torvalds wrote:
> On Mon, Dec 3, 2018 at 6:17 AM Michal Hocko <[email protected]> wrote:
> >
> > This argument just doesn't make any sense. Rare bugs are maybe even more
> > annoying because you do not expect them to happen.
>
> Absolutely.
>
> And valid lockdep complaints are a real issue too.
>
> So I don't think there's any question that this should be reverted,
> the only question is whether I take the revert directly or get it from
> the PM tree.
I agree this to be reverted. (And the problem is bigger than lockdep,
and apparently not limited to suspend).
But we merged hacky solution because we wanted something simple for
stable, and I don't think that was good idea. (And I'm not sure this
is worth fixing in stable, as we don't have reports of people hitting
that in that kernels).
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
* Rafael J. Wysocki <[email protected]> wrote:
> > Note that I haven't tested the revert yet, but the code and the breakage
> > looks pretty obvious. (I'll boot the revert, will follow up if that
> > didn't solve the problem.)
>
> I can queue up a revert unless anyone beats me to that.
Thanks!
I have 1+ days uptime now on the system, no splat or other problems, as
expected:
Tested-by: Ingo Molnar <[email protected]>
I have the revert locally in -tip, will drop it once your commit hits
upstream.
Ingo
* Michal Hocko <[email protected]> wrote:
> > Do we actually have reports of this happening for people outside
> > Android?
>
> Not that I am aware of.
I'd say outside of Android 99% of the use of hibernation is the fail-safe
that distributions offer on laptops with very low battery levels: the
emergency hibernation when there's almost no power left anymore.
Do these hibernation failure messages typically make it to persistent
logs before the system uses power?
In practice if that is buggy the kernel won't hibernate and the laptop
will run out of power and the user will conclude "ugh, I shouldn't have
left my laptop turned on" - without looking into the logs and reporting,
as they'll perceive it as a user failure not a system failure.
I certainly saw random Linux laptops fail to hibernate over the years and
didn't report it, so if the distribution doesn't do the reporting
automatically then chances are we'll never see it.
Thanks,
Ingo
On Tue 04-12-18 10:02:28, Ingo Molnar wrote:
>
> * Michal Hocko <[email protected]> wrote:
>
> > > Do we actually have reports of this happening for people outside
> > > Android?
> >
> > Not that I am aware of.
>
> I'd say outside of Android 99% of the use of hibernation is the fail-safe
> that distributions offer on laptops with very low battery levels: the
> emergency hibernation when there's almost no power left anymore.
>
> Do these hibernation failure messages typically make it to persistent
> logs before the system uses power?
I dunno. I do not use hibernation. I am a heavy user of the suspend
though. I s2ram all the time. And I have certainly experienced cases
where suspend has failed and I onlyi found out later when I've picked up
my laptop from my heat up bag. Nothing fatal has resulted from that but
this is certainly annoying.
--
Michal Hocko
SUSE Labs
* Oleg Nesterov <[email protected]> wrote:
> > I reviewed the ->cred_guard_mutex code, and the mutex is held across all
> > of exec() - and we always did this.
>
> Yes, and this was always wrong. For example, this test-case hangs:
>
> #include <unistd.h>
> #include <signal.h>
> #include <sys/ptrace.h>
> #include <pthread.h>
>
> void *thread(void *arg)
> {
> ptrace(PTRACE_TRACEME, 0,0,0);
> return NULL;
> }
>
> int main(void)
> {
> int pid = fork();
>
> if (!pid) {
> pthread_t pt;
> pthread_create(&pt, NULL, thread, NULL);
> pthread_join(pt, NULL);
> execlp("echo", "echo", "passed", NULL);
> }
>
> sleep(1);
> // or anything else which needs ->cred_guard_mutex,
> // say open(/proc/$pid/mem)
> ptrace(PTRACE_ATTACH, pid, 0,0);
> kill(pid, SIGCONT);
>
> return 0;
> }
>
> we really need to narrow the (huge) scope of ->cred_guard_mutex in exec paths.
>
> my attempt to fix this was nacked, and nobody suggested a better solution so far.
Any link to your patch and the NAK?
Thanks,
Ingo
* Michal Hocko <[email protected]> wrote:
> I dunno. I do not use hibernation. I am a heavy user of the suspend
> though. I s2ram all the time. And I have certainly experienced cases
> where suspend has failed and I onlyi found out later when I've picked
> up my laptop from my heat up bag. Nothing fatal has resulted from that
> but this is certainly annoying.
Hm, so I (mistakenly) thought freezing was mostly limited to hibernation
and to a few weird cases when in flight DMA must not be suspended - but
I'm wrong and in practice we always freeze tasks during s2ram, right?
And indeed:
config SUSPEND_FREEZER
bool "Enable freezer for suspend to RAM/standby" \
if ARCH_WANTS_FREEZER_CONTROL || BROKEN
depends on SUSPEND
default y
which is essentially always enabled on x86.
TIL ...
s2ram is obviously a huge deal.
Just a newbie question: any chance to not do any freezing at all on
modern laptops when doing s2ram, or at least only warn if it fails and
try to suspend?
Because in practice losing power due to failed freezing *will* result in
data loss, in about 90% of the time ...
So I don't even know what we are trying to protect against by refusing to
freeze - avoiding a 0.001% data loss risk against a 90% data loss risk?
Thanks,
Ingo
On Tue 04-12-18 10:33:10, Ingo Molnar wrote:
>
> * Michal Hocko <[email protected]> wrote:
>
> > I dunno. I do not use hibernation. I am a heavy user of the suspend
> > though. I s2ram all the time. And I have certainly experienced cases
> > where suspend has failed and I onlyi found out later when I've picked
> > up my laptop from my heat up bag. Nothing fatal has resulted from that
> > but this is certainly annoying.
>
> Hm, so I (mistakenly) thought freezing was mostly limited to hibernation
> and to a few weird cases when in flight DMA must not be suspended - but
> I'm wrong and in practice we always freeze tasks during s2ram, right?
Yup.
> And indeed:
>
> config SUSPEND_FREEZER
> bool "Enable freezer for suspend to RAM/standby" \
> if ARCH_WANTS_FREEZER_CONTROL || BROKEN
> depends on SUSPEND
> default y
>
> which is essentially always enabled on x86.
>
> TIL ...
>
> s2ram is obviously a huge deal.
>
> Just a newbie question: any chance to not do any freezing at all on
> modern laptops when doing s2ram, or at least only warn if it fails and
> try to suspend?
AFAIU both suspend and hibernation require the system to enter quiescent
state with no task potentially interfering with suspended devices. And
in this particular case those de-thread-ed threads will certainly not
interfere so silencing the lockdep sounds like a reasonable workaround.
--
Michal Hocko
SUSE Labs
On Tue, Dec 4, 2018 at 1:58 AM Michal Hocko <[email protected]> wrote:
>
> AFAIU both suspend and hibernation require the system to enter quiescent
> state with no task potentially interfering with suspended devices. And
> in this particular case those de-thread-ed threads will certainly not
> interfere so silencing the lockdep sounds like a reasonable workaround.
I still think it would be better to simply not freeze killed user processes.
We already have things like
if (test_tsk_thread_flag(p, TIF_MEMDIE))
return false;
exactly because we do not want to freeze processes that are about to
die due to being killed. Very similar situation: we don't want to
freeze those processes, because doing so would halt them from freeing
the resources that may be needed for suspend or hibernate.
How about something like we set PF_NOFREEZE when we set PF_EXITING? At
that point we've pretty much turned into a kernel thread, no?
Linus
On Tue 04-12-18 09:31:11, Linus Torvalds wrote:
> On Tue, Dec 4, 2018 at 1:58 AM Michal Hocko <[email protected]> wrote:
> >
> > AFAIU both suspend and hibernation require the system to enter quiescent
> > state with no task potentially interfering with suspended devices. And
> > in this particular case those de-thread-ed threads will certainly not
> > interfere so silencing the lockdep sounds like a reasonable workaround.
>
> I still think it would be better to simply not freeze killed user processes.
>
> We already have things like
>
> if (test_tsk_thread_flag(p, TIF_MEMDIE))
> return false;
>
> exactly because we do not want to freeze processes that are about to
> die due to being killed. Very similar situation: we don't want to
> freeze those processes, because doing so would halt them from freeing
> the resources that may be needed for suspend or hibernate.
Yeah, we do not freeze oom victims but that is really far from trivial
to achieve. In order to do so we post-pone quiescent state until we know
all the TIF_MEMDIE threads are gone. See oom_killer_disable. In short it
is a painfull code.
> How about something like we set PF_NOFREEZE when we set PF_EXITING? At
> that point we've pretty much turned into a kernel thread, no?
Hmm, that doesn't sound like a bad idea but I am not sure it will
help because those threads we are waiting for might be block way before
they reached PF_EXITING. I would expect that threads that actually reach
that state usually die shortly. Unless I am missing something we are
talking about basically two cases here. Those threads are frozen while
de_thread waits for them because zap_other_threads cannot wake them from
the fridge or they are blocked in uninterruptible sleep somewhere in the
kernel. The later is the fundamental problem. The former could be hacked
around (check PF_FROZEN and use uninterruptible wake), I guess but I
haven't thought that through yet.
--
Michal Hocko
SUSE Labs
Hi!
> * Michal Hocko <[email protected]> wrote:
>
> > > Do we actually have reports of this happening for people outside
> > > Android?
> >
> > Not that I am aware of.
>
> I'd say outside of Android 99% of the use of hibernation is the fail-safe
> that distributions offer on laptops with very low battery levels: the
> emergency hibernation when there's almost no power left anymore.
Android does not use hibernation AFAICT. Just s2ram.
> Do these hibernation failure messages typically make it to persistent
> logs before the system uses power?
I'd say so. If you have enough energy left for hibernation, you also
have enough energy left to write the logs and sync.
> In practice if that is buggy the kernel won't hibernate and the laptop
> will run out of power and the user will conclude "ugh, I shouldn't have
> left my laptop turned on" - without looking into the logs and reporting,
> as they'll perceive it as a user failure not a system failure.
>
> I certainly saw random Linux laptops fail to hibernate over the years and
> didn't report it, so if the distribution doesn't do the reporting
> automatically then chances are we'll never see it.
There are many reasons while hibernation can fail. Buggy drivers,
tasks in D state... And there are some when hibernation can fail "by
design". If you swap does not have enough space to store the data, for
example.
Hibernation was designed to be non-intrusive, and reliable as in "if
it hibernates it will also resume ok", but not reliable as in "it will
always hibernate".
I see that is problematic for "hibernate on battery low".
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, Dec 4, 2018 at 10:17 AM Michal Hocko <[email protected]> wrote:
>
> > How about something like we set PF_NOFREEZE when we set PF_EXITING? At
> > that point we've pretty much turned into a kernel thread, no?
>
> Hmm, that doesn't sound like a bad idea but I am not sure it will
> help because those threads we are waiting for might be block way before
> they reached PF_EXITING.
Yeah, looks that way. We've got the whole "zap_other_threads() ->
actually starting the exit" window, which is probably much bigger than
the "start the exit -> release_task" window.
So we'd have to mark things non-freezable at zap time, not at exit
time, and that's a lot more questionable.
Looking at this, I'm agreeing that ot would be better to just try to
narrow down the cred_guard_mutex use a lot.
Oleg, if you had patch that got push-back for that, maybe this problem
is now the impetus for people to say "yeah, that's not nice but we
clearly need to do it".
I'm not finding any old emails on this, but considering I redid my
email setup recently, that doesn't necessarily mean much.
Linus
On Tue 2018-12-04 10:33:10, Ingo Molnar wrote:
>
> * Michal Hocko <[email protected]> wrote:
>
> > I dunno. I do not use hibernation. I am a heavy user of the suspend
> > though. I s2ram all the time. And I have certainly experienced cases
> > where suspend has failed and I onlyi found out later when I've picked
> > up my laptop from my heat up bag. Nothing fatal has resulted from that
> > but this is certainly annoying.
>
> Hm, so I (mistakenly) thought freezing was mostly limited to hibernation
> and to a few weird cases when in flight DMA must not be suspended - but
> I'm wrong and in practice we always freeze tasks during s2ram, right?
>
> And indeed:
>
> config SUSPEND_FREEZER
> bool "Enable freezer for suspend to RAM/standby" \
> if ARCH_WANTS_FREEZER_CONTROL || BROKEN
> depends on SUSPEND
> default y
>
> which is essentially always enabled on x86.
>
> TIL ...
pavel@amd:~$ wtf til
Gee... I don't know what til means...
?
> s2ram is obviously a huge deal.
>
> Just a newbie question: any chance to not do any freezing at all on
> modern laptops when doing s2ram, or at least only warn if it fails and
> try to suspend?
Not really.
> Because in practice losing power due to failed freezing *will* result in
> data loss, in about 90% of the time ...
Ugh. What are you talking about?
I don't know how you use your machines, but 95% of s2ram's I do,
machines are running on AC power, and I'll notice that freezer
failure, because the machine keeps making noise when it should be
sleeping.
There's big difference between "sync; forced_poweroff" and
"forced_poweroff", which are annoying but quite harmless (editors keep
backups, web browser stores session periodically, and filesystems have
journal...) and real data corruption as in "Pavel found another bug in
fsck.ext3, which is great, but his filesystem is corrupted, which is
not so great".
(BTW one of those bugs is unfixable; I managed to corrupt ext3 in such
a way that fsck is not able to automatically repair it, and likely
never will be. Fun?)
> So I don't even know what we are trying to protect against by refusing to
> freeze - avoiding a 0.001% data loss risk against a 90% data loss risk?
Where did you get those 0.001% and 90% numbers?
I don't see freezer failures too often.
I see machine that is sleeping, but fails to resume, maybe once in
month. That's "sync; forced_poweroff" type failure. Not nice,
but... Unfortunately fairly hard to debug. And not worse than usual
"hard crashes" I see at about same frequency.
I did have real filesystem corruption, at least twice while debugging
hibernation. Trust me. You don't want to have that, and you certainly
don't want your users to have that.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, Dec 4, 2018 at 11:33 AM Linus Torvalds
<[email protected]> wrote:
>
> Looking at this, I'm agreeing that ot would be better to just try to
> narrow down the cred_guard_mutex use a lot.
Ho humm. This is a crazy idea, but I don't see why it wouldn't work.
How about we:
- stop holding on to cred_guard_mutex entirely in the exec path
and instead just do:
- prepare_bprm_creds takes a ref to our old creds, and saves it off in the bprm
- security_bprm_{committing,committed}_creds() can do it's "is this a
valid transition" using the saved-off old creds instead of the current
creds
because honestly, the *only* reason we hold on to that lock is for the
insane and not really interesting case of "somebody tried to use
ptrace to change the creds in-flight during the exec".
Or maybe we could just add a task state flag that says "in exec, you
can't modify the creds in this window, because we're about to switch
to new creds".
Again, no *normal* situation will even notice or care, I think. We
hold the cred lock purely to make sure that the sequence from
prepare_exec_creds -> install_exec_creds is "atomic" wrt credentials,
and it already is for all the normal cases since this is all inside a
single execve system call.
Linus
On Tue 2018-12-04 09:31:11, Linus Torvalds wrote:
> On Tue, Dec 4, 2018 at 1:58 AM Michal Hocko <[email protected]> wrote:
> >
> > AFAIU both suspend and hibernation require the system to enter quiescent
> > state with no task potentially interfering with suspended devices. And
> > in this particular case those de-thread-ed threads will certainly not
> > interfere so silencing the lockdep sounds like a reasonable workaround.
>
> I still think it would be better to simply not freeze killed user processes.
>
> We already have things like
>
> if (test_tsk_thread_flag(p, TIF_MEMDIE))
> return false;
>
> exactly because we do not want to freeze processes that are about to
> die due to being killed. Very similar situation: we don't want to
> freeze those processes, because doing so would halt them from freeing
> the resources that may be needed for suspend or hibernate.
>
> How about something like we set PF_NOFREEZE when we set PF_EXITING? At
> that point we've pretty much turned into a kernel thread, no?
I'd be careful about that. Exiting task needs to write to disk (space
of unlinked but open files needs to be freed), so we can't just ignore
them.
And given that ptrace example (where it deadlocks w/o freezer anywhere
nearby), I'd say attempt to simplify the locking should be made, first.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, Dec 4, 2018 at 11:49 AM Linus Torvalds
<[email protected]> wrote:
>
> because honestly, the *only* reason we hold on to that lock is for the
> insane and not really interesting case of "somebody tried to use
> ptrace to change the creds in-flight during the exec".
No, sorry, me confused. Not somebody trying to change them, it's just
ptrace_attach() trying to change _our_ state during this sequence, and
relying on it all being atomic.
So taking a ref is unnecessary and pointless. It's not the creds that
change, it's that we really want to delay ptrace_attach().
We could maybe set that "we're busy now" flag, and have
ptrace_attach() do something like
if (task_is_busy(task)) {
sched_yield();
return -ERESTARTSYS;
}
or something like that.
Linus
Ingo, et al,
Sorry, I am sick and can't participate this discussion right now,
On 12/04, Ingo Molnar wrote:
>
> * Oleg Nesterov <[email protected]> wrote:
>
> > we really need to narrow the (huge) scope of ->cred_guard_mutex in exec paths.
> >
> > my attempt to fix this was nacked, and nobody suggested a better solution so far.
>
> Any link to your patch and the NAK?
See https://lore.kernel.org/lkml/[email protected]/
No questions, the patch wasn't pretty. And imo we need to rework the security
hooks in the long term.
Oleg.
> From: Oleg Nesterov [mailto:[email protected]]
> Sent: Wednesday, December 05, 2018 11:36 PM
> To: Ingo Molnar
> Cc: Linus Torvalds; Linux List Kernel Mailing; Rafael J. Wysocki; Chanho Min;
> Thomas Gleixner; Peter Zijlstra; Pavel Machek; Michal Hocko
> Subject: Re: [PATCH] Revert "exec: make de_thread() freezable (was: Re: Linux
> 4.20-rc4)
>
> Ingo, et al,
>
> Sorry, I am sick and can't participate this discussion right now,
>
> On 12/04, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <[email protected]> wrote:
> >
> > > we really need to narrow the (huge) scope of ->cred_guard_mutex in exec
> paths.
> > >
> > > my attempt to fix this was nacked, and nobody suggested a better solution
> so far.
> >
> > Any link to your patch and the NAK?
>
> See https://lore.kernel.org/lkml/[email protected]/
>
> No questions, the patch wasn't pretty. And imo we need to rework the security
> hooks in the long term.
>
> Oleg.
I am sorry for the reverting this patch. It's also my fault that
I didn't check lockdep. But, We decided to keep this patch in our product.
Freeze fail is a real problem we've had for the last two years,
whereas lockdep's notice is not a real problem.
We hope this issue will be resolved soon.
Special thanks to Oleg,
Chanho
Hi!
> > On 12/04, Ingo Molnar wrote:
> > >
> > > * Oleg Nesterov <[email protected]> wrote:
> > >
> > > > we really need to narrow the (huge) scope of ->cred_guard_mutex in exec
> > paths.
> > > >
> > > > my attempt to fix this was nacked, and nobody suggested a better solution
> > so far.
> > >
> > > Any link to your patch and the NAK?
> >
> > See https://lore.kernel.org/lkml/[email protected]/
> >
> > No questions, the patch wasn't pretty. And imo we need to rework the security
> > hooks in the long term.
> >
> > Oleg.
>
> I am sorry for the reverting this patch. It's also my fault that
> I didn't check lockdep. But, We decided to keep this patch in our product.
> Freeze fail is a real problem we've had for the last two years,
> whereas lockdep's notice is not a real problem.
> We hope this issue will be resolved soon.
I guess it makes sense for your usage.
How often do you see the failures without the patch?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> >
> > I am sorry for the reverting this patch. It's also my fault that
> > I didn't check lockdep. But, We decided to keep this patch in our product.
> > Freeze fail is a real problem we've had for the last two years,
> > whereas lockdep's notice is not a real problem.
> > We hope this issue will be resolved soon.
>
> I guess it makes sense for your usage.
>
> How often do you see the failures without the patch?
Very rare, it happens about 1 in 1000 suspends.
Chanho,