2017-12-03 16:22:59

by Linus Torvalds

[permalink] [raw]
Subject: Linux 4.15-rc2

It's Sunday, but a few hours earlier than usual, since I'm on the east
coast, three hours ahead of my normal release schedule.

It's a slightly bigger rc2 than I would have wished for, but this
early in the release process I don't worry about it. The appended
shortlog gives the details, it's fixes all over the place -
architectures, drivers, filesystems, networking, core kernel.

One thing I'll point out is that I'm trying to get some kernel ASLR
leaks plugged, and as part of that we now hash any pointers printed by
"%p' by default. That won't affect a lot of people, but where it is a
debugging problem (rather than leaking interesting kernel pointers),
we will have to fix things up.

It can be a small annoyance, but the alternatives (trying to actually
find all the cases where we might be leaking) were worse. But let's
see if anybody even notices - a lot of the pointer printouts are stale
debug information from when some driver was originally written, and
aren't actually really interesting.

There will probably be some more leak fixes during this rc process,
we'll see how that all sorts out.

Linus

---

Abhishek Goel (2):
cpupowerutils: bench - Fix cpu online check
cpupower : Fix cpupower working when cpu0 is offline

Adrian Hunter (4):
mmc: block: Fix missing blk_put_request()
mmc: block: Check return value of blk_get_request()
mmc: core: Do not leave the block driver in a suspended state
mmc: block: Ensure that debugfs files are removed

Ahmad Fatoum (1):
e1000: Fix off-by-one in debug message

Alex Deucher (4):
drm/amdgpu/gfx7: cache raster_config values
drm/amdgpu: used cached gca values for cik_read_register
Revert "drm/amdgpu: fix rmmod KCQ disable failed error"
drm/amdgpu: drop experimental flag for raven

Amritha Nambiar (1):
i40e: Fix reporting incorrect error codes

Andrew Elble (2):
nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
nfsd: check for use of the closed special stateid

Andrew Jiang (1):
drm/amd/display: Don't reject 3D timings

Andrew Waterman (3):
RISC-V: Add VDSO entries for clock_get/gettimeofday/getcpu
RISC-V: Flush I$ when making a dirty page executable
RISC-V: Allow userspace to flush the instruction cache

Andrey Grodzovsky (1):
drm/amd/display: Switch to drm_atomic_helper_wait_for_flip_done

Andrey Gusakov (6):
drm/bridge: tc358767: do no fail on hi-res displays
drm/bridge: tc358767: filter out too high modes
drm/bridge: tc358767: fix DP0_MISC register set
drm/bridge: tc358767: fix timing calculations
drm/bridge: tc358767: fix AUXDATAn registers access
drm/bridge: tc358767: fix 1-lane behavior

Andy Shevchenko (1):
scripts/bloat-o-meter: don't fail with division by 0

Antoine Tenart (4):
net: mvpp2: fix the txq_init error path
net: mvpp2: cleanup probed ports in the probe error path
net: mvpp2: check ethtool sets the Tx ring size is to a valid min value
net: phy: marvell10g: fix the PHY id mask

Ard Biesheuvel (2):
arm64: module-plts: factor out PLT generation code for ftrace
arm64: ftrace: emit ftrace-mod.o contents through code

Arnd Bergmann (2):
drm/i915: fix intel_backlight_device_register declaration
drm/omap: displays: panel-dpi: add backlight dependency

Bartosz Golaszewski (1):
eeprom: at24: correctly set the size for at24mac402

Bastian Stender (2):
mmc: core: prepend 0x to pre_eol_info entry in sysfs
mmc: core: prepend 0x to OCR entry in sysfs

Benjamin Gaignard (1):
ethernet: dwmac-stm32: Fix copyright

Bhawanpreet Lakha (1):
drm/amd/display: Add null check for 24BPP (xfm and dpp)

Bhumika Goyal (3):
sunrpc: make the function arg as const
NFSD: make cache_detail structures const
SUNRPC: make cache_detail structures const

Bjorn Andersson (1):
mmc: sdhci-msm: Optionally wait for signal level changes

Carlos Maiolino (1):
xfs: Properly retry failed dquot items in case of error during
buffer writeback

Changbin Du (1):
drm/i915/gvt: Fix unsafe locking caused by spin_unlock_bh

Chao Yu (1):
quota: propagate error from __dquot_initialize

Charlene Liu (2):
drm/amd/display: fix seq issue: turn on clock before programming afmt.
drm/amd/display: try to find matching audio inst for enc inst first

Chris Wilson (3):
drm/i915: Clear breadcrumb node when cancelling signaling
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
drm/i915/fbdev: Serialise early hotplug events with async fbdev config

Christian Borntraeger (1):
s390/debug: use pK for kernel pointers

Christian König (2):
drm/amdgpu: don't try to move pinned BOs
drm/ttm: fix populate_and_map() functions once more

Christoph Hellwig (1):
move libgcc.h to include/linux

Christophe JAILLET (2):
bnxt_en: Fix an error handling path in 'bnxt_get_module_eeprom()'
drm/omap: Fix error handling path in 'omap_dmm_probe()'

Chun-Yeow Yeoh (1):
mac80211: fix the update of path metric for RANN frame

Cihangir Akturk (1):
drm: mali-dp: switch to drm_*_get(), drm_*_put() helpers

Colin Ian King (9):
nvme: fix spelling mistake: "requeing" -> "requeuing"
drm/i915/gvt: ensure -ve return value is handled correctly
i2c: i2c-boardinfo: fix memory leaks on devinfo
ambassador: fix incorrect indentation of assignment statement
atm: fore200e: use %pK to format kernel addresses instead of %x
atm: lanai: use %p to format kernel addresses instead of %x
atm: suni: remove extraneous space to fix indentation
drm/amd/display: fix memory leaks on error exit return
net: via: via-rhine: use %p to format void * address instead of %x

Dan Carpenter (1):
omapdrm: hdmi4_cec: signedness bug in hdmi4_cec_init()

Dan Williams (11):
mm: fix device-dax pud write-faults triggered by get_user_pages()
mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
mm: replace pud_write with pud_access_permitted in fault + gup paths
mm: replace pmd_write with pmd_access_permitted in fault + gup paths
mm: replace pte_write with pte_access_permitted in fault + gup paths
mm, hugetlbfs: introduce ->split() to vm_operations_struct
device-dax: implement ->split() to catch invalid munmap attempts
mm: introduce get_user_pages_longterm
mm: fail get_vaddr_frames() for filesystem-dax mappings
v4l2: disable filesystem-dax mapping support
IB/core: disable memory registration of filesystem-dax vmas

Darrick J. Wong (5):
xfs: always free inline data before resetting inode fork during ifree
xfs: log recovery should replay deferred ops in order
xfs: ubsan fixes
xfs: remove unused parameter from xfs_writepage_map
xfs: scrub inode mode properly

Dave Airlie (1):
drm/ttm: don't attempt to use hugepages if dma32 requested (v2)

Dave Martin (1):
arm64: fpsimd: Fix failure to restore FPSIMD state after signals

David Disseldorp (1):
null_blk: fix dev->badblocks leak

David Hildenbrand (1):
KVM: x86: fix em_fxstor() sleeping while in atomic

David Howells (14):
rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing
rxrpc: Don't set upgrade by default in sendmsg()
rxrpc: Provide a different lockdep key for call->user_mutex for
kernel calls
rxrpc: Delay terminal ACK transmission on a client call
rxrpc: Split the call params from the operation params
rxrpc: Fix call timeouts
rxrpc: Don't transmit DELAY ACKs immediately on proposal
rxrpc: Express protocol timeouts in terms of RTT
rxrpc: Add a timeout for detecting lost ACKs/lost DATA
rxrpc: Add keepalive for a call
rxrpc: Fix service endpoint expiry
rxrpc: Fix conn expiry timers
afs: Fix permit refcounting
afs: Properly reset afs_vnode (inode) fields

David S. Miller (1):
sparc64: Fix boot on T4 and later.

David Sterba (2):
btrfs: add missing device::flush_bio puts
btrfs: dev_alloc_list is not protected by RCU, use normal list_del

Dmitry V. Levin (1):
uapi: fix linux/kfd_ioctl.h userspace compilation errors

Dmytro Laktyushkin (3):
drm/amd/display: fix split recout calculation
drm/amd/display: fix split recout offset
drm/amd/display: fix split viewport rounding error

Dr. David Alan Gilbert (2):
KVM: lapic: Split out x2apic ldr calculation
KVM: lapic: Fixup LDR on load in x2apic

Eduardo Otubo (1):
xen-netfront: remove warning when unloading module

Eric Anholt (1):
drm/bridge: Fix lvds-encoder since the panel_bridge rework.

Eric Dumazet (1):
net/packet: fix a race in packet_bind() and packet_notifier()

Eric Sandeen (3):
xfs: fix leaks on corruption errors in xfs_bmap.c
xfs: fix uninitialized variable in xfs_scrub_quota
xfs: calculate correct offset in xfs_scrub_quota_item

Eric Yang (1):
drm/amd/display: Add timing validation against dongle cap

Eyal Moscovici (1):
KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs

Felix Kuehling (2):
drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode
drm/amdkfd: Fix SDMA oversubsription handling

Filipe Manana (3):
Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
Btrfs: fix reported number of inode blocks after buffered append writes
Btrfs: incremental send, fix wrong unlink path after renaming file

Geert Uytterhoeven (1):
net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit

Greg Kroah-Hartman (19):
s390: block: add SPDX identifiers to the remaining files
s390: crypto: add SPDX identifiers to the remaining files
s390: cio: add SPDX identifiers to the remaining files
s390: char: add SPDX identifiers to the remaining files
s390: net: add SPDX identifiers to the remaining files
s390: scsi: zfcp_aux: add SPDX identifier
s390: virtio: add SPDX identifiers to the remaining files
s390: crypto: Remove redundant license text
s390: drivers: Remove redundant license text
s390: kernel: add SPDX identifiers to the remaining files
s390: crypto: add SPDX identifiers to the remaining files
s390: mm: add SPDX identifiers to the remaining files
s390: pci: add SPDX identifiers to the remaining files
s390: appldata: add SPDX identifiers to the remaining files
s390: add SPDX identifiers to the remaining files
s390: kernel: Remove redundant license text
s390: include: Remove redundant license text
s390: crypto: Remove redundant license text
s390: Remove redundant license text

Gustavo A R Silva (1):
i40e/virtchnl: fix application of sizeof to pointer

Gustavo A. R. Silva (1):
net: openvswitch: datapath: fix data type in queue_gso_packets

Hans Verkuil (1):
drm/bridge: adv7511/33: Fix adv7511_cec_init() failure handling

Hans de Goede (4):
ACPI / bus: Leave modalias empty for devices which are not present
drm/i915: Fix false-positive assert_rpm_wakelock_held in
i915_pmic_bus_access_notifier v2
drm/i915: Re-register PMIC bus access notifier on runtime resume
i2c: i801: Fix Failed to allocate irq -2147483648 error

Harald Freudenberger (1):
s390/zcrypt: Fix wrong comparison leading to strange load balancing

Harry Wentland (6):
drm/amd/display: Fix amdgpu_dm bugs found by smatch
drm/amd/display: Bunch of smatch error and warning fixes in DC
drm/amd/display: Fix use before NULL check in validate_timing
drm/amd/display: Fix hubp check in set_cursor_position
drm/amd/display: Fix potential NULL and mem leak in create_links
drm/amd/display: Fix couple more inconsistent NULL checks in dc_resource

Heiko Carstens (2):
s390: rework __switch_to() to allow larger task_struct offsets
s390/disassembler: remove confusing code

Heiner Kallweit (2):
eeprom: at24: fix reading from 24MAC402/24MAC602
eeprom: at24: check at24_read/write arguments

Hersen Wu (2):
drm/amd/display: Handle as MST first and then DP dongle if sink
support both
drm/amd/display: USB-C / thunderbolt dock specific workaround

Huacai Chen (1):
bcache: Fix building error on MIPS

Hyong-Youb Kim (1):
myri10ge: Update MAINTAINERS

Ian Kent (2):
autofs: revert "autofs: take more care to not update last_used
on path walk"
autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"

Israel Rukshin (1):
nvme-rdma: Use mr pool

Jakub Kicinski (1):
cls_bpf: don't decrement net's refcount when offload fails

James Hogan (1):
cpufreq: Add Loongson machine dependencies

James Smart (1):
nvmet-fc: correct ref counting error when deferred rcv used

Jan H. Schönherr (1):
KVM: Let KVM_SET_SIGNAL_MASK work as advertised

Janakarajan Natarajan (1):
KVM: x86: Fix CPUID function for word 6 (80000001_ECX)

Jean Delvare (1):
hwmon: Drop reference to Jean's tree

Jeff Layton (1):
reiserfs: remove unneeded i_version bump

Jeff Lien (1):
nvme-pci: add quirk for delay before CHK RDY for WDC SN200

Jens Axboe (2):
nvme-fc: don't use bit masks for set/test_bit() numbers
blktrace: fix trace mutex deadlock

Jerry (Fangzhi) Zuo (1):
drm/amd/display: Check aux channel before MST resume

Jesse Chan (1):
cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE

Jiang Biao (1):
fs/mbcache.c: make count_objects() more robust

Jinbum Park (1):
arm64: pgd: Mark pgd_cache as __ro_after_init

Jiri Pirko (1):
net: sched: cbq: create block for q->link.block

Johannes Berg (2):
cfg80211: select CRYPTO_SHA256 if needed
mac80211: use QoS NDP for AP probing

John Johansen (1):
apparmor: fix oops in audit_signal_cb hook

Jon Maloy (1):
tipc: eliminate access after delete in group_filter_msg()

Joonas Lahtinen (1):
drm/i915: Disable THP until we have a GPU read BW W/A

Jordan Lazare (1):
drm/amd/display: Revert noisy assert messages

Jorgen Hansen (2):
VSOCK: Don't call vsock_stream_has_data in atomic context
VSOCK: Don't set sk_state to TCP_CLOSE before testing it

Josef Bacik (2):
btrfs: clear space cache inode generation always
btrfs: fix deadlock when writing out space cache

Kai-Heng Feng (1):
nvme-pci: disable APST on Samsung SSD 960 EVO + ASUS PRIME B350M-A

Kees Cook (1):
exec: avoid RLIMIT_STACK races with prlimit()

Keith Busch (2):
nvme: Fix NULL dereference on reservation request
nvme: Suppress static analyis warning

Kirill A. Shutemov (3):
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
mm, thp: Do not make pmd/pud dirty without a reason
mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine

Laurent Pinchart (1):
drm: omapdrm: Fix DPI on platforms using the DSI VDDS

Leo (Sunpeng) Li (3):
drm/amd/display: Should disable when new stream is null
drm/amd/display: Do DC mode-change check when adding CRTCs
drm/amd/display: Do not put drm_atomic_state on resume

Leo Liu (1):
drm/amdgpu: move UVD/VCE and VCN structure out from union

Linus Torvalds (6):
Rename superblock flags (MS_xyz -> SB_xyz)
proc: don't report kernel addresses in /proc/<pid>/stack
Revert "mm, thp: Do not make pmd/pud dirty without a reason"
kallsyms: take advantage of the new '%px' format
vsprintf: don't use 'restricted_pointer()' when not restricting
Linux 4.15-rc2

Liran Alon (6):
KVM: x86: pvclock: Handle first-time write to pvclock-page
contains random junk
KVM: nVMX/nSVM: Don't intercept #UD when running L2
KVM: x86: Exit to user-mode on #UD intercept when emulator requires
KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
KVM: x86: Don't re-execute instruction when not passing CR2 value
KVM: nVMX: Fix vmx_check_nested_events() return value in case an
event was reinjected to L2

Liu Bo (3):
Btrfs: add write_flags for compression bio
Btrfs: bail out gracefully rather than BUG_ON
Btrfs: fix list_add corruption and soft lockups in fsync

Liu, Changcheng (1):
scripts/faddr2line: extend usage on generic arch

Liviu Dudau (3):
drm: hdlcd: Update PM code to save/restore console.
drm: mali-dp: Separate static internal data into a read-only structure.
drm: mali-dp: Disable planes when their CRTC gets disabled.

Lucas Stach (2):
drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressive
drm/imx: always call wait_for_flip_done in commit_tail

Lv Zheng (1):
ACPI / EC: Fix regression related to PM ops support in ECDT device

Maarten Lankhorst (2):
drm/vblank: Pass crtc_id to page_flip_ioctl.
drm/fb_helper: Disable all crtc's when initial setup fails.

Mahesh Salgaonkar (1):
powerpc/powernv: Fix kexec crashes caused by tlbie tracing

Marcos Paulo de Souza (1):
blktrace: Use blk_trace_bio_get_cgid inside blk_add_trace_bio

Mark Rutland (1):
arm64: mm: cleanup stale AIVIVT references

Martin Schwidefsky (4):
s390: fix alloc_pgste check in init_new_context again
s390: sthyi: add SPDX identifiers to the remaining files
s390: revert ELF_ET_DYN_BASE base changes
s390/gs: add compat regset for the guarded storage broadcast control block

Max Gurtovoy (1):
nvme-rdma: fix memory leak during queue allocation

Michael Ellerman (1):
powerpc/kexec: Fix kexec/kdump in P9 guest kernels

Michael Lyle (1):
bcache: check return value of register_shrinker

Michal Hocko (3):
xfs: fortify xfs_alloc_buftarg error handling
mm, memory_hotplug: do not back off draining pcp free pages from
kworker context
Revert "mm/page-writeback.c: print a warning if the vm dirtiness
settings are illogical"

Michel Dänzer (2):
drm/amdgpu: Set adev->vcn.irq.num_types for VCN
drm/amdgpu: Use unsigned ring indices in amdgpu_queue_mgr_map

Mika Westerberg (1):
net: thunderbolt: Stop using zero to mean no valid DMA mapping

Mike Kravetz (1):
mm/cma: fix alloc_contig_range ret code/potential leak

Mike Maloney (1):
packet: fix crash in fanout_demux_rollover()

Mikulas Patocka (1):
block: remove useless assignment in bio_split

Minwoo Im (2):
nvme-pci: avoid hmb desc array idx out-of-bound when hmmaxd set.
nvme-pci: fix NULL pointer dereference in nvme_free_host_mem()

Mirza Krak (1):
drm/rockchip: dw-mipi-dsi: fix possible un-balanced runtime PM enable

Nadav Amit (1):
fs/hugetlbfs/inode.c: change put_page/unlock_page order in
hugetlbfs_fallocate()

Naofumi Honda (1):
nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat

Nikita Leshenko (5):
KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
KVM: x86: ioapic: Don't fire level irq when Remote IRR set
KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
KVM: x86: ioapic: Clear Remote IRR when entry is switched to
edge-triggered
KVM: x86: ioapic: Preserve read-only values in the redirection table

Nikolay Borisov (1):
btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

OGAWA Hirofumi (1):
fs/fat/inode.c: fix sb_rdonly() change

Oded Gabbay (2):
microblaze: add missing include to mmu_context_mm.h
drm/radeon: remove init of CIK VMIDs 8-16 for amdkfd

Olof Johansson (8):
RISC-V: use generic serial.h
RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
RISC-V: io.h: type fixes for warnings
RISC-V: move empty_zero_page definition to C and export it
RISC-V: Export some expected symbols for modules
RISC-V: Provide stub of setup_profiling_timer()
RISC-V: Use define for get_cycles like other architectures
RISC-V: Add missing include

Ondrej Mosnáček (1):
crypto: skcipher - Fix skcipher_walk_aead_common

Palmer Dabbelt (11):
RISC-V: Remove __vdso_cmpxchg{32,64} symbol versions
RISC-V: Remove unused arguments from ATOMIC_OP
RISC-V: Comment on why {,cmp}xchg is ordered how it is
RISC-V: Remove __smp_bp__{before,after}_atomic
RISC-V: Remove smb_mb__{before,after}_spinlock()
RISC-V: __test_and_op_bit_ord should be strongly ordered
RISC-V: Add READ_ONCE in arch_spin_is_locked()
RISC-V: `sfence.vma` orderes the instruction cache
RISC-V: remove spin_unlock_wait()
RISC-V: Clean up an unused include
RISC-V: __io_writes should respect the length argument

Paolo Abeni (1):
sch_sfq: fix null pointer dereference at timer expiration

Paolo Bonzini (2):
KVM: x86: inject exceptions produced by x86_decode_insn
KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP

Paul Mackerras (1):
KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT
guests on radix hosts

Peter Rosin (1):
hwmon: (jc42) optionally try to disable the SMBUS timeout

Peter Ujfalusi (1):
omapdrm: hdmi4: Correct the SoC revision matching

Petr Machata (4):
mlxsw: spectrum_router: Offload decap only for up tunnels
mlxsw: spectrum_router: Demote tunnels on VRF migration
mlxsw: spectrum_router: Handle encap to demoted tunnels
mlxsw: spectrum_router: Update nexthop RIF on update

Pierre-Hugues Husson (1):
drm/bridge: synopsys/dw-hdmi: Enable cec clock

Qu Wenruo (2):
btrfs: Fix wild memory access in compression level parser
btrfs: tree-checker: Fix false panic for sanity test

Randy Dunlap (2):
block: genhd.c: fix message typo
drm/amdkfd: fix amdkfd use-after-free GP fault

Robert Lippert (1):
hwmon: (pmbus) Use 64bit math for DIRECT format values

Roman Kapl (1):
net: sched: crash on blocks with goto chain action

Roman Li (2):
drm/amd/display: Fix S3 topology change
drm/amd/display: fix gamma setting

Rui Hua (1):
bcache: recover data from backing when data is clean

Russell King (1):
ARM: avoid faulting on qemu

Sagi Grimberg (7):
nvme-fabrics: introduce init command check for a queue that is not alive
nvme-fc: check if queue is ready in queue_rq
nvme-loop: check if queue is ready in queue_rq
nvme-rdma: don't suppress send completions
nvme-rdma: don't complete requests before a send work request
has completed
nvme-rdma: wait for local invalidation before completing a request
nvme-rdma: Check remotely invalidated rkey matches our expected rkey

Sara Sharon (1):
mac80211: tear down RX aggregations first

Sasha Neftin (1):
e1000e: fix the use of magic numbers for buffer overrun issue

Shakeel Butt (1):
mm, memcg: fix mem_cgroup_swapout() for THPs

Shirish S (1):
drm/amd/display: check plane state before validating fbc

Srishti Sharma (2):
drm/arm: Replace instances of drm_dev_unref with drm_dev_put.
drm/arm: Replace instances of drm_dev_unref with drm_dev_put.

Stefan Schake (1):
drm/vc4: Account for interrupts in flight

Stephan Mueller (2):
crypto: algif_aead - skip SGL entries with NULL page
crypto: af_alg - remove locking in async callback

Stephen Hemminger (1):
uapi: add SPDX identifier to vm_sockets_diag.h

Sunil Goutham (1):
net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts

Takashi Iwai (1):
Revert "ALSA: usb-audio: Fix potential zero-division at parsing FU"

Tang Junhui (1):
bcache: add a comment in journal bucket reading

Tetsuo Handa (1):
quota: Check for register_shrinker() failure.

Thomas Meyer (1):
auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM

Thomas Richter (1):
s390/topology: fix compile error in file arch/s390/kernel/smp.c

Tobin C. Harding (5):
docs: correct documentation for %pK
vsprintf: refactor %pK code out of pointer()
printk: hash addresses printed with %p
vsprintf: add printk specifier %px
kasan: use %px to print addresses instead of %p

Trond Myklebust (11):
nfsd: Fix stateid races between OPEN and CLOSE
nfsd: Fix another OPEN stateid race
nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
nfsd: Ensure we don't recognise lock stateids after freeing them
nfsd4: move find_lock_stateid
nfsd: Fix race in lock stateid creation
nfsd: Ensure we check stateid validity in the seqid operation checks
nfsd: Fix races with check_stateid_generation()
NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
SUNRPC: Allow connect to return EHOSTUNREACH
SUNRPC: Handle ENETDOWN errors

Ulf Hansson (1):
mmc: sdhci: Avoid swiotlb buffer being full

Vaibhav Jain (3):
cxl: Check if vphb exists before iterating over AFU devices
powerpc: Avoid signed to unsigned conversion in set_thread_tidr()
powerpc: Do not assign thread.tidr if already assigned

Vasily Averin (9):
nfsd: remove net pointer from debug messages
lockd: remove net pointer from messages
grace: replace BUG_ON by WARN_ONCE in exit_net hook
lockd: added cleanup checks in exit_net hook
lockd: lost rollback of set_grace_period() in lockd_down_net()
race of lockd inetaddr notifiers vs nlmsvc_rqst change
race of nfsd inetaddr notifiers vs nn->nfsd_serv change
nlm_shutdown_hosts_net() cleanup
lockd: fix "list_add double add" caused by legacy signal interface

Vasily Gorbik (1):
s390/disassembler: correct disassembly lines alignment

Vasyl Gomonovych (1):
lmc: Use memdup_user() as a cleanup

Ville Syrjälä (4):
drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
drm/i915: Fix init_clock_gating for resume
drm/i915: Don't try indexed reads to alternate slave addresses
drm/i915: Prevent zero length "index" write

Vitor Massaru Iha (1):
drm: Fix checkpatch issue: "WARNING: braces {} are not necessary
for single statement blocks."

Vivien Didelot (1):
net: dsa: fix 'increment on 0' warning

Wang Nan (1):
mm, oom_reaper: gather each vma to prevent leaking TLB entry

Wanpeng Li (6):
KVM: X86: Fix operand/address-size during instruction decoding
KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
KVM: X86: Fix softlockup when get the current kvmclock
KVM: VMX: Fix rflags cache during vCPU reset
KVM: VMX: Fix vmx->nested freeing when no SMI handler

Weinan Li (1):
drm/i915/gvt: remove skl_misc_ctl_write handler

Will Deacon (1):
arm64: context: Fix comments and remove pointless smp_wmb()

Xiaolin Zhang (1):
drm/i915/gvt: enabled pipe A default on creating vgpu

Xin Long (11):
sctp: use sizeof(__u16) for each stream number length instead of
magic number
sctp: only allow the out stream reset when the stream outq is empty
sctp: only allow the asoc reset when the asoc outq is empty
sctp: avoid flushing unsent queue when doing asoc reset
sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1
sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
sctp: force the params with right types for sctp csum apis
sctp: remove extern from stream sched
sctp: use right member as the param of list_for_each_entry
bonding: use nla_get_u64 to extract the value for
IFLA_BOND_AD_ACTOR_SYSTEM
vxlan: use __be32 type for the param vni in __vxlan_fdb_delete

Xiong Zhang (1):
drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition

Xu YiPing (1):
arm64: perf: remove unsupported events for Cortex-A73

Yan Markman (1):
net: mvpp2: do not disable GMAC padding

Yisheng Xie (1):
kmemleak: add scheduling point to kmemleak_scan()

Yury Norov (1):
arm64: cpu_ops: Add missing 'const' qualifiers

Zhu Yanjun (1):
forcedeth: replace pci_unmap_page with dma_unmap_page

Zi Yan (1):
mm: migrate: fix an incorrect call of prep_transhuge_page()

chenjie (1):
mm/madvise.c: fix madvise() infinite loop under special circumstances

fred gao (1):
drm/i915/gvt: Move request alloc to dispatch_workload path only

shaoyunl (1):
drm/amdkfd: Fix SDMA ring buffer size calculation

weiping zhang (7):
bdi: convert bdi_debug_register to int
bdi: add error handle for bdi_debug_register
block: add WARN_ON if bdi register fail
blk-wbt: remove duplicated setting in wbt_init
blk-sysfs: remove NULL pointer checking in queue_wb_lat_store
blk-wbt: move wbt_clear_stat to common place in wbt_done
blk-wbt: fix comments typo

zhangliping (1):
openvswitch: fix the incorrect flow action alloc size


2017-12-04 22:25:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Sunday, December 3, 2017 5:22:56 PM CET Linus Torvalds wrote:
> It's Sunday, but a few hours earlier than usual, since I'm on the east
> coast, three hours ahead of my normal release schedule.
>
> It's a slightly bigger rc2 than I would have wished for, but this
> early in the release process I don't worry about it. The appended
> shortlog gives the details, it's fixes all over the place -
> architectures, drivers, filesystems, networking, core kernel.
>
> One thing I'll point out is that I'm trying to get some kernel ASLR
> leaks plugged, and as part of that we now hash any pointers printed by
> "%p' by default. That won't affect a lot of people, but where it is a
> debugging problem (rather than leaking interesting kernel pointers),
> we will have to fix things up.
>
> It can be a small annoyance, but the alternatives (trying to actually
> find all the cases where we might be leaking) were worse. But let's
> see if anybody even notices - a lot of the pointer printouts are stale
> debug information from when some driver was originally written, and
> aren't actually really interesting.
>
> There will probably be some more leak fixes during this rc process,
> we'll see how that all sorts out.

So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
systems I have tested, so it is probably safe to assume it to be
broken everywhere.

I'm quite confident that this is not something that went in through the
PM tree, because I was running those changes on the systems that turn
out to be broken now.

It looks like the the ACPI waking vector mechanism stopped working, so
I'm suspecting some x86 changes having to do with virtual-to-physical
address mapping.

I've just started bisection.

Thanks,
Rafael

2017-12-04 22:36:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> systems I have tested, so it is probably safe to assume it to be
> broken everywhere.

Oh, it's definitely not broken everywhere, because I use it myself,
and was traveling last week due to my mom's bday.

HOWEVER.

Some of the x86 work seems to have broken it for some configurations.
In particular, do you have a big "everything enabled" kernel config -
particularly lockdep and irqflags tracing enabled?

Andy has a patch, but it hasn't made it to me yet (probably because
the x86 people are very busy with the kaiser work):

https://lkml.org/lkml/2017/11/30/546

(also note his follow-up "fix the commit message" note, but that one
doesn't actually affect the code itself).

Does that patch fix it for you?

Linus

2017-12-04 22:38:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Mon, 4 Dec 2017, Linus Torvalds wrote:

> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
>
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
>
> HOWEVER.
>
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
>
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):

Picking it up right now.

Thanks,

tglx

2017-12-04 22:41:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> On Mon, 4 Dec 2017, Linus Torvalds wrote:
>
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> >
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> >
> > HOWEVER.
> >
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> >
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):

This definitely fixes the problem at least on one of the affected machines.

> Picking it up right now.

Cool, thanks!

Rafael

2017-12-05 00:26:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> >
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > >
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > >
> > > HOWEVER.
> > >
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > >
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
>
> This definitely fixes the problem at least on one of the affected machines.

I can confirm that the Andy's patch fixes it on all systems that had this
issue here.

Thanks,
Rafael

2017-12-06 12:15:15

by Michal Hocko

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
>
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
>
> HOWEVER.
>
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
>
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):
>
> https://lkml.org/lkml/2017/11/30/546
>
> (also note his follow-up "fix the commit message" note, but that one
> doesn't actually affect the code itself).

merging tip/x86/urgent on top of your tree fixed this problem for me,
but I am seeing something else
[ 131.711412] ACPI: Preparing to enter system sleep state S3
[ 131.755328] ACPI: EC: event blocked
[ 131.755328] ACPI: EC: EC stopped
[ 131.755328] PM: Saving platform NVS memory
[ 131.755344] Disabling non-boot CPUs ...
[ 131.779330] IRQ 124: no longer affine to CPU1
[ 131.780334] smpboot: CPU 1 is now offline
[ 131.804465] smpboot: CPU 2 is now offline
[ 131.827291] IRQ 122: no longer affine to CPU3
[ 131.827292] IRQ 123: no longer affine to CPU3
[ 131.828293] smpboot: CPU 3 is now offline
[ 131.830991] ACPI: Low-level resume complete
[ 131.831092] ACPI: EC: EC started
[ 131.831093] PM: Restoring platform NVS memory
[ 131.831864] do_IRQ: 0.55 No irq handler for vector
[ 131.831884] Enabling non-boot CPUs ...
[ 131.831909] x86: Booting SMP configuration:
[ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 131.832913] cache: parent cpu1 should not be sleeping
[ 131.833058] CPU1 is up
[ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 131.833864] cache: parent cpu2 should not be sleeping
[ 131.833983] CPU2 is up
[ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 131.834776] cache: parent cpu3 should not be sleeping
[ 131.834923] CPU3 is up

"No irq handler" part looks a bit scary (maybe related to lost affinity
messages?) but the following messages look quite as well. Is this
something known? The system seems to be up and running without any
visible issues.
--
Michal Hocko
SUSE Labs

2017-12-06 12:23:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 6 Dec 2017, Michal Hocko wrote:
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [ 131.711412] ACPI: Preparing to enter system sleep state S3
> [ 131.755328] ACPI: EC: event blocked
> [ 131.755328] ACPI: EC: EC stopped
> [ 131.755328] PM: Saving platform NVS memory
> [ 131.755344] Disabling non-boot CPUs ...
> [ 131.779330] IRQ 124: no longer affine to CPU1
> [ 131.780334] smpboot: CPU 1 is now offline
> [ 131.804465] smpboot: CPU 2 is now offline
> [ 131.827291] IRQ 122: no longer affine to CPU3
> [ 131.827292] IRQ 123: no longer affine to CPU3
> [ 131.828293] smpboot: CPU 3 is now offline
> [ 131.830991] ACPI: Low-level resume complete
> [ 131.831092] ACPI: EC: EC started
> [ 131.831093] PM: Restoring platform NVS memory
> [ 131.831864] do_IRQ: 0.55 No irq handler for vector

Hmm, that's really odd.

> [ 131.831884] Enabling non-boot CPUs ...
> [ 131.831909] x86: Booting SMP configuration:
> [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 131.832913] cache: parent cpu1 should not be sleeping

This is an old one.

> [ 131.833058] CPU1 is up
> [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [ 131.833864] cache: parent cpu2 should not be sleeping
> [ 131.833983] CPU2 is up
> [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 131.834776] cache: parent cpu3 should not be sleeping
> [ 131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

I assume it's due to the affinity break, just that we don't know right now
on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
the others are offline already, but ....

I'll think about it how we can figure out what's going on.

Thanks,

tglx

2017-12-06 12:31:04

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

Op 06-12-17 om 13:15 schreef Michal Hocko:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
>> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
>>> systems I have tested, so it is probably safe to assume it to be
>>> broken everywhere.
>> Oh, it's definitely not broken everywhere, because I use it myself,
>> and was traveling last week due to my mom's bday.
>>
>> HOWEVER.
>>
>> Some of the x86 work seems to have broken it for some configurations.
>> In particular, do you have a big "everything enabled" kernel config -
>> particularly lockdep and irqflags tracing enabled?
>>
>> Andy has a patch, but it hasn't made it to me yet (probably because
>> the x86 people are very busy with the kaiser work):
>>
>> https://lkml.org/lkml/2017/11/30/546
>>
>> (also note his follow-up "fix the commit message" note, but that one
>> doesn't actually affect the code itself).
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [ 131.711412] ACPI: Preparing to enter system sleep state S3
> [ 131.755328] ACPI: EC: event blocked
> [ 131.755328] ACPI: EC: EC stopped
> [ 131.755328] PM: Saving platform NVS memory
> [ 131.755344] Disabling non-boot CPUs ...
> [ 131.779330] IRQ 124: no longer affine to CPU1
> [ 131.780334] smpboot: CPU 1 is now offline
> [ 131.804465] smpboot: CPU 2 is now offline
> [ 131.827291] IRQ 122: no longer affine to CPU3
> [ 131.827292] IRQ 123: no longer affine to CPU3
> [ 131.828293] smpboot: CPU 3 is now offline
> [ 131.830991] ACPI: Low-level resume complete
> [ 131.831092] ACPI: EC: EC started
> [ 131.831093] PM: Restoring platform NVS memory
> [ 131.831864] do_IRQ: 0.55 No irq handler for vector
> [ 131.831884] Enabling non-boot CPUs ...
> [ 131.831909] x86: Booting SMP configuration:
> [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 131.832913] cache: parent cpu1 should not be sleeping
> [ 131.833058] CPU1 is up
> [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [ 131.833864] cache: parent cpu2 should not be sleeping
> [ 131.833983] CPU2 is up
> [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 131.834776] cache: parent cpu3 should not be sleeping
> [ 131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
Symptoms are similar..

~Maarten

2017-12-06 12:46:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>
> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> Symptoms are similar..

Well, the spurious interrupt is one thing, but you obviously lose
interrupts for some reason.

Did you ever manage to get the data out which I asked for?

Thanks,

tglx

2017-12-06 13:09:40

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>> messages?) but the following messages look quite as well. Is this
>>> something known? The system seems to be up and running without any
>>> visible issues.
>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>> Symptoms are similar..
> Well, the spurious interrupt is one thing, but you obviously lose
> interrupts for some reason.
>
> Did you ever manage to get the data out which I asked for?
>
> Thanks,
>
> tglx
>
Yes, sent this out about an hour ago

https://lkml.org/lkml/2017/12/6/215

Cheers,

Maarten

2017-12-06 14:05:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wednesday, December 6, 2017 1:23:34 PM CET Thomas Gleixner wrote:
> On Wed, 6 Dec 2017, Michal Hocko wrote:
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [ 131.711412] ACPI: Preparing to enter system sleep state S3
> > [ 131.755328] ACPI: EC: event blocked
> > [ 131.755328] ACPI: EC: EC stopped
> > [ 131.755328] PM: Saving platform NVS memory
> > [ 131.755344] Disabling non-boot CPUs ...
> > [ 131.779330] IRQ 124: no longer affine to CPU1
> > [ 131.780334] smpboot: CPU 1 is now offline
> > [ 131.804465] smpboot: CPU 2 is now offline
> > [ 131.827291] IRQ 122: no longer affine to CPU3
> > [ 131.827292] IRQ 123: no longer affine to CPU3
> > [ 131.828293] smpboot: CPU 3 is now offline
> > [ 131.830991] ACPI: Low-level resume complete
> > [ 131.831092] ACPI: EC: EC started
> > [ 131.831093] PM: Restoring platform NVS memory
> > [ 131.831864] do_IRQ: 0.55 No irq handler for vector
>
> Hmm, that's really odd.
>
> > [ 131.831884] Enabling non-boot CPUs ...
> > [ 131.831909] x86: Booting SMP configuration:
> > [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [ 131.832913] cache: parent cpu1 should not be sleeping
>
> This is an old one.
>
> > [ 131.833058] CPU1 is up
> > [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [ 131.833864] cache: parent cpu2 should not be sleeping
> > [ 131.833983] CPU2 is up
> > [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [ 131.834776] cache: parent cpu3 should not be sleeping
> > [ 131.834923] CPU3 is up
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>
> I assume it's due to the affinity break, just that we don't know right now
> on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
> the others are offline already, but ....

This is resume from S3, so the firmware might do something odd to the other
CPUs, but in case it didn't (which is quite likely or we would have seen more
of these messages), they are offline and in mwait_play_dead(), so IMO it is
safe to assume that this was CPU0.

And this appears to have happened at the atch_suspend_enable_irqs() time,
which is just local_irq_enable() on x86 running on CPU0.

> I'll think about it how we can figure out what's going on.

It looks like an interrupt that have triggered right after we've enabled
interrupts on the boot CPU.

Thanks,
Rafael

2017-12-06 14:15:09

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>> messages?) but the following messages look quite as well. Is this
> >>> something known? The system seems to be up and running without any
> >>> visible issues.
> >> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >> Symptoms are similar..
> > Well, the spurious interrupt is one thing, but you obviously lose
> > interrupts for some reason.
> >
> > Did you ever manage to get the data out which I asked for?
> >
> > Thanks,
> >
> > tglx
> >
> Yes, sent this out about an hour ago
>
> https://lkml.org/lkml/2017/12/6/215

Weird. Did not reach me

2017-12-07 07:55:11

by Michal Hocko

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> >
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> >
> > HOWEVER.
> >
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> >
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):
> >
> > https://lkml.org/lkml/2017/11/30/546
> >
> > (also note his follow-up "fix the commit message" note, but that one
> > doesn't actually affect the code itself).
>
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [ 131.711412] ACPI: Preparing to enter system sleep state S3
> [ 131.755328] ACPI: EC: event blocked
> [ 131.755328] ACPI: EC: EC stopped
> [ 131.755328] PM: Saving platform NVS memory
> [ 131.755344] Disabling non-boot CPUs ...
> [ 131.779330] IRQ 124: no longer affine to CPU1
> [ 131.780334] smpboot: CPU 1 is now offline
> [ 131.804465] smpboot: CPU 2 is now offline
> [ 131.827291] IRQ 122: no longer affine to CPU3
> [ 131.827292] IRQ 123: no longer affine to CPU3
> [ 131.828293] smpboot: CPU 3 is now offline
> [ 131.830991] ACPI: Low-level resume complete
> [ 131.831092] ACPI: EC: EC started
> [ 131.831093] PM: Restoring platform NVS memory
> [ 131.831864] do_IRQ: 0.55 No irq handler for vector
> [ 131.831884] Enabling non-boot CPUs ...
> [ 131.831909] x86: Booting SMP configuration:
> [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 131.832913] cache: parent cpu1 should not be sleeping
> [ 131.833058] CPU1 is up
> [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [ 131.833864] cache: parent cpu2 should not be sleeping
> [ 131.833983] CPU2 is up
> [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 131.834776] cache: parent cpu3 should not be sleeping
> [ 131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

Hmm, there is still something bad going on during resume. My laptop
haven't woken up from s2ram this morning. The screen was powered on
but the system hasn't come up.

The last thing that made it into the kernel log on fs is this

Dec 6 19:32:29 tiehlicka kernel: [21898.084685] PM: suspend entry (deep)

which won't tell us much I suspect. I've tried dozen s2ram cycles and it
hasn't reproduced so it smells like a timing issue.
--
Michal Hocko
SUSE Labs

2017-12-07 13:33:24

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
>>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>>>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>>>> messages?) but the following messages look quite as well. Is this
>>>>> something known? The system seems to be up and running without any
>>>>> visible issues.
>>>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>>>> Symptoms are similar..
>>> Well, the spurious interrupt is one thing, but you obviously lose
>>> interrupts for some reason.
>>>
>>> Did you ever manage to get the data out which I asked for?
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>> Yes, sent this out about an hour ago
>>
>> https://lkml.org/lkml/2017/12/6/215
> Weird. Did not reach me
>
But do you have any idea?

~Maarten

2017-12-08 10:30:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, 7 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> >>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >>>> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>>>> messages?) but the following messages look quite as well. Is this
> >>>>> something known? The system seems to be up and running without any
> >>>>> visible issues.
> >>>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >>>> Symptoms are similar..
> >>> Well, the spurious interrupt is one thing, but you obviously lose
> >>> interrupts for some reason.
> >>>
> >>> Did you ever manage to get the data out which I asked for?
> >>>
> >>> Thanks,
> >>>
> >>> tglx
> >>>
> >> Yes, sent this out about an hour ago
> >>
> >> https://lkml.org/lkml/2017/12/6/215
> > Weird. Did not reach me
> >
> But do you have any idea?

Can you please provide the full trace, dmesg and the full output of
.../debug/irq/... ?

Thanks,

tglx

2017-12-09 10:33:30

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > >
> > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > > >
> > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > broken everywhere.
> > > >
> > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > and was traveling last week due to my mom's bday.
> > > >
> > > > HOWEVER.
> > > >
> > > > Some of the x86 work seems to have broken it for some configurations.
> > > > In particular, do you have a big "everything enabled" kernel config -
> > > > particularly lockdep and irqflags tracing enabled?
> > > >
> > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > the x86 people are very busy with the kaiser work):
> >
> > This definitely fixes the problem at least on one of the affected machines.
>
> I can confirm that the Andy's patch fixes it on all systems that had this
> issue here.

I believe I have the issue here, too (-next on thinkpad x60). Which
patch is expected to fix it? Let me try recent -next...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.49 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-12-09 11:41:38

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Sat 2017-12-09 11:33:25, Pavel Machek wrote:
> On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > > >
> > > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > > > >
> > > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > > broken everywhere.
> > > > >
> > > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > > and was traveling last week due to my mom's bday.
> > > > >
> > > > > HOWEVER.
> > > > >
> > > > > Some of the x86 work seems to have broken it for some configurations.
> > > > > In particular, do you have a big "everything enabled" kernel config -
> > > > > particularly lockdep and irqflags tracing enabled?
> > > > >
> > > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > > the x86 people are very busy with the kaiser work):
> > >
> > > This definitely fixes the problem at least on one of the affected machines.
> >
> > I can confirm that the Andy's patch fixes it on all systems that had this
> > issue here.
>
> I believe I have the issue here, too (-next on thinkpad x60). Which
> patch is expected to fix it? Let me try recent -next...

Still there AFAICT.
Pavel


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.61 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2017-12-10 20:30:54

by Michal Hocko

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu 07-12-17 08:55:08, Michal Hocko wrote:
> On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> > On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > >
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > >
> > > HOWEVER.
> > >
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > >
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
> > >
> > > https://lkml.org/lkml/2017/11/30/546
> > >
> > > (also note his follow-up "fix the commit message" note, but that one
> > > doesn't actually affect the code itself).
> >
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [ 131.711412] ACPI: Preparing to enter system sleep state S3
> > [ 131.755328] ACPI: EC: event blocked
> > [ 131.755328] ACPI: EC: EC stopped
> > [ 131.755328] PM: Saving platform NVS memory
> > [ 131.755344] Disabling non-boot CPUs ...
> > [ 131.779330] IRQ 124: no longer affine to CPU1
> > [ 131.780334] smpboot: CPU 1 is now offline
> > [ 131.804465] smpboot: CPU 2 is now offline
> > [ 131.827291] IRQ 122: no longer affine to CPU3
> > [ 131.827292] IRQ 123: no longer affine to CPU3
> > [ 131.828293] smpboot: CPU 3 is now offline
> > [ 131.830991] ACPI: Low-level resume complete
> > [ 131.831092] ACPI: EC: EC started
> > [ 131.831093] PM: Restoring platform NVS memory
> > [ 131.831864] do_IRQ: 0.55 No irq handler for vector
> > [ 131.831884] Enabling non-boot CPUs ...
> > [ 131.831909] x86: Booting SMP configuration:
> > [ 131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [ 131.832913] cache: parent cpu1 should not be sleeping
> > [ 131.833058] CPU1 is up
> > [ 131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [ 131.833864] cache: parent cpu2 should not be sleeping
> > [ 131.833983] CPU2 is up
> > [ 131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [ 131.834776] cache: parent cpu3 should not be sleeping
> > [ 131.834923] CPU3 is up
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>
> Hmm, there is still something bad going on during resume. My laptop
> haven't woken up from s2ram this morning. The screen was powered on
> but the system hasn't come up.

It's been few days and I haven't seen this problem again. And I am doing
s2ram all the time...
--
Michal Hocko
SUSE Labs

2017-12-13 15:58:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

So I was finally able to figure out what the hell is going on:

Suspend:

- The device suspend code puts the graphics card into a power
state != PCI_D0.

- Offline non boot CPUs

- Break interrupt affinity. Allocate new vector on CPU 0, compose and
write MSI message which ends up in:

__pci_write_msi_msg(entry, msg)
{
if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
/* Don't touch the hardware now */
} else {
....
}
entry->msg = *msg;
}

So because the device is not in PCI_D0 the message is not written. It's
written in the device resume path.

Resume:
[ 139.670446] ACPI: Low-level resume complete
[ 139.670541] PM: Restoring platform NVS memory
[ 139.672462] do_IRQ: 0.55 No irq handler for vector
[ 139.672475] Enabling non-boot CPUs ...

So the spurious interrupt happens early and way before the device resume
code writes the new MSI message.

I checked the behaviour on 4.14. The MSI write is delayed there in the same
way, but there is no spurious interrupt. There is no interrupt coming in at
all _BEFORE_ the device is put out of PCI_D0.

And this has certainly nothing to do with the vector management changes,
but I can't figure yet what makes that spurious interrupt to be sent.

Any ideas welcome.

Thanks,

tglx

2017-12-13 16:23:45

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

[+cc linux-pci, linux-pm]

On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> So I was finally able to figure out what the hell is going on:
>
> Suspend:
>
> - The device suspend code puts the graphics card into a power
> state != PCI_D0.
>
> - Offline non boot CPUs
>
> - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> write MSI message which ends up in:
>
> __pci_write_msi_msg(entry, msg)
> {
> if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> /* Don't touch the hardware now */
> } else {
> ....
> }
> entry->msg = *msg;
> }
>
> So because the device is not in PCI_D0 the message is not written. It's
> written in the device resume path.

I'm not a PM guru, but this ordering seems fragile. If we offline
CPUs before re-targeting interrupts directed at those CPUs, aren't we
always going to be at risk of sending interrupts to an offline CPU?

Even if the device is now asleep and therefore should not generate an
interrupt, it seems like there's a window when the device returns to
PCI_D0 where it could generate an interrupt before we have a chance to
update the MSI message.

> Resume:
> [ 139.670446] ACPI: Low-level resume complete
> [ 139.670541] PM: Restoring platform NVS memory
> [ 139.672462] do_IRQ: 0.55 No irq handler for vector
> [ 139.672475] Enabling non-boot CPUs ...
>
> So the spurious interrupt happens early and way before the device resume
> code writes the new MSI message.
>
> I checked the behaviour on 4.14. The MSI write is delayed there in the same
> way, but there is no spurious interrupt. There is no interrupt coming in at
> all _BEFORE_ the device is put out of PCI_D0.
>
> And this has certainly nothing to do with the vector management changes,
> but I can't figure yet what makes that spurious interrupt to be sent.
>
> Any ideas welcome.
>
> Thanks,
>
> tglx
>

2017-12-13 16:42:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 13 Dec 2017, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-pm]
>
> On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> > So I was finally able to figure out what the hell is going on:
> >
> > Suspend:
> >
> > - The device suspend code puts the graphics card into a power
> > state != PCI_D0.
> >
> > - Offline non boot CPUs
> >
> > - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> > write MSI message which ends up in:
> >
> > __pci_write_msi_msg(entry, msg)
> > {
> > if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> > /* Don't touch the hardware now */
> > } else {
> > ....
> > }
> > entry->msg = *msg;
> > }
> >
> > So because the device is not in PCI_D0 the message is not written. It's
> > written in the device resume path.
>
> I'm not a PM guru, but this ordering seems fragile. If we offline
> CPUs before re-targeting interrupts directed at those CPUs, aren't we
> always going to be at risk of sending interrupts to an offline CPU?
>
> Even if the device is now asleep and therefore should not generate an
> interrupt, it seems like there's a window when the device returns to
> PCI_D0 where it could generate an interrupt before we have a chance to
> update the MSI message.

Definitely. That was fragile forever but puzzles me is that I can't figure
out what now causes that spurious interrupt to surface out of the blue.

Thanks,

tglx

2017-12-13 17:45:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
>
> Definitely. That was fragile forever but puzzles me is that I can't figure
> out what now causes that spurious interrupt to surface out of the blue.

Perhaps just timing?

How hard would it be to change the ordering to just redirect irqs first?

Linus

2017-12-13 18:19:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 13 Dec 2017, Linus Torvalds wrote:

> On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> >
> > Definitely. That was fragile forever but puzzles me is that I can't figure
> > out what now causes that spurious interrupt to surface out of the blue.
>
> Perhaps just timing?

That's what I'm trying to figure out right now, because that is the only
sensible explanation left. The whole machinery of suspend is exactly the
same with and without the vector changes. I instrumented all functions
involved and the picture is the same. I even do not see any fundamental
timing differences where one would say: That's it.

What puzzles me even more is that in the range of commits I'm fiddling with
there is no other change than the vector management stuff and the point
where it breaks makes no sense at all. The point Maarten bisected it to
works nicely here, so that might just point to a very subtle timing issue.

> How hard would it be to change the ordering to just redirect irqs first?

The whole interrupt redirection happens when the non boot CPUs are brought
down, which is the very last step before the actual suspend happens.

We could probably do that earlier, but that's something Rafael needs to
answer ultimately.

Thanks,

tglx

2017-12-13 20:52:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> >
> > Perhaps just timing?
>
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
>
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.

After doing more debugging on this it turns out that this looks like a
legacy interrupt coming in. The vector number is always 55, which is legacy
IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
masked and vector 55 is completely unused.

More questions than answers. Still investigating.

Thanks,

tglx

2017-12-13 21:06:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> >
> > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> > > >
> > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > out what now causes that spurious interrupt to surface out of the blue.
> > >
> > > Perhaps just timing?
> >
> > That's what I'm trying to figure out right now, because that is the only
> > sensible explanation left. The whole machinery of suspend is exactly the
> > same with and without the vector changes. I instrumented all functions
> > involved and the picture is the same. I even do not see any fundamental
> > timing differences where one would say: That's it.
> >
> > What puzzles me even more is that in the range of commits I'm fiddling with
> > there is no other change than the vector management stuff and the point
> > where it breaks makes no sense at all. The point Maarten bisected it to
> > works nicely here, so that might just point to a very subtle timing issue.
>
> After doing more debugging on this it turns out that this looks like a
> legacy interrupt coming in. The vector number is always 55, which is legacy
> IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> masked and vector 55 is completely unused.
>
> More questions than answers. Still investigating.

And it does not explain Maartens report which gets a spurious vector 33 on
CPU4 after the non boot cpus have been brought online again. And that's the
vector which was assigned before the affinity was moved by unplugging CPU4.

Hrmpf. Even more mystery to solve.

Thanks,

tglx

2017-12-13 22:40:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> >
> > Perhaps just timing?
>
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
>
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.
>
> > How hard would it be to change the ordering to just redirect irqs first?
>
> The whole interrupt redirection happens when the non boot CPUs are brought
> down, which is the very last step before the actual suspend happens.
>
> We could probably do that earlier, but that's something Rafael needs to
> answer ultimately.

Well, that's both flattering and concerning. ;-)

Anyway, yes, we can do that earlier AFAICS. Action handlers are not going to
run after we've called suspend_device_irqs() which happens before the final
stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
gets the interrupt from that point on (it is either wakeup or unwanted then).

Thanks,
Rafael

2017-12-13 22:49:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wednesday, December 13, 2017 10:06:40 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > >
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > >
> > > > Perhaps just timing?
> > >
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > >
> > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> >
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> >
> > More questions than answers. Still investigating.
>
> And it does not explain Maartens report which gets a spurious vector 33 on
> CPU4 after the non boot cpus have been brought online again. And that's the
> vector which was assigned before the affinity was moved by unplugging CPU4.
>
> Hrmpf. Even more mystery to solve.

Any chance to look at /proc/interrupts from a machine where that can be
reproduced?

I'm also curious if that can be reproduced by doing CPU offline/online
without suspending?

2017-12-13 23:26:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, Dec 13, 2017 at 11:39 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
>> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>>
>> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
>> > >
>> > > Definitely. That was fragile forever but puzzles me is that I can't figure
>> > > out what now causes that spurious interrupt to surface out of the blue.
>> >
>> > Perhaps just timing?
>>
>> That's what I'm trying to figure out right now, because that is the only
>> sensible explanation left. The whole machinery of suspend is exactly the
>> same with and without the vector changes. I instrumented all functions
>> involved and the picture is the same. I even do not see any fundamental
>> timing differences where one would say: That's it.
>>
>> What puzzles me even more is that in the range of commits I'm fiddling with
>> there is no other change than the vector management stuff and the point
>> where it breaks makes no sense at all. The point Maarten bisected it to
>> works nicely here, so that might just point to a very subtle timing issue.
>>
>> > How hard would it be to change the ordering to just redirect irqs first?
>>
>> The whole interrupt redirection happens when the non boot CPUs are brought
>> down, which is the very last step before the actual suspend happens.
>>
>> We could probably do that earlier, but that's something Rafael needs to
>> answer ultimately.
>
> Well, that's both flattering and concerning. ;-)
>
> Anyway, yes, we can do that earlier AFAICS. Action handlers are not going to
> run after we've called suspend_device_irqs() which happens before the final
> stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
> gets the interrupt from that point on (it is either wakeup or unwanted then).

There is a catch that we don't and likely should not do that for
suspend-to-idle, but since we have pm_suspend_target_state now, that
case can be distinguished from the "full suspend" one readily.

Thanks,
Rafael

2017-12-14 11:54:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > >
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > >
> > > > Perhaps just timing?
> > >
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > >
> > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> >
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> >
> > More questions than answers. Still investigating.

At least that one could be explained by the changes. In the previous
management scheme the IOAPIC interrupts were always allocated even when the
interrupt was not in use. The new scheme does not longer do that because
people complained about the vector waste (16 vectors on each CPU) and it
got rid of all the special casing of IRQ0-15.

So the old scheme silently consumed the spurious vector. I added debug code
to that effect to 4.14 and on that machine IRQ7 is triggered at the same
point post resume and the core code drops it silently because the interrupt
is marked masked and no action assigned.

So the only difference to today is that the new code complains, while the
old one does an extra mask of the already masked IOAPIC pin and silently
returns.

After quite some investigation I found out that its independent of the
graphics thing. That's a genuine issue on that platform which seems to emit
random legacy vectors which were never ever used for unknown reasons. I
verified that both the IOAPIC and the PIC are masked, so they cannot send
crap. Though it turned out that the silly firmware unmasks the PIC and
leaves it that way when it returns from suspend. Now there is a race
whether the kernel resume path manages to mask the PIC again early enough
before something triggers IRQ7 or not. Adding/removing debug code makes the
problem come and go. So I really don't worry about that one and rather
prefer to have the spurious interrupt printed than silently consumed by
chance.

Now the graphics issue is a different story. That only happens on
hibernation after doing the snapshot. There all non boot cpus are onlined
again and after that the devices are 'thawed'. The following reenable of
interrupts fails because i915 is not in PCI_D0 state.

Suspend:

irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written <- Device not in PCI_D0
....
device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
pci_pm_resume_noirq <-dpm_run_callback
pci_pm_resume_noirq <-dpm_run_callback
pci_pm_default_resume_early <-pci_pm_resume_noirq
pci_pm_default_resume_early <-pci_pm_resume_noirq
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
device_pm_callback_end: i915 0000:00:02.0, err=0

Hibernate:

irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written <- Device not in PCI_D0
....
device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
pci_pm_thaw_noirq <-dpm_run_callback
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written <--- Device is not in PCI_D0
device_pm_callback_end: i915 0000:00:02.0, err=0

So that code path fails to set the new affinity because at the point where
the MSI msg should be written the device state is != PCI_D0.

Now, what's different vs. 4.14:

The 4.14 code accidentaly had the irq descriptor for this vector still
populated in the old CPU due to the convoluted way the vector allocation
worked. I have still to investigate if one of those cases is actually
leaking the descriptor, which would be a fatal bug.

But the new code does a proper cleanup and does not repopulate it on the
offline CPU. So that unearthes the issue. I'm handing that over to the PM
folks to look at. I got lost in that maze of callbacks.

Thanks,

tglx

2017-12-14 12:13:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > >
> > > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <[email protected]> wrote:
> > > > > >
> > > > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > > > out what now causes that spurious interrupt to surface out of the blue.
> > > > >
> > > > > Perhaps just timing?
> > > >
> > > > That's what I'm trying to figure out right now, because that is the only
> > > > sensible explanation left. The whole machinery of suspend is exactly the
> > > > same with and without the vector changes. I instrumented all functions
> > > > involved and the picture is the same. I even do not see any fundamental
> > > > timing differences where one would say: That's it.
> > > >
> > > > What puzzles me even more is that in the range of commits I'm fiddling with
> > > > there is no other change than the vector management stuff and the point
> > > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > > works nicely here, so that might just point to a very subtle timing issue.
> > >
> > > After doing more debugging on this it turns out that this looks like a
> > > legacy interrupt coming in. The vector number is always 55, which is legacy
> > > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > > masked and vector 55 is completely unused.
> > >
> > > More questions than answers. Still investigating.
>
> At least that one could be explained by the changes. In the previous
> management scheme the IOAPIC interrupts were always allocated even when the
> interrupt was not in use. The new scheme does not longer do that because
> people complained about the vector waste (16 vectors on each CPU) and it
> got rid of all the special casing of IRQ0-15.
>
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
>
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.
>
> After quite some investigation I found out that its independent of the
> graphics thing. That's a genuine issue on that platform which seems to emit
> random legacy vectors which were never ever used for unknown reasons. I
> verified that both the IOAPIC and the PIC are masked, so they cannot send
> crap. Though it turned out that the silly firmware unmasks the PIC and
> leaves it that way when it returns from suspend. Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not. Adding/removing debug code makes the
> problem come and go. So I really don't worry about that one and rather
> prefer to have the spurious interrupt printed than silently consumed by
> chance.

OK

> Now the graphics issue is a different story. That only happens on
> hibernation after doing the snapshot. There all non boot cpus are onlined
> again and after that the devices are 'thawed'. The following reenable of
> interrupts fails because i915 is not in PCI_D0 state.
>
> Suspend:
>
> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written <- Device not in PCI_D0
> ....
> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> pci_pm_resume_noirq <-dpm_run_callback
> pci_pm_resume_noirq <-dpm_run_callback
> pci_pm_default_resume_early <-pci_pm_resume_noirq
> pci_pm_default_resume_early <-pci_pm_resume_noirq
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
> device_pm_callback_end: i915 0000:00:02.0, err=0

So this works, because we power up the device during resume even if it
had been suspended (via runtime PM) before the suspend started.

> Hibernate:
>
> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written <- Device not in PCI_D0
> ....
> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> pci_pm_thaw_noirq <-dpm_run_callback
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written <--- Device is not in PCI_D0
> device_pm_callback_end: i915 0000:00:02.0, err=0

And here we try to leave the device alone which is OK for devices in D0,
but not for suspended ones.

It looks like we need to power up them at the "thaw" time too or at least
I don't see how to address that differently.

> So that code path fails to set the new affinity because at the point where
> the MSI msg should be written the device state is != PCI_D0.
>
> Now, what's different vs. 4.14:
>
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.
>
> But the new code does a proper cleanup and does not repopulate it on the
> offline CPU. So that unearthes the issue. I'm handing that over to the PM
> folks to look at. I got lost in that maze of callbacks.

OK, thanks so much for getting to the bottom of this!

Rafael

2017-12-14 12:30:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > Now the graphics issue is a different story. That only happens on
> > hibernation after doing the snapshot. There all non boot cpus are onlined
> > again and after that the devices are 'thawed'. The following reenable of
> > interrupts fails because i915 is not in PCI_D0 state.
> >
> > Suspend:
> >
> > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > ....
> > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> > pci_pm_resume_noirq <-dpm_run_callback
> > pci_pm_resume_noirq <-dpm_run_callback
> > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
> > device_pm_callback_end: i915 0000:00:02.0, err=0
>
> So this works, because we power up the device during resume even if it
> had been suspended (via runtime PM) before the suspend started.
>
> > Hibernate:
> >
> > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > ....
> > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> > pci_pm_thaw_noirq <-dpm_run_callback
> > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > __pci_write_msi_msg: Not written <--- Device is not in PCI_D0
> > device_pm_callback_end: i915 0000:00:02.0, err=0
>
> And here we try to leave the device alone which is OK for devices in D0,
> but not for suspended ones.
>
> It looks like we need to power up them at the "thaw" time too or at least
> I don't see how to address that differently.

The question is whether the code which brings the device out of D0 should
write the message unconditionally. That would be sufficient I think.

Thanks,

tglx

2017-12-14 13:24:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, 14 Dec 2017, Thomas Gleixner wrote:
> Now, what's different vs. 4.14:
>
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.

It doesn't leak. It repopulates it at the same place out of sheer luck.

Thanks,

tglx

2017-12-14 15:31:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thursday, December 14, 2017 1:30:37 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > > Now the graphics issue is a different story. That only happens on
> > > hibernation after doing the snapshot. There all non boot cpus are onlined
> > > again and after that the devices are 'thawed'. The following reenable of
> > > interrupts fails because i915 is not in PCI_D0 state.
> > >
> > > Suspend:
> > >
> > > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > > ....
> > > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [resume]
> > > pci_pm_resume_noirq <-dpm_run_callback
> > > pci_pm_resume_noirq <-dpm_run_callback
> > > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > > pci_pm_default_resume_early <-pci_pm_resume_noirq
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a <-- Set the new affinity
> > > device_pm_callback_end: i915 0000:00:02.0, err=0
> >
> > So this works, because we power up the device during resume even if it
> > had been suspended (via runtime PM) before the suspend started.
> >
> > > Hibernate:
> > >
> > > irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > > __pci_write_msi_msg: Not written <- Device not in PCI_D0
> > > ....
> > > device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> > > pci_pm_thaw_noirq <-dpm_run_callback
> > > __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> > > __pci_write_msi_msg: Not written <--- Device is not in PCI_D0
> > > device_pm_callback_end: i915 0000:00:02.0, err=0
> >
> > And here we try to leave the device alone which is OK for devices in D0,
> > but not for suspended ones.
> >
> > It looks like we need to power up them at the "thaw" time too or at least
> > I don't see how to address that differently.
>
> The question is whether the code which brings the device out of D0 should
> write the message unconditionally. That would be sufficient I think.

It doesn't have to do that.

The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
in fact requires the device to be in D0, so the caller should put it into
D0 instead of trying to "update" its power state.

[Note that the PCI layer doesn't put devices into low-power states during the
hibernation's "freeze" transition, but drivers can legitimately do that in
their "freeze" callbacks which was overlooked in that code and that's what
i915 does.]

So IMO what we need is the change below. I'm going to test it shortly,
but please give it a go too.

---
drivers/pci/pci-driver.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);

- pci_update_current_state(pci_dev, PCI_D0);
+ /*
+ * pci_restore_state() requires the device to be in D0 (because of MSI
+ * restoration among other things), so force it into D0 in case the
+ * driver's "freeze" callbacks put it into a low-power state directly.
+ */
+ pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);

if (drv && drv->pm && drv->pm->thaw_noirq)

2017-12-14 15:52:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> in fact requires the device to be in D0, so the caller should put it into
> D0 instead of trying to "update" its power state.
>
> [Note that the PCI layer doesn't put devices into low-power states during the
> hibernation's "freeze" transition, but drivers can legitimately do that in
> their "freeze" callbacks which was overlooked in that code and that's what
> i915 does.]
>
> So IMO what we need is the change below. I'm going to test it shortly,
> but please give it a go too.

So now this looks more reasonable:

irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
__pci_write_msi_msg: Not written
...
device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
pci_pm_thaw_noirq <-dpm_run_callback
__pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
device_pm_callback_end: i915 0000:00:02.0, err=0
...
resume_irqs: Resume 125
...
irq_handler_entry: irq=125 name=i915

Thanks,

tglx

> ---
> drivers/pci/pci-driver.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> + * pci_restore_state() requires the device to be in D0 (because of MSI
> + * restoration among other things), so force it into D0 in case the
> + * driver's "freeze" callbacks put it into a low-power state directly.
> + */
> + pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
>
>

2017-12-14 15:54:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> > in fact requires the device to be in D0, so the caller should put it into
> > D0 instead of trying to "update" its power state.
> >
> > [Note that the PCI layer doesn't put devices into low-power states during the
> > hibernation's "freeze" transition, but drivers can legitimately do that in
> > their "freeze" callbacks which was overlooked in that code and that's what
> > i915 does.]
> >
> > So IMO what we need is the change below. I'm going to test it shortly,
> > but please give it a go too.
>
> So now this looks more reasonable:
>
> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> __pci_write_msi_msg: Not written
> ...
> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
> pci_pm_thaw_noirq <-dpm_run_callback
> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
> device_pm_callback_end: i915 0000:00:02.0, err=0
> ...
> resume_irqs: Resume 125
> ...
> irq_handler_entry: irq=125 name=i915

Cool.

Let me respin it with a changelog etc then.

Thanks,
Rafael

2017-12-14 16:18:04

by Maarten Lankhorst

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

Op 14-12-17 om 16:54 schreef Rafael J. Wysocki:
> On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
>> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
>>> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
>>> in fact requires the device to be in D0, so the caller should put it into
>>> D0 instead of trying to "update" its power state.
>>>
>>> [Note that the PCI layer doesn't put devices into low-power states during the
>>> hibernation's "freeze" transition, but drivers can legitimately do that in
>>> their "freeze" callbacks which was overlooked in that code and that's what
>>> i915 does.]
>>>
>>> So IMO what we need is the change below. I'm going to test it shortly,
>>> but please give it a go too.
>> So now this looks more reasonable:
>>
>> irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>> __pci_write_msi_msg: Not written
>> ...
>> device_pm_callback_start: i915 0000:00:02.0, parent: pci0000:00, noirq bus [thaw]
>> pci_pm_thaw_noirq <-dpm_run_callback
>> __pci_write_msi_msg: 0000:00:02.0 00000000fee0100c 0000412a
>> device_pm_callback_end: i915 0000:00:02.0, err=0
>> ...
>> resume_irqs: Resume 125
>> ...
>> irq_handler_entry: irq=125 name=i915
> Cool.
>
> Let me respin it with a changelog etc then.
>
> Thanks,
> Rafael
>
>
The machine I was using for reproducing the bug appears to be fixed with this patch, so I now sent
it to intel's trybot for results.

https://patchwork.freedesktop.org/series/35367/

Thanks for looking at the bug!

~Maarten

2017-12-14 19:03:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <[email protected]> wrote:
>
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
>
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.

Great debugging, and it looks like Rafael has a patch that already got
positive testing.

I just wanted to pipe up about that "irq7", because judging from your
email it seems like you think it's a real irq:

> Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not.

..and that's not how the PIC works.

In fact, "legacy irq 7" is the _normal_ and very traditional spurious
interrupt, and it's documented. If the PIC gets an interrupt from
_any_ source, but the interrupt goes away before the PIC gets an
acknowledge from the CPU (and by "acknowledge", I'm not talking about
the explicit software IRQ ACK, I'm talking about the hardware
protocol, between the PIC and the CPU), the PIC will then report irq 7
as the interrupt - regardless of what the original was.

The reason is almost always something like

- CPU interrupts are disabled or masked

- driver does a write to the external hardware that causes an
interrupt to be raised

- CPU doesn't react to the irq due to the disabled/masked nature

- but the driver then does something that masks the interrupt again

- interrupts are enabled/unmasked on the CPU

- CPU now acks the interrupt, but the PIC no longer sees any
interrupt source, so the PIC (that has to reply with *something*)
replies with that documented spurious irq7.

To confuse things further, irq7 is not _exclusively_ the spurious
interrupt, You can definitely put real hardware and connect it to pin7
of the PIC, and get real irq7 reports.

And to confuse things even *more*, this "irq7" thing is per-PIC, and
the PC model obviously has the whole "nested PIC" thing where the
second PIC is connected to irq2 of the first PIC. So there are *two*
different "spurious interrupt" reports, one for each PIC.

Anyway, to avoid this issue, drivers should strive to

(a) actually take the interrupt when doing things that can cause
them, and have the interrupt handler do whatever it is that causes the
interrupt to go away (ie: "normal operation")

(b) if you play games with clearing the source of the interrupt
_without_ taking the interrupt, you should strive to basically mask
the interrupt first.

So to do (b) you can do something like

mask_device_interrupt(dev);
read_from_device_to_synchronize(dev);

instead of (or perhaps _before_) disabling interrupts at a CPU level.
Suspend/resume obviously does tend to play games with these kinds of
things where you are no longer in "normal operation" and you do setup
without having interrupts actually enabled.

Or you can just decide that spurious interrupts are ok, and ignore the
issue. But they *can* be very confusing, and obviously in this case
that confusion then seems to have caused actual problems.

Linus

2017-12-14 22:36:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, 14 Dec 2017, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <[email protected]> wrote:
> I just wanted to pipe up about that "irq7", because judging from your
> email it seems like you think it's a real irq:
>
> > Now there is a race
> > whether the kernel resume path manages to mask the PIC again early enough
> > before something triggers IRQ7 or not.
>
> ..and that's not how the PIC works.
>
> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
> interrupt, and it's documented. If the PIC gets an interrupt from
> _any_ source, but the interrupt goes away before the PIC gets an
> acknowledge from the CPU (and by "acknowledge", I'm not talking about
> the explicit software IRQ ACK, I'm talking about the hardware
> protocol, between the PIC and the CPU), the PIC will then report irq 7
> as the interrupt - regardless of what the original was.
>
> The reason is almost always something like
>
> - CPU interrupts are disabled or masked
>
> - driver does a write to the external hardware that causes an
> interrupt to be raised

Which should be a non issue because _ALL_ PIC irq lines are masked at the
PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
sports similar behaviour the PIC should not ever observe that scenario.

But, because the silly firmware comes out of suspend with all PIC lines
unmasked for whatever reason, the PIC can observe that IRQ being raised and
the CPU not handling it. So yes, I forgot about 7 being magic, but I still
think it's the firmware which causes it by unmasking the PIC irqs.

Thanks,

tglx


2017-12-14 22:47:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner <[email protected]> wrote:
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.

Yes, that sounds quite likely.

Linus

2017-12-15 00:34:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, Dec 14, 2017 at 11:36 PM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 14 Dec 2017, Linus Torvalds wrote:
>> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner <[email protected]> wrote:
>> I just wanted to pipe up about that "irq7", because judging from your
>> email it seems like you think it's a real irq:
>>
>> > Now there is a race
>> > whether the kernel resume path manages to mask the PIC again early enough
>> > before something triggers IRQ7 or not.
>>
>> ..and that's not how the PIC works.
>>
>> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
>> interrupt, and it's documented. If the PIC gets an interrupt from
>> _any_ source, but the interrupt goes away before the PIC gets an
>> acknowledge from the CPU (and by "acknowledge", I'm not talking about
>> the explicit software IRQ ACK, I'm talking about the hardware
>> protocol, between the PIC and the CPU), the PIC will then report irq 7
>> as the interrupt - regardless of what the original was.
>>
>> The reason is almost always something like
>>
>> - CPU interrupts are disabled or masked
>>
>> - driver does a write to the external hardware that causes an
>> interrupt to be raised
>
> Which should be a non issue because _ALL_ PIC irq lines are masked at the
> PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
> sports similar behaviour the PIC should not ever observe that scenario.
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.

That's my understanding too.

Thanks,
Rafael

2017-12-15 02:08:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()

From: Rafael J. Wysocki <[email protected]>

It is incorrect to call pci_restore_state() for devices in low-power
states (D1-D3), as that involves the restoration of MSI setup which
requires MMIO to be operational and that is only the case in D0.

However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
callbacks put the device into a low-power state, so fix it by making
it force devices into D0 via pci_set_power_state() instead of trying
to "update" their power state which is pointless.

Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
Cc: 4.13+ <[email protected]> # 4.13+
Reported-by: Thomas Gleixner <[email protected]>
Reported-by: Maarten Lankhorst <[email protected]>
Tested-by: Thomas Gleixner <[email protected]>
Tested-by: Maarten Lankhorst <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---

The bug is not as old as I thought, actually.

Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
forever, but it started to be problematic in 4.13, when we started
to call pci_restore_state() in addition to it to fix another issue.

---
drivers/pci/pci-driver.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);

- pci_update_current_state(pci_dev, PCI_D0);
+ /*
+ * pci_restore_state() requires the device to be in D0 (because of MSI
+ * restoration among other things), so force it into D0 in case the
+ * driver's "freeze" callbacks put it into a low-power state directly.
+ */
+ pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);

if (drv && drv->pm && drv->pm->thaw_noirq)

2017-12-15 09:05:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3

On Thu, 14 Dec 2017, Linus Torvalds wrote:

> On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner <[email protected]> wrote:
> >
> > But, because the silly firmware comes out of suspend with all PIC lines
> > unmasked for whatever reason, the PIC can observe that IRQ being raised and
> > the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> > think it's the firmware which causes it by unmasking the PIC irqs.
>
> Yes, that sounds quite likely.

And just for the record I was able to figure out which interrupt comes in
and goes away again. It's the only level triggered interrupt, which is the
ACPI interrupt.....

Thanks,

tglx

2017-12-15 14:28:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()

On Fri, Dec 15, 2017 at 3:07 AM, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> It is incorrect to call pci_restore_state() for devices in low-power
> states (D1-D3), as that involves the restoration of MSI setup which
> requires MMIO to be operational and that is only the case in D0.
>
> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
> callbacks put the device into a low-power state, so fix it by making
> it force devices into D0 via pci_set_power_state() instead of trying
> to "update" their power state which is pointless.
>
> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
> Cc: 4.13+ <[email protected]> # 4.13+
> Reported-by: Thomas Gleixner <[email protected]>
> Reported-by: Maarten Lankhorst <[email protected]>
> Tested-by: Thomas Gleixner <[email protected]>
> Tested-by: Maarten Lankhorst <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> The bug is not as old as I thought, actually.
>
> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
> forever, but it started to be problematic in 4.13, when we started
> to call pci_restore_state() in addition to it to fix another issue.

Bjorn, any concerns about this one?

> ---
> drivers/pci/pci-driver.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> + * pci_restore_state() requires the device to be in D0 (because of MSI
> + * restoration among other things), so force it into D0 in case the
> + * driver's "freeze" callbacks put it into a low-power state directly.
> + */
> + pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
>

2017-12-15 18:31:03

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()

On Fri, Dec 15, 2017 at 03:07:18AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> It is incorrect to call pci_restore_state() for devices in low-power
> states (D1-D3), as that involves the restoration of MSI setup which
> requires MMIO to be operational and that is only the case in D0.
>
> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
> callbacks put the device into a low-power state, so fix it by making
> it force devices into D0 via pci_set_power_state() instead of trying
> to "update" their power state which is pointless.
>
> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
> Cc: 4.13+ <[email protected]> # 4.13+
> Reported-by: Thomas Gleixner <[email protected]>
> Reported-by: Maarten Lankhorst <[email protected]>
> Tested-by: Thomas Gleixner <[email protected]>
> Tested-by: Maarten Lankhorst <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

Acked-by: Bjorn Helgaas <[email protected]>

Let me know if you want me to take this. I don't have anything
currently queued up that touches pci-driver.c, so I'm happy if you
take it yourself.

> ---
>
> The bug is not as old as I thought, actually.
>
> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
> forever, but it started to be problematic in 4.13, when we started
> to call pci_restore_state() in addition to it to fix another issue.
>
> ---
> drivers/pci/pci-driver.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_resume_early(dev);
>
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> + * pci_restore_state() requires the device to be in D0 (because of MSI
> + * restoration among other things), so force it into D0 in case the
> + * driver's "freeze" callbacks put it into a low-power state directly.
> + */
> + pci_set_power_state(pci_dev, PCI_D0);
> pci_restore_state(pci_dev);
>
> if (drv && drv->pm && drv->pm->thaw_noirq)
>

2017-12-15 23:44:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()

On Fri, Dec 15, 2017 at 7:30 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, Dec 15, 2017 at 03:07:18AM +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <[email protected]>
>>
>> It is incorrect to call pci_restore_state() for devices in low-power
>> states (D1-D3), as that involves the restoration of MSI setup which
>> requires MMIO to be operational and that is only the case in D0.
>>
>> However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
>> callbacks put the device into a low-power state, so fix it by making
>> it force devices into D0 via pci_set_power_state() instead of trying
>> to "update" their power state which is pointless.
>>
>> Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
>> Cc: 4.13+ <[email protected]> # 4.13+
>> Reported-by: Thomas Gleixner <[email protected]>
>> Reported-by: Maarten Lankhorst <[email protected]>
>> Tested-by: Thomas Gleixner <[email protected]>
>> Tested-by: Maarten Lankhorst <[email protected]>
>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> Acked-by: Bjorn Helgaas <[email protected]>
>
> Let me know if you want me to take this. I don't have anything
> currently queued up that touches pci-driver.c, so I'm happy if you
> take it yourself.

I will take it.

Depending of what Yu finds, we may need an additional fix to make the
Purley system work.

>> ---
>>
>> The bug is not as old as I thought, actually.
>>
>> Yes, we did the pci_update_current_state() in pci_pm_thaw_noirq()
>> forever, but it started to be problematic in 4.13, when we started
>> to call pci_restore_state() in addition to it to fix another issue.
>>
>> ---
>> drivers/pci/pci-driver.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> Index: linux-pm/drivers/pci/pci-driver.c
>> ===================================================================
>> --- linux-pm.orig/drivers/pci/pci-driver.c
>> +++ linux-pm/drivers/pci/pci-driver.c
>> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>> if (pci_has_legacy_pm_support(pci_dev))
>> return pci_legacy_resume_early(dev);
>>
>> - pci_update_current_state(pci_dev, PCI_D0);
>> + /*
>> + * pci_restore_state() requires the device to be in D0 (because of MSI
>> + * restoration among other things), so force it into D0 in case the
>> + * driver's "freeze" callbacks put it into a low-power state directly.
>> + */
>> + pci_set_power_state(pci_dev, PCI_D0);
>> pci_restore_state(pci_dev);
>>
>> if (drv && drv->pm && drv->pm->thaw_noirq)
>>

2018-02-21 19:14:58

by Eugene Syromiatnikov

[permalink] [raw]
Subject: Re: Linux 4.15-rc2

On Sun, Dec 03, 2017 at 11:22:56AM -0500, Linus Torvalds wrote:
>
> Linus Torvalds (6):
> Rename superblock flags (MS_xyz -> SB_xyz)

This commit, while claims that it changes internal flags, also touches
an UAPI header (include/uapi/linux/bfs_fs.h), specifically, the macro
BFS_UNCLEAN. I expect that either this macro should be in a private
header, or (if this check is expected to be available to the userspace)
at least SB_RDONLY should be also defined there.