2016-12-26 23:09:30

by Sedat Dilek

[permalink] [raw]
Subject: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

[ Add some pm | i915 | x86 folks ]

Hi,

I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
and I see some call-traces.
It is reproducible on suspend and resume.

I cannot say which area touches the problem or if these are several
independent problems.

For a full dmesg-log see attachments (my linux-config is attached, too).

Here some hunks...

[ 29.003601] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:1032
[ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
[ 29.003610] 1 lock held by Xorg/1469:
[ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
[<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
[ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
4.10.0-rc1-1-iniza-small #1
[ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 29.003656] Call Trace:
[ 29.003663] dump_stack+0x85/0xc2
[ 29.003666] ___might_sleep+0x196/0x260
[ 29.003668] __might_sleep+0x53/0xb0
[ 29.003671] __pm_runtime_resume+0x7a/0x90
[ 29.003691] intel_runtime_pm_get+0x25/0x90 [i915]
[ 29.003711] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
[ 29.003733] i915_vma_bind+0xaf/0x1e0 [i915]
[ 29.003752] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
[ 29.003755] ? find_get_entry+0x5/0x240
[ 29.003774] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
[ 29.003796] ? __i915_vma_do_pin+0x334/0x590 [i915]
[ 29.003815] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
[ 29.003833] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
[ 29.003851] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
[ 29.003854] ? __might_fault+0x4e/0xb0
[ 29.003872] i915_gem_execbuffer2+0xc5/0x260 [i915]
[ 29.003873] ? __might_fault+0x4e/0xb0
[ 29.003888] drm_ioctl+0x206/0x450 [drm]
[ 29.003913] ? i915_gem_execbuffer+0x340/0x340 [i915]
[ 29.003918] ? __fget+0x5/0x200
[ 29.003922] do_vfs_ioctl+0x91/0x6f0
[ 29.003925] ? __fget+0x111/0x200
[ 29.003926] ? __fget+0x5/0x200
[ 29.003929] SyS_ioctl+0x79/0x90
[ 29.003934] entry_SYSCALL_64_fastpath+0x23/0xc6
[ 29.003936] RIP: 0033:0x7fb9e09e7bb7
[ 29.003938] RSP: 002b:00007ffe2dba2ea8 EFLAGS: 00003202 ORIG_RAX:
0000000000000010
[ 29.003941] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb9e09e7bb7
[ 29.003942] RDX: 00007ffe2dba2fa8 RSI: 0000000040406469 RDI: 0000000000000009
[ 29.003944] RBP: 00007ffe2dba2dc0 R08: 0000000000000040 R09: 0101010101010101
[ 29.003945] R10: 0000000000000000 R11: 0000000000003202 R12: 0000000000000008
[ 29.003947] R13: 00000000000000f5 R14: 0000000000000000 R15: 0000000000000000

[ 2846.098249] ------------[ cut here ]------------
[ 2846.098259] WARNING: CPU: 1 PID: 13 at fs/sysfs/group.c:237
sysfs_remove_group+0x8e/0x90
[ 2846.098260] sysfs group 'thermal_throttle' not found for kobject 'cpu1'
[ 2846.098261] Modules linked in: ccm arc4 iwldvm mac80211 i915
snd_hda_codec_hdmi bnep snd_hda_codec_realtek snd_hda_codec_generic
uvcvideo snd_hda_intel snd_hda_codec rfcomm snd_hwdep
videobuf2_vmalloc snd_hda_core i2c_algo_bit videobuf2_memops kvm_intel
drm_kms_helper joydev snd_pcm videobuf2_v4l2 iwlwifi videobuf2_core
syscopyarea kvm snd_seq_midi videodev sysfillrect snd_seq_midi_event
btusb sysimgblt snd_rawmidi btrtl fb_sys_fops snd_seq btbcm drm
snd_timer btintel irqbypass snd_seq_device cfg80211 psmouse bluetooth
snd parport_pc soundcore serio_raw ppdev samsung_laptop lpc_ich wmi
video mac_hid acpi_cpufreq lp parport binfmt_misc hid_generic usbhid
hid r8169 mii
[ 2846.098325] CPU: 1 PID: 13 Comm: cpuhp/1 Tainted: G W
4.10.0-rc1-1-iniza-small #1
[ 2846.098327] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 2846.098329] Call Trace:
[ 2846.098336] dump_stack+0x85/0xc2
[ 2846.098342] __warn+0xd1/0xf0
[ 2846.098347] ? thresh_event_valid+0x80/0x80
[ 2846.098351] warn_slowpath_fmt+0x4f/0x60
[ 2846.098355] ? mutex_unlock+0x12/0x20
[ 2846.098358] ? kernfs_find_and_get_ns+0x4a/0x60
[ 2846.098361] sysfs_remove_group+0x8e/0x90
[ 2846.098366] thermal_throttle_offline+0x1e/0x30
[ 2846.098370] cpuhp_invoke_callback+0x1ff/0x810
[ 2846.098374] ? finish_task_switch+0x6d/0x280
[ 2846.098379] ? virtnet_clean_affinity.isra.30.part.31+0xa0/0xa0
[ 2846.098383] cpuhp_down_callbacks+0x42/0x80
[ 2846.098387] cpuhp_thread_fun+0x93/0x100
[ 2846.098389] smpboot_thread_fn+0x17e/0x260
[ 2846.098394] kthread+0x128/0x160
[ 2846.098396] ? sort_range+0x30/0x30
[ 2846.098399] ? kthread_create_on_node+0x40/0x40
[ 2846.098402] ? kthread_create_on_node+0x40/0x40
[ 2846.098405] ret_from_fork+0x2a/0x40
[ 2846.098409] ---[ end trace 93de6a721f4e1f64 ]---

[ 2846.471779] ------------[ cut here ]------------
[ 2846.471799] WARNING: CPU: 3 PID: 3475 at
drivers/gpu/drm/drm_irq.c:1237 drm_wait_one_vblank+0x14f/0x1a0 [drm]
[ 2846.471800] vblank wait timed out on crtc 0
[ 2846.471801] Modules linked in: ccm arc4 iwldvm mac80211 i915
snd_hda_codec_hdmi bnep snd_hda_codec_realtek snd_hda_codec_generic
uvcvideo snd_hda_intel snd_hda_codec rfcomm snd_hwdep
videobuf2_vmalloc snd_hda_core i2c_algo_bit videobuf2_memops kvm_intel
drm_kms_helper joydev snd_pcm videobuf2_v4l2 iwlwifi videobuf2_core
syscopyarea kvm snd_seq_midi videodev sysfillrect snd_seq_midi_event
btusb sysimgblt snd_rawmidi btrtl fb_sys_fops snd_seq btbcm drm
snd_timer btintel irqbypass snd_seq_device cfg80211 psmouse bluetooth
snd parport_pc soundcore serio_raw ppdev samsung_laptop lpc_ich wmi
video mac_hid acpi_cpufreq lp parport binfmt_misc hid_generic usbhid
hid r8169 mii
[ 2846.471852] CPU: 3 PID: 3475 Comm: kworker/u16:55 Tainted: G
W 4.10.0-rc1-1-iniza-small #1
[ 2846.471853] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 2846.471857] Workqueue: events_unbound async_run_entry_fn
[ 2846.471858] Call Trace:
[ 2846.471864] dump_stack+0x85/0xc2
[ 2846.471868] __warn+0xd1/0xf0
[ 2846.471870] warn_slowpath_fmt+0x4f/0x60
[ 2846.471873] ? finish_wait+0x6a/0x80
[ 2846.471884] drm_wait_one_vblank+0x14f/0x1a0 [drm]
[ 2846.471886] ? wake_up_atomic_t+0x30/0x30
[ 2846.471922] ironlake_crtc_enable+0x782/0xc10 [i915]
[ 2846.471950] intel_update_crtc+0x5a/0x100 [i915]
[ 2846.471975] intel_update_crtcs+0x63/0x80 [i915]
[ 2846.471999] intel_atomic_commit_tail+0x346/0x1da0 [i915]
[ 2846.472002] ? mark_held_locks+0x6d/0x90
[ 2846.472005] ? _raw_spin_unlock_irqrestore+0x36/0x60
[ 2846.472029] intel_atomic_commit+0x48e/0x570 [i915]
[ 2846.472044] drm_atomic_commit+0x4b/0x50 [drm]
[ 2846.472070] __intel_display_resume+0x6a/0xb0 [i915]
[ 2846.472094] intel_display_resume+0xe3/0x110 [i915]
[ 2846.472098] ? pci_pm_thaw+0x90/0x90
[ 2846.472116] i915_drm_resume+0xe0/0x180 [i915]
[ 2846.472134] i915_pm_resume+0x20/0x30 [i915]
[ 2846.472137] pci_pm_resume+0x64/0xa0
[ 2846.472139] dpm_run_callback+0x95/0x2a0
[ 2846.472142] device_resume+0x87/0x190
[ 2846.472144] async_resume+0x1d/0x50
[ 2846.472145] async_run_entry_fn+0x37/0xe0
[ 2846.472148] process_one_work+0x1d3/0x690
[ 2846.472151] ? process_one_work+0x154/0x690
[ 2846.472154] worker_thread+0x69/0x4c0
[ 2846.472156] kthread+0x128/0x160
[ 2846.472159] ? process_one_work+0x690/0x690
[ 2846.472161] ? kthread_create_on_node+0x40/0x40
[ 2846.472163] ret_from_fork+0x2a/0x40
[ 2846.472166] ---[ end trace 93de6a721f4e1f67 ]---

[ 2846.527767] ------------[ cut here ]------------
[ 2846.527801] WARNING: CPU: 3 PID: 3475 at
drivers/gpu/drm/i915/intel_display.c:14182
intel_atomic_commit_tail+0x1d2e/0x1da0 [i915]
[ 2846.527803] pipe A vblank wait timed out
[ 2846.527803] Modules linked in: ccm arc4 iwldvm mac80211 i915
snd_hda_codec_hdmi bnep snd_hda_codec_realtek snd_hda_codec_generic
uvcvideo snd_hda_intel snd_hda_codec rfcomm snd_hwdep
videobuf2_vmalloc snd_hda_core i2c_algo_bit videobuf2_memops kvm_intel
drm_kms_helper joydev snd_pcm videobuf2_v4l2 iwlwifi videobuf2_core
syscopyarea kvm snd_seq_midi videodev sysfillrect snd_seq_midi_event
btusb sysimgblt snd_rawmidi btrtl fb_sys_fops snd_seq btbcm drm
snd_timer btintel irqbypass snd_seq_device cfg80211 psmouse bluetooth
snd parport_pc soundcore serio_raw ppdev samsung_laptop lpc_ich wmi
video mac_hid acpi_cpufreq lp parport binfmt_misc hid_generic usbhid
hid r8169 mii
[ 2846.527848] CPU: 3 PID: 3475 Comm: kworker/u16:55 Tainted: G
W 4.10.0-rc1-1-iniza-small #1
[ 2846.527849] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 2846.527852] Workqueue: events_unbound async_run_entry_fn
[ 2846.527853] Call Trace:
[ 2846.527857] dump_stack+0x85/0xc2
[ 2846.527860] __warn+0xd1/0xf0
[ 2846.527862] warn_slowpath_fmt+0x4f/0x60
[ 2846.527865] ? finish_wait+0x6a/0x80
[ 2846.527893] intel_atomic_commit_tail+0x1d2e/0x1da0 [i915]
[ 2846.527896] ? mark_held_locks+0x6d/0x90
[ 2846.527898] ? _raw_spin_unlock_irqrestore+0x36/0x60
[ 2846.527901] ? wake_up_atomic_t+0x30/0x30
[ 2846.527926] intel_atomic_commit+0x48e/0x570 [i915]
[ 2846.527941] drm_atomic_commit+0x4b/0x50 [drm]
[ 2846.527968] __intel_display_resume+0x6a/0xb0 [i915]
[ 2846.527993] intel_display_resume+0xe3/0x110 [i915]
[ 2846.527996] ? pci_pm_thaw+0x90/0x90
[ 2846.528014] i915_drm_resume+0xe0/0x180 [i915]
[ 2846.528032] i915_pm_resume+0x20/0x30 [i915]
[ 2846.528035] pci_pm_resume+0x64/0xa0
[ 2846.528037] dpm_run_callback+0x95/0x2a0
[ 2846.528039] device_resume+0x87/0x190
[ 2846.528041] async_resume+0x1d/0x50
[ 2846.528043] async_run_entry_fn+0x37/0xe0
[ 2846.528046] process_one_work+0x1d3/0x690
[ 2846.528048] ? process_one_work+0x154/0x690
[ 2846.528051] worker_thread+0x69/0x4c0
[ 2846.528053] kthread+0x128/0x160
[ 2846.528056] ? process_one_work+0x690/0x690
[ 2846.528058] ? kthread_create_on_node+0x40/0x40
[ 2846.528060] ret_from_fork+0x2a/0x40
[ 2846.528063] ---[ end trace 93de6a721f4e1f68 ]---

If you need further informations, please let me know.

Hope this helps to narrow down the issue(s).

Regards,
- Sedat -


Attachments:
dmesg_4.10.0-rc1-1-iniza-small_after-suspend-resume.txt (78.88 kB)
config-4.10.0-rc1-1-iniza-small (136.62 kB)
Download all attachments

2016-12-27 07:42:57

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Tue, Dec 27, 2016 at 12:09 AM, Sedat Dilek <[email protected]> wrote:
> [ Add some pm | i915 | x86 folks ]
>
> Hi,
>
> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
> and I see some call-traces.
> It is reproducible on suspend and resume.
>
> I cannot say which area touches the problem or if these are several
> independent problems.
>
> For a full dmesg-log see attachments (my linux-config is attached, too).
>
> Here some hunks...
>
[...]

[ cpu/hotplug ]

I got the tglx brainfart patch and it fixes the cpu/hotplug call-trace.

[ pm? | i915? ]

I see this call-trace twice.

After booting and starting into Xorg/unity and...

[ 29.772847] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:1032
[ 29.772853] in_atomic(): 1, irqs_disabled(): 0, pid: 1480, name: Xorg
[ 29.772856] 1 lock held by Xorg/1480:
[ 29.772857] #0: (&dev->struct_mutex){+.+.+.}, at:
[<ffffffffa0571c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
[ 29.772898] CPU: 2 PID: 1480 Comm: Xorg Not tainted
4.10.0-rc1-2-iniza-small #1
[ 29.772899] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 29.772900] Call Trace:
[ 29.772907] dump_stack+0x85/0xc2
[ 29.772910] ___might_sleep+0x196/0x260
[ 29.772912] __might_sleep+0x53/0xb0
[ 29.772915] __pm_runtime_resume+0x7a/0x90
[ 29.772934] intel_runtime_pm_get+0x25/0x90 [i915]
[ 29.772954] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
[ 29.772976] i915_vma_bind+0xaf/0x1e0 [i915]
[ 29.772995] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
[ 29.772997] ? find_get_entry+0x5/0x240
[ 29.773016] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
[ 29.773038] ? __i915_vma_do_pin+0x334/0x590 [i915]
[ 29.773056] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
[ 29.773075] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
[ 29.773101] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
[ 29.773105] ? __might_fault+0x4e/0xb0
[ 29.773132] i915_gem_execbuffer2+0xc5/0x260 [i915]
[ 29.773135] ? __might_fault+0x4e/0xb0
[ 29.773155] drm_ioctl+0x206/0x450 [drm]
[ 29.773182] ? i915_gem_execbuffer+0x340/0x340 [i915]
[ 29.773187] ? __fget+0x5/0x200
[ 29.773191] do_vfs_ioctl+0x91/0x6f0
[ 29.773193] ? __fget+0x111/0x200
[ 29.773195] ? __fget+0x5/0x200
[ 29.773198] SyS_ioctl+0x79/0x90
[ 29.773203] entry_SYSCALL_64_fastpath+0x23/0xc6
[ 29.773205] RIP: 0033:0x7fc6b8986bb7
[ 29.773207] RSP: 002b:00007ffc1133a418 EFLAGS: 00003202 ORIG_RAX:
0000000000000010
[ 29.773210] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007fc6b8986bb7
[ 29.773212] RDX: 00007ffc1133a518 RSI: 0000000040406469 RDI: 0000000000000009
[ 29.773213] RBP: 00007ffc1133a330 R08: 0000000000000040 R09: 0101010101010101
[ 29.773215] R10: 0000000000000000 R11: 0000000000003202 R12: 0000000000000008
[ 29.773216] R13: 00000000000000f5 R14: 0000000000000000 R15: 0000000000000000

After suspend/resume...

[ 153.131294] PM: resume of devices complete after 783.799 msecs
[ 153.133556] Restarting tasks ... done.
[ 154.668836] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:1032
[ 154.668848] in_atomic(): 1, irqs_disabled(): 0, pid: 1480, name: Xorg
[ 154.668854] 1 lock held by Xorg/1480:
[ 154.668856] #0: (&dev->struct_mutex){+.+.+.}, at:
[<ffffffffa0571c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
[ 154.668939] CPU: 2 PID: 1480 Comm: Xorg Tainted: G W
4.10.0-rc1-2-iniza-small #1
[ 154.668942] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 154.668945] Call Trace:
[ 154.668958] dump_stack+0x85/0xc2
[ 154.668965] ___might_sleep+0x196/0x260
[ 154.668970] __might_sleep+0x53/0xb0
[ 154.668976] __pm_runtime_resume+0x7a/0x90
[ 154.669025] intel_runtime_pm_get+0x25/0x90 [i915]
[ 154.669073] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
[ 154.669127] i915_vma_bind+0xaf/0x1e0 [i915]
[ 154.669175] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
[ 154.669181] ? free_hot_cold_page+0x1c1/0x390
[ 154.669227] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
[ 154.669275] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
[ 154.669318] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
[ 154.669362] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
[ 154.669368] ? __might_fault+0x4e/0xb0
[ 154.669412] i915_gem_execbuffer2+0xc5/0x260 [i915]
[ 154.669415] ? __might_fault+0x4e/0xb0
[ 154.669443] drm_ioctl+0x206/0x450 [drm]
[ 154.669486] ? i915_gem_execbuffer+0x340/0x340 [i915]
[ 154.669493] ? __fget+0x5/0x200
[ 154.669498] do_vfs_ioctl+0x91/0x6f0
[ 154.669502] ? __fget+0x111/0x200
[ 154.669505] ? __fget+0x5/0x200
[ 154.669510] SyS_ioctl+0x79/0x90
[ 154.669518] entry_SYSCALL_64_fastpath+0x23/0xc6
[ 154.669522] RIP: 0033:0x7fc6b8986bb7
[ 154.669525] RSP: 002b:00007ffc1133a418 EFLAGS: 00003202 ORIG_RAX:
0000000000000010
[ 154.669530] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc6b8986bb7
[ 154.669532] RDX: 00007ffc1133a518 RSI: 0000000040406469 RDI: 0000000000000009
[ 154.669535] RBP: 00007ffc1133a680 R08: 0000000000000040 R09: 0101010101010101
[ 154.669537] R10: 0025ebd590960000 R11: 0000000000003202 R12: 000055b0264c9260
[ 154.669540] R13: 000055b027899350 R14: 0000000000000000 R15: 00007ffc1133a880

Again I send you my dmesg-log, linux-config and patches on top of
Linux v4.10-rc1.

Thanks.

- Sedat -


Attachments:
dmesg_4.10.0-rc1-2-iniza-small_after-suspend-resume.txt (67.47 kB)
4.10.0-rc1-2-iniza-small.patch (9.08 kB)
config-4.10.0-rc1-2-iniza-small (136.62 kB)
Download all attachments

2016-12-27 10:24:58

by Chris Wilson

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Tue, Dec 27, 2016 at 12:09:22AM +0100, Sedat Dilek wrote:
> [ Add some pm | i915 | x86 folks ]
>
> Hi,
>
> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
> and I see some call-traces.
> It is reproducible on suspend and resume.
>
> I cannot say which area touches the problem or if these are several
> independent problems.
>
> For a full dmesg-log see attachments (my linux-config is attached, too).
>
> Here some hunks...
>
> [ 29.003601] BUG: sleeping function called from invalid context at
> drivers/base/power/runtime.c:1032
> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
> [ 29.003610] 1 lock held by Xorg/1469:
> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
> 4.10.0-rc1-1-iniza-small #1
> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [ 29.003656] Call Trace:
> [ 29.003663] dump_stack+0x85/0xc2
> [ 29.003666] ___might_sleep+0x196/0x260
> [ 29.003668] __might_sleep+0x53/0xb0
> [ 29.003671] __pm_runtime_resume+0x7a/0x90
> [ 29.003691] intel_runtime_pm_get+0x25/0x90 [i915]
> [ 29.003711] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
> [ 29.003733] i915_vma_bind+0xaf/0x1e0 [i915]
> [ 29.003752] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
> [ 29.003755] ? find_get_entry+0x5/0x240
> [ 29.003774] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
> [ 29.003796] ? __i915_vma_do_pin+0x334/0x590 [i915]
> [ 29.003815] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
> [ 29.003833] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
> [ 29.003851] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
> [ 29.003854] ? __might_fault+0x4e/0xb0
> [ 29.003872] i915_gem_execbuffer2+0xc5/0x260 [i915]
> [ 29.003873] ? __might_fault+0x4e/0xb0
> [ 29.003888] drm_ioctl+0x206/0x450 [drm]
> [ 29.003913] ? i915_gem_execbuffer+0x340/0x340 [i915]
> [ 29.003918] ? __fget+0x5/0x200
> [ 29.003922] do_vfs_ioctl+0x91/0x6f0
> [ 29.003925] ? __fget+0x111/0x200
> [ 29.003926] ? __fget+0x5/0x200
> [ 29.003929] SyS_ioctl+0x79/0x90
> [ 29.003934] entry_SYSCALL_64_fastpath+0x23/0xc6
> [ 29.003936] RIP: 0033:0x7fb9e09e7bb7
> [ 29.003938] RSP: 002b:00007ffe2dba2ea8 EFLAGS: 00003202 ORIG_RAX:
> 0000000000000010
> [ 29.003941] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb9e09e7bb7
> [ 29.003942] RDX: 00007ffe2dba2fa8 RSI: 0000000040406469 RDI: 0000000000000009
> [ 29.003944] RBP: 00007ffe2dba2dc0 R08: 0000000000000040 R09: 0101010101010101
> [ 29.003945] R10: 0000000000000000 R11: 0000000000003202 R12: 0000000000000008
> [ 29.003947] R13: 00000000000000f5 R14: 0000000000000000 R15: 0000000000000000

This should be independent of suspend/resume and should be fixed with
https://patchwork.freedesktop.org/patch/116373/

commit ebc0808fa2da0548a78e715858024cb81cd732bc
Author: Chris Wilson <[email protected]>
Date: Tue Oct 18 13:02:51 2016 +0100

drm/i915: Restrict pagefault disabling to just around copy_from_user()

It is in drm-intel-next-fixes, so should be picked up 4.10 in due
course.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2016-12-27 15:10:09

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Tue, Dec 27, 2016 at 10:24:42AM +0000, Chris Wilson wrote:
> On Tue, Dec 27, 2016 at 12:09:22AM +0100, Sedat Dilek wrote:
> > [ Add some pm | i915 | x86 folks ]
> >
> > Hi,
> >
> > I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
> > and I see some call-traces.
> > It is reproducible on suspend and resume.
> >
> > I cannot say which area touches the problem or if these are several
> > independent problems.
> >
> > For a full dmesg-log see attachments (my linux-config is attached, too).
> >
> > Here some hunks...
> >
> > [ 29.003601] BUG: sleeping function called from invalid context at
> > drivers/base/power/runtime.c:1032
> > [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
> > [ 29.003610] 1 lock held by Xorg/1469:
> > [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
> > [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
> > [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
> > 4.10.0-rc1-1-iniza-small #1
> > [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> > [ 29.003656] Call Trace:
> > [ 29.003663] dump_stack+0x85/0xc2
> > [ 29.003666] ___might_sleep+0x196/0x260
> > [ 29.003668] __might_sleep+0x53/0xb0
> > [ 29.003671] __pm_runtime_resume+0x7a/0x90
> > [ 29.003691] intel_runtime_pm_get+0x25/0x90 [i915]
> > [ 29.003711] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
> > [ 29.003733] i915_vma_bind+0xaf/0x1e0 [i915]
> > [ 29.003752] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
> > [ 29.003755] ? find_get_entry+0x5/0x240
> > [ 29.003774] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
> > [ 29.003796] ? __i915_vma_do_pin+0x334/0x590 [i915]
> > [ 29.003815] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
> > [ 29.003833] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
> > [ 29.003851] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
> > [ 29.003854] ? __might_fault+0x4e/0xb0
> > [ 29.003872] i915_gem_execbuffer2+0xc5/0x260 [i915]
> > [ 29.003873] ? __might_fault+0x4e/0xb0
> > [ 29.003888] drm_ioctl+0x206/0x450 [drm]
> > [ 29.003913] ? i915_gem_execbuffer+0x340/0x340 [i915]
> > [ 29.003918] ? __fget+0x5/0x200
> > [ 29.003922] do_vfs_ioctl+0x91/0x6f0
> > [ 29.003925] ? __fget+0x111/0x200
> > [ 29.003926] ? __fget+0x5/0x200
> > [ 29.003929] SyS_ioctl+0x79/0x90
> > [ 29.003934] entry_SYSCALL_64_fastpath+0x23/0xc6
> > [ 29.003936] RIP: 0033:0x7fb9e09e7bb7
> > [ 29.003938] RSP: 002b:00007ffe2dba2ea8 EFLAGS: 00003202 ORIG_RAX:
> > 0000000000000010
> > [ 29.003941] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb9e09e7bb7
> > [ 29.003942] RDX: 00007ffe2dba2fa8 RSI: 0000000040406469 RDI: 0000000000000009
> > [ 29.003944] RBP: 00007ffe2dba2dc0 R08: 0000000000000040 R09: 0101010101010101
> > [ 29.003945] R10: 0000000000000000 R11: 0000000000003202 R12: 0000000000000008
> > [ 29.003947] R13: 00000000000000f5 R14: 0000000000000000 R15: 0000000000000000
>
> This should be independent of suspend/resume and should be fixed with
> https://patchwork.freedesktop.org/patch/116373/
>
> commit ebc0808fa2da0548a78e715858024cb81cd732bc
> Author: Chris Wilson <[email protected]>
> Date: Tue Oct 18 13:02:51 2016 +0100
>
> drm/i915: Restrict pagefault disabling to just around copy_from_user()
>
> It is in drm-intel-next-fixes, so should be picked up 4.10 in due
> course.

Also note that our CI is unhappy with -rc1, and it was not due to i915
patches. So very likely something else is also broken.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2016-12-27 15:55:41

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Tue, Dec 27, 2016 at 4:10 PM, Daniel Vetter <[email protected]> wrote:
> On Tue, Dec 27, 2016 at 10:24:42AM +0000, Chris Wilson wrote:
>> On Tue, Dec 27, 2016 at 12:09:22AM +0100, Sedat Dilek wrote:
>> > [ Add some pm | i915 | x86 folks ]
>> >
>> > Hi,
>> >
>> > I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>> > and I see some call-traces.
>> > It is reproducible on suspend and resume.
>> >
>> > I cannot say which area touches the problem or if these are several
>> > independent problems.
>> >
>> > For a full dmesg-log see attachments (my linux-config is attached, too).
>> >
>> > Here some hunks...
>> >
>> > [ 29.003601] BUG: sleeping function called from invalid context at
>> > drivers/base/power/runtime.c:1032
>> > [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>> > [ 29.003610] 1 lock held by Xorg/1469:
>> > [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>> > [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>> > [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>> > 4.10.0-rc1-1-iniza-small #1
>> > [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> > [ 29.003656] Call Trace:
>> > [ 29.003663] dump_stack+0x85/0xc2
>> > [ 29.003666] ___might_sleep+0x196/0x260
>> > [ 29.003668] __might_sleep+0x53/0xb0
>> > [ 29.003671] __pm_runtime_resume+0x7a/0x90
>> > [ 29.003691] intel_runtime_pm_get+0x25/0x90 [i915]
>> > [ 29.003711] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
>> > [ 29.003733] i915_vma_bind+0xaf/0x1e0 [i915]
>> > [ 29.003752] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
>> > [ 29.003755] ? find_get_entry+0x5/0x240
>> > [ 29.003774] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
>> > [ 29.003796] ? __i915_vma_do_pin+0x334/0x590 [i915]
>> > [ 29.003815] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
>> > [ 29.003833] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
>> > [ 29.003851] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
>> > [ 29.003854] ? __might_fault+0x4e/0xb0
>> > [ 29.003872] i915_gem_execbuffer2+0xc5/0x260 [i915]
>> > [ 29.003873] ? __might_fault+0x4e/0xb0
>> > [ 29.003888] drm_ioctl+0x206/0x450 [drm]
>> > [ 29.003913] ? i915_gem_execbuffer+0x340/0x340 [i915]
>> > [ 29.003918] ? __fget+0x5/0x200
>> > [ 29.003922] do_vfs_ioctl+0x91/0x6f0
>> > [ 29.003925] ? __fget+0x111/0x200
>> > [ 29.003926] ? __fget+0x5/0x200
>> > [ 29.003929] SyS_ioctl+0x79/0x90
>> > [ 29.003934] entry_SYSCALL_64_fastpath+0x23/0xc6
>> > [ 29.003936] RIP: 0033:0x7fb9e09e7bb7
>> > [ 29.003938] RSP: 002b:00007ffe2dba2ea8 EFLAGS: 00003202 ORIG_RAX:
>> > 0000000000000010
>> > [ 29.003941] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb9e09e7bb7
>> > [ 29.003942] RDX: 00007ffe2dba2fa8 RSI: 0000000040406469 RDI: 0000000000000009
>> > [ 29.003944] RBP: 00007ffe2dba2dc0 R08: 0000000000000040 R09: 0101010101010101
>> > [ 29.003945] R10: 0000000000000000 R11: 0000000000003202 R12: 0000000000000008
>> > [ 29.003947] R13: 00000000000000f5 R14: 0000000000000000 R15: 0000000000000000
>>
>> This should be independent of suspend/resume and should be fixed with
>> https://patchwork.freedesktop.org/patch/116373/
>>
>> commit ebc0808fa2da0548a78e715858024cb81cd732bc
>> Author: Chris Wilson <[email protected]>
>> Date: Tue Oct 18 13:02:51 2016 +0100
>>
>> drm/i915: Restrict pagefault disabling to just around copy_from_user()
>>
>> It is in drm-intel-next-fixes, so should be picked up 4.10 in due
>> course.
>
> Also note that our CI is unhappy with -rc1, and it was not due to i915
> patches. So very likely something else is also broken.
>

Can you explain what "CI" means and its function in drm-intel development?

The mentioned patch is 4/4 of a series [1].
The single patch does not apply on top of Linux v4.10-rc1.

1/4 does not apply, etc.

2/4 is already in Linus tree [2].
Can you explain why the other 3 did not got into v4.10-rc1?

So, what is your advise to test Chris' patch?

- Sedat -

[1] https://patchwork.freedesktop.org/series/13950/
[2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4bcbe2a90a1127a6dad72fbda27e77705d9e0f4

2016-12-27 16:09:20

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Tue, Dec 27, 2016 at 4:55 PM, Sedat Dilek <[email protected]> wrote:
> On Tue, Dec 27, 2016 at 4:10 PM, Daniel Vetter <[email protected]> wrote:
>> On Tue, Dec 27, 2016 at 10:24:42AM +0000, Chris Wilson wrote:
>>> On Tue, Dec 27, 2016 at 12:09:22AM +0100, Sedat Dilek wrote:
>>> > [ Add some pm | i915 | x86 folks ]
>>> >
>>> > Hi,
>>> >
>>> > I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>> > and I see some call-traces.
>>> > It is reproducible on suspend and resume.
>>> >
>>> > I cannot say which area touches the problem or if these are several
>>> > independent problems.
>>> >
>>> > For a full dmesg-log see attachments (my linux-config is attached, too).
>>> >
>>> > Here some hunks...
>>> >
>>> > [ 29.003601] BUG: sleeping function called from invalid context at
>>> > drivers/base/power/runtime.c:1032
>>> > [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>> > [ 29.003610] 1 lock held by Xorg/1469:
>>> > [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>> > [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>> > [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>> > 4.10.0-rc1-1-iniza-small #1
>>> > [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>> > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>> > [ 29.003656] Call Trace:
>>> > [ 29.003663] dump_stack+0x85/0xc2
>>> > [ 29.003666] ___might_sleep+0x196/0x260
>>> > [ 29.003668] __might_sleep+0x53/0xb0
>>> > [ 29.003671] __pm_runtime_resume+0x7a/0x90
>>> > [ 29.003691] intel_runtime_pm_get+0x25/0x90 [i915]
>>> > [ 29.003711] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
>>> > [ 29.003733] i915_vma_bind+0xaf/0x1e0 [i915]
>>> > [ 29.003752] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
>>> > [ 29.003755] ? find_get_entry+0x5/0x240
>>> > [ 29.003774] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
>>> > [ 29.003796] ? __i915_vma_do_pin+0x334/0x590 [i915]
>>> > [ 29.003815] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
>>> > [ 29.003833] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
>>> > [ 29.003851] i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
>>> > [ 29.003854] ? __might_fault+0x4e/0xb0
>>> > [ 29.003872] i915_gem_execbuffer2+0xc5/0x260 [i915]
>>> > [ 29.003873] ? __might_fault+0x4e/0xb0
>>> > [ 29.003888] drm_ioctl+0x206/0x450 [drm]
>>> > [ 29.003913] ? i915_gem_execbuffer+0x340/0x340 [i915]
>>> > [ 29.003918] ? __fget+0x5/0x200
>>> > [ 29.003922] do_vfs_ioctl+0x91/0x6f0
>>> > [ 29.003925] ? __fget+0x111/0x200
>>> > [ 29.003926] ? __fget+0x5/0x200
>>> > [ 29.003929] SyS_ioctl+0x79/0x90
>>> > [ 29.003934] entry_SYSCALL_64_fastpath+0x23/0xc6
>>> > [ 29.003936] RIP: 0033:0x7fb9e09e7bb7
>>> > [ 29.003938] RSP: 002b:00007ffe2dba2ea8 EFLAGS: 00003202 ORIG_RAX:
>>> > 0000000000000010
>>> > [ 29.003941] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007fb9e09e7bb7
>>> > [ 29.003942] RDX: 00007ffe2dba2fa8 RSI: 0000000040406469 RDI: 0000000000000009
>>> > [ 29.003944] RBP: 00007ffe2dba2dc0 R08: 0000000000000040 R09: 0101010101010101
>>> > [ 29.003945] R10: 0000000000000000 R11: 0000000000003202 R12: 0000000000000008
>>> > [ 29.003947] R13: 00000000000000f5 R14: 0000000000000000 R15: 0000000000000000
>>>
>>> This should be independent of suspend/resume and should be fixed with
>>> https://patchwork.freedesktop.org/patch/116373/
>>>
>>> commit ebc0808fa2da0548a78e715858024cb81cd732bc
>>> Author: Chris Wilson <[email protected]>
>>> Date: Tue Oct 18 13:02:51 2016 +0100
>>>
>>> drm/i915: Restrict pagefault disabling to just around copy_from_user()
>>>
>>> It is in drm-intel-next-fixes, so should be picked up 4.10 in due
>>> course.
>>
>> Also note that our CI is unhappy with -rc1, and it was not due to i915
>> patches. So very likely something else is also broken.
>>
>
> Can you explain what "CI" means and its function in drm-intel development?
>
> The mentioned patch is 4/4 of a series [1].
> The single patch does not apply on top of Linux v4.10-rc1.
>
> 1/4 does not apply, etc.
>
> 2/4 is already in Linus tree [2].
> Can you explain why the other 3 did not got into v4.10-rc1?
>
> So, what is your advise to test Chris' patch?
>
> - Sedat -
>
> [1] https://patchwork.freedesktop.org/series/13950/
> [2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4bcbe2a90a1127a6dad72fbda27e77705d9e0f4

Linux v4.10-rc1 contains Chris 4/4 patch.

commit ebc0808fa2da0548a78e715858024cb81cd732bc
drm/i915: Restrict pagefault disabling to just around copy_from_user()

/me confused.

- Sedat -

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ebc0808fa2da0548a78e715858024cb81cd732bc

2016-12-27 21:13:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

Hi!

> [ Add some pm | i915 | x86 folks ]
>
> Hi,
>
> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
> and I see some call-traces.
> It is reproducible on suspend and resume.
>
> I cannot say which area touches the problem or if these are several
> independent problems.
>
> For a full dmesg-log see attachments (my linux-config is attached, too).
>
> Here some hunks...
>
> [ 29.003601] BUG: sleeping function called from invalid context at
> drivers/base/power/runtime.c:1032
> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
> [ 29.003610] 1 lock held by Xorg/1469:
> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
> 4.10.0-rc1-1-iniza-small #1
> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [ 29.003656] Call Trace:

Just a note, at least 2 machines here refuse to resume with
v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
common cause...

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.29 kB)
signature.asc (181.00 B)
Digital signature
Download all attachments

2016-12-28 08:07:16

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
> Hi!
>
>> [ Add some pm | i915 | x86 folks ]
>>
>> Hi,
>>
>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>> and I see some call-traces.
>> It is reproducible on suspend and resume.
>>
>> I cannot say which area touches the problem or if these are several
>> independent problems.
>>
>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>
>> Here some hunks...
>>
>> [ 29.003601] BUG: sleeping function called from invalid context at
>> drivers/base/power/runtime.c:1032
>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>> [ 29.003610] 1 lock held by Xorg/1469:
>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>> 4.10.0-rc1-1-iniza-small #1
>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>> [ 29.003656] Call Trace:
>
> Just a note, at least 2 machines here refuse to resume with
> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
> common cause...
>

[ Correct linux-pm ML and add Mika & Jani ]

Thanks for the feedback.

There are some cpu/hotplug fixes post-v4.10-rc1.
Give that a try.

Yesterday, after answers from drm-intel folks I have seen that a
cpu/hotplug commit [1] was reverted in
drm-intel.git#drm-intel-nightly.
I haven't tried that.

It's good when Thomas knows of this and gets in contact with drm-intel folks.

Regards,
- Sedat -

[1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
[2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly

P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"

This reverts commit dc280d93623927570da279e99393879dbbab39e7
Author: Thomas Gleixner <[email protected]>
Date: Wed Dec 21 20:19:49 2016 +0100
cpu/hotplug: Prevent overwriting of callbacks

It started hanging all machines in CI s3 test:
https://intel-gfx-ci.01.org/CI/igt@[email protected]

Bisected-by: Mika Kuoppala <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>

- EOT -

2016-12-28 08:30:46

by Jani Nikula

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>> Hi!
>>
>>> [ Add some pm | i915 | x86 folks ]
>>>
>>> Hi,
>>>
>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>> and I see some call-traces.
>>> It is reproducible on suspend and resume.
>>>
>>> I cannot say which area touches the problem or if these are several
>>> independent problems.
>>>
>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>
>>> Here some hunks...
>>>
>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>> drivers/base/power/runtime.c:1032
>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>> [ 29.003610] 1 lock held by Xorg/1469:
>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>> 4.10.0-rc1-1-iniza-small #1
>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>> [ 29.003656] Call Trace:
>>
>> Just a note, at least 2 machines here refuse to resume with
>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>> common cause...
>>
>
> [ Correct linux-pm ML and add Mika & Jani ]
>
> Thanks for the feedback.
>
> There are some cpu/hotplug fixes post-v4.10-rc1.
> Give that a try.
>
> Yesterday, after answers from drm-intel folks I have seen that a
> cpu/hotplug commit [1] was reverted in
> drm-intel.git#drm-intel-nightly.
> I haven't tried that.
>
> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>
> Regards,
> - Sedat -
>
> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>
> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>
> This reverts commit dc280d93623927570da279e99393879dbbab39e7
> Author: Thomas Gleixner <[email protected]>
> Date: Wed Dec 21 20:19:49 2016 +0100
> cpu/hotplug: Prevent overwriting of callbacks
>
> It started hanging all machines in CI s3 test:
> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>
> Bisected-by: Mika Kuoppala <[email protected]>
> Signed-off-by: Jani Nikula <[email protected]>

Thomas -

Indeed, basically all of the boxes in the intel-gfx CI hang at the
suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
of callbacks"), and after the revert in the tree that feeds to the CI,
we're back on track.

I found [1], was hoping to get feedback from Mika whether that helps
before reporting. Chris also suggested [2] as a quick fix but I don't
know if anyone tried that.

BR,
Jani.


[1] https://lkml.org/lkml/2016/12/26/156
[2] http://paste.debian.net/904973/


--
Jani Nikula, Intel Open Source Technology Center

2016-12-28 09:04:37

by Saarinen, Jani

[permalink] [raw]
Subject: RE: [Intel-gfx] [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

HI,
> -----Original Message-----
> From: Intel-gfx [mailto:[email protected]] On Behalf
> Of Sedat Dilek
> Sent: Tuesday, December 27, 2016 6:07 PM
> To: Chris Wilson <[email protected]>; Sedat Dilek
> <[email protected]>; Wysocki, Rafael J <[email protected]>;
> Thomas Gleixner <[email protected]>; intel-gfx <intel-
> [email protected]>; Linux PM List <[email protected]
> foundation.org>; LKML <[email protected]>; the arch/x86
> maintainers <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Subject: Re: [Intel-gfx] [Linux v4.10.0-rc1] call-traces after suspend-resume
> (pm? i915? cpu/hotplug?)
>
> >> Also note that our CI is unhappy with -rc1, and it was not due to
> >> i915 patches. So very likely something else is also broken.
> >>
> >
> > Can you explain what "CI" means and its function in drm-intel development?
> >
See https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-intel.html (Pre-Merge Testing) and
https://01.org/linuxgraphics/documentation/reproducing-patchwork-test-results
and (next link also on above documentation) https://intel-gfx-ci.01.org/CI/

Br

Jani Saarinen
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo




2016-12-28 10:01:01

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>> Hi!
>>>
>>>> [ Add some pm | i915 | x86 folks ]
>>>>
>>>> Hi,
>>>>
>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>> and I see some call-traces.
>>>> It is reproducible on suspend and resume.
>>>>
>>>> I cannot say which area touches the problem or if these are several
>>>> independent problems.
>>>>
>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>
>>>> Here some hunks...
>>>>
>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>> drivers/base/power/runtime.c:1032
>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>> 4.10.0-rc1-1-iniza-small #1
>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>> [ 29.003656] Call Trace:
>>>
>>> Just a note, at least 2 machines here refuse to resume with
>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>> common cause...
>>>
>>
>> [ Correct linux-pm ML and add Mika & Jani ]
>>
>> Thanks for the feedback.
>>
>> There are some cpu/hotplug fixes post-v4.10-rc1.
>> Give that a try.
>>
>> Yesterday, after answers from drm-intel folks I have seen that a
>> cpu/hotplug commit [1] was reverted in
>> drm-intel.git#drm-intel-nightly.
>> I haven't tried that.
>>
>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>
>> Regards,
>> - Sedat -
>>
>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>
>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>
>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>> Author: Thomas Gleixner <[email protected]>
>> Date: Wed Dec 21 20:19:49 2016 +0100
>> cpu/hotplug: Prevent overwriting of callbacks
>>
>> It started hanging all machines in CI s3 test:
>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>
>> Bisected-by: Mika Kuoppala <[email protected]>
>> Signed-off-by: Jani Nikula <[email protected]>
>
> Thomas -
>
> Indeed, basically all of the boxes in the intel-gfx CI hang at the
> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
> of callbacks"), and after the revert in the tree that feeds to the CI,
> we're back on track.
>
> I found [1], was hoping to get feedback from Mika whether that helps
> before reporting. Chris also suggested [2] as a quick fix but I don't
> know if anyone tried that.
>

Hi Jani,

I know you were not CCed in the original thread, please see [5].

The patchset from Thomas you mention [1] does fix one of the problems
I have seen, please see [6].
With these post-v4.10-rc1 patches applied a clean revert of Revert
"cpu/hotplug: Prevent overwriting of callbacks" is not possible.

Can you give a clear statement if the quick-fix from Chris is in
combination with the above revert or not?
Against v4.10-rc1?
Tested together with the patchset of Thomas?

Thanks.

Regards,
- Sedat -

[5] http://marc.info/?t=148279390200001&r=1&w=2
[6] http://marc.info/?l=linux-kernel&m=148282459901267&w=2


> BR,
> Jani.
>
>
> [1] https://lkml.org/lkml/2016/12/26/156
> [2] http://paste.debian.net/904973/
>
>
> --
> Jani Nikula, Intel Open Source Technology Center

2016-12-28 11:00:51

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>> Hi!
>>>>
>>>>> [ Add some pm | i915 | x86 folks ]
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>>> and I see some call-traces.
>>>>> It is reproducible on suspend and resume.
>>>>>
>>>>> I cannot say which area touches the problem or if these are several
>>>>> independent problems.
>>>>>
>>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>>
>>>>> Here some hunks...
>>>>>
>>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>>> drivers/base/power/runtime.c:1032
>>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>>> 4.10.0-rc1-1-iniza-small #1
>>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>>> [ 29.003656] Call Trace:
>>>>
>>>> Just a note, at least 2 machines here refuse to resume with
>>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>>> common cause...
>>>>
>>>
>>> [ Correct linux-pm ML and add Mika & Jani ]
>>>
>>> Thanks for the feedback.
>>>
>>> There are some cpu/hotplug fixes post-v4.10-rc1.
>>> Give that a try.
>>>
>>> Yesterday, after answers from drm-intel folks I have seen that a
>>> cpu/hotplug commit [1] was reverted in
>>> drm-intel.git#drm-intel-nightly.
>>> I haven't tried that.
>>>
>>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>>
>>> Regards,
>>> - Sedat -
>>>
>>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>>
>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>
>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>> Author: Thomas Gleixner <[email protected]>
>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>> cpu/hotplug: Prevent overwriting of callbacks
>>>
>>> It started hanging all machines in CI s3 test:
>>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>>
>>> Bisected-by: Mika Kuoppala <[email protected]>
>>> Signed-off-by: Jani Nikula <[email protected]>
>>
>> Thomas -
>>
>> Indeed, basically all of the boxes in the intel-gfx CI hang at the
>> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
>> of callbacks"), and after the revert in the tree that feeds to the CI,
>> we're back on track.
>>
>> I found [1], was hoping to get feedback from Mika whether that helps
>> before reporting. Chris also suggested [2] as a quick fix but I don't
>> know if anyone tried that.
>>
>
> Hi Jani,
>
> I know you were not CCed in the original thread, please see [5].
>
> The patchset from Thomas you mention [1] does fix one of the problems
> I have seen, please see [6].
> With these post-v4.10-rc1 patches applied a clean revert of Revert
> "cpu/hotplug: Prevent overwriting of callbacks" is not possible.
>
> Can you give a clear statement if the quick-fix from Chris is in
> combination with the above revert or not?
> Against v4.10-rc1?
> Tested together with the patchset of Thomas?
>
> Thanks.
>
> Regards,
> - Sedat -
>
> [5] http://marc.info/?t=148279390200001&r=1&w=2
> [6] http://marc.info/?l=linux-kernel&m=148282459901267&w=2
>
>
>> BR,
>> Jani.
>>
>>
>> [1] https://lkml.org/lkml/2016/12/26/156
>> [2] http://paste.debian.net/904973/
>>
>>
>> --
>> Jani Nikula, Intel Open Source Technology Center

I tried Chris' patch on latest Linus upstream.
It does not fix the problem seen on booting and after suspend/resume,
see attachments.

- Sedat -


Attachments:
dmesg_4.10.0-rc1-3-iniza-small_after-suspend-resume.txt (72.72 kB)
config-4.10.0-rc1-3-iniza-small (136.62 kB)
4.10.0-rc1-3-iniza-small.patch (32.49 kB)
Download all attachments

2016-12-28 22:32:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>> Hi!
>>>>
>>>>> [ Add some pm | i915 | x86 folks ]
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>>> and I see some call-traces.
>>>>> It is reproducible on suspend and resume.
>>>>>
>>>>> I cannot say which area touches the problem or if these are several
>>>>> independent problems.
>>>>>
>>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>>
>>>>> Here some hunks...
>>>>>
>>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>>> drivers/base/power/runtime.c:1032
>>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>>> 4.10.0-rc1-1-iniza-small #1
>>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>>> [ 29.003656] Call Trace:
>>>>
>>>> Just a note, at least 2 machines here refuse to resume with
>>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>>> common cause...
>>>>
>>>
>>> [ Correct linux-pm ML and add Mika & Jani ]
>>>
>>> Thanks for the feedback.
>>>
>>> There are some cpu/hotplug fixes post-v4.10-rc1.
>>> Give that a try.
>>>
>>> Yesterday, after answers from drm-intel folks I have seen that a
>>> cpu/hotplug commit [1] was reverted in
>>> drm-intel.git#drm-intel-nightly.
>>> I haven't tried that.
>>>
>>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>>
>>> Regards,
>>> - Sedat -
>>>
>>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>>
>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>
>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>> Author: Thomas Gleixner <[email protected]>
>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>> cpu/hotplug: Prevent overwriting of callbacks
>>>
>>> It started hanging all machines in CI s3 test:
>>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>>
>>> Bisected-by: Mika Kuoppala <[email protected]>
>>> Signed-off-by: Jani Nikula <[email protected]>
>>
>> Thomas -
>>
>> Indeed, basically all of the boxes in the intel-gfx CI hang at the
>> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
>> of callbacks"), and after the revert in the tree that feeds to the CI,
>> we're back on track.
>>
>> I found [1], was hoping to get feedback from Mika whether that helps
>> before reporting. Chris also suggested [2] as a quick fix but I don't
>> know if anyone tried that.
>>
>
> Hi Jani,
>
> I know you were not CCed in the original thread, please see [5].
>
> The patchset from Thomas you mention [1] does fix one of the problems
> I have seen, please see [6].
> With these post-v4.10-rc1 patches applied a clean revert of Revert
> "cpu/hotplug: Prevent overwriting of callbacks" is not possible.
>
> Can you give a clear statement if the quick-fix from Chris is in
> combination with the above revert or not?
> Against v4.10-rc1?
> Tested together with the patchset of Thomas?

Please test the Linus' tree from today, it should work.

Thanks,
Rafael

2016-12-29 00:33:57

by Doug Smythies

[permalink] [raw]
Subject: RE: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On 2016.12.28 14:33 Rafael J. Wysocki wrote:
> On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
>> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>>
>>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>>
>>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>>> Author: Thomas Gleixner <[email protected]>
>>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>>> cpu/hotplug: Prevent overwriting of callbacks

With respect to kernel 4.10-rc1 and the above referenced commit:

On my computer rdmsr was not working, and therefore many tools
that use it (i.e. turbostat) were also broken.

I bisected the kernel down to the same above referenced commit.
After finding some potentially related e-mails, I built a
4.10-rc1+ kernel with these:

0dad3a3 x86/mce/AMD: Make the init code more robust
b9d9d69 smp/hotplug: Undo tglxs brainfart
b4b8664 arm64: don't pull uaccess.h into *.S
7ce7d89 Linux 4.10-rc1

And now rdmsr is working fine, as is trubostat.


2016-12-29 00:43:46

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Wed, Dec 28, 2016 at 11:32 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
>> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>>> Hi!
>>>>>
>>>>>> [ Add some pm | i915 | x86 folks ]
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>>>> and I see some call-traces.
>>>>>> It is reproducible on suspend and resume.
>>>>>>
>>>>>> I cannot say which area touches the problem or if these are several
>>>>>> independent problems.
>>>>>>
>>>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>>>
>>>>>> Here some hunks...
>>>>>>
>>>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>>>> drivers/base/power/runtime.c:1032
>>>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>>>> 4.10.0-rc1-1-iniza-small #1
>>>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>>>> [ 29.003656] Call Trace:
>>>>>
>>>>> Just a note, at least 2 machines here refuse to resume with
>>>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>>>> common cause...
>>>>>
>>>>
>>>> [ Correct linux-pm ML and add Mika & Jani ]
>>>>
>>>> Thanks for the feedback.
>>>>
>>>> There are some cpu/hotplug fixes post-v4.10-rc1.
>>>> Give that a try.
>>>>
>>>> Yesterday, after answers from drm-intel folks I have seen that a
>>>> cpu/hotplug commit [1] was reverted in
>>>> drm-intel.git#drm-intel-nightly.
>>>> I haven't tried that.
>>>>
>>>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>>>
>>>> Regards,
>>>> - Sedat -
>>>>
>>>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>>>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>>>
>>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>>
>>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>>> Author: Thomas Gleixner <[email protected]>
>>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>>> cpu/hotplug: Prevent overwriting of callbacks
>>>>
>>>> It started hanging all machines in CI s3 test:
>>>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>>>
>>>> Bisected-by: Mika Kuoppala <[email protected]>
>>>> Signed-off-by: Jani Nikula <[email protected]>
>>>
>>> Thomas -
>>>
>>> Indeed, basically all of the boxes in the intel-gfx CI hang at the
>>> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
>>> of callbacks"), and after the revert in the tree that feeds to the CI,
>>> we're back on track.
>>>
>>> I found [1], was hoping to get feedback from Mika whether that helps
>>> before reporting. Chris also suggested [2] as a quick fix but I don't
>>> know if anyone tried that.
>>>
>>
>> Hi Jani,
>>
>> I know you were not CCed in the original thread, please see [5].
>>
>> The patchset from Thomas you mention [1] does fix one of the problems
>> I have seen, please see [6].
>> With these post-v4.10-rc1 patches applied a clean revert of Revert
>> "cpu/hotplug: Prevent overwriting of callbacks" is not possible.
>>
>> Can you give a clear statement if the quick-fix from Chris is in
>> combination with the above revert or not?
>> Against v4.10-rc1?
>> Tested together with the patchset of Thomas?
>
> Please test the Linus' tree from today, it should work.
>

Latest Linus tree (v4.10-rc1-17-g2d706e790f05) does not fix it.

- Sedat -

2016-12-29 07:28:09

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

[ Hope I have not dropped people from the CC list ]

Anyway, I tried a bit more with drm-intel-nightly (drm-tip:
2016y-12m-28d-13h-55m-14s UTC integration manifest) on top of Linux
v4.10-rc1.
This has the cpuhp-revert of drm-intel folks but not the
post-cpuhp-fixes of Thomas.

Now, I only see on booting...

[ 29.558350] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:1032
[ 29.558356] in_atomic(): 1, irqs_disabled(): 0, pid: 1507, name: Xorg
[ 29.558359] 1 lock held by Xorg/1507:
[ 29.558360] #0: (&dev->struct_mutex){+.+.+.}, at:
[<ffffffffa0660df3>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
[ 29.558456] CPU: 3 PID: 1507 Comm: Xorg Not tainted
4.10.0-rc1-4-iniza-amd64 #1
[ 29.558461] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 29.558464] Call Trace:
[ 29.558472] dump_stack+0x85/0xc2
[ 29.558478] ___might_sleep+0x196/0x260
[ 29.558484] __might_sleep+0x53/0xb0
[ 29.558488] __pm_runtime_resume+0x7a/0x90
[ 29.558511] intel_runtime_pm_get+0x25/0x90 [i915]
[ 29.558534] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
[ 29.558558] i915_vma_bind+0xaf/0x1e0 [i915]
[ 29.558580] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
[ 29.558584] ? find_get_entry+0x5/0x240
[ 29.558605] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
[ 29.558638] ? i915_vma_bind+0xaf/0x1e0 [i915]
[ 29.558673] ? __i915_vma_do_pin+0x30f/0x550 [i915]
[ 29.558706] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
[ 29.558727] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
[ 29.558748] i915_gem_do_execbuffer.isra.38+0xb50/0x1a80 [i915]
[ 29.558752] ? __might_fault+0x4e/0xb0
[ 29.558772] i915_gem_execbuffer2+0xc5/0x260 [i915]
[ 29.558775] ? __might_fault+0x4e/0xb0
[ 29.558791] drm_ioctl+0x20b/0x460 [drm]
[ 29.558811] ? i915_gem_execbuffer+0x340/0x340 [i915]
[ 29.558816] ? __fget+0x5/0x200
[ 29.558819] do_vfs_ioctl+0x91/0x6f0
[ 29.558822] ? __fget+0x111/0x200
[ 29.558824] ? __fget+0x5/0x200
[ 29.558827] SyS_ioctl+0x79/0x90
[ 29.558832] entry_SYSCALL_64_fastpath+0x23/0xc6

...and after suspend/resume (looks like the same call-trace)...

[ 108.502938] PM: resume of devices complete after 792.728 msecs
[ 108.506825] Restarting tasks ... done.
[ 110.410210] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 110.412520] ata1.00: configured for UDMA/133
[ 110.909622] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:1032
[ 110.909639] in_atomic(): 1, irqs_disabled(): 0, pid: 1507, name: Xorg
[ 110.909649] 1 lock held by Xorg/1507:
[ 110.909652] #0: (&dev->struct_mutex){+.+.+.}, at:
[<ffffffffa0660df3>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
[ 110.909775] CPU: 1 PID: 1507 Comm: Xorg Tainted: G W
4.10.0-rc1-4-iniza-amd64 #1
[ 110.909779] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
[ 110.909783] Call Trace:
[ 110.909800] dump_stack+0x85/0xc2
[ 110.909810] ___might_sleep+0x196/0x260
[ 110.909818] __might_sleep+0x53/0xb0
[ 110.909827] __pm_runtime_resume+0x7a/0x90
[ 110.909902] intel_runtime_pm_get+0x25/0x90 [i915]
[ 110.909976] aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
[ 110.910057] i915_vma_bind+0xaf/0x1e0 [i915]
[ 110.910133] i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
[ 110.910144] ? free_hot_cold_page+0x1c1/0x390
[ 110.910216] i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
[ 110.910226] ? free_pages+0x13/0x20
[ 110.910305] ? __i915_vma_do_pin+0x30f/0x550 [i915]
[ 110.910380] ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
[ 110.910453] ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
[ 110.910506] i915_gem_do_execbuffer.isra.38+0xb50/0x1a80 [i915]
[ 110.910513] ? __might_fault+0x4e/0xb0
[ 110.910559] i915_gem_execbuffer2+0xc5/0x260 [i915]
[ 110.910563] ? __might_fault+0x4e/0xb0
[ 110.910594] drm_ioctl+0x20b/0x460 [drm]
[ 110.910640] ? i915_gem_execbuffer+0x340/0x340 [i915]
[ 110.910646] ? __fget+0x5/0x200
[ 110.910651] do_vfs_ioctl+0x91/0x6f0
[ 110.910655] ? __fget+0x111/0x200
[ 110.910658] ? __fget+0x5/0x200
[ 110.910663] SyS_ioctl+0x79/0x90
[ 110.910671] entry_SYSCALL_64_fastpath+0x23/0xc6
[ 110.910675] RIP: 0033:0x7fb9370e9bb7
[ 110.910678] RSP: 002b:00007fff34e97208 EFLAGS: 00003202 ORIG_RAX:
0000000000000010
[ 110.910684] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fb9370e9bb7
[ 110.910686] RDX: 00007fff34e97308 RSI: 0000000040406469 RDI: 0000000000000009
[ 110.910689] RBP: 0000563782f58c78 R08: 0000000000000040 R09: 0101010101010101
[ 110.910691] R10: 00007fff34e90000 R11: 0000000000003202 R12: 00007fff34e96420
[ 110.910693] R13: 0000000000000032 R14: 0000000000000010 R15: 0000000000000031

Here the snippet referred above...

[ drivers/base/power/runtime.c ]
...
/**
* __pm_runtime_resume - Entry point for runtime resume operations.
* @dev: Device to resume.
* @rpmflags: Flag bits.
*
* If the RPM_GET_PUT flag is set, increment the device's usage count. Then
* carry out a resume, either synchronous or asynchronous.
*
* This routine may be called in atomic context if the RPM_ASYNC flag is set,
* or if pm_runtime_irq_safe() has been called.
*/
int __pm_runtime_resume(struct device *dev, int rpmflags)
{
unsigned long flags;
int retval;

might_sleep_if(!(rpmflags & RPM_ASYNC) &&
!dev->power.irq_safe); <--- XXX: Line #1032

if (rpmflags & RPM_GET_PUT)
atomic_inc(&dev->power.usage_count);

spin_lock_irqsave(&dev->power.lock, flags);
retval = rpm_resume(dev, rpmflags);
spin_unlock_irqrestore(&dev->power.lock, flags);

return retval;
}
EXPORT_SYMBOL_GPL(__pm_runtime_resume);
...

Hope this helps.

- Sedat -


Attachments:
dmesg_4.10.0-rc1-4-iniza-amd64_after-suspend-resume.txt (67.68 kB)
config-4.10.0-rc1-4-iniza-amd64 (136.78 kB)
Download all attachments

2016-12-29 09:50:54

by Jani Nikula

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Thu, 29 Dec 2016, Sedat Dilek <[email protected]> wrote:
> On Wed, Dec 28, 2016 at 11:32 PM, Rafael J. Wysocki <[email protected]> wrote:
>> On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
>>> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>>>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>>>> Hi!
>>>>>>
>>>>>>> [ Add some pm | i915 | x86 folks ]
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>>>>> and I see some call-traces.
>>>>>>> It is reproducible on suspend and resume.
>>>>>>>
>>>>>>> I cannot say which area touches the problem or if these are several
>>>>>>> independent problems.
>>>>>>>
>>>>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>>>>
>>>>>>> Here some hunks...
>>>>>>>
>>>>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>>>>> drivers/base/power/runtime.c:1032
>>>>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>>>>> 4.10.0-rc1-1-iniza-small #1
>>>>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>>>>> [ 29.003656] Call Trace:
>>>>>>
>>>>>> Just a note, at least 2 machines here refuse to resume with
>>>>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>>>>> common cause...
>>>>>>
>>>>>
>>>>> [ Correct linux-pm ML and add Mika & Jani ]
>>>>>
>>>>> Thanks for the feedback.
>>>>>
>>>>> There are some cpu/hotplug fixes post-v4.10-rc1.
>>>>> Give that a try.
>>>>>
>>>>> Yesterday, after answers from drm-intel folks I have seen that a
>>>>> cpu/hotplug commit [1] was reverted in
>>>>> drm-intel.git#drm-intel-nightly.
>>>>> I haven't tried that.
>>>>>
>>>>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>>>>
>>>>> Regards,
>>>>> - Sedat -
>>>>>
>>>>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>>>>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>>>>
>>>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>>>
>>>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>>>> Author: Thomas Gleixner <[email protected]>
>>>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>>>> cpu/hotplug: Prevent overwriting of callbacks
>>>>>
>>>>> It started hanging all machines in CI s3 test:
>>>>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>>>>
>>>>> Bisected-by: Mika Kuoppala <[email protected]>
>>>>> Signed-off-by: Jani Nikula <[email protected]>
>>>>
>>>> Thomas -
>>>>
>>>> Indeed, basically all of the boxes in the intel-gfx CI hang at the
>>>> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
>>>> of callbacks"), and after the revert in the tree that feeds to the CI,
>>>> we're back on track.
>>>>
>>>> I found [1], was hoping to get feedback from Mika whether that helps
>>>> before reporting. Chris also suggested [2] as a quick fix but I don't
>>>> know if anyone tried that.
>>>>
>>>
>>> Hi Jani,
>>>
>>> I know you were not CCed in the original thread, please see [5].
>>>
>>> The patchset from Thomas you mention [1] does fix one of the problems
>>> I have seen, please see [6].
>>> With these post-v4.10-rc1 patches applied a clean revert of Revert
>>> "cpu/hotplug: Prevent overwriting of callbacks" is not possible.
>>>
>>> Can you give a clear statement if the quick-fix from Chris is in
>>> combination with the above revert or not?
>>> Against v4.10-rc1?
>>> Tested together with the patchset of Thomas?
>>
>> Please test the Linus' tree from today, it should work.
>>
>
> Latest Linus tree (v4.10-rc1-17-g2d706e790f05) does not fix it.

It seems to me there are more than one bug at play here.

BR,
Jani.


>
> - Sedat -

--
Jani Nikula, Intel Open Source Technology Center

2016-12-29 12:00:00

by Mika Kuoppala

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

Sedat Dilek <[email protected]> writes:

> On Wed, Dec 28, 2016 at 11:32 PM, Rafael J. Wysocki <[email protected]> wrote:
>> On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
>>> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>>>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>>>> Hi!
>>>>>>
>>>>>>> [ Add some pm | i915 | x86 folks ]
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>>>>> and I see some call-traces.
>>>>>>> It is reproducible on suspend and resume.
>>>>>>>
>>>>>>> I cannot say which area touches the problem or if these are several
>>>>>>> independent problems.
>>>>>>>
>>>>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>>>>
>>>>>>> Here some hunks...
>>>>>>>
>>>>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>>>>> drivers/base/power/runtime.c:1032
>>>>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>>>>> 4.10.0-rc1-1-iniza-small #1
>>>>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>>>>> [ 29.003656] Call Trace:
>>>>>>
>>>>>> Just a note, at least 2 machines here refuse to resume with
>>>>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>>>>> common cause...
>>>>>>
>>>>>
>>>>> [ Correct linux-pm ML and add Mika & Jani ]
>>>>>
>>>>> Thanks for the feedback.
>>>>>
>>>>> There are some cpu/hotplug fixes post-v4.10-rc1.
>>>>> Give that a try.
>>>>>
>>>>> Yesterday, after answers from drm-intel folks I have seen that a
>>>>> cpu/hotplug commit [1] was reverted in
>>>>> drm-intel.git#drm-intel-nightly.
>>>>> I haven't tried that.
>>>>>
>>>>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>>>>
>>>>> Regards,
>>>>> - Sedat -
>>>>>
>>>>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>>>>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>>>>
>>>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>>>
>>>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>>>> Author: Thomas Gleixner <[email protected]>
>>>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>>>> cpu/hotplug: Prevent overwriting of callbacks
>>>>>
>>>>> It started hanging all machines in CI s3 test:
>>>>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>>>>
>>>>> Bisected-by: Mika Kuoppala <[email protected]>
>>>>> Signed-off-by: Jani Nikula <[email protected]>
>>>>
>>>> Thomas -
>>>>
>>>> Indeed, basically all of the boxes in the intel-gfx CI hang at the
>>>> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
>>>> of callbacks"), and after the revert in the tree that feeds to the CI,
>>>> we're back on track.
>>>>
>>>> I found [1], was hoping to get feedback from Mika whether that helps
>>>> before reporting. Chris also suggested [2] as a quick fix but I don't
>>>> know if anyone tried that.
>>>>
>>>
>>> Hi Jani,
>>>
>>> I know you were not CCed in the original thread, please see [5].
>>>
>>> The patchset from Thomas you mention [1] does fix one of the problems
>>> I have seen, please see [6].
>>> With these post-v4.10-rc1 patches applied a clean revert of Revert
>>> "cpu/hotplug: Prevent overwriting of callbacks" is not possible.
>>>
>>> Can you give a clear statement if the quick-fix from Chris is in
>>> combination with the above revert or not?
>>> Against v4.10-rc1?
>>> Tested together with the patchset of Thomas?
>>
>> Please test the Linus' tree from today, it should work.
>>
>
> Latest Linus tree (v4.10-rc1-17-g2d706e790f05) does not fix it.
>

Latest Linus tree 2d706e790f0508dff4fb72eca9b4892b79757feb fixes our S3
problems. It survives gem_exec_suspend --r basic-S3 on kabylake.

It contains the fix to the bisected commit:

commit b9d9d6911bd5c370ad4b3aa57d758c093d17aed5
Author: Thomas Gleixner <[email protected]>
Date: Mon Dec 26 22:58:19 2016 +0100

smp/hotplug: Undo tglxs brainfart


-Mika

2016-12-30 11:19:08

by Sedat Dilek

[permalink] [raw]
Subject: Re: [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)

On Thu, Dec 29, 2016 at 12:58 PM, Mika Kuoppala
<[email protected]> wrote:
> Sedat Dilek <[email protected]> writes:
>
>> On Wed, Dec 28, 2016 at 11:32 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> On Wed, Dec 28, 2016 at 11:00 AM, Sedat Dilek <[email protected]> wrote:
>>>> On Wed, Dec 28, 2016 at 9:29 AM, Jani Nikula <[email protected]> wrote:
>>>>> On Wed, 28 Dec 2016, Sedat Dilek <[email protected]> wrote:
>>>>>> On Tue, Dec 27, 2016 at 10:13 PM, Pavel Machek <[email protected]> wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>>> [ Add some pm | i915 | x86 folks ]
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system
>>>>>>>> and I see some call-traces.
>>>>>>>> It is reproducible on suspend and resume.
>>>>>>>>
>>>>>>>> I cannot say which area touches the problem or if these are several
>>>>>>>> independent problems.
>>>>>>>>
>>>>>>>> For a full dmesg-log see attachments (my linux-config is attached, too).
>>>>>>>>
>>>>>>>> Here some hunks...
>>>>>>>>
>>>>>>>> [ 29.003601] BUG: sleeping function called from invalid context at
>>>>>>>> drivers/base/power/runtime.c:1032
>>>>>>>> [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg
>>>>>>>> [ 29.003610] 1 lock held by Xorg/1469:
>>>>>>>> [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at:
>>>>>>>> [<ffffffffa0623c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
>>>>>>>> [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted
>>>>>>>> 4.10.0-rc1-1-iniza-small #1
>>>>>>>> [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
>>>>>>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
>>>>>>>> [ 29.003656] Call Trace:
>>>>>>>
>>>>>>> Just a note, at least 2 machines here refuse to resume with
>>>>>>> v4.10-rc1. One has intel graphics, one has AMD. It may or may not have
>>>>>>> common cause...
>>>>>>>
>>>>>>
>>>>>> [ Correct linux-pm ML and add Mika & Jani ]
>>>>>>
>>>>>> Thanks for the feedback.
>>>>>>
>>>>>> There are some cpu/hotplug fixes post-v4.10-rc1.
>>>>>> Give that a try.
>>>>>>
>>>>>> Yesterday, after answers from drm-intel folks I have seen that a
>>>>>> cpu/hotplug commit [1] was reverted in
>>>>>> drm-intel.git#drm-intel-nightly.
>>>>>> I haven't tried that.
>>>>>>
>>>>>> It's good when Thomas knows of this and gets in contact with drm-intel folks.
>>>>>>
>>>>>> Regards,
>>>>>> - Sedat -
>>>>>>
>>>>>> [1] https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-nightly&id=e558f178f5390185b7324ff4b816b52c6ae3a928
>>>>>> [2] https://cgit.freedesktop.org/drm-intel/log/?h=drm-intel-nightly
>>>>>>
>>>>>> P.S.: Revert "cpu/hotplug: Prevent overwriting of callbacks"
>>>>>>
>>>>>> This reverts commit dc280d93623927570da279e99393879dbbab39e7
>>>>>> Author: Thomas Gleixner <[email protected]>
>>>>>> Date: Wed Dec 21 20:19:49 2016 +0100
>>>>>> cpu/hotplug: Prevent overwriting of callbacks
>>>>>>
>>>>>> It started hanging all machines in CI s3 test:
>>>>>> https://intel-gfx-ci.01.org/CI/igt@[email protected]
>>>>>>
>>>>>> Bisected-by: Mika Kuoppala <[email protected]>
>>>>>> Signed-off-by: Jani Nikula <[email protected]>
>>>>>
>>>>> Thomas -
>>>>>
>>>>> Indeed, basically all of the boxes in the intel-gfx CI hang at the
>>>>> suspend/resume test with dc280d936239 ("cpu/hotplug: Prevent overwriting
>>>>> of callbacks"), and after the revert in the tree that feeds to the CI,
>>>>> we're back on track.
>>>>>
>>>>> I found [1], was hoping to get feedback from Mika whether that helps
>>>>> before reporting. Chris also suggested [2] as a quick fix but I don't
>>>>> know if anyone tried that.
>>>>>
>>>>
>>>> Hi Jani,
>>>>
>>>> I know you were not CCed in the original thread, please see [5].
>>>>
>>>> The patchset from Thomas you mention [1] does fix one of the problems
>>>> I have seen, please see [6].
>>>> With these post-v4.10-rc1 patches applied a clean revert of Revert
>>>> "cpu/hotplug: Prevent overwriting of callbacks" is not possible.
>>>>
>>>> Can you give a clear statement if the quick-fix from Chris is in
>>>> combination with the above revert or not?
>>>> Against v4.10-rc1?
>>>> Tested together with the patchset of Thomas?
>>>
>>> Please test the Linus' tree from today, it should work.
>>>
>>
>> Latest Linus tree (v4.10-rc1-17-g2d706e790f05) does not fix it.
>>
>
> Latest Linus tree 2d706e790f0508dff4fb72eca9b4892b79757feb fixes our S3
> problems. It survives gem_exec_suspend --r basic-S3 on kabylake.
>
> It contains the fix to the bisected commit:
>
> commit b9d9d6911bd5c370ad4b3aa57d758c093d17aed5
> Author: Thomas Gleixner <[email protected]>
> Date: Mon Dec 26 22:58:19 2016 +0100
>
> smp/hotplug: Undo tglxs brainfart
>
>

These are good news!

I still see another issue and this seems independent of Thomas'
"brainfart" patch.

Will post a separate email on the other issue.

- Sedat -