2016-10-12 09:56:22

by Paul Bolle

[permalink] [raw]
Subject: drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On a laptop that tracks the latest stable release (Ie, it now runs
v4.8.1) I see this WARNING
    WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

Full trace pasted below. I never saw this WARNING before v4.8. Since
v4.8 I've had it in all (four, actually) boots.

What am I expected to do about this WARNING?

Thanks,


Paul Bolle

WARNING: CPU: 3 PID: 1368 at drivers/gpu/drm/i915/intel_display.c:14178 skl_max_scale.part.120+0x75/0x80 [i915]
WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)
Modules linked in:
rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 cmac nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep vfat fat arc4 snd_hda_codec_hdmi snd_soc_skl dell_led snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal coretemp kvm_intel snd_compress snd_pcm_dmaengine ac97_bus kvm snd_hda_intel iwlmvm snd_hda_codec mac80211 iTCO_wdt
iTCO_vendor_support uvcvideo snd_hda_core snd_hwdep snd_seq irqbypass dell_laptop i2c_designware_platform i2c_designware_core dell_wmi crct10dif_pclmul dell_smbios dcdbas crc32_pclmul snd_seq_device iwlwifi videobuf2_vmalloc videobuf2_memops ghash_clmulni_intel snd_pcm videobuf2_v4l2 videobuf2_core cfg80211 videodev media joydev pcspkr mei_me rtsx_pci_ms memstick snd_timer i2c_i801 i2c_smbus mei snd btusb soundcore shpchp hci_uart btrtl btbcm btqca idma64 btintel bluetooth intel_pch_thermal processor_thermal_device intel_lpss_pci intel_soc_dts_iosf wmi pinctrl_sunrisepoint intel_lpss_acpi rfkill pinctrl_intel intel_lpss int3400_thermal acpi_als int3403_thermal int340x_thermal_zone kfifo_buf acpi_thermal_rel intel_hid industrialio sparse_keymap acpi_pad tpm_tis tpm_tis_core tpm nfsd auth_rpcgss
nfs_acl lockd grace sunrpc hid_multitouch i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper crc32c_intel drm serio_raw nvme rtsx_pci nvme_core i2c_hid video fjes
CPU: 3 PID: 1368 Comm: Xorg Not tainted 4.8.1-1.local1.fc24.x86_64 #1
Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.4 06/14/2016
0000000000000286 00000000df2f374c ffffa31528d53910 ffffffffb83e5cfd
ffffa31528d53960 0000000000000000 ffffa31528d53950 ffffffffb80a7d5b
00003762c72b3010 ffffa3151e4d8cc0 ffffa31526c23800 ffffa31526e60000
Call Trace:
[<ffffffffb83e5cfd>] dump_stack+0x63/0x86
[<ffffffffb80a7d5b>] __warn+0xcb/0xf0
[<ffffffffb80a7ddf>] warn_slowpath_fmt+0x5f/0x80
[<ffffffffb83f5247>] ? sort+0x147/0x220
[<ffffffffc0132754>] ? drm_atomic_helper_normalize_zpos+0x264/0x300 [drm_kms_helper]
[<ffffffffc01f20e5>] skl_max_scale.part.120+0x75/0x80 [i915]
[<ffffffffc01f21b6>] intel_check_primary_plane+0xc6/0xe0 [i915]
[<ffffffffc0132754>] ? drm_atomic_helper_normalize_zpos+0x264/0x300 [drm_kms_helper]
[<ffffffffc01e33d2>] intel_plane_atomic_check+0x132/0x1f0 [i915]
[<ffffffffc012f524>] drm_atomic_helper_check_planes+0x84/0x200 [drm_kms_helper]
[<ffffffffc01fdd07>] intel_atomic_check+0x9a7/0x11a0 [i915]
[<ffffffffb822c84a>] ? __kmalloc_track_caller+0x17a/0x210
[<ffffffffc00bbf07>] drm_atomic_check_only+0x187/0x610 [drm]
[<ffffffffc00bb6c8>] ? drm_atomic_get_crtc_state+0x88/0x100 [drm]
[<ffffffffc00bc3a7>] drm_atomic_commit+0x17/0x60 [drm]
[<ffffffffc012e90c>] drm_atomic_helper_update_plane+0xec/0x130 [drm_kms_helper]
[<ffffffffc00ab16b>] __setplane_internal+0x22b/0x270 [drm]
[<ffffffffc00ab2e9>] drm_mode_cursor_universal+0x139/0x240 [drm]
[<ffffffffc00ab46e>] drm_mode_cursor_common+0x7e/0x180 [drm]
[<ffffffffc00af9de>] drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[<ffffffffc00a1f0a>] drm_ioctl+0x1da/0x4b0 [drm]
[<ffffffffc00af9d0>] ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[<ffffffffb81161bd>] ? enqueue_hrtimer+0x3d/0x80
[<ffffffffb8266693>] do_vfs_ioctl+0xa3/0x5e0
[<ffffffffb86bc151>] ? __sys_recvmsg+0x51/0x90
[<ffffffffb8266c49>] SyS_ioctl+0x79/0x90
[<ffffffffb87fc4f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4


2016-10-12 11:08:54

by Joonas Lahtinen

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On ke, 2016-10-12 at 11:56 +0200, Paul Bolle wrote:
> On a laptop that tracks the latest stable release (Ie, it now runs
> v4.8.1) I see this WARNING
>     WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)
>
> Full trace pasted below. I never saw this WARNING before v4.8. Since
> v4.8 I've had it in all (four, actually) boots.
>
> What am I expected to do about this WARNING?
>

Bisecting the offending commit between v4.8 and v4.8.1 would be a good
start.

Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

2016-10-12 12:08:08

by Paul Bolle

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On Wed, 2016-10-12 at 14:08 +0300, Joonas Lahtinen wrote:
> Bisecting the offending commit between v4.8 and v4.8.1 would be a good
> start.

That would be between v4.7 and v4.8. (I guess my report was ambiguous.)

That might take some time. Because bisecting always takes a long time
and especially since hitting this WARNING sometimes takes over an hour.
Anyhow, please prod me if I stay silent for too long.

Thanks,


Paul Bolle

2016-10-12 14:35:09

by Jani Nikula

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On Wed, 12 Oct 2016, Paul Bolle <[email protected]> wrote:
> On Wed, 2016-10-12 at 14:08 +0300, Joonas Lahtinen wrote:
>> Bisecting the offending commit between v4.8 and v4.8.1 would be a good
>> start.
>
> That would be between v4.7 and v4.8. (I guess my report was ambiguous.)
>
> That might take some time. Because bisecting always takes a long time
> and especially since hitting this WARNING sometimes takes over an hour.
> Anyhow, please prod me if I stay silent for too long.

In the mean time, please file a bug over at [1] so we don't lose track.

BR,
Jani.

[1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel

--
Jani Nikula, Intel Open Source Technology Center

2016-10-12 14:47:30

by Paul Bolle

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On Wed, 2016-10-12 at 17:34 +0300, Jani Nikula wrote:
> In the mean time, please file a bug over at [1] so we don't lose
> track.

Done:  https://bugs.freedesktop.org/show_bug.cgi?id=98214


Paul Bolle

2016-10-12 18:49:48

by Paul Bolle

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On Wed, 2016-10-12 at 14:06 +0200, Paul Bolle wrote:
> That might take some time. Because bisecting always takes a long time
> and especially since hitting this WARNING sometimes takes over an hour.
> Anyhow, please prod me if I stay silent for too long.

For the record: I just had to power cycle this laptop because it got
into that lovely state where it just locks without accepting any input
(no, I don't have netconsole enabled).

Assuming this lockup is related: this could be more urgent than I
thought.


Paul Bolle

2016-10-13 18:06:59

by Paul Bolle

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

[Adding Matt]

On Wed, 2016-10-12 at 14:08 +0300, Joonas Lahtinen wrote:
> Bisecting the offending commit between v4.8 and v4.8.1 would be a good
> start.

0) Why use a personal notebook when one can just post any half baked
idea to lkml?

1) I stumbled on https://bugzilla.redhat.com/show_bug.cgi?id=1361357 (s
ame hardware, rather similar trace).

2) That report was about the first Fedora (rawhide) kernel release
after this commit http://pkgs.fedoraproject.org/cgit/rpms/kernel.git/co
mmit/?h=f25&id=7d2c2f2d91793b5da452bee9bea4fa32051c8608 ("Bring in
patch-series from drm-next to fix skl_update_other_pipe_wm issues").

3) That seventeen part series ended up in v4.8. The last commit of that
series is commit 5b483747a925 ("drm/i915: Remove wm_config from
dev_priv/intel_atomic_state").

4) So, with a little bit of luck, my bisect might only need to look at
those seventeen commits. Still, it will probably be next week before I
have a day or two to actually perform that bisect.

To be continued,


Paul Bolle

2016-10-24 20:48:57

by Paul Bolle

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

[Detailed post, but please give it a quick scan.]

On Wed, 2016-10-12 at 14:06 +0200, Paul Bolle wrote:
> On Wed, 2016-10-12 at 14:08 +0300, Joonas Lahtinen wrote:
> > Bisecting the offending commit between v4.8 and v4.8.1 would be a good
> > start.
>
> That would be between v4.7 and v4.8. (I guess my report was
> ambiguous.)
>
> That might take some time. Because bisecting always takes a long time
> and especially since hitting this WARNING sometimes takes over an hour.
> Anyhow, please prod me if I stay silent for too long.

0) So I've lost my courage to do a bisect when my first attempt landed
me in v4.6-rc3. This is about for issue popping up between v4.7 and
v4.8-rc1.

1) So I used the most reliable debugging tool that I actually
understand: printk():

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index fbcfed63a76e..791de414cf1e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -14771,10 +14771,16 @@ skl_max_scale(struct intel_crtc *intel_crtc, struct intel_crtc_state *crtc_state
return DRM_PLANE_HELPER_NO_SCALING;

crtc_clock = crtc_state->base.adjusted_mode.crtc_clock;
- cdclk = to_intel_atomic_state(crtc_state->base.state)->cdclk;
+ if (WARN_ON_ONCE(!crtc_clock))
+ return DRM_PLANE_HELPER_NO_SCALING;

- if (WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock))
+ cdclk = to_intel_atomic_state(crtc_state->base.state)->cdclk;
+ if (WARN_ON_ONCE(cdclk < crtc_clock)) {
+ printk(KERN_DEBUG "i915: cdclk < crtc_clock: %d < %d\n", cdclk, crtc_clock);
return DRM_PLANE_HELPER_NO_SCALING;
+ }
+
+ printk_ratelimited(KERN_DEBUG "i915: cdclk >= crtc_clock: %d >= %d\n", cdclk, crtc_clock);

/*
* skl max scale is lower of:

2) This taught me that crtc_clock always is 373250 on my machine. cdclk
mostly is 450000, but every now and then it briefly is 337500.

3) Now the interesting pattern is that cdclk drops to 337500 only after
two quick calls of skl_max_scale() with cdclk set to 450000, and a
roughly 300ms pause before the third call of that function. Example:

<7>[23758.501933] i915: cdclk >= crtc_clock: 450000 >= 373250
<7>[23758.515211] i915: cdclk >= crtc_clock: 450000 >= 373250
<7>[23758.869057] i915: cdclk < crtc_clock: 337500 < 373250

This pattern of cdclk being 337500 after roughly 300ms is surprisingly
stable.

4) So _perhaps_ there's some roughly 300ms timeout, somehow, somewhere,
that sets cdclk to 337500 and triggers this issue. Ideas?

To be continued,


Paul Bolle

2016-10-25 08:37:29

by Jani Nikula

[permalink] [raw]
Subject: Re: [Intel-gfx] drm/i915: WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock)

On Mon, 24 Oct 2016, Paul Bolle <[email protected]> wrote:
> [Detailed post, but please give it a quick scan.]

Please file the information in the bug you filed. Please attach dmesg
(again, on the bug) with drm.debug=14 and running your patch.

BR,
Jani.

>
> On Wed, 2016-10-12 at 14:06 +0200, Paul Bolle wrote:
>> On Wed, 2016-10-12 at 14:08 +0300, Joonas Lahtinen wrote:
>> > Bisecting the offending commit between v4.8 and v4.8.1 would be a good
>> > start.
>>
>> That would be between v4.7 and v4.8. (I guess my report was
>> ambiguous.)
>>
>> That might take some time. Because bisecting always takes a long time
>> and especially since hitting this WARNING sometimes takes over an hour.
>> Anyhow, please prod me if I stay silent for too long.
>
> 0) So I've lost my courage to do a bisect when my first attempt landed
> me in v4.6-rc3. This is about for issue popping up between v4.7 and
> v4.8-rc1.
>
> 1) So I used the most reliable debugging tool that I actually
> understand: printk():
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index fbcfed63a76e..791de414cf1e 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -14771,10 +14771,16 @@ skl_max_scale(struct intel_crtc *intel_crtc, struct intel_crtc_state *crtc_state
> return DRM_PLANE_HELPER_NO_SCALING;
>
> crtc_clock = crtc_state->base.adjusted_mode.crtc_clock;
> - cdclk = to_intel_atomic_state(crtc_state->base.state)->cdclk;
> + if (WARN_ON_ONCE(!crtc_clock))
> + return DRM_PLANE_HELPER_NO_SCALING;
>
> - if (WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock))
> + cdclk = to_intel_atomic_state(crtc_state->base.state)->cdclk;
> + if (WARN_ON_ONCE(cdclk < crtc_clock)) {
> + printk(KERN_DEBUG "i915: cdclk < crtc_clock: %d < %d\n", cdclk, crtc_clock);
> return DRM_PLANE_HELPER_NO_SCALING;
> + }
> +
> + printk_ratelimited(KERN_DEBUG "i915: cdclk >= crtc_clock: %d >= %d\n", cdclk, crtc_clock);
>
> /*
> * skl max scale is lower of:
>
> 2) This taught me that crtc_clock always is 373250 on my machine. cdclk
> mostly is 450000, but every now and then it briefly is 337500.
>
> 3) Now the interesting pattern is that cdclk drops to 337500 only after
> two quick calls of skl_max_scale() with cdclk set to 450000, and a
> roughly 300ms pause before the third call of that function. Example:
>
> <7>[23758.501933] i915: cdclk >= crtc_clock: 450000 >= 373250
> <7>[23758.515211] i915: cdclk >= crtc_clock: 450000 >= 373250
> <7>[23758.869057] i915: cdclk < crtc_clock: 337500 < 373250
>
> This pattern of cdclk being 337500 after roughly 300ms is surprisingly
> stable.
>
> 4) So _perhaps_ there's some roughly 300ms timeout, somehow, somewhere,
> that sets cdclk to 337500 and triggers this issue. Ideas?
>
> To be continued,
>
>
> Paul Bolle

--
Jani Nikula, Intel Open Source Technology Center