Date: Thu, 31 Mar 2016 01:14:30 +0200
From: Florian Zumbiehl <florz@florz.de>
To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
        linux-kernel@vger.kernel.org, Daniel Vetter <daniel.vetter@intel.com>,
        Jani Nikula <jani.nikula@linux.intel.com>,
        David Airlie <airlied@linux.ie>
Subject: i951 ERRORs and WARN_ON()s (was: blank screen on boot with
 i915/DRM_FBDEV_EMULATION)
Message-ID: <20160330231430.GC6652@florz.florz.de>
References: <20160326112122.GF13320@florz.florz.de>
 <20160329120118.GW2510@phenom.ffwll.local>
 <20160329164457.GA6652@florz.florz.de>
 <20160330062957.GB2510@phenom.ffwll.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160330062957.GB2510@phenom.ffwll.local>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5914
Lines: 88

Hi,

> We've fixed piles of those in recent kernels, but didn't backport all the
> fixes (since usually it's a silent failure, but it can correlate with
> black screens).

Not quite completely, it seems ...

I have built drm-intel-nightly (f261f82359), and I'm getting this:

| [   15.855007] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
| [   15.855007] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* pipe B underrun
| [   15.855007] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
| [   15.863175] [drm] RC6 disabled, disabling runtime PM support
| [   15.863543] [drm] initialized overlay support
| [   15.933338] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
| [   15.997130] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
| [   16.061856] [drm:intel_set_cpu_fifo_underrun_reporting [i915]] *ERROR* pipe A underrun
| [   16.725274] [drm] Initialized i915 1.6.0 20160330 for 0000:00:02.0 on minor 0
| [   16.805727] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

> > | [ 2520.457732] WARNING: CPU: 0 PID: 3193 at drivers/gpu/drm/i915/i915_gem.c:4508 i915_gem_free_object+0x277/0x280 [i915]()
> > | [ 2520.457736] WARN_ON(obj->frontbuffer_bits)
> 
> Hm, this one should be fixed, and the patches should all be correctly
> marked for stable. Either there's a backlog somewhere, or we failed.
> 
> Would be great if you can test a drm-intel-nightly build (or 4.6-rc1) for
> either and confirm that they're gone. And for the later we really should
> hunt down the bugfix if it's stuck.

| [  141.999803] ------------[ cut here ]------------
| [  141.999916] WARNING: CPU: 0 PID: 3349 at drivers/gpu/drm/i915/i915_gem.c:4568 i915_gem_free_object+0x25f/0x270 [i915]
| [  141.999923] WARN_ON(obj->frontbuffer_bits)
| [  141.999928] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CT iptable_raw xt_nat xt_tcpudp xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables dummy tun nfsd exportfs nfs lockd grace sunrpc ipv6 fbcon bitblit softcursor font loop mousedev i915 i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea snd_intel8x0 drm snd_ac97_codec ac97_bus i2c_core snd_pcm_oss fb snd_mixer_oss fbdev snd_pcm ipw2200 snd_timer snd libipw soundcore lib80211 nsc_ircc thinkpad_acpi cfg80211 pcmcia psmouse sdhci_pci irda uhci_hcd ehci_pci sdhci crc_ccitt ehci_hcd serio_raw e1000 mmc_core nvram evdev usbcore parport_pc yenta_socket hwmon parport pcmcia_rsrc video usb_common pcmcia_core backlight ac battery acpi_cpufreq intel_agp processor button intel_gtt agpgart twofish_generic twofish_i586 twofish_common xts gf128mul dm_crypt dm_mod thermal
| [  142.000114] CPU: 0 PID: 3349 Comm: Xorg Not tainted 4.6.0-rc1+ #1
| [  142.000120] Hardware name: IBM 23716JG/23716JG, BIOS 1UETD3WW (2.08 ) 12/21/2006
| [  142.000127]  c11b8f7a c1037247 f8dea59b c0051dc4 00000d15 f8dda000 000011d8 f8d3ff2f
| [  142.000141]  f8d3ff2f 000011d8 f3ef1dcc f3ef1e30 f3ef1dcc f3ef1dc0 c1037309 00000009
| [  142.000154]  00000000 c0051dac f8dea59b c0051dc4 f8d3ff2f f8dda000 000011d8 f8dea59b
| [  142.000168] Call Trace:
| [  142.000185]  [<c11b8f7a>] ? dump_stack+0xa/0x20
| [  142.000197]  [<c1037247>] ? __warn+0xe7/0x100
| [  142.000269]  [<f8d3ff2f>] ? i915_gem_free_object+0x25f/0x270 [i915]
| [  142.000337]  [<f8d3ff2f>] ? i915_gem_free_object+0x25f/0x270 [i915]
| [  142.000347]  [<c1037309>] ? warn_slowpath_fmt+0x39/0x40
| [  142.000416]  [<f8d3ff2f>] ? i915_gem_free_object+0x25f/0x270 [i915]
| [  142.000452]  [<f892ed83>] ? drm_gem_object_free+0x23/0x40 [drm]
| [  142.000478]  [<f892f58f>] ? drm_gem_object_handle_unreference_unlocked+0xcf/0xe0 [drm]
| [  142.000504]  [<f892f5e7>] ? drm_gem_object_release_handle+0x47/0x90 [drm]
| [  142.000529]  [<f892f67e>] ? drm_gem_handle_delete+0x4e/0x80 [drm]
| [  142.000554]  [<f892f8d0>] ? drm_gem_handle_create+0x30/0x30 [drm]
| [  142.000580]  [<f89302c0>] ? drm_ioctl+0x230/0x570 [drm]
| [  142.000606]  [<f892f8d0>] ? drm_gem_handle_create+0x30/0x30 [drm]
| [  142.000618]  [<c10b34a3>] ? unmap_page_range+0x433/0x530
| [  142.000627]  [<c11be1c3>] ? __rb_erase_color+0xf3/0x250
| [  142.000637]  [<c10b7116>] ? unlink_file_vma+0x36/0x70
| [  142.000645]  [<c10b1db9>] ? tlb_finish_mmu+0x9/0x30
| [  142.000671]  [<f8930090>] ? drm_ioctl_permit+0x80/0x80 [drm]
| [  142.000682]  [<c10e7250>] ? do_vfs_ioctl+0x80/0x6a0
| [  142.000690]  [<c11bf570>] ? timerqueue_del+0x20/0x70
| [  142.000699]  [<c10cbde5>] ? kmem_cache_free+0x95/0xa0
| [  142.000708]  [<c10b6d0e>] ? remove_vma+0x3e/0x50
| [  142.000717]  [<c10b9019>] ? do_munmap+0x219/0x2d0
| [  142.000726]  [<c10e78b3>] ? SyS_ioctl+0x43/0x80
| [  142.000735]  [<c1001272>] ? do_fast_syscall_32+0x82/0x110
| [  142.000745]  [<c134644f>] ? sysenter_past_esp+0x40/0x6a
| [  142.000777] ---[ end trace c0ddddf77cdb5434 ]---

Each time an Xv window disappears from view--sometimes with slight
variations in the stacktrace. Do you need full debug info or a bunch more
stacktraces or is this enough to get an idea?

> > Also, I have occasional X server crashes (every few weeks or so) which
> > started with 4.1.9, I think (I had 3.11.0 before that), and I had some kind
> > of problem with Xv not working anymore until reboot with 4.1.9 which hasn't
> > happened with 4.4.5 yet ... do you think any of those would be worth
> > further investigation? If so, any suggestions as to how to split it all
> > into separate issues/how to go about it?
> 
> No idea about X stuff, not my expertise ;-)

Well, I would guess that something that persists until reboot smells like a
kernel/driver bug? Also, IIRC, there was no (major) X server upgrade
between 3.11.0 and 4.1.9, so chances are that's a kernel/driver bug as well
;-)

Regards, Florian