Since rebasing our dev trees on v4.15-rc2 a bunch of our systems are failing to resume from S3. I've bisected it to the following commit
commit ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1 (refs/bisect/bad)
Author: Andy Lutomirski <[email protected]>
Date: Wed Nov 22 20:39:16 2017 -0800
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
When reverting this on the tip of our tree (https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next) we still observe the issue, though, so there must be more going on.
I only observe this issue when CONFIG_TRACE_IRQFLAGS is on, but in order to disable it I'll also have to disable things like CONFIG_LOCKDEP and CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which is less than ideal.
Attached are my .config, dmesg.log (up until suspend) and a system configuration log with lshw, lspci, lsmod, and distro info.
I could find a ton of info online when searching for this. The only two things I found were a mention by Intel IGT guys of the same or similar issue and a revert from Greg K-H for the 4.14 stable tree.
* https://bugs.freedesktop.org/show_bug.cgi?id=103936
* https://patchwork.kernel.org/patch/10090797/
Has anyone else seen this?
Please let me know if there's anything else I can do to help find the root cause and fix for this.
Thanks,
Harry
> On Jan 5, 2018, at 1:00 PM, Harry Wentland <[email protected]> wrote:
>
> Since rebasing our dev trees on v4.15-rc2 a bunch of our systems are failing to resume from S3. I've bisected it to the following commit
>
> commit ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1 (refs/bisect/bad)
> Author: Andy Lutomirski <[email protected]>
> Date: Wed Nov 22 20:39:16 2017 -0800
>
> x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
>
> When reverting this on the tip of our tree (https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next) we still observe the issue, though, so there must be more going on.
>
> I only observe this issue when CONFIG_TRACE_IRQFLAGS is on, but in order to disable it I'll also have to disable things like CONFIG_LOCKDEP and CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which is less than ideal.
>
> Attached are my .config, dmesg.log (up until suspend) and a system configuration log with lshw, lspci, lsmod, and distro info.
>
> I could find a ton of info online when searching for this. The only two things I found were a mention by Intel IGT guys of the same or similar issue and a revert from Greg K-H for the 4.14 stable tree.
> * https://bugs.freedesktop.org/show_bug.cgi?id=103936
> * https://patchwork.kernel.org/patch/10090797/
>
> Has anyone else seen this?
>
> Please let me know if there's anything else I can do to help find the root cause and fix for this.
It's a known issue, and it should be fixed in newer -rc kernels.
On 2018-01-05 04:28 PM, Andy Lutomirski wrote:
> It's a known issue, and it should be fixed in newer -rc kernels.
>
I'm still seeing this on v4.15-rc6. Will I need rc7 or the latest x86 merges in linus's master?
Thanks,
Harry
> On Jan 5, 2018, at 1:51 PM, Harry Wentland <[email protected]> wrote:
>
>> On 2018-01-05 04:28 PM, Andy Lutomirski wrote:
>> It's a known issue, and it should be fixed in newer -rc kernels.
>>
>
> I'm still seeing this on v4.15-rc6. Will I need rc7 or the latest x86 merges in linus's master?
>
No, it should be fine in -rc6. I don't suppose you could try bisecting but reverting the native_load_gs_index change on each iteration?
> Thanks,
> Harry
On 2018-01-05 06:07 PM, Andy Lutomirski wrote:
>
>
>> On Jan 5, 2018, at 1:51 PM, Harry Wentland <[email protected]> wrote:
>>
>>> On 2018-01-05 04:28 PM, Andy Lutomirski wrote:
>>> It's a known issue, and it should be fixed in newer -rc kernels.
>>>
>>
>> I'm still seeing this on v4.15-rc6. Will I need rc7 or the latest x86 merges in linus's master?
>>
>
> No, it should be fine in -rc6. I don't suppose you could try bisecting but reverting the native_load_gs_index change on each iteration?
(fixing up the amd-gfx address)
I'll give that a try on Monday when I'm back at work.
Harry
>
>> Thanks,
>> Harry
On 2018-01-06 09:55 AM, Harry Wentland wrote:
> On 2018-01-05 06:07 PM, Andy Lutomirski wrote:
>>
>>
>>> On Jan 5, 2018, at 1:51 PM, Harry Wentland <[email protected]> wrote:
>>>
>>>> On 2018-01-05 04:28 PM, Andy Lutomirski wrote:
>>>> It's a known issue, and it should be fixed in newer -rc kernels.
>>>>
>>>
>>> I'm still seeing this on v4.15-rc6. Will I need rc7 or the latest x86 merges in linus's master?
>>>
>>
>> No, it should be fine in -rc6. I don't suppose you could try bisecting but reverting the native_load_gs_index change on each iteration?
>
> (fixing up the amd-gfx address)
>
> I'll give that a try on Monday when I'm back at work.
Apologies for the very late response. I mostly blame the flu.
Anyways, things are working now. Tested rc5, rc6, and rc8+ and all look good.
Not sure why I saw issues with rc6 before. I must've missed something.
Harry
>
> Harry
>
>>
>>> Thanks,
>>> Harry
> _______________________________________________
> amd-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>