May 7, 2013 09:25:40 PM, Bjorn Helgaas wrote:
> [+cc Phillip]
>
>> I would suspect that Windows' complaint about the BIOS mucking up the MTRRs
>> is likely the best hint. Likely Windows is detecting the problem and fixing
>> it up on resume, thus it only complains about "reduced resume performance".
>> If the MTRRs are messed up, then quite likely parts of RAM have become
>> uncacheable, causing performance to get randomly slaughtered in various
>> ways.
>>
>> From looking at the code it's not clear if we are checking/restoring the
>> MTRR contents after resume. If not, maybe we should be.
>
>I agree; the MTRR warning is a good hint. Artem?
>
>Phillip, I cc'd you because you have similar hardware and your
>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1131468 report is
>slightly similar. Have you seen anything like this "reduced
>performance after resume" issue? If so, can you collect /proc/mtrr
>contents before and after suspending?
>
Like Robert Hancock correctly noted the Linux kernel lacks the code to check
for MTTR changes after resume - I'm not a kernel hacker to write such a code ;-)
Likewise there's no code to see if RAM pages have become uncacheable - i.e
I've no idea how to check it either.
According to /proc/mttr nothing changes on resume - only Windows detects
the discrepancy between MTTR regions on resume. dmesg contains no warnings
or errors (aside from usual ACPI SATA warnings - but they happen right on
boot - so I highly doubt the ACPI or SATA layers can be the culprit, since USB
exhibits a similar performance degradation).
In short, there's little to nothing that I can check.
That bug report has nothing to do with my problem - my PC suspends and
resumes more or less correctly - everything works (albeit some parts don't
work as they should). That person also has a very outdated BIOS - 1904 from
08/15/2011. I wouldn't be surprised if BIOS update solved his problem.
Best regards,
Artem
On Tue, May 7, 2013 at 8:59 AM, Artem S. Tashkinov <[email protected]> wrote:
> May 7, 2013 09:25:40 PM, Bjorn Helgaas wrote:
>> [+cc Phillip]
>>
>>> I would suspect that Windows' complaint about the BIOS mucking up the MTRRs
>>> is likely the best hint. Likely Windows is detecting the problem and fixing
>>> it up on resume, thus it only complains about "reduced resume performance".
>>> If the MTRRs are messed up, then quite likely parts of RAM have become
>>> uncacheable, causing performance to get randomly slaughtered in various
>>> ways.
>>>
>>> From looking at the code it's not clear if we are checking/restoring the
>>> MTRR contents after resume. If not, maybe we should be.
>>
>>I agree; the MTRR warning is a good hint. Artem?
>>
>>Phillip, I cc'd you because you have similar hardware and your
>>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1131468 report is
>>slightly similar. Have you seen anything like this "reduced
>>performance after resume" issue? If so, can you collect /proc/mtrr
>>contents before and after suspending?
>>
>
> Like Robert Hancock correctly noted the Linux kernel lacks the code to check
> for MTTR changes after resume - I'm not a kernel hacker to write such a code ;-)
>
> Likewise there's no code to see if RAM pages have become uncacheable - i.e
> I've no idea how to check it either.
>
> According to /proc/mttr nothing changes on resume - only Windows detects
> the discrepancy between MTTR regions on resume. dmesg contains no warnings
> or errors (aside from usual ACPI SATA warnings - but they happen right on
> boot - so I highly doubt the ACPI or SATA layers can be the culprit, since USB
> exhibits a similar performance degradation).
>
> In short, there's little to nothing that I can check.
I'm not trying to be ungrateful, but maybe you could actually collect
the info we've asked for and attach it to the bugzilla. It's hard for
me to get excited about digging into this when all I see is "nothing
changes in MTRR" and "it's probably not X." I really need some
concrete data to help rule things out and suggest other things to
investigate.
Maybe we won't be able to make progress on this until other people
start hitting similar issues and we can find patterns.
Bjorn
On Tue, May 7, 2013 at 9:59 AM, Artem S. Tashkinov <[email protected]> wrote:
> May 7, 2013 09:25:40 PM, Bjorn Helgaas wrote:
>> [+cc Phillip]
>>
>>> I would suspect that Windows' complaint about the BIOS mucking up the MTRRs
>>> is likely the best hint. Likely Windows is detecting the problem and fixing
>>> it up on resume, thus it only complains about "reduced resume performance".
>>> If the MTRRs are messed up, then quite likely parts of RAM have become
>>> uncacheable, causing performance to get randomly slaughtered in various
>>> ways.
>>>
>>> From looking at the code it's not clear if we are checking/restoring the
>>> MTRR contents after resume. If not, maybe we should be.
>>
>>I agree; the MTRR warning is a good hint. Artem?
>>
>>Phillip, I cc'd you because you have similar hardware and your
>>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1131468 report is
>>slightly similar. Have you seen anything like this "reduced
>>performance after resume" issue? If so, can you collect /proc/mtrr
>>contents before and after suspending?
>>
>
> Like Robert Hancock correctly noted the Linux kernel lacks the code to check
> for MTTR changes after resume - I'm not a kernel hacker to write such a code ;-)
>
> Likewise there's no code to see if RAM pages have become uncacheable - i.e
> I've no idea how to check it either.
>
> According to /proc/mttr nothing changes on resume - only Windows detects
> the discrepancy between MTTR regions on resume. dmesg contains no warnings
> or errors (aside from usual ACPI SATA warnings - but they happen right on
> boot - so I highly doubt the ACPI or SATA layers can be the culprit, since USB
> exhibits a similar performance degradation).
I'm not sure if reading /proc/mtrr actually reads the registers out of
the CPU each time, or whether we just return the cached values we read
out during initial boot-up. If the latter, then this output isn't
really useful as there's no guarantee the values are still intact.
>
> In short, there's little to nothing that I can check.
>
> That bug report has nothing to do with my problem - my PC suspends and
> resumes more or less correctly - everything works (albeit some parts don't
> work as they should). That person also has a very outdated BIOS - 1904 from
> 08/15/2011. I wouldn't be surprised if BIOS update solved his problem.
>
> Best regards,
>
> Artem
On Tue, May 7, 2013 at 12:05 PM, Robert Hancock <[email protected]> wrote:
> On Tue, May 7, 2013 at 9:59 AM, Artem S. Tashkinov <[email protected]> wrote:
>> May 7, 2013 09:25:40 PM, Bjorn Helgaas wrote:
>>> [+cc Phillip]
>>>
>>>> I would suspect that Windows' complaint about the BIOS mucking up the MTRRs
>>>> is likely the best hint. Likely Windows is detecting the problem and fixing
>>>> it up on resume, thus it only complains about "reduced resume performance".
>>>> If the MTRRs are messed up, then quite likely parts of RAM have become
>>>> uncacheable, causing performance to get randomly slaughtered in various
>>>> ways.
>>>>
>>>> From looking at the code it's not clear if we are checking/restoring the
>>>> MTRR contents after resume. If not, maybe we should be.
>>>
>>>I agree; the MTRR warning is a good hint. Artem?
>>>
>>>Phillip, I cc'd you because you have similar hardware and your
>>>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1131468 report is
>>>slightly similar. Have you seen anything like this "reduced
>>>performance after resume" issue? If so, can you collect /proc/mtrr
>>>contents before and after suspending?
>>>
>>
>> Like Robert Hancock correctly noted the Linux kernel lacks the code to check
>> for MTTR changes after resume - I'm not a kernel hacker to write such a code ;-)
>>
>> Likewise there's no code to see if RAM pages have become uncacheable - i.e
>> I've no idea how to check it either.
>>
>> According to /proc/mttr nothing changes on resume - only Windows detects
>> the discrepancy between MTTR regions on resume. dmesg contains no warnings
>> or errors (aside from usual ACPI SATA warnings - but they happen right on
>> boot - so I highly doubt the ACPI or SATA layers can be the culprit, since USB
>> exhibits a similar performance degradation).
>
> I'm not sure if reading /proc/mtrr actually reads the registers out of
> the CPU each time, or whether we just return the cached values we read
> out during initial boot-up. If the latter, then this output isn't
> really useful as there's no guarantee the values are still intact.
Good point. From what I can tell, on Artem's system with "CPU0:
Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz," we would be using
generic_mtrr_ops, and generic_get_mtrr() appears to read from the
MSRs, so I think it should be useful.
Bjorn
On Tue, May 7, 2013 at 10:20 PM, Bjorn Helgaas <[email protected]> wrote:
>> I'm not sure if reading /proc/mtrr actually reads the registers out of
>> the CPU each time, or whether we just return the cached values we read
>> out during initial boot-up. If the latter, then this output isn't
>> really useful as there's no guarantee the values are still intact.
>
> Good point. From what I can tell, on Artem's system with "CPU0:
> Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz," we would be using
> generic_mtrr_ops, and generic_get_mtrr() appears to read from the
> MSRs, so I think it should be useful.
FWIW, that motherboard suffers from a PCI to PCIE bridge problem. It might
have been fixed by bios upgrades by now but not sure.
It might also suffer (depending on the revision) from the Sandy bridge SATA
issue. So if affected, SATA controller is a ticking bomb.
I have a P8H67-V motherboard but I haven't seen any suspend related issues.
If this is totally unrelated I'm sorry for wasting your time. Just thought it
might be good to know.
Thanks
Patrik Jakobsson
On Tue, May 7, 2013 at 2:48 PM, Patrik Jakobsson
<[email protected]> wrote:
> On Tue, May 7, 2013 at 10:20 PM, Bjorn Helgaas <[email protected]> wrote:
>>> I'm not sure if reading /proc/mtrr actually reads the registers out of
>>> the CPU each time, or whether we just return the cached values we read
>>> out during initial boot-up. If the latter, then this output isn't
>>> really useful as there's no guarantee the values are still intact.
>>
>> Good point. From what I can tell, on Artem's system with "CPU0:
>> Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz," we would be using
>> generic_mtrr_ops, and generic_get_mtrr() appears to read from the
>> MSRs, so I think it should be useful.
>
> FWIW, that motherboard suffers from a PCI to PCIE bridge problem. It might
> have been fixed by bios upgrades by now but not sure.
>
> It might also suffer (depending on the revision) from the Sandy bridge SATA
> issue. So if affected, SATA controller is a ticking bomb.
>
> I have a P8H67-V motherboard but I haven't seen any suspend related issues.
>
> If this is totally unrelated I'm sorry for wasting your time. Just thought it
> might be good to know.
Thanks for chiming in. I'm not familiar with either of the issues you
mentioned. Do you have any references where I could read up on them?
Artem's system has a PCIe-to-PCI bridge (not a PCI-to-PCIe bridge) at
05:00.0, but it leads to [bus 06] and there's nothing on bus 06, so I
don't think that's the problem.
And the issue affects both USB and a hard drive, so I suspect it's
more than just SATA. Artem, did you identify the PCI devices leading
to your USB and hard drive? I can't remember if I've actually seen
that.
Bjorn
On Wed, May 8, 2013 at 12:02 AM, Bjorn Helgaas <[email protected]> wrote:
> On Tue, May 7, 2013 at 2:48 PM, Patrik Jakobsson
> <[email protected]> wrote:
>> On Tue, May 7, 2013 at 10:20 PM, Bjorn Helgaas <[email protected]> wrote:
>>>> I'm not sure if reading /proc/mtrr actually reads the registers out of
>>>> the CPU each time, or whether we just return the cached values we read
>>>> out during initial boot-up. If the latter, then this output isn't
>>>> really useful as there's no guarantee the values are still intact.
>>>
>>> Good point. From what I can tell, on Artem's system with "CPU0:
>>> Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz," we would be using
>>> generic_mtrr_ops, and generic_get_mtrr() appears to read from the
>>> MSRs, so I think it should be useful.
>>
>> FWIW, that motherboard suffers from a PCI to PCIE bridge problem. It might
>> have been fixed by bios upgrades by now but not sure.
>>
>> It might also suffer (depending on the revision) from the Sandy bridge SATA
>> issue. So if affected, SATA controller is a ticking bomb.
>>
>> I have a P8H67-V motherboard but I haven't seen any suspend related issues.
>>
>> If this is totally unrelated I'm sorry for wasting your time. Just thought it
>> might be good to know.
>
> Thanks for chiming in. I'm not familiar with either of the issues you
> mentioned. Do you have any references where I could read up on them?
I think this is the official statement from Intel on the SATA issue:
http://newsroom.intel.com/community/intel_newsroom/blog/2011/01/31/intel-identifies-chipset-design-error-implementing-solution
And here's a link to a discussion about the PCIe-to-PCI bridge stuff:
https://lkml.org/lkml/2012/1/30/216
> Artem's system has a PCIe-to-PCI bridge (not a PCI-to-PCIe bridge) at
> 05:00.0, but it leads to [bus 06] and there's nothing on bus 06, so I
> don't think that's the problem.
I meant what you said ;) and yes, it seems unrelated. Both my P8H67 and a
P8P67 I've built behave nicely if nothing is connected.
> And the issue affects both USB and a hard drive, so I suspect it's
> more than just SATA. Artem, did you identify the PCI devices leading
> to your USB and hard drive? I can't remember if I've actually seen
> that.