2010-06-08 18:12:17

by Yuhong Bao

[permalink] [raw]
Subject: Windows side agrees that lowmem corruption is a problem too


Remember the lowmem corruption problems that lead the code that displays this to be added to Linux:AMI BIOS detected: BIOS may corrupt low RAM, working around it.which was IMO way too broad. Good news, the Windows side agree that this is a problem too:http://www.microsoft.com/whdc/system/platform/firmware/mem-corrupt.mspx
Yuhong Bao
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1-


2010-06-08 18:28:24

by Yuhong Bao

[permalink] [raw]
Subject: RE: Windows side agrees that lowmem corruption is a problem too


Adding mingo and gregkh to CC list.
> Remember the lowmem corruption problems that lead the code that displays this to be added to Linux:> AMI BIOS detected: BIOS may corrupt low RAM, working around it.> which was IMO way too broad. Good news, the Windows side agree that this is a problem too:> http://www.microsoft.com/whdc/system/platform/firmware/mem-corrupt.mspx>> Yuhong Bao


_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2-

2010-06-08 19:06:22

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too

On 06/08/2010 11:28 AM, Yuhong Bao wrote:
>
> Adding mingo and gregkh to CC list.
>> Remember the lowmem corruption problems that lead the code that displays this to be added to Linux:> AMI BIOS detected: BIOS may corrupt low RAM, working around it.> which was IMO way too broad. Good news, the Windows side agree that this is a problem too:> http://www.microsoft.com/whdc/system/platform/firmware/mem-corrupt.mspx>> Yuhong Bao
>

Hardly "way too broad". I'm starting to think we should enable it
unconditionally, given the number of machines which have exhibited that
problem. As shown in the whitepaper, Vista/Win7 even avoid using
< 1 MB for a lot of things, presumably for this reason.

If it only was suspend, it would be one thing, but from what I've seen
it has been known to happen at other times too (e.g. HDMI cable insertion!)

-hpa

2010-06-08 19:08:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too


* H. Peter Anvin <[email protected]> wrote:

> On 06/08/2010 11:28 AM, Yuhong Bao wrote:
> >
> > Adding mingo and gregkh to CC list.
> >> Remember the lowmem corruption problems that lead the code that displays this to be added to Linux:> AMI BIOS detected: BIOS may corrupt low RAM, working around it.> which was IMO way too broad. Good news, the Windows side agree that this is a problem too:> http://www.microsoft.com/whdc/system/platform/firmware/mem-corrupt.mspx>> Yuhong Bao
> >
>
> Hardly "way too broad". I'm starting to think we should enable it
> unconditionally, given the number of machines which have exhibited that
> problem. As shown in the whitepaper, Vista/Win7 even avoid using
> < 1 MB for a lot of things, presumably for this reason.
>
> If it only was suspend, it would be one thing, but from what I've seen
> it has been known to happen at other times too (e.g. HDMI cable insertion!)

Yep, patterns of some silly OSD bitmap showed up in one of the corruption -
firmware displaying a 'you inserted a cable' kind of icon somewhere and
messing up the SMM code or so ...

I agree that dis-using <1M by default is probably the sanest option.

Ingo

2010-06-08 19:22:47

by Ondrej Zary

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too

On Tuesday 08 June 2010 21:08:48 Ingo Molnar wrote:
> * H. Peter Anvin <[email protected]> wrote:
> > On 06/08/2010 11:28 AM, Yuhong Bao wrote:
> > > Adding mingo and gregkh to CC list.
> > >
> > >> Remember the lowmem corruption problems that lead the code that
> > >> displays this to be added to Linux:> AMI BIOS detected: BIOS may
> > >> corrupt low RAM, working around it.> which was IMO way too broad. Good
> > >> news, the Windows side agree that this is a problem too:>
> > >> http://www.microsoft.com/whdc/system/platform/firmware/mem-corrupt.msp
> > >>x>> Yuhong Bao
> >
> > Hardly "way too broad". I'm starting to think we should enable it
> > unconditionally, given the number of machines which have exhibited that
> > problem. As shown in the whitepaper, Vista/Win7 even avoid using
> > < 1 MB for a lot of things, presumably for this reason.
> >
> > If it only was suspend, it would be one thing, but from what I've seen
> > it has been known to happen at other times too (e.g. HDMI cable
> > insertion!)
>
> Yep, patterns of some silly OSD bitmap showed up in one of the corruption -
> firmware displaying a 'you inserted a cable' kind of icon somewhere and
> messing up the SMM code or so ...
>
> I agree that dis-using <1M by default is probably the sanest option.

But please limit it to newer systems only (DMI present && year > 200?). There
are many old machines running fine. Losing 1MB from 16MB is a bad thing.

--
Ondrej Zary

2010-06-08 19:31:34

by Yuhong Bao

[permalink] [raw]
Subject: RE: Windows side agrees that lowmem corruption is a problem too


> But please limit it to newer systems only (DMI present && year> 200?). There
> are many old machines running fine. Losing 1MB from 16MB is a bad thing.
I'd just check the amount of extended memory available.
Yuhong Bao
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1-

2010-06-08 20:31:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too

On 06/08/2010 12:22 PM, Ondrej Zary wrote:
>>
>> Yep, patterns of some silly OSD bitmap showed up in one of the corruption -
>> firmware displaying a 'you inserted a cable' kind of icon somewhere and
>> messing up the SMM code or so ...
>>
>> I agree that dis-using <1M by default is probably the sanest option.
>
> But please limit it to newer systems only (DMI present && year > 200?). There
> are many old machines running fine. Losing 1MB from 16MB is a bad thing.
>

Disusing 64K is something we can do unconditionally (especially since
we're only talking about 60K -- 15 pages -- of actually usable memory
anyway.)

Dropping all the low 0.6 MB (which is what it really is) is probably
unacceptable by default, but perhaps it makes sense to use it only for
ZONE_DMA or something.

-hpa

2010-06-08 20:49:34

by Yuhong Bao

[permalink] [raw]
Subject: RE: Windows side agrees that lowmem corruption is a problem too


> Disusing 64K is something we can do unconditionally (especially since
> we're only talking about 60K -- 15 pages -- of actually usable memory
> anyway.)
Unless you really have no extended memory, agreed.
> Dropping all the low 0.6 MB (which is what it really is) is probably
> unacceptable by default, but perhaps it makes sense to use it only for
> ZONE_DMA or something.
For example, for really old 8-bit ISA devices that can only address 20-bit of address space and do not use the system 8237 DMA controller.
Yuhong Bao
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5-

2010-06-08 21:53:57

by Alan

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too

> > I agree that dis-using <1M by default is probably the sanest option.
>
> But please limit it to newer systems only (DMI present && year > 200?). There
> are many old machines running fine. Losing 1MB from 16MB is a bad thing.

Losing the low 1MB is bad thing anyway for things like firmware flashing
and other weird crap that needs low pages (floppy controllers etc).

Losing 64K (but reporting corruption in it in a big scary way) is
probably sensible for distributions, but its a config item so its policy
so that wouldn't be a problem.

It has to be painful to the vendors so they get complaints, reports and
support call costs. Otherwise they won't have the correct incentives to
fix their mess.

Alan

2010-06-08 21:58:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too

On 06/08/2010 02:56 PM, Alan Cox wrote:
>>> I agree that dis-using <1M by default is probably the sanest option.
>>
>> But please limit it to newer systems only (DMI present && year > 200?). There
>> are many old machines running fine. Losing 1MB from 16MB is a bad thing.
>
> Losing the low 1MB is bad thing anyway for things like firmware flashing
> and other weird crap that needs low pages (floppy controllers etc).
>
> Losing 64K (but reporting corruption in it in a big scary way) is
> probably sensible for distributions, but its a config item so its policy
> so that wouldn't be a problem.
>
> It has to be painful to the vendors so they get complaints, reports and
> support call costs. Otherwise they won't have the correct incentives to
> fix their mess.

We have already functionally lost 64K on all existing machines... I
think the current blacklist covers 90% or more of all systems in
existence, and we keep filling in the few holes that remain.

Adding the remaining half-megabyte of RAM really shouldn't be done
unconditionally, but as I said it could plausibly be reserved for
ZONE_DMA only.

-hpa

2010-06-09 01:08:32

by Yuhong Bao

[permalink] [raw]
Subject: RE: Windows side agrees that lowmem corruption is a problem too


>> It has to be painful to the vendors so they get complaints, reports and
>> support call costs. Otherwise they won't have the correct incentives to
>> fix their mess.
Notice that Windows 7 logs an event log entry when this is detected during sleep, even though they don't use the low meg at all.
> We have already functionally lost 64K on all existing machines... I
> think the current blacklist covers 90% or more of all systems in
> existence, and we keep filling in the few holes that remain.
Which was why I said it was too broad.

If you really mean to do it unconditionally, just do it unconditionally.

Yuhong Bao

_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3-

2010-06-11 01:15:30

by Robert Hancock

[permalink] [raw]
Subject: Re: Windows side agrees that lowmem corruption is a problem too

On 06/08/2010 02:31 PM, H. Peter Anvin wrote:
> On 06/08/2010 12:22 PM, Ondrej Zary wrote:
>>>
>>> Yep, patterns of some silly OSD bitmap showed up in one of the corruption -
>>> firmware displaying a 'you inserted a cable' kind of icon somewhere and
>>> messing up the SMM code or so ...
>>>
>>> I agree that dis-using<1M by default is probably the sanest option.
>>
>> But please limit it to newer systems only (DMI present&& year> 200?). There
>> are many old machines running fine. Losing 1MB from 16MB is a bad thing.
>>
>
> Disusing 64K is something we can do unconditionally (especially since
> we're only talking about 60K -- 15 pages -- of actually usable memory
> anyway.)
>
> Dropping all the low 0.6 MB (which is what it really is) is probably
> unacceptable by default, but perhaps it makes sense to use it only for
> ZONE_DMA or something.

According to the document, "Neither Windows Vista nor Windows 7 stores
operating system code and data in the lowest 1 MB of physical memory,
regardless of whether Windows is running on real or virtualized
hardware", so doing the same in general might not be a bad thing (unless
we have less than a certain amount of RAM).

They're also checksumming the low 1MB and writing an event log entry if
corruption is detected after sleep events, so if WHQL tests start
checking for that, maybe these bugs will start going away on new
machines. Of course, on some machines the corruption apparently happens
other times as well, so who knows..