2002-04-24 03:24:15

by Hong-Gunn Chew

[permalink] [raw]
Subject: File corruption when running VMware.

I have a repeatable problem when running VMware workstation 3.00 and
3.01. The cause is still unknown, and could be VMware itself, the
hardware or the kernel.

When running VMware, a file read from disk can be corrupted and will
stay corrupted in memory in the disk cache. It can be reproduced by
checking the md5sum of a large file (>200MB), with different results
each time when VMware is running.
Further tests shows the corruption occurs only at the 3rd byte of a
16-byte block, and only the LSB is affected.
Load on the machine is minimal and VMware is at the BIOS setup screen.

Has anyone encountered this problem before? I can provide any
additional information that might be useful.

Cheers,
Hong-Gunn

System configuration:
CPU: P4 2.0A 2.0GHz
RAM: 4x256MB RDRAM PC800
MB: ASUS P4-TE firmware:1005
Intel i850
Disk: IBM Deskstar 120GXP 80GB
Graphics: ATI 7500 OEM

Distri: RedHat 7.2
Kernel: 2.4.18
X: Xfree 4.2
glibc: 2.2.4-19.3


2002-04-24 05:02:16

by Rik van Riel

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On Wed, 24 Apr 2002, Hong-Gunn Chew wrote:

> I have a repeatable problem when running VMware workstation 3.00 and
> 3.01. The cause is still unknown, and could be VMware itself, the
> hardware or the kernel.

If you can reproduce it without VMware or with only the
open source part of VMware (ie without any of the binary
only parts) we might have a chance of debugging it.

If it only happens when you're using the binary only
parts of VMware you're probably better off talking to
the VMware people.

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

2002-04-24 10:08:43

by Petr Vandrovec

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On 24 Apr 02 at 12:54, Hong-Gunn Chew wrote:

> Further tests shows the corruption occurs only at the 3rd byte of a
> 16-byte block, and only the LSB is affected.
> Load on the machine is minimal and VMware is at the BIOS setup screen.

Never seen that, and nobody reported that. Are you sure that
vmmon/vmnet modules choosen by vmware-config.pl are correct ones?
If you exprience any problems, you should run 'vmware-config.pl --compile',
it will cause vmmon/vmnet to be build from scratch even if your
kernel looks like one for which precompiled module is available.

And next question - do you use vmmon/vmnet from VMware, or from my
site?
Petr

2002-04-26 15:31:05

by Petr Vandrovec

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On 24 Apr 02 at 2:01, Rik van Riel wrote:
> On Wed, 24 Apr 2002, Hong-Gunn Chew wrote:
>
> > I have a repeatable problem when running VMware workstation 3.00 and
> > 3.01. The cause is still unknown, and could be VMware itself, the
> > hardware or the kernel.
>
> If you can reproduce it without VMware or with only the
> open source part of VMware (ie without any of the binary
> only parts) we might have a chance of debugging it.

Hi again,
one of 2.4.x kernel images available in SuSE's 8.0 has patched&enabled
support for page tables in high memory, and this quickly revealed
incompatibility between VMware's vmmon page table handling and
ptes above directly mapped range.

So if you have >890MB of RAM and your kernel is compiled with support
for pte in high memory, please stop using VMware, or reconfigure your
kernel to not use pte in high memory (4GB config without pte-in-highmem
is OK). Using pte-in-highmem with vmmon will cause kernel oopses and/or
memory corruption :-(

If you do not have >890MB of memory, then reason for your memory corruption
is still unknown to me.
Best regards,
Petr Vandrovec
[email protected]

2002-04-26 16:15:02

by Hong-Gunn Chew

[permalink] [raw]
Subject: RE: File corruption when running VMware.

Hi Petr,

> Hi again,
> one of 2.4.x kernel images available in SuSE's 8.0 has
> patched&enabled
> support for page tables in high memory, and this quickly
> revealed incompatibility between VMware's vmmon page table
> handling and ptes above directly mapped range.
>
> So if you have >890MB of RAM and your kernel is compiled
> with support for pte in high memory, please stop using
> VMware, or reconfigure your
> kernel to not use pte in high memory (4GB config without
> pte-in-highmem is OK). Using pte-in-highmem with vmmon will
> cause kernel oopses and/or
> memory corruption :-(

I do have 1GB of memory. I will try to reconfigure my kernel
and see if there's still a problem.
Thanks for the info!

Cheers,
Hong-Gunn

2002-04-26 23:01:18

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On Fri, Apr 26, 2002 at 05:30:37PM +0200, Petr Vandrovec wrote:
> On 24 Apr 02 at 2:01, Rik van Riel wrote:
> > On Wed, 24 Apr 2002, Hong-Gunn Chew wrote:
> >
> > > I have a repeatable problem when running VMware workstation 3.00 and
> > > 3.01. The cause is still unknown, and could be VMware itself, the
> > > hardware or the kernel.
> >
> > If you can reproduce it without VMware or with only the
> > open source part of VMware (ie without any of the binary
> > only parts) we might have a chance of debugging it.
>
> Hi again,
> one of 2.4.x kernel images available in SuSE's 8.0 has patched&enabled
> support for page tables in high memory, and this quickly revealed
> incompatibility between VMware's vmmon page table handling and
> ptes above directly mapped range.
>
> So if you have >890MB of RAM and your kernel is compiled with support
> for pte in high memory, please stop using VMware, or reconfigure your
> kernel to not use pte in high memory (4GB config without pte-in-highmem
> is OK). Using pte-in-highmem with vmmon will cause kernel oopses and/or

passing to the kernel mem=850M in lilo at boot will be enough.

> memory corruption :-(
>
> If you do not have >890MB of memory, then reason for your memory corruption
> is still unknown to me.
> Best regards,
> Petr Vandrovec
> [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Andrea

2002-04-26 23:56:08

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On Sat, Apr 27, 2002 at 01:01:34AM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 26, 2002 at 05:30:37PM +0200, Petr Vandrovec wrote:
> > On 24 Apr 02 at 2:01, Rik van Riel wrote:
> > > On Wed, 24 Apr 2002, Hong-Gunn Chew wrote:
> > >
> > > > I have a repeatable problem when running VMware workstation 3.00 and
> > > > 3.01. The cause is still unknown, and could be VMware itself, the
> > > > hardware or the kernel.
> > >
> > > If you can reproduce it without VMware or with only the
> > > open source part of VMware (ie without any of the binary
> > > only parts) we might have a chance of debugging it.
> >
> > Hi again,
> > one of 2.4.x kernel images available in SuSE's 8.0 has patched&enabled
> > support for page tables in high memory, and this quickly revealed
> > incompatibility between VMware's vmmon page table handling and
> > ptes above directly mapped range.
> >
> > So if you have >890MB of RAM and your kernel is compiled with support
> > for pte in high memory, please stop using VMware, or reconfigure your
> > kernel to not use pte in high memory (4GB config without pte-in-highmem
> > is OK). Using pte-in-highmem with vmmon will cause kernel oopses and/or
>
> passing to the kernel mem=850M in lilo at boot will be enough.

I downloaded your latest driver from your site (vmware-ws-any-update16
package) and I adjusted it this way:

--- vmware-ws-any-update16/vmmon-only/linux/hostif.c.~1~ Sun Mar 31 20:44:35 2002
+++ vmware-ws-any-update16/vmmon-only/linux/hostif.c Sat Apr 27 01:12:50 2002
@@ -176,7 +176,7 @@
unsigned long pagenr;
pgd_t *pgd;
pmd_t *pmd;
- pte_t *pte;
+ pte_t *ptep, pte;

pgd = pgd_offset(current->mm, addr);
if (pgd_none(*pgd))
@@ -184,10 +184,12 @@
pmd = pmd_offset(pgd, addr);
if (pmd_none(*pmd))
return 0;
- pte = pte_offset(pmd, addr);
- if (!pte_present(*pte))
+ ptep = pte_offset_atomic(pmd, addr);
+ pte = *ptep;
+ pte_kunmap(ptep);
+ if (!pte_present(pte))
return 0;
- pagenr = pte_pagenr(*pte);
+ pagenr = pte_pagenr(pte);
return pagenr;
#else
int pdoffset = PFN_2_PDOFF(ppn);
--- vmware-ws-any-update16/vmnet-only/vmnetInt.h.~1~ Sat Mar 23 04:27:54 2002
+++ vmware-ws-any-update16/vmnet-only/vmnetInt.h Sat Apr 27 01:16:43 2002
@@ -96,10 +96,8 @@
#endif


-#ifndef KERNEL_2_5_5
-# define pte_offset_map(_dir, _address) pte_offset(_dir, _address)
-# define pte_unmap(_pte)
-#endif
+# define pte_offset_map(_dir, _address) pte_offset_atomic(_dir, _address)
+# define pte_unmap(_pte) pte_kunmap(_pte);


#ifndef KERNEL_2_4_8


I'm running the patched driver right now with vmware 3.0 workstataion
on my main desktop with 1G using 2.4.19-pre7 as kernel (pte-highmem
enabled of course). If I'll find any instability of the host OS I'll let
you know, so far it looks solid.

Hope this helps,

Andrea

2002-04-27 00:00:46

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On Sat, Apr 27, 2002 at 01:56:23AM +0200, Andrea Arcangeli wrote:
> [..] using 2.4.19-pre7 [..]

typo sorry, s/pre7/pre7aa2/ obviously (the so patched driver wouldn't
compile against vanilla pre7).

Andrea

2002-04-27 15:36:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: File corruption when running VMware.

On Sat, Apr 27, 2002 at 01:56:23AM +0200, Andrea Arcangeli wrote:
> enabled of course). If I'll find any instability of the host OS I'll let
> you know, so far it looks solid.

The instability appears only during the poweron/resume, I left it running
for a long time and it was solid, but only after I now restarted/stopped
it a few times it showed stability problems still. If the poweron
doesn't reboot the machine then it is solid (that's why I couldn't
notice it yesterday). Also correcting the #if 0 in the patch or adapting
the lower part doesn't help. The big question is: are them the only two
places touching the pagetables? I also wonder why you're using cr3
instead of using the pointer in current->mm, I assume they're different
and that you swap the cr3 internally to the vmware module during ctx
switches of tasks?

thanks,

Andrea