2001-11-08 16:41:13

by Frank de Lange

[permalink] [raw]
Subject: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

Hi'all...

[ disclaimer: yeah, vmware, yeah tainted kernel, yeah yeah ]

It seems 2.4.14 and vmware 3.0.x don't like eachother very much on my SMP (yeah
Abit, yeah yeah I know) box. I've seen several hangs (nothing logged, no
warning, no nothing) using this combination. The same box, running the same
vmware but 2.4.13-ac instead does not complain...

Sooooo.... there seems to be something going on there. As vmware loads its own
kernel modules (licensed under who knows what? The source is available and
hackable), it could be a bug in those modules. Then again, as it does not occur
on the -ac series, it could be in the kernel as well. As there's nothing to be
seen in the logs (it just freezes solid), there's nothing more to report
currently...

Cheers//Frank

--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]


2001-11-08 20:09:03

by Petr Vandrovec

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

On 8 Nov 01 at 17:39, Frank de Lange wrote:
>
> It seems 2.4.14 and vmware 3.0.x don't like eachother very much on my SMP (yeah
> Abit, yeah yeah I know) box. I've seen several hangs (nothing logged, no
> warning, no nothing) using this combination. The same box, running the same
> vmware but 2.4.13-ac instead does not complain...

Yeah. Use Alan's kernels with VMware. These are one which I daily tests
and for which I can say that they works (== do not use VMware with
2.4.13-ac8, vmmon will not restore correct %cr2 value under some
conditions, use -ac7 until it is clear whether non-standard %cr2 usage
is going to stay or not).

> Sooooo.... there seems to be something going on there. As vmware loads its own
> kernel modules (licensed under who knows what? The source is available and
> hackable), it could be a bug in those modules. Then again, as it does not occur
> on the -ac series, it could be in the kernel as well. As there's nothing to be
> seen in the logs (it just freezes solid), there's nothing more to report
> currently...

Is it really solid freeze (what does alt-sysrq-s,u,s,b)?
Thanks,
Petr Vandrovec
[email protected]

2001-11-08 20:38:27

by Frank de Lange

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

On Thu, Nov 08, 2001 at 09:08:10PM +0000, Petr Vandrovec wrote:
> Is it really solid freeze (what does alt-sysrq-s,u,s,b)?

Solid as a rock, nothing responds anymore. You can sit an elephant on the
keyboard and it won't respond.

Only the big white switch helps (fsck'ing 80 gigs gives me enough time to make
a good cup of coffee... time for ext3 in the main kernel series...)

Have you investigated the problems any further? I mean, does it hang in the
vmware module (probably vmmon as it does not seem to be related to network or
other peripheral activity), or is it somewhere in the main kernel code?

[ maybe I should give up and just install that kernel debugger... ]

Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-11-08 21:24:59

by Petr Vandrovec

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

On 8 Nov 01 at 21:36, Frank de Lange wrote:
> On Thu, Nov 08, 2001 at 09:08:10PM +0000, Petr Vandrovec wrote:
> > Is it really solid freeze (what does alt-sysrq-s,u,s,b)?
>
> Solid as a rock, nothing responds anymore. You can sit an elephant on the
> keyboard and it won't respond.
>
> Only the big white switch helps (fsck'ing 80 gigs gives me enough time to make
> a good cup of coffee... time for ext3 in the main kernel series...)

Journaling will not do anything good in such case, as damaged kernel
could write damaged data to your harddisk. You should run full fsck
after every such lockup even if you are using journaled filesystem -
- unless you are 100% sure that kernel really stoped doing anything
instead of that it started doing strange things.

> Have you investigated the problems any further? I mean, does it hang in the
> vmware module (probably vmmon as it does not seem to be related to network or
> other peripheral activity), or is it somewhere in the main kernel code?

As there are no loops in vmmon, it is mathematically provable that it
did not end in endless loop with interrupts disabled inside vmmon ;-)
But it could die anywhere else.

Probably it is time for me to try Linus's kernel, but I have so perfect
exprience with Alan ones that I'm a bit reluctant to do that.
Best regards,
Petr Vandrovec
[email protected]

2001-11-08 21:37:29

by Frank de Lange

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

On Thu, Nov 08, 2001 at 10:24:05PM +0000, Petr Vandrovec wrote:
> Journaling will not do anything good in such case, as damaged kernel
> could write damaged data to your harddisk. You should run full fsck
> after every such lockup even if you are using journaled filesystem -
> - unless you are 100% sure that kernel really stoped doing anything
> instead of that it started doing strange things.

Hmmm, with ext3 that would not help you very much I think, given that the
journal is replayed before the fsck is performed (fsck can replay a journal
file). So if there's garbage in the journal, it might make its way into the
filesystem. Or it might confuse fsck...

I use reiserfs on another disk in the same box, which does not suffer from the
long fsck times, but does put a quite heavy load on the CPU during intense file
operations. Simple tests with ext3 seem to indicate that it suffers less from
this problem.

> Probably it is time for me to try Linus's kernel, but I have so perfect
> exprience with Alan ones that I'm a bit reluctant to do that.

Yeah, same here. I decided to try a Linus kernel 'cause of some unexplained and
unwarranted slowdowns - especially in interactive applications.

With mixed results, the slowdowns seem to be gone but other problems appear
(like the vmware hangs). I'm back to -ac for the moment (running 2.4.13-ac5,
waiting for others to bend or break the IDE patches :-)

Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-11-08 22:35:31

by Petr Vandrovec

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

On 8 Nov 01 at 22:24, Frank de Lange <frank@unterne wrote:
> > Only the big white switch helps (fsck'ing 80 gigs gives me enough time to make
> > a good cup of coffee... time for ext3 in the main kernel series...)

I tried Win2k and Netware6 (both 128MB) virtual machines running on
the top of 2.4.15-pre1. My machine has 256MB RAM, SMP PIII/800. Maybe
that it even worked faster than under Alan's kernel, specially when
I tried to crash it by creating Netware6 256MB virtual machine - it
stresses 256MB system a bit - but to my surprise system still stayed very
responsive, without any crash, of course. Only usual complaints when
running Netware6 as guest: 'rtc: lost some interrupts at 512Hz.'.

So I was not able to reproduce it. As I said, my system has 256MB of memory,
two PIII/800, 40GB IDE, some tulip based network card, and on background
it grabs some TV pictures using two bt878 pieces - so pretty standard box,
running unstable Debian with 4.1 XFree on mga g450.

There was no other activity during the testing AFAIK.

If you see something different from your box, or from your VMs, tell me.
But adding some SCSI adapter is beyond PCI slots of my box. I also
assume that you are using released VMware version, build 1455.
Best regards,
Petr Vandrovec
[email protected]

2001-11-08 22:41:51

by Frank de Lange

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

On Thu, Nov 08, 2001 at 11:34:38PM +0000, Petr Vandrovec wrote:
> If you see something different from your box, or from your VMs, tell me.
> But adding some SCSI adapter is beyond PCI slots of my box. I also
> assume that you are using released VMware version, build 1455.

Yeah, using VMware build 1455 on ABit BP-6 with 2 * Celeron 466@466, 768 MB RAM
(dirt cheap nowadays so why not...), 2 * Maxtor 40GB IDE on BX controller, HPT
controller not in use, Matrox G400. I've seen the rtc: blah errors, stressed
the box to its limits with VM's with Linux/WinXP, and every now and then...

it just freezes... (only when using a Linus kernel, and only when using VMware)

I'll try 2.4.15pre, maybe it helps...

Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-11-08 23:13:25

by Alan

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

> and for which I can say that they works (== do not use VMware with
> 2.4.13-ac8, vmmon will not restore correct %cr2 value under some
> conditions, use -ac7 until it is clear whether non-standard %cr2 usage
> is going to stay or not).

%cr2 doesnt work out. Don't worry about it

Alan

2001-11-09 13:56:40

by Alessandro Suardi

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

Frank de Lange wrote:
>
> On Thu, Nov 08, 2001 at 11:34:38PM +0000, Petr Vandrovec wrote:
> > If you see something different from your box, or from your VMs, tell me.
> > But adding some SCSI adapter is beyond PCI slots of my box. I also
> > assume that you are using released VMware version, build 1455.
>
> Yeah, using VMware build 1455 on ABit BP-6 with 2 * Celeron 466@466, 768 MB RAM
> (dirt cheap nowadays so why not...), 2 * Maxtor 40GB IDE on BX controller, HPT
> controller not in use, Matrox G400. I've seen the rtc: blah errors, stressed
> the box to its limits with VM's with Linux/WinXP, and every now and then...
>
> it just freezes... (only when using a Linus kernel, and only when using VMware)
>
> I'll try 2.4.15pre, maybe it helps...

2.4.14 + VMWare 1455 froze on my laptop the other day, hard hang, no
keyboard, had to keep poweroff pressed for several seconds to shut
the box down. Ext3 though minimized quite nicely startup time :)

I can't now try to go much further - I zapped the original NT install
I didn't perform properly but my PageUp/PageDown keys aren't working
anymore (hardware bug, sigh) and I can't scroll past the NT EULA as
it wants PageDown :( Next week I'm getting the keyboard fixed, so I
will resume testing.

The full installation though in the various attempts I performed went
through twice whilst I had only 1 freeze.

This is a Dell Latitude CPx750J, 256MB RAM, 20GB IBM disk, Xircom
RBEM56G100TX network card.

--alessandro

"we live as we dream alone / to break the spell we mix with the others
we were not born in isolation / but sometimes it seems that way"
(R.E.M., live intro to 'World Leader Pretend')

2001-11-09 19:27:59

by Todd M. Roy

[permalink] [raw]
Subject: Re: hang with 2.4.14 & vmware 3.0.x, anyone else seen this?

As a matter of fact, yes. In fact, I think this was actually to problem
I encountered a few days ago that I reported (I thought it was a loopback
problem). I'm running on a DELL Optiplex GX1 though.

-- todd --

--text follows this line--
> X-Apparently-To: [email protected] via web13603.mail.yahoo.com; 08 Nov 2001 14:44:59 -0800 (PST)
> X-RocketRCL: -1;0;0
> X-Track: 1: 40
> X-Authentication-Warning: http://www.unternet.org: frank set sender to [email protected] using -f
> Date: Thu, 8 Nov 2001 23:39:54 +0100
> From: Frank de Lange <[email protected]>
> Cc: [email protected]
> Content-Disposition: inline
> Sender: [email protected]
> X-Mailing-List: [email protected]
>
> On Thu, Nov 08, 2001 at 11:34:38PM +0000, Petr Vandrovec wrote:
> > If you see something different from your box, or from your VMs, tell me.
> > But adding some SCSI adapter is beyond PCI slots of my box. I also
> > assume that you are using released VMware version, build 1455.
>
> Yeah, using VMware build 1455 on ABit BP-6 with 2 * Celeron 466@466, 768 MB RAM
> (dirt cheap nowadays so why not...), 2 * Maxtor 40GB IDE on BX controller, HPT
> controller not in use, Matrox G400. I've seen the rtc: blah errors, stressed
> the box to its limits with VM's with Linux/WinXP, and every now and then...
>
> it just freezes... (only when using a Linus kernel, and only when using VMware)
>
> I'll try 2.4.15pre, maybe it helps...
>
> Cheers//Frank
> --
> WWWWW _______________________
> ## o o\ / Frank de Lange \
> }# \| / \
> ##---# _/ <Hacker for Hire> \
> #### \ +31-320-252965 /
> \ [email protected] /
> -------------------------
> [ "Omnis enim res, quae dando non deficit, dum habetur
> et non datur, nondum habetur, quomodo habenda est." ]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>