2008-11-29 07:24:50

by Martin Steigerwald

[permalink] [raw]
Subject: hangs with MTRR_SANITIZER? (was: Re: truncated files)


Hi!

CC'd to linux-kernel mailinglist, as that might be something that goes
beyond any possible TuxOnIce or XFS issues. I know I am using TuxOnIce
which is not part of the mainline kernel. And I am even using an
inofficial patch - which I will use again unchanged for the non
MTRR_SANITIZER kernel, in order to know whether its the MTRR_SANITIZER
thing. And anyway before knowing whether it might be MTRR_SANITIZER
related I need to run the non MTRR_SANITIZER kernel for at least a week
and have quite some hibernate cycles. If someone else had issues with
MTRR_SANITIZER I would like to hear about it. Also if someone thinks I am
completely off track on trying to track this down I appreciate a hint.


Links to my posts on xfs and tuxonice-devel mailing lists:

http://oss.sgi.com/pipermail/xfs/2008-November/037399.html

http://lists.tuxonice.net/pipermail/tuxonice-devel/2008-November/000421.html


Am Freitag 28 November 2008 schrieb Martin Steigerwald:
> Am Freitag 28 November 2008 schrieb Martin Steigerwald:
> > Am Mittwoch 26 November 2008 schrieb Dave Chinner:
> > > On Wed, Nov 26, 2008 at 09:49:18AM +0100, Martin Steigerwald wrote:
> > > > Am Dienstag 25 November 2008 schrieb Dave Chinner:
> > > > > On Tue, Nov 25, 2008 at 10:44:14PM +0100, Martin Steigerwald
>
> wrote:
> > > > > > Hi!
> > > > > >
> > > > > > Today on one try to hibernate via tuxonice it machine
> > > > > > appeared dead. I am
> > > > >
> > > > > ^^^^^^^^^
> > > > > When (not if) suspend to disk/resume fails, you get to keep all
> > > > > the broken pieces of your filesystem. It works most of the
> > > > > time, but it has some fundamentally broken corner cases that
> > > > > you probably just hit....
> > > >
> > > > Well I use TuxOnIce for a reason! I had uptimes of up to 70 days
> > > > with it already. And they are usually only interrupted by kernel
> > > > updates or manual shutdowns. I was never convinced by in-kernel
> > > > solutions for hibernate.
> > >
> > > Sure, though I'm not convinced that TuxOnIce is any better because
> > > it still uses the same fundamental design as the in-kernel ones.
> >
> > Might be.
> >
> > But something is fishy here. I had it a second time today. This time
> > I know for sure that the machine freezed hard. Mouse pointer froze
> > and the machine didn't even respond to a ping anymore. Nothing in
> > logs - doesn't surprise me.
> >
> > I didn't have this issue with 2.6.26, and I also don't think I had it
> > with 2.6.27.5. I will downgrade to 2.6.27.5 now.
>
> I wonder about those truncated files nonetheless. As I don't think that
> KDE is writing config files all the time. Well I might be wrong, but I
> didn't even change KDE configuration during time of the crash... OTOH
> XFS uses a in memory inode size and should be safe with the point in
> time when it writes the size to disk as far as I read here. Well this
> time at least again the file "kdeglobals" was affected and this file
> might be written rather often.

Okay, I thought about this a bit more in my dreams this night it seems. I
think it even hangs before starting much of hibernate.

I had this pre-hibernate script:

shambhala:/etc/acpi> bzr cat -r304 hibernate-tuxonice.sh
#!/bin/sh

# Änderung der Netzwerkumgebung erkennen und Zeitserver handeln
/etc/init.d/ifplugd stop
ifdown eth0
/etc/init.d/chrony stop

# Gutnacht
hibernate

# Änderung der Netzwerkumgebung erkennen und Zeitserver handeln
/etc/init.d/chrony start
/etc/init.d/ifplugd start


Thus its logical that pinging the machine did not work anymore, since its
first thing it does is to disable the network in order to detect changes
in network environment between snapshot cycles reliably.

Then it froze hard even before the desktop was locked. AFAIR not even the
hibernater / suspend LED of my ThinkPad T42 started to blink. Thus I
guess it froze way before any serious hibernation work started. It also
didn't yet switch to console. And actually somewhere along the line of
this it might fail.

I at least have the idea that it could have to do with:

│ CONFIG_MTRR_SANITIZER:

│ Convert MTRR layout from continuous to discrete, so X drivers can
│ add writeback entries.

│ Can be disabled with disable_mtrr_cleanup on the kernel command line.
│ The largest mtrr entry size for a continous block can be set with
│ mtrr_chunk_size.

│ If unsure, say N.

Especially as some earlier description of this config option adds an
important detail:

> > +config MTRR_SANITIZER
> > + def_bool y
> > + prompt "MTRR cleanup support"
> > + depends on MTRR
> > + help
> > + Convert MTRR layout from continuous to discrete, so some X
driver
> > + could add WB entries.
> > +
> > + Say N here if you see bootup problems (boot crash, boot hang,
> > + spontaneous reboots).

This one at least sounds interesting. Especially in combintion with the
sentence about X drivers. But then it only speaks about boot related
issues. And here it hanged shortly after I pressed Fn-F12 to start a
snapshot cycle.


shambhala:/etc/acpi> lspci -nn | grep -i vga
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350
[Mobility Radeon 9600 M10] [1002:4e50]

It didn't yet hang with my IBM ThinkPad T23 (SuperSavage) and my
workstation at work (newer ATI Radeon). So that might be another hint.

On the first occurence the machine did not respond to user input very
early, even before serious hibernating work started, too.

> > +
> > + could be disabled with disable_mtrr_cleanup. also
mtrr_chunk_size

http://lkml.org/lkml/2008/4/28/685

Thus I am compiling 2.6.27.7 without MTRR_SANITIZER support, but elsewise
unchanged. And will test whether those hangs will be gone. It didn't
happen with 2.6.27.5 tough. That might just be concurrence or it might
hint at those BIOS corruption prevention patch that came in between
2.6.27.5 and 2.6.27.7. But actually I doubt that the BIOS corruption
prevention patch is at play here.


To avoid the truncated files problems, I will try this:

shambhala:/etc/acpi> cat hibernate-tuxonice.sh
#!/bin/sh

# Zur Sicherheit gleich am Anfang alle ausstehenden Änderungen schreiben
sync

# Änderung der Netzwerkumgebung erkennen und Zeitserver handeln
/etc/init.d/ifplugd stop
ifdown eth0
/etc/init.d/chrony stop

# Zur Sicherheit hier nochmal alle ausstehenden Änderungen schreiben
sync

# Gutnacht
hibernate

# Änderung der Netzwerkumgebung erkennen und Zeitserver handeln
/etc/init.d/chrony start
/etc/init.d/ifplugd start

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


2008-11-29 23:02:57

by Martin Steigerwald

[permalink] [raw]
Subject: Re: hangs with MTRR_SANITIZER? - no, something else

Am Samstag 29 November 2008 schrieb Martin Steigerwald:
> Hi!
>
> CC'd to linux-kernel mailinglist, as that might be something that goes
> beyond any possible TuxOnIce or XFS issues. I know I am using TuxOnIce
> which is not part of the mainline kernel. And I am even using an
> inofficial patch - which I will use again unchanged for the non
> MTRR_SANITIZER kernel, in order to know whether its the MTRR_SANITIZER
> thing. And anyway before knowing whether it might be MTRR_SANITIZER
> related I need to run the non MTRR_SANITIZER kernel for at least a week
> and have quite some hibernate cycles. If someone else had issues with
> MTRR_SANITIZER I would like to hear about it. Also if someone thinks I
> am completely off track on trying to track this down I appreciate a
> hint.

Ok, its not MTRR_SANITIZER. It hung again on hibernate, again before any
serious hibernating work has started. I will add debug output to my
pre-hibernate script as it might hang already in there, maybe while
disabling the network. I want to know whether it hangs before calling the
hibernate script or after it. I think I will go for the latest official
hibernate patch instead of using the inofficial one, although I am not
convinced that it makes much of a difference. Lets see.

The syncs I added to my pre-hibernate seemed to help. KDE configuration is
intact. As a safeguard I rsync ~/.kde to a backup directory before
hibernating anyway.

Lets see what ideas I have to continue that Sherlock Holmes game ;)

I am puzzled that it only happens on my ThinkPad T42, not on the T23 and
neither on the Dell workstation - till now.

Goodnight ;-),
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2008-12-08 09:00:50

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [TuxOnIce-devel] hangs with MTRR_SANITIZER? - solved

Am Sonntag, 30. November 2008 schrieb Martin Steigerwald:
> Am Samstag 29 November 2008 schrieb Martin Steigerwald:
> > Hi!
> >
> > CC'd to linux-kernel mailinglist, as that might be something that goes
> > beyond any possible TuxOnIce or XFS issues. I know I am using TuxOnIce
> > which is not part of the mainline kernel. And I am even using an
> > inofficial patch - which I will use again unchanged for the non
> > MTRR_SANITIZER kernel, in order to know whether its the MTRR_SANITIZER
> > thing. And anyway before knowing whether it might be MTRR_SANITIZER
> > related I need to run the non MTRR_SANITIZER kernel for at least a week
> > and have quite some hibernate cycles. If someone else had issues with
> > MTRR_SANITIZER I would like to hear about it. Also if someone thinks I
> > am completely off track on trying to track this down I appreciate a
> > hint.
>
> Ok, its not MTRR_SANITIZER. It hung again on hibernate, again before any
> serious hibernating work has started. I will add debug output to my
> pre-hibernate script as it might hang already in there, maybe while
> disabling the network. I want to know whether it hangs before calling the
> hibernate script or after it. I think I will go for the latest official
> hibernate patch instead of using the inofficial one, although I am not
> convinced that it makes much of a difference. Lets see.
>
> The syncs I added to my pre-hibernate seemed to help. KDE configuration is
> intact. As a safeguard I rsync ~/.kde to a backup directory before
> hibernating anyway.

Okay... thats solved now.

Conclusions:

1) There was no XFS problem as the sync I added at the beginning of my pre
hibernate script did avoid the truncated files the one time I still had the
hang. Thus those appear to have been IO in flight.

2) Its not MTRR_SANITIZER as explained above nor any other mainline problem.

3) Instead problems gone, when I replaced the inofficial TuxOnIce rc7a for
2.6.26 to 2.6.27 forward port patch I used[1] with the official but still not
officially released current tuxonice for 2.6.27 patch[2].

So sorry for the noise. I just learned to prefer official upstream patches.
Whether they'd be officially released or not. Can ask whether they appear to
be stable before trying one. ;)

[1]
http://lists.tuxonice.net/pipermail/tuxonice-devel/2008-November/000357.html

[2] http://www.tuxonice.net/downloads/all/current-tuxonice-2.6.27.patch.bz2

Ciao,
--
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90


Attachments:
(No filename) (2.49 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments