2011-03-23 16:31:34

by Richard Weinberger

[permalink] [raw]
Subject: Corrupted files after suspend to disk

Hi,

I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
running Linux 2.6.37.4.
After resuming from s2disk some files are corrupted.
But when I reboot my netbook everything seems good again.

When I saw the problem the first time the ls command segfaulted always.
I did a reboot and it worked again.

A few days later zypper crashed. After a reboot it worked again.
And today ssh crashed. I looked a bit closer and saw it crashed
somewhere within libcrypto.
So I made copy libcrypto and rebooted.
After the reboot ssh worked again but libcrypto and the copy of it hat
a different sha1 sum!
WTF?!

Is this a known issue?

dmesgs and config are attached.

The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
(default from suse).
I'm using ext3 as root filesystem.
What else do you need?

--
Thanks,
//richard


Attachments:
dmesg_bad.txt (140.50 kB)
dmesg_good.txt (52.45 kB)
config (121.21 kB)
Download all attachments

2011-03-23 20:36:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> Hi,
>
> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> running Linux 2.6.37.4.
> After resuming from s2disk some files are corrupted.
> But when I reboot my netbook everything seems good again.
>
> When I saw the problem the first time the ls command segfaulted always.
> I did a reboot and it worked again.
>
> A few days later zypper crashed. After a reboot it worked again.
> And today ssh crashed. I looked a bit closer and saw it crashed
> somewhere within libcrypto.
> So I made copy libcrypto and rebooted.
> After the reboot ssh worked again but libcrypto and the copy of it hat
> a different sha1 sum!
> WTF?!
>
> Is this a known issue?

No.

> dmesgs and config are attached.
>
> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> (default from suse).
> I'm using ext3 as root filesystem.
> What else do you need?

Whatever you can do to narrow down the problem. At the moment I only know
that it's there.

Thanks,
Rafael

2011-03-23 21:49:46

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

2011/3/23 Rafael J. Wysocki <[email protected]>:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> Hi,
>>
>> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> running Linux 2.6.37.4.
>> After resuming from s2disk some files are corrupted.
>> But when I reboot my netbook everything seems good again.
>>
>> When I saw the problem the first time the ls command segfaulted always.
>> I did a reboot and it worked again.
>>
>> A few days later zypper crashed. After a reboot it worked again.
>> And today ssh crashed. I looked a bit closer and saw it crashed
>> somewhere within libcrypto.
>> So I made copy libcrypto and rebooted.
>> After the reboot ssh worked again but libcrypto and the copy of it hat
>> a different sha1 sum!
>> WTF?!
>>
>> Is this a known issue?
>
> No.
>
>> dmesgs and config are attached.
>>
>> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> (default from suse).
>> I'm using ext3 as root filesystem.
>> What else do you need?
>
> Whatever you can do to narrow down the problem. ?At the moment I only know
> that it's there.

I can reproduce the problem now.
After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
It's always a very basic lib like libcrypto, libglib which is used all
the time on my system.

Maybe it's an issue like this one?
https://lkml.org/lkml/2010/12/2/339

> Thanks,
> Rafael
>

--
Thanks,
//richard

2011-03-23 22:11:27

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> Hi,
> >>
> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >> running Linux 2.6.37.4.
> >> After resuming from s2disk some files are corrupted.
> >> But when I reboot my netbook everything seems good again.
> >>
> >> When I saw the problem the first time the ls command segfaulted always.
> >> I did a reboot and it worked again.
> >>
> >> A few days later zypper crashed. After a reboot it worked again.
> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >> somewhere within libcrypto.
> >> So I made copy libcrypto and rebooted.
> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >> a different sha1 sum!
> >> WTF?!
> >>
> >> Is this a known issue?
> >
> > No.
> >
> >> dmesgs and config are attached.
> >>
> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >> (default from suse).
> >> I'm using ext3 as root filesystem.
> >> What else do you need?
> >
> > Whatever you can do to narrow down the problem. At the moment I only know
> > that it's there.
>
> I can reproduce the problem now.
> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> It's always a very basic lib like libcrypto, libglib which is used all
> the time on my system.

Those files are never intentionally modified, right?

> Maybe it's an issue like this one?
> https://lkml.org/lkml/2010/12/2/339

It might have if that patch hadn't been merged before 2.6.37.

Is the system 32-bit or 64-bit?

Rafael

2011-03-23 22:16:28

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> Hi,
>> >>
>> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> running Linux 2.6.37.4.
>> >> After resuming from s2disk some files are corrupted.
>> >> But when I reboot my netbook everything seems good again.
>> >>
>> >> When I saw the problem the first time the ls command segfaulted always.
>> >> I did a reboot and it worked again.
>> >>
>> >> A few days later zypper crashed. After a reboot it worked again.
>> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> somewhere within libcrypto.
>> >> So I made copy libcrypto and rebooted.
>> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> a different sha1 sum!
>> >> WTF?!
>> >>
>> >> Is this a known issue?
>> >
>> > No.
>> >
>> >> dmesgs and config are attached.
>> >>
>> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> (default from suse).
>> >> I'm using ext3 as root filesystem.
>> >> What else do you need?
>> >
>> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> > that it's there.
>>
>> I can reproduce the problem now.
>> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> It's always a very basic lib like libcrypto, libglib which is used all
>> the time on my system.
>
> Those files are never intentionally modified, right?
>
>> Maybe it's an issue like this one?
>> https://lkml.org/lkml/2010/12/2/339
>
> It might have if that patch hadn't been merged before 2.6.37.
>
> Is the system 32-bit or 64-bit?

It's a 32-bit system.
cmp shows that the corrupted files differ in many bytes (not scattered).
The corrupted bytes are always 0 or 252.

> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

--
Thanks,
//richard

2011-03-23 22:22:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> >> Hi,
> >> >>
> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >> >> running Linux 2.6.37.4.
> >> >> After resuming from s2disk some files are corrupted.
> >> >> But when I reboot my netbook everything seems good again.
> >> >>
> >> >> When I saw the problem the first time the ls command segfaulted always.
> >> >> I did a reboot and it worked again.
> >> >>
> >> >> A few days later zypper crashed. After a reboot it worked again.
> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >> >> somewhere within libcrypto.
> >> >> So I made copy libcrypto and rebooted.
> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >> >> a different sha1 sum!
> >> >> WTF?!
> >> >>
> >> >> Is this a known issue?
> >> >
> >> > No.
> >> >
> >> >> dmesgs and config are attached.
> >> >>
> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >> >> (default from suse).
> >> >> I'm using ext3 as root filesystem.
> >> >> What else do you need?
> >> >
> >> > Whatever you can do to narrow down the problem. At the moment I only know
> >> > that it's there.
> >>
> >> I can reproduce the problem now.
> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> >> It's always a very basic lib like libcrypto, libglib which is used all
> >> the time on my system.
> >
> > Those files are never intentionally modified, right?
> >
> >> Maybe it's an issue like this one?
> >> https://lkml.org/lkml/2010/12/2/339
> >
> > It might have if that patch hadn't been merged before 2.6.37.
> >
> > Is the system 32-bit or 64-bit?
>
> It's a 32-bit system.
> cmp shows that the corrupted files differ in many bytes (not scattered).
> The corrupted bytes are always 0 or 252.

Do I understand correctly that the files apparently corrupted after resume
are not corrupted any more when you reboot?

Rafael

2011-03-23 22:30:41

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> >> running Linux 2.6.37.4.
>> >> >> After resuming from s2disk some files are corrupted.
>> >> >> But when I reboot my netbook everything seems good again.
>> >> >>
>> >> >> When I saw the problem the first time the ls command segfaulted always.
>> >> >> I did a reboot and it worked again.
>> >> >>
>> >> >> A few days later zypper crashed. After a reboot it worked again.
>> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> >> somewhere within libcrypto.
>> >> >> So I made copy libcrypto and rebooted.
>> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> >> a different sha1 sum!
>> >> >> WTF?!
>> >> >>
>> >> >> Is this a known issue?
>> >> >
>> >> > No.
>> >> >
>> >> >> dmesgs and config are attached.
>> >> >>
>> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> >> (default from suse).
>> >> >> I'm using ext3 as root filesystem.
>> >> >> What else do you need?
>> >> >
>> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> >> > that it's there.
>> >>
>> >> I can reproduce the problem now.
>> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> >> It's always a very basic lib like libcrypto, libglib which is used all
>> >> the time on my system.
>> >
>> > Those files are never intentionally modified, right?
>> >
>> >> Maybe it's an issue like this one?
>> >> https://lkml.org/lkml/2010/12/2/339
>> >
>> > It might have if that patch hadn't been merged before 2.6.37.
>> >
>> > Is the system 32-bit or 64-bit?
>>
>> It's a 32-bit system.
>> cmp shows that the corrupted files differ in many bytes (not scattered).
>> The corrupted bytes are always 0 or 252.
>
> Do I understand correctly that the files apparently corrupted after resume
> are not corrupted any more when you reboot?

Yes.
Seems like a cache issue.


> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

--
Thanks,
//richard

2011-03-23 22:34:03

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> Hi,
>> >>
>> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> running Linux 2.6.37.4.
>> >> After resuming from s2disk some files are corrupted.
>> >> But when I reboot my netbook everything seems good again.
>> >>
>> >> When I saw the problem the first time the ls command segfaulted always.
>> >> I did a reboot and it worked again.
>> >>
>> >> A few days later zypper crashed. After a reboot it worked again.
>> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> somewhere within libcrypto.
>> >> So I made copy libcrypto and rebooted.
>> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> a different sha1 sum!
>> >> WTF?!
>> >>
>> >> Is this a known issue?
>> >
>> > No.
>> >
>> >> dmesgs and config are attached.
>> >>
>> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> (default from suse).
>> >> I'm using ext3 as root filesystem.
>> >> What else do you need?
>> >
>> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> > that it's there.
>>
>> I can reproduce the problem now.
>> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> It's always a very basic lib like libcrypto, libglib which is used all
>> the time on my system.
>
> Those files are never intentionally modified, right?

Sorry, I've overlooked this question.
Yes, they have never been modified.
I've double checked it.

>> Maybe it's an issue like this one?
>> https://lkml.org/lkml/2010/12/2/339
>
> It might have if that patch hadn't been merged before 2.6.37.
>
> Is the system 32-bit or 64-bit?
>
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

--
Thanks,
//richard

2011-03-23 23:00:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >> >> >> running Linux 2.6.37.4.
> >> >> >> After resuming from s2disk some files are corrupted.
> >> >> >> But when I reboot my netbook everything seems good again.
> >> >> >>
> >> >> >> When I saw the problem the first time the ls command segfaulted always.
> >> >> >> I did a reboot and it worked again.
> >> >> >>
> >> >> >> A few days later zypper crashed. After a reboot it worked again.
> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >> >> >> somewhere within libcrypto.
> >> >> >> So I made copy libcrypto and rebooted.
> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >> >> >> a different sha1 sum!
> >> >> >> WTF?!
> >> >> >>
> >> >> >> Is this a known issue?
> >> >> >
> >> >> > No.
> >> >> >
> >> >> >> dmesgs and config are attached.
> >> >> >>
> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >> >> >> (default from suse).
> >> >> >> I'm using ext3 as root filesystem.
> >> >> >> What else do you need?
> >> >> >
> >> >> > Whatever you can do to narrow down the problem. At the moment I only know
> >> >> > that it's there.
> >> >>
> >> >> I can reproduce the problem now.
> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> >> >> It's always a very basic lib like libcrypto, libglib which is used all
> >> >> the time on my system.
> >> >
> >> > Those files are never intentionally modified, right?
> >> >
> >> >> Maybe it's an issue like this one?
> >> >> https://lkml.org/lkml/2010/12/2/339
> >> >
> >> > It might have if that patch hadn't been merged before 2.6.37.
> >> >
> >> > Is the system 32-bit or 64-bit?
> >>
> >> It's a 32-bit system.
> >> cmp shows that the corrupted files differ in many bytes (not scattered).
> >> The corrupted bytes are always 0 or 252.
> >
> > Do I understand correctly that the files apparently corrupted after resume
> > are not corrupted any more when you reboot?
>
> Yes.
> Seems like a cache issue.

There's a couple things you can check before we start asking other people for
help.

First, it would be good to know if things change when you save the image
into a swap file instead of the swap partition you've been using so far
(I believe it's documented quite well how to do that).

Second, please verify if using the built-in save/load hibernate code leads
to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
to verify that).

Of course, please test the above separately. :-)

Thanks,
Rafael

2011-03-24 10:16:57

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> >> >> running Linux 2.6.37.4.
>> >> >> >> After resuming from s2disk some files are corrupted.
>> >> >> >> But when I reboot my netbook everything seems good again.
>> >> >> >>
>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
>> >> >> >> I did a reboot and it worked again.
>> >> >> >>
>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> >> >> somewhere within libcrypto.
>> >> >> >> So I made copy libcrypto and rebooted.
>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> >> >> a different sha1 sum!
>> >> >> >> WTF?!
>> >> >> >>
>> >> >> >> Is this a known issue?
>> >> >> >
>> >> >> > No.
>> >> >> >
>> >> >> >> dmesgs and config are attached.
>> >> >> >>
>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> >> >> (default from suse).
>> >> >> >> I'm using ext3 as root filesystem.
>> >> >> >> What else do you need?
>> >> >> >
>> >> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> >> >> > that it's there.
>> >> >>
>> >> >> I can reproduce the problem now.
>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
>> >> >> the time on my system.
>> >> >
>> >> > Those files are never intentionally modified, right?
>> >> >
>> >> >> Maybe it's an issue like this one?
>> >> >> https://lkml.org/lkml/2010/12/2/339
>> >> >
>> >> > It might have if that patch hadn't been merged before 2.6.37.
>> >> >
>> >> > Is the system 32-bit or 64-bit?
>> >>
>> >> It's a 32-bit system.
>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
>> >> The corrupted bytes are always 0 or 252.
>> >
>> > Do I understand correctly that the files apparently corrupted after resume
>> > are not corrupted any more when you reboot?
>>
>> Yes.
>> Seems like a cache issue.
>
> There's a couple things you can check before we start asking other people for
> help.
>
> First, it would be good to know if things change when you save the image
> into a swap file instead of the swap partition you've been using so far
> (I believe it's documented quite well how to do that).
>
> Second, please verify if using the built-in save/load hibernate code leads
> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
> to verify that).
>
> Of course, please test the above separately. :-)

Ok, I'll test this when I'm at home.

BTW: dropping the caches helps, when some files seem corrupted.
Today /usr/bin/okular was broken.
After setting vm.drop_caches=1 it worked again.

> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

--
Thanks,
//richard

2011-03-24 19:37:58

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Thu, Mar 24, 2011 at 11:16 AM, richard -rw- weinberger
<[email protected]> wrote:
> On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
>> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> >> >> >> Hi,
>>> >> >> >>
>>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>>> >> >> >> running Linux 2.6.37.4.
>>> >> >> >> After resuming from s2disk some files are corrupted.
>>> >> >> >> But when I reboot my netbook everything seems good again.
>>> >> >> >>
>>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
>>> >> >> >> I did a reboot and it worked again.
>>> >> >> >>
>>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
>>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>>> >> >> >> somewhere within libcrypto.
>>> >> >> >> So I made copy libcrypto and rebooted.
>>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>>> >> >> >> a different sha1 sum!
>>> >> >> >> WTF?!
>>> >> >> >>
>>> >> >> >> Is this a known issue?
>>> >> >> >
>>> >> >> > No.
>>> >> >> >
>>> >> >> >> dmesgs and config are attached.
>>> >> >> >>
>>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>>> >> >> >> (default from suse).
>>> >> >> >> I'm using ext3 as root filesystem.
>>> >> >> >> What else do you need?
>>> >> >> >
>>> >> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>>> >> >> > that it's there.
>>> >> >>
>>> >> >> I can reproduce the problem now.
>>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
>>> >> >> the time on my system.
>>> >> >
>>> >> > Those files are never intentionally modified, right?
>>> >> >
>>> >> >> Maybe it's an issue like this one?
>>> >> >> https://lkml.org/lkml/2010/12/2/339
>>> >> >
>>> >> > It might have if that patch hadn't been merged before 2.6.37.
>>> >> >
>>> >> > Is the system 32-bit or 64-bit?
>>> >>
>>> >> It's a 32-bit system.
>>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
>>> >> The corrupted bytes are always 0 or 252.
>>> >
>>> > Do I understand correctly that the files apparently corrupted after resume
>>> > are not corrupted any more when you reboot?
>>>
>>> Yes.
>>> Seems like a cache issue.
>>
>> There's a couple things you can check before we start asking other people for
>> help.
>>
>> First, it would be good to know if things change when you save the image
>> into a swap file instead of the swap partition you've been using so far
>> (I believe it's documented quite well how to do that).
>>
>> Second, please verify if using the built-in save/load hibernate code leads
>> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
>> to verify that).
>>
>> Of course, please test the above separately. :-)
>
> Ok, I'll test this when I'm at home.
>
> BTW: dropping the caches helps, when some files seem corrupted.
> Today /usr/bin/okular was broken.
> After setting vm.drop_caches=1 it worked again.
>
>> Thanks,
>> Rafael
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at ?http://www.tux.org/lkml/
>>
>
> --
> Thanks,
> //richard
>

On Linux 2.6.38 I'm unable to reproduce the issue.
Only 2.6.37 seems to be affected.
So, I'm moving over to 2.6.38. :)

--
Thanks,
//richard

2011-03-24 22:30:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Thursday, March 24, 2011, richard -rw- weinberger wrote:
> On Thu, Mar 24, 2011 at 11:16 AM, richard -rw- weinberger
> <[email protected]> wrote:
> > On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
> >> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> >>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> >>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> >>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> >> >> >> Hi,
> >>> >> >> >>
> >>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >>> >> >> >> running Linux 2.6.37.4.
> >>> >> >> >> After resuming from s2disk some files are corrupted.
> >>> >> >> >> But when I reboot my netbook everything seems good again.
> >>> >> >> >>
> >>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
> >>> >> >> >> I did a reboot and it worked again.
> >>> >> >> >>
> >>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
> >>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >>> >> >> >> somewhere within libcrypto.
> >>> >> >> >> So I made copy libcrypto and rebooted.
> >>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >>> >> >> >> a different sha1 sum!
> >>> >> >> >> WTF?!
> >>> >> >> >>
> >>> >> >> >> Is this a known issue?
> >>> >> >> >
> >>> >> >> > No.
> >>> >> >> >
> >>> >> >> >> dmesgs and config are attached.
> >>> >> >> >>
> >>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >>> >> >> >> (default from suse).
> >>> >> >> >> I'm using ext3 as root filesystem.
> >>> >> >> >> What else do you need?
> >>> >> >> >
> >>> >> >> > Whatever you can do to narrow down the problem. At the moment I only know
> >>> >> >> > that it's there.
> >>> >> >>
> >>> >> >> I can reproduce the problem now.
> >>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> >>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
> >>> >> >> the time on my system.
> >>> >> >
> >>> >> > Those files are never intentionally modified, right?
> >>> >> >
> >>> >> >> Maybe it's an issue like this one?
> >>> >> >> https://lkml.org/lkml/2010/12/2/339
> >>> >> >
> >>> >> > It might have if that patch hadn't been merged before 2.6.37.
> >>> >> >
> >>> >> > Is the system 32-bit or 64-bit?
> >>> >>
> >>> >> It's a 32-bit system.
> >>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
> >>> >> The corrupted bytes are always 0 or 252.
> >>> >
> >>> > Do I understand correctly that the files apparently corrupted after resume
> >>> > are not corrupted any more when you reboot?
> >>>
> >>> Yes.
> >>> Seems like a cache issue.
> >>
> >> There's a couple things you can check before we start asking other people for
> >> help.
> >>
> >> First, it would be good to know if things change when you save the image
> >> into a swap file instead of the swap partition you've been using so far
> >> (I believe it's documented quite well how to do that).
> >>
> >> Second, please verify if using the built-in save/load hibernate code leads
> >> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
> >> to verify that).
> >>
> >> Of course, please test the above separately. :-)
> >
> > Ok, I'll test this when I'm at home.
> >
> > BTW: dropping the caches helps, when some files seem corrupted.
> > Today /usr/bin/okular was broken.
> > After setting vm.drop_caches=1 it worked again.
> >
> >> Thanks,
> >> Rafael
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >>
> >
> > --
> > Thanks,
> > //richard
> >
>
> On Linux 2.6.38 I'm unable to reproduce the issue.
> Only 2.6.37 seems to be affected.
> So, I'm moving over to 2.6.38. :)

OK, thanks for the report. :-)

Rafael

2012-02-16 10:52:29

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Thu, Mar 24, 2011 at 11:30 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Thursday, March 24, 2011, richard -rw- weinberger wrote:
>> On Thu, Mar 24, 2011 at 11:16 AM, richard -rw- weinberger
>> <[email protected]> wrote:
>> > On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
>> >> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> >>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> >> >> >> Hi,
>> >>> >> >> >>
>> >>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >>> >> >> >> running Linux 2.6.37.4.
>> >>> >> >> >> After resuming from s2disk some files are corrupted.
>> >>> >> >> >> But when I reboot my netbook everything seems good again.
>> >>> >> >> >>
>> >>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
>> >>> >> >> >> I did a reboot and it worked again.
>> >>> >> >> >>
>> >>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
>> >>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >>> >> >> >> somewhere within libcrypto.
>> >>> >> >> >> So I made copy libcrypto and rebooted.
>> >>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >>> >> >> >> a different sha1 sum!
>> >>> >> >> >> WTF?!
>> >>> >> >> >>
>> >>> >> >> >> Is this a known issue?
>> >>> >> >> >
>> >>> >> >> > No.
>> >>> >> >> >
>> >>> >> >> >> dmesgs and config are attached.
>> >>> >> >> >>
>> >>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >>> >> >> >> (default from suse).
>> >>> >> >> >> I'm using ext3 as root filesystem.
>> >>> >> >> >> What else do you need?
>> >>> >> >> >
>> >>> >> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> >>> >> >> > that it's there.
>> >>> >> >>
>> >>> >> >> I can reproduce the problem now.
>> >>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> >>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
>> >>> >> >> the time on my system.
>> >>> >> >
>> >>> >> > Those files are never intentionally modified, right?
>> >>> >> >
>> >>> >> >> Maybe it's an issue like this one?
>> >>> >> >> https://lkml.org/lkml/2010/12/2/339
>> >>> >> >
>> >>> >> > It might have if that patch hadn't been merged before 2.6.37.
>> >>> >> >
>> >>> >> > Is the system 32-bit or 64-bit?
>> >>> >>
>> >>> >> It's a 32-bit system.
>> >>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
>> >>> >> The corrupted bytes are always 0 or 252.
>> >>> >
>> >>> > Do I understand correctly that the files apparently corrupted after resume
>> >>> > are not corrupted any more when you reboot?
>> >>>
>> >>> Yes.
>> >>> Seems like a cache issue.
>> >>
>> >> There's a couple things you can check before we start asking other people for
>> >> help.
>> >>
>> >> First, it would be good to know if things change when you save the image
>> >> into a swap file instead of the swap partition you've been using so far
>> >> (I believe it's documented quite well how to do that).
>> >>
>> >> Second, please verify if using the built-in save/load hibernate code leads
>> >> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
>> >> to verify that).
>> >>
>> >> Of course, please test the above separately. :-)
>> >
>> > Ok, I'll test this when I'm at home.
>> >
>> > BTW: dropping the caches helps, when some files seem corrupted.
>> > Today /usr/bin/okular was broken.
>> > After setting vm.drop_caches=1 it worked again.
>> >
>> >> Thanks,
>> >> Rafael
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> >> the body of a message to [email protected]
>> >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>> >> Please read the FAQ at ?http://www.tux.org/lkml/
>> >>
>> >
>> > --
>> > Thanks,
>> > //richard
>> >
>>
>> On Linux 2.6.38 I'm unable to reproduce the issue.
>> Only 2.6.37 seems to be affected.
>> So, I'm moving over to 2.6.38. :)
>
> OK, thanks for the report. :-)
>
> Rafael

Bad news:
I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
Now with my shiny new Lenovo x121e I have the same issue! :-(

OpenSUSE 12.1, kernel 3.2.7.
After a few suspend2disk iterations random files are corrupted.
But only cached files. A reboot solves the problem.

--
Thanks,
//richard

2012-02-16 16:30:41

by Dave Jones

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:

> >> >> Of course, please test the above separately. :-)
> >> >
> >> > Ok, I'll test this when I'm at home.
> >> >
> >> > BTW: dropping the caches helps, when some files seem corrupted.
> >> > Today /usr/bin/okular was broken.
> >> > After setting vm.drop_caches=1 it worked again.
> >>
> >> On Linux 2.6.38 I'm unable to reproduce the issue.
> >> Only 2.6.37 seems to be affected.
> >> So, I'm moving over to 2.6.38. :)
> >
> Bad news:
> I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
> Now with my shiny new Lenovo x121e I have the same issue! :-(
>
> OpenSUSE 12.1, kernel 3.2.7.
> After a few suspend2disk iterations random files are corrupted.
> But only cached files. A reboot solves the problem.

FWIW, we've been seeing a number of hard to diagnose failures
with suspend to disk for the last few releases in Fedora.
Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
for a while, but there's no smoking gun that really explains what's
getting into these states. Further complicating things, is that it
doesn't seem to be 100% reproducable.

Dave

2012-02-16 21:47:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Thursday, February 16, 2012, Dave Jones wrote:
> On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
>
> > >> >> Of course, please test the above separately. :-)
> > >> >
> > >> > Ok, I'll test this when I'm at home.
> > >> >
> > >> > BTW: dropping the caches helps, when some files seem corrupted.
> > >> > Today /usr/bin/okular was broken.
> > >> > After setting vm.drop_caches=1 it worked again.
> > >>
> > >> On Linux 2.6.38 I'm unable to reproduce the issue.
> > >> Only 2.6.37 seems to be affected.
> > >> So, I'm moving over to 2.6.38. :)
> > >
> > Bad news:
> > I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
> > Now with my shiny new Lenovo x121e I have the same issue! :-(
> >
> > OpenSUSE 12.1, kernel 3.2.7.
> > After a few suspend2disk iterations random files are corrupted.
> > But only cached files. A reboot solves the problem.
>
> FWIW, we've been seeing a number of hard to diagnose failures
> with suspend to disk for the last few releases in Fedora.
> Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> for a while, but there's no smoking gun that really explains what's
> getting into these states. Further complicating things, is that it
> doesn't seem to be 100% reproducable.

I wonder if that's reproducible with the filesystems freezing patch I posted
some time ago (it will need some rebasing to apply to the current mainline or
3.2.y).

I also thing that this problem discovered by Alan Stern may be involved:

http://marc.info/?l=linux-pm&m=132940331030253&w=4

Thanks,
Rafael

2012-02-16 22:26:26

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Corrupted files after suspend to disk

On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:

> > FWIW, we've been seeing a number of hard to diagnose failures
> > with suspend to disk for the last few releases in Fedora.
> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> > for a while, but there's no smoking gun that really explains what's
> > getting into these states. Further complicating things, is that it
> > doesn't seem to be 100% reproducable.
>
> I wonder if that's reproducible with the filesystems freezing patch I posted
> some time ago (it will need some rebasing to apply to the current mainline or
> 3.2.y).
>
> I also thing that this problem discovered by Alan Stern may be involved:
>
> http://marc.info/?l=linux-pm&m=132940331030253&w=4

Probably not, unless the filesystems in question are on a USB drive.
Still, if anyone wants to test it, there's a patch here:

http://marc.info/?l=linux-pm&m=132941053601190&w=4

Alan Stern

2012-02-16 23:07:44

by Richard Weinberger

[permalink] [raw]
Subject: Re: [linux-pm] Corrupted files after suspend to disk

On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <[email protected]> wrote:
> On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
>
>> > FWIW, we've been seeing a number of hard to diagnose failures
>> > with suspend to disk for the last few releases in Fedora.
>> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
>> > for a while, but there's no smoking gun that really explains what's
>> > getting into these states. Further complicating things, is that it
>> > doesn't seem to be 100% reproducable.
>>
>> I wonder if that's reproducible with the filesystems freezing patch I posted
>> some time ago (it will need some rebasing to apply to the current mainline or
>> 3.2.y).

Where can I find this patch?
I'll happily test it.
But it may take some time as the bug is not easy to reproduce.

>> I also thing that this problem discovered by Alan Stern may be involved:
>>
>> http://marc.info/?l=linux-pm&m=132940331030253&w=4
>
> Probably not, unless the filesystems in question are on a USB drive.

The filesystems are no on a USB device.

--
Thanks,
//richard

2012-02-16 23:12:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] Corrupted files after suspend to disk

On Friday, February 17, 2012, richard -rw- weinberger wrote:
> On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <[email protected]> wrote:
> > On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
> >
> >> > FWIW, we've been seeing a number of hard to diagnose failures
> >> > with suspend to disk for the last few releases in Fedora.
> >> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> >> > for a while, but there's no smoking gun that really explains what's
> >> > getting into these states. Further complicating things, is that it
> >> > doesn't seem to be 100% reproducable.
> >>
> >> I wonder if that's reproducible with the filesystems freezing patch I posted
> >> some time ago (it will need some rebasing to apply to the current mainline or
> >> 3.2.y).
>
> Where can I find this patch?
> I'll happily test it.
> But it may take some time as the bug is not easy to reproduce.

This is the last version posted:

http://marc.info/?l=linux-kernel&m=132775832509351&w=4

However, it only may help if you use the kernel-based hibernation i.e.
"echo disk > /sys/power/state" (that may be worth testing without the
patch too, but Fedora is using this AFAICS, so it probably has that
problem too).

Thanks,
Rafael

2012-02-16 23:22:51

by Richard Weinberger

[permalink] [raw]
Subject: Re: [linux-pm] Corrupted files after suspend to disk

On Fri, Feb 17, 2012 at 12:16 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Friday, February 17, 2012, richard -rw- weinberger wrote:
>> On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <[email protected]> wrote:
>> > On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
>> >
>> >> > FWIW, we've been seeing a number of hard to diagnose failures
>> >> > with suspend to disk for the last few releases in Fedora.
>> >> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
>> >> > for a while, but there's no smoking gun that really explains what's
>> >> > getting into these states. Further complicating things, is that it
>> >> > doesn't seem to be 100% reproducable.
>> >>
>> >> I wonder if that's reproducible with the filesystems freezing patch I posted
>> >> some time ago (it will need some rebasing to apply to the current mainline or
>> >> 3.2.y).
>>
>> Where can I find this patch?
>> I'll happily test it.
>> But it may take some time as the bug is not easy to reproduce.
>
> This is the last version posted:
>
> http://marc.info/?l=linux-kernel&m=132775832509351&w=4
>
> However, it only may help if you use the kernel-based hibernation i.e.
> "echo disk > /sys/power/state" (that may be worth testing without the
> patch too, but Fedora is using this AFAICS, so it probably has that
> problem too).

Okay, I'll use kernel-based hibernation from now on.
If the problem still occurs I'll apply your patch.

Stay tuned!

--
Thanks,
//richard

2012-02-16 23:27:35

by Eric Sandeen

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On 2/16/12 8:30 AM, Dave Jones wrote:
> On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
>
> > >> >> Of course, please test the above separately. :-)
> > >> >
> > >> > Ok, I'll test this when I'm at home.
> > >> >
> > >> > BTW: dropping the caches helps, when some files seem corrupted.
> > >> > Today /usr/bin/okular was broken.
> > >> > After setting vm.drop_caches=1 it worked again.
> > >>
> > >> On Linux 2.6.38 I'm unable to reproduce the issue.
> > >> Only 2.6.37 seems to be affected.
> > >> So, I'm moving over to 2.6.38. :)
> > >
> > Bad news:
> > I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
> > Now with my shiny new Lenovo x121e I have the same issue! :-(
> >
> > OpenSUSE 12.1, kernel 3.2.7.
> > After a few suspend2disk iterations random files are corrupted.
> > But only cached files. A reboot solves the problem.

Just to be clear - you see _data_ corruption in files, but only
until a reboot, and after that they are ok? Ok, reading above
about using drop_caches that sounds like the case.

That sounds different from what I saw in the bug Dave mentions
below, but possibly related root cause, I suppose.

-Eric

> FWIW, we've been seeing a number of hard to diagnose failures
> with suspend to disk for the last few releases in Fedora.
> Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> for a while, but there's no smoking gun that really explains what's
> getting into these states. Further complicating things, is that it
> doesn't seem to be 100% reproducable.
>
> Dave
>

2012-02-16 23:37:19

by Richard Weinberger

[permalink] [raw]
Subject: Re: Corrupted files after suspend to disk

On Fri, Feb 17, 2012 at 12:27 AM, Eric Sandeen <[email protected]> wrote:
> On 2/16/12 8:30 AM, Dave Jones wrote:
>> On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
>>
>> ?> >> >> Of course, please test the above separately. :-)
>> ?> >> >
>> ?> >> > Ok, I'll test this when I'm at home.
>> ?> >> >
>> ?> >> > BTW: dropping the caches helps, when some files seem corrupted.
>> ?> >> > Today /usr/bin/okular was broken.
>> ?> >> > After setting vm.drop_caches=1 it worked again.
>> ?> >>
>> ?> >> On Linux 2.6.38 I'm unable to reproduce the issue.
>> ?> >> Only 2.6.37 seems to be affected.
>> ?> >> So, I'm moving over to 2.6.38. :)
>> ?> >
>> ?> Bad news:
>> ?> I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
>> ?> Now with my shiny new Lenovo x121e I have the same issue! :-(
>> ?>
>> ?> OpenSUSE 12.1, kernel 3.2.7.
>> ?> After a few suspend2disk iterations random files are corrupted.
>> ?> But only cached files. A reboot solves the problem.
>
> Just to be clear - you see _data_ corruption in files, but only
> until a reboot, and after that they are ok? ?Ok, reading above
> about using drop_caches that sounds like the case.

Yes.
A reboot always solved the data corruption.
drop_caches solved it in 99% of all cases.

On-disk data was never corrupted.

--
Thanks,
//richard