Hi,
I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
running Linux 2.6.37.4.
After resuming from s2disk some files are corrupted.
But when I reboot my netbook everything seems good again.
When I saw the problem the first time the ls command segfaulted always.
I did a reboot and it worked again.
A few days later zypper crashed. After a reboot it worked again.
And today ssh crashed. I looked a bit closer and saw it crashed
somewhere within libcrypto.
So I made copy libcrypto and rebooted.
After the reboot ssh worked again but libcrypto and the copy of it hat
a different sha1 sum!
WTF?!
Is this a known issue?
dmesgs and config are attached.
The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
(default from suse).
I'm using ext3 as root filesystem.
What else do you need?
--
Thanks,
//richard
On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> Hi,
>
> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> running Linux 2.6.37.4.
> After resuming from s2disk some files are corrupted.
> But when I reboot my netbook everything seems good again.
>
> When I saw the problem the first time the ls command segfaulted always.
> I did a reboot and it worked again.
>
> A few days later zypper crashed. After a reboot it worked again.
> And today ssh crashed. I looked a bit closer and saw it crashed
> somewhere within libcrypto.
> So I made copy libcrypto and rebooted.
> After the reboot ssh worked again but libcrypto and the copy of it hat
> a different sha1 sum!
> WTF?!
>
> Is this a known issue?
No.
> dmesgs and config are attached.
>
> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> (default from suse).
> I'm using ext3 as root filesystem.
> What else do you need?
Whatever you can do to narrow down the problem. At the moment I only know
that it's there.
Thanks,
Rafael
2011/3/23 Rafael J. Wysocki <[email protected]>:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> Hi,
>>
>> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> running Linux 2.6.37.4.
>> After resuming from s2disk some files are corrupted.
>> But when I reboot my netbook everything seems good again.
>>
>> When I saw the problem the first time the ls command segfaulted always.
>> I did a reboot and it worked again.
>>
>> A few days later zypper crashed. After a reboot it worked again.
>> And today ssh crashed. I looked a bit closer and saw it crashed
>> somewhere within libcrypto.
>> So I made copy libcrypto and rebooted.
>> After the reboot ssh worked again but libcrypto and the copy of it hat
>> a different sha1 sum!
>> WTF?!
>>
>> Is this a known issue?
>
> No.
>
>> dmesgs and config are attached.
>>
>> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> (default from suse).
>> I'm using ext3 as root filesystem.
>> What else do you need?
>
> Whatever you can do to narrow down the problem. ?At the moment I only know
> that it's there.
I can reproduce the problem now.
After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
It's always a very basic lib like libcrypto, libglib which is used all
the time on my system.
Maybe it's an issue like this one?
https://lkml.org/lkml/2010/12/2/339
> Thanks,
> Rafael
>
--
Thanks,
//richard
On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> Hi,
> >>
> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >> running Linux 2.6.37.4.
> >> After resuming from s2disk some files are corrupted.
> >> But when I reboot my netbook everything seems good again.
> >>
> >> When I saw the problem the first time the ls command segfaulted always.
> >> I did a reboot and it worked again.
> >>
> >> A few days later zypper crashed. After a reboot it worked again.
> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >> somewhere within libcrypto.
> >> So I made copy libcrypto and rebooted.
> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >> a different sha1 sum!
> >> WTF?!
> >>
> >> Is this a known issue?
> >
> > No.
> >
> >> dmesgs and config are attached.
> >>
> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >> (default from suse).
> >> I'm using ext3 as root filesystem.
> >> What else do you need?
> >
> > Whatever you can do to narrow down the problem. At the moment I only know
> > that it's there.
>
> I can reproduce the problem now.
> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> It's always a very basic lib like libcrypto, libglib which is used all
> the time on my system.
Those files are never intentionally modified, right?
> Maybe it's an issue like this one?
> https://lkml.org/lkml/2010/12/2/339
It might have if that patch hadn't been merged before 2.6.37.
Is the system 32-bit or 64-bit?
Rafael
On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> Hi,
>> >>
>> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> running Linux 2.6.37.4.
>> >> After resuming from s2disk some files are corrupted.
>> >> But when I reboot my netbook everything seems good again.
>> >>
>> >> When I saw the problem the first time the ls command segfaulted always.
>> >> I did a reboot and it worked again.
>> >>
>> >> A few days later zypper crashed. After a reboot it worked again.
>> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> somewhere within libcrypto.
>> >> So I made copy libcrypto and rebooted.
>> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> a different sha1 sum!
>> >> WTF?!
>> >>
>> >> Is this a known issue?
>> >
>> > No.
>> >
>> >> dmesgs and config are attached.
>> >>
>> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> (default from suse).
>> >> I'm using ext3 as root filesystem.
>> >> What else do you need?
>> >
>> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> > that it's there.
>>
>> I can reproduce the problem now.
>> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> It's always a very basic lib like libcrypto, libglib which is used all
>> the time on my system.
>
> Those files are never intentionally modified, right?
>
>> Maybe it's an issue like this one?
>> https://lkml.org/lkml/2010/12/2/339
>
> It might have if that patch hadn't been merged before 2.6.37.
>
> Is the system 32-bit or 64-bit?
It's a 32-bit system.
cmp shows that the corrupted files differ in many bytes (not scattered).
The corrupted bytes are always 0 or 252.
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>
--
Thanks,
//richard
On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> >> Hi,
> >> >>
> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >> >> running Linux 2.6.37.4.
> >> >> After resuming from s2disk some files are corrupted.
> >> >> But when I reboot my netbook everything seems good again.
> >> >>
> >> >> When I saw the problem the first time the ls command segfaulted always.
> >> >> I did a reboot and it worked again.
> >> >>
> >> >> A few days later zypper crashed. After a reboot it worked again.
> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >> >> somewhere within libcrypto.
> >> >> So I made copy libcrypto and rebooted.
> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >> >> a different sha1 sum!
> >> >> WTF?!
> >> >>
> >> >> Is this a known issue?
> >> >
> >> > No.
> >> >
> >> >> dmesgs and config are attached.
> >> >>
> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >> >> (default from suse).
> >> >> I'm using ext3 as root filesystem.
> >> >> What else do you need?
> >> >
> >> > Whatever you can do to narrow down the problem. At the moment I only know
> >> > that it's there.
> >>
> >> I can reproduce the problem now.
> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> >> It's always a very basic lib like libcrypto, libglib which is used all
> >> the time on my system.
> >
> > Those files are never intentionally modified, right?
> >
> >> Maybe it's an issue like this one?
> >> https://lkml.org/lkml/2010/12/2/339
> >
> > It might have if that patch hadn't been merged before 2.6.37.
> >
> > Is the system 32-bit or 64-bit?
>
> It's a 32-bit system.
> cmp shows that the corrupted files differ in many bytes (not scattered).
> The corrupted bytes are always 0 or 252.
Do I understand correctly that the files apparently corrupted after resume
are not corrupted any more when you reboot?
Rafael
On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> >> running Linux 2.6.37.4.
>> >> >> After resuming from s2disk some files are corrupted.
>> >> >> But when I reboot my netbook everything seems good again.
>> >> >>
>> >> >> When I saw the problem the first time the ls command segfaulted always.
>> >> >> I did a reboot and it worked again.
>> >> >>
>> >> >> A few days later zypper crashed. After a reboot it worked again.
>> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> >> somewhere within libcrypto.
>> >> >> So I made copy libcrypto and rebooted.
>> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> >> a different sha1 sum!
>> >> >> WTF?!
>> >> >>
>> >> >> Is this a known issue?
>> >> >
>> >> > No.
>> >> >
>> >> >> dmesgs and config are attached.
>> >> >>
>> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> >> (default from suse).
>> >> >> I'm using ext3 as root filesystem.
>> >> >> What else do you need?
>> >> >
>> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> >> > that it's there.
>> >>
>> >> I can reproduce the problem now.
>> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> >> It's always a very basic lib like libcrypto, libglib which is used all
>> >> the time on my system.
>> >
>> > Those files are never intentionally modified, right?
>> >
>> >> Maybe it's an issue like this one?
>> >> https://lkml.org/lkml/2010/12/2/339
>> >
>> > It might have if that patch hadn't been merged before 2.6.37.
>> >
>> > Is the system 32-bit or 64-bit?
>>
>> It's a 32-bit system.
>> cmp shows that the corrupted files differ in many bytes (not scattered).
>> The corrupted bytes are always 0 or 252.
>
> Do I understand correctly that the files apparently corrupted after resume
> are not corrupted any more when you reboot?
Yes.
Seems like a cache issue.
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>
--
Thanks,
//richard
On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> Hi,
>> >>
>> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> running Linux 2.6.37.4.
>> >> After resuming from s2disk some files are corrupted.
>> >> But when I reboot my netbook everything seems good again.
>> >>
>> >> When I saw the problem the first time the ls command segfaulted always.
>> >> I did a reboot and it worked again.
>> >>
>> >> A few days later zypper crashed. After a reboot it worked again.
>> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> somewhere within libcrypto.
>> >> So I made copy libcrypto and rebooted.
>> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> a different sha1 sum!
>> >> WTF?!
>> >>
>> >> Is this a known issue?
>> >
>> > No.
>> >
>> >> dmesgs and config are attached.
>> >>
>> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> (default from suse).
>> >> I'm using ext3 as root filesystem.
>> >> What else do you need?
>> >
>> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> > that it's there.
>>
>> I can reproduce the problem now.
>> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> It's always a very basic lib like libcrypto, libglib which is used all
>> the time on my system.
>
> Those files are never intentionally modified, right?
Sorry, I've overlooked this question.
Yes, they have never been modified.
I've double checked it.
>> Maybe it's an issue like this one?
>> https://lkml.org/lkml/2010/12/2/339
>
> It might have if that patch hadn't been merged before 2.6.37.
>
> Is the system 32-bit or 64-bit?
>
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>
--
Thanks,
//richard
On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >> >> >> running Linux 2.6.37.4.
> >> >> >> After resuming from s2disk some files are corrupted.
> >> >> >> But when I reboot my netbook everything seems good again.
> >> >> >>
> >> >> >> When I saw the problem the first time the ls command segfaulted always.
> >> >> >> I did a reboot and it worked again.
> >> >> >>
> >> >> >> A few days later zypper crashed. After a reboot it worked again.
> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >> >> >> somewhere within libcrypto.
> >> >> >> So I made copy libcrypto and rebooted.
> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >> >> >> a different sha1 sum!
> >> >> >> WTF?!
> >> >> >>
> >> >> >> Is this a known issue?
> >> >> >
> >> >> > No.
> >> >> >
> >> >> >> dmesgs and config are attached.
> >> >> >>
> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >> >> >> (default from suse).
> >> >> >> I'm using ext3 as root filesystem.
> >> >> >> What else do you need?
> >> >> >
> >> >> > Whatever you can do to narrow down the problem. At the moment I only know
> >> >> > that it's there.
> >> >>
> >> >> I can reproduce the problem now.
> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> >> >> It's always a very basic lib like libcrypto, libglib which is used all
> >> >> the time on my system.
> >> >
> >> > Those files are never intentionally modified, right?
> >> >
> >> >> Maybe it's an issue like this one?
> >> >> https://lkml.org/lkml/2010/12/2/339
> >> >
> >> > It might have if that patch hadn't been merged before 2.6.37.
> >> >
> >> > Is the system 32-bit or 64-bit?
> >>
> >> It's a 32-bit system.
> >> cmp shows that the corrupted files differ in many bytes (not scattered).
> >> The corrupted bytes are always 0 or 252.
> >
> > Do I understand correctly that the files apparently corrupted after resume
> > are not corrupted any more when you reboot?
>
> Yes.
> Seems like a cache issue.
There's a couple things you can check before we start asking other people for
help.
First, it would be good to know if things change when you save the image
into a swap file instead of the swap partition you've been using so far
(I believe it's documented quite well how to do that).
Second, please verify if using the built-in save/load hibernate code leads
to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
to verify that).
Of course, please test the above separately. :-)
Thanks,
Rafael
On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >> >> >> running Linux 2.6.37.4.
>> >> >> >> After resuming from s2disk some files are corrupted.
>> >> >> >> But when I reboot my netbook everything seems good again.
>> >> >> >>
>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
>> >> >> >> I did a reboot and it worked again.
>> >> >> >>
>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >> >> >> somewhere within libcrypto.
>> >> >> >> So I made copy libcrypto and rebooted.
>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >> >> >> a different sha1 sum!
>> >> >> >> WTF?!
>> >> >> >>
>> >> >> >> Is this a known issue?
>> >> >> >
>> >> >> > No.
>> >> >> >
>> >> >> >> dmesgs and config are attached.
>> >> >> >>
>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >> >> >> (default from suse).
>> >> >> >> I'm using ext3 as root filesystem.
>> >> >> >> What else do you need?
>> >> >> >
>> >> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> >> >> > that it's there.
>> >> >>
>> >> >> I can reproduce the problem now.
>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
>> >> >> the time on my system.
>> >> >
>> >> > Those files are never intentionally modified, right?
>> >> >
>> >> >> Maybe it's an issue like this one?
>> >> >> https://lkml.org/lkml/2010/12/2/339
>> >> >
>> >> > It might have if that patch hadn't been merged before 2.6.37.
>> >> >
>> >> > Is the system 32-bit or 64-bit?
>> >>
>> >> It's a 32-bit system.
>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
>> >> The corrupted bytes are always 0 or 252.
>> >
>> > Do I understand correctly that the files apparently corrupted after resume
>> > are not corrupted any more when you reboot?
>>
>> Yes.
>> Seems like a cache issue.
>
> There's a couple things you can check before we start asking other people for
> help.
>
> First, it would be good to know if things change when you save the image
> into a swap file instead of the swap partition you've been using so far
> (I believe it's documented quite well how to do that).
>
> Second, please verify if using the built-in save/load hibernate code leads
> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
> to verify that).
>
> Of course, please test the above separately. :-)
Ok, I'll test this when I'm at home.
BTW: dropping the caches helps, when some files seem corrupted.
Today /usr/bin/okular was broken.
After setting vm.drop_caches=1 it worked again.
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>
--
Thanks,
//richard
On Thu, Mar 24, 2011 at 11:16 AM, richard -rw- weinberger
<[email protected]> wrote:
> On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
>> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>>> >> >> >> Hi,
>>> >> >> >>
>>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>>> >> >> >> running Linux 2.6.37.4.
>>> >> >> >> After resuming from s2disk some files are corrupted.
>>> >> >> >> But when I reboot my netbook everything seems good again.
>>> >> >> >>
>>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
>>> >> >> >> I did a reboot and it worked again.
>>> >> >> >>
>>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
>>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>>> >> >> >> somewhere within libcrypto.
>>> >> >> >> So I made copy libcrypto and rebooted.
>>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>>> >> >> >> a different sha1 sum!
>>> >> >> >> WTF?!
>>> >> >> >>
>>> >> >> >> Is this a known issue?
>>> >> >> >
>>> >> >> > No.
>>> >> >> >
>>> >> >> >> dmesgs and config are attached.
>>> >> >> >>
>>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>>> >> >> >> (default from suse).
>>> >> >> >> I'm using ext3 as root filesystem.
>>> >> >> >> What else do you need?
>>> >> >> >
>>> >> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>>> >> >> > that it's there.
>>> >> >>
>>> >> >> I can reproduce the problem now.
>>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
>>> >> >> the time on my system.
>>> >> >
>>> >> > Those files are never intentionally modified, right?
>>> >> >
>>> >> >> Maybe it's an issue like this one?
>>> >> >> https://lkml.org/lkml/2010/12/2/339
>>> >> >
>>> >> > It might have if that patch hadn't been merged before 2.6.37.
>>> >> >
>>> >> > Is the system 32-bit or 64-bit?
>>> >>
>>> >> It's a 32-bit system.
>>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
>>> >> The corrupted bytes are always 0 or 252.
>>> >
>>> > Do I understand correctly that the files apparently corrupted after resume
>>> > are not corrupted any more when you reboot?
>>>
>>> Yes.
>>> Seems like a cache issue.
>>
>> There's a couple things you can check before we start asking other people for
>> help.
>>
>> First, it would be good to know if things change when you save the image
>> into a swap file instead of the swap partition you've been using so far
>> (I believe it's documented quite well how to do that).
>>
>> Second, please verify if using the built-in save/load hibernate code leads
>> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
>> to verify that).
>>
>> Of course, please test the above separately. :-)
>
> Ok, I'll test this when I'm at home.
>
> BTW: dropping the caches helps, when some files seem corrupted.
> Today /usr/bin/okular was broken.
> After setting vm.drop_caches=1 it worked again.
>
>> Thanks,
>> Rafael
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at ?http://www.tux.org/lkml/
>>
>
> --
> Thanks,
> //richard
>
On Linux 2.6.38 I'm unable to reproduce the issue.
Only 2.6.37 seems to be affected.
So, I'm moving over to 2.6.38. :)
--
Thanks,
//richard
On Thursday, March 24, 2011, richard -rw- weinberger wrote:
> On Thu, Mar 24, 2011 at 11:16 AM, richard -rw- weinberger
> <[email protected]> wrote:
> > On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
> >> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> >>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
> >>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
> >>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
> >>> >> >> >> Hi,
> >>> >> >> >>
> >>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
> >>> >> >> >> running Linux 2.6.37.4.
> >>> >> >> >> After resuming from s2disk some files are corrupted.
> >>> >> >> >> But when I reboot my netbook everything seems good again.
> >>> >> >> >>
> >>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
> >>> >> >> >> I did a reboot and it worked again.
> >>> >> >> >>
> >>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
> >>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
> >>> >> >> >> somewhere within libcrypto.
> >>> >> >> >> So I made copy libcrypto and rebooted.
> >>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
> >>> >> >> >> a different sha1 sum!
> >>> >> >> >> WTF?!
> >>> >> >> >>
> >>> >> >> >> Is this a known issue?
> >>> >> >> >
> >>> >> >> > No.
> >>> >> >> >
> >>> >> >> >> dmesgs and config are attached.
> >>> >> >> >>
> >>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
> >>> >> >> >> (default from suse).
> >>> >> >> >> I'm using ext3 as root filesystem.
> >>> >> >> >> What else do you need?
> >>> >> >> >
> >>> >> >> > Whatever you can do to narrow down the problem. At the moment I only know
> >>> >> >> > that it's there.
> >>> >> >>
> >>> >> >> I can reproduce the problem now.
> >>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
> >>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
> >>> >> >> the time on my system.
> >>> >> >
> >>> >> > Those files are never intentionally modified, right?
> >>> >> >
> >>> >> >> Maybe it's an issue like this one?
> >>> >> >> https://lkml.org/lkml/2010/12/2/339
> >>> >> >
> >>> >> > It might have if that patch hadn't been merged before 2.6.37.
> >>> >> >
> >>> >> > Is the system 32-bit or 64-bit?
> >>> >>
> >>> >> It's a 32-bit system.
> >>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
> >>> >> The corrupted bytes are always 0 or 252.
> >>> >
> >>> > Do I understand correctly that the files apparently corrupted after resume
> >>> > are not corrupted any more when you reboot?
> >>>
> >>> Yes.
> >>> Seems like a cache issue.
> >>
> >> There's a couple things you can check before we start asking other people for
> >> help.
> >>
> >> First, it would be good to know if things change when you save the image
> >> into a swap file instead of the swap partition you've been using so far
> >> (I believe it's documented quite well how to do that).
> >>
> >> Second, please verify if using the built-in save/load hibernate code leads
> >> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
> >> to verify that).
> >>
> >> Of course, please test the above separately. :-)
> >
> > Ok, I'll test this when I'm at home.
> >
> > BTW: dropping the caches helps, when some files seem corrupted.
> > Today /usr/bin/okular was broken.
> > After setting vm.drop_caches=1 it worked again.
> >
> >> Thanks,
> >> Rafael
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >>
> >
> > --
> > Thanks,
> > //richard
> >
>
> On Linux 2.6.38 I'm unable to reproduce the issue.
> Only 2.6.37 seems to be affected.
> So, I'm moving over to 2.6.38. :)
OK, thanks for the report. :-)
Rafael
On Thu, Mar 24, 2011 at 11:30 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Thursday, March 24, 2011, richard -rw- weinberger wrote:
>> On Thu, Mar 24, 2011 at 11:16 AM, richard -rw- weinberger
>> <[email protected]> wrote:
>> > On Thu, Mar 24, 2011 at 12:00 AM, Rafael J. Wysocki <[email protected]> wrote:
>> >> On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> On Wed, Mar 23, 2011 at 11:22 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >>> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> >> On Wed, Mar 23, 2011 at 11:11 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >>> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> >> >> 2011/3/23 Rafael J. Wysocki <[email protected]>:
>> >>> >> >> > On Wednesday, March 23, 2011, richard -rw- weinberger wrote:
>> >>> >> >> >> Hi,
>> >>> >> >> >>
>> >>> >> >> >> I'm facing a very strange problem on my netbook (Lenovo Ideapad S10)
>> >>> >> >> >> running Linux 2.6.37.4.
>> >>> >> >> >> After resuming from s2disk some files are corrupted.
>> >>> >> >> >> But when I reboot my netbook everything seems good again.
>> >>> >> >> >>
>> >>> >> >> >> When I saw the problem the first time the ls command segfaulted always.
>> >>> >> >> >> I did a reboot and it worked again.
>> >>> >> >> >>
>> >>> >> >> >> A few days later zypper crashed. After a reboot it worked again.
>> >>> >> >> >> And today ssh crashed. I looked a bit closer and saw it crashed
>> >>> >> >> >> somewhere within libcrypto.
>> >>> >> >> >> So I made copy libcrypto and rebooted.
>> >>> >> >> >> After the reboot ssh worked again but libcrypto and the copy of it hat
>> >>> >> >> >> a different sha1 sum!
>> >>> >> >> >> WTF?!
>> >>> >> >> >>
>> >>> >> >> >> Is this a known issue?
>> >>> >> >> >
>> >>> >> >> > No.
>> >>> >> >> >
>> >>> >> >> >> dmesgs and config are attached.
>> >>> >> >> >>
>> >>> >> >> >> The used distribution is openSUSE 11.4 with suspend-0.80.20100129-7.1
>> >>> >> >> >> (default from suse).
>> >>> >> >> >> I'm using ext3 as root filesystem.
>> >>> >> >> >> What else do you need?
>> >>> >> >> >
>> >>> >> >> > Whatever you can do to narrow down the problem. ?At the moment I only know
>> >>> >> >> > that it's there.
>> >>> >> >>
>> >>> >> >> I can reproduce the problem now.
>> >>> >> >> After ~20 suspend and resume iterations aide finds corrupted files in /lib/.
>> >>> >> >> It's always a very basic lib like libcrypto, libglib which is used all
>> >>> >> >> the time on my system.
>> >>> >> >
>> >>> >> > Those files are never intentionally modified, right?
>> >>> >> >
>> >>> >> >> Maybe it's an issue like this one?
>> >>> >> >> https://lkml.org/lkml/2010/12/2/339
>> >>> >> >
>> >>> >> > It might have if that patch hadn't been merged before 2.6.37.
>> >>> >> >
>> >>> >> > Is the system 32-bit or 64-bit?
>> >>> >>
>> >>> >> It's a 32-bit system.
>> >>> >> cmp shows that the corrupted files differ in many bytes (not scattered).
>> >>> >> The corrupted bytes are always 0 or 252.
>> >>> >
>> >>> > Do I understand correctly that the files apparently corrupted after resume
>> >>> > are not corrupted any more when you reboot?
>> >>>
>> >>> Yes.
>> >>> Seems like a cache issue.
>> >>
>> >> There's a couple things you can check before we start asking other people for
>> >> help.
>> >>
>> >> First, it would be good to know if things change when you save the image
>> >> into a swap file instead of the swap partition you've been using so far
>> >> (I believe it's documented quite well how to do that).
>> >>
>> >> Second, please verify if using the built-in save/load hibernate code leads
>> >> to the same issue (you can hibernate by doing "echo disk > /sys/power/state"
>> >> to verify that).
>> >>
>> >> Of course, please test the above separately. :-)
>> >
>> > Ok, I'll test this when I'm at home.
>> >
>> > BTW: dropping the caches helps, when some files seem corrupted.
>> > Today /usr/bin/okular was broken.
>> > After setting vm.drop_caches=1 it worked again.
>> >
>> >> Thanks,
>> >> Rafael
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> >> the body of a message to [email protected]
>> >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>> >> Please read the FAQ at ?http://www.tux.org/lkml/
>> >>
>> >
>> > --
>> > Thanks,
>> > //richard
>> >
>>
>> On Linux 2.6.38 I'm unable to reproduce the issue.
>> Only 2.6.37 seems to be affected.
>> So, I'm moving over to 2.6.38. :)
>
> OK, thanks for the report. :-)
>
> Rafael
Bad news:
I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
Now with my shiny new Lenovo x121e I have the same issue! :-(
OpenSUSE 12.1, kernel 3.2.7.
After a few suspend2disk iterations random files are corrupted.
But only cached files. A reboot solves the problem.
--
Thanks,
//richard
On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
> >> >> Of course, please test the above separately. :-)
> >> >
> >> > Ok, I'll test this when I'm at home.
> >> >
> >> > BTW: dropping the caches helps, when some files seem corrupted.
> >> > Today /usr/bin/okular was broken.
> >> > After setting vm.drop_caches=1 it worked again.
> >>
> >> On Linux 2.6.38 I'm unable to reproduce the issue.
> >> Only 2.6.37 seems to be affected.
> >> So, I'm moving over to 2.6.38. :)
> >
> Bad news:
> I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
> Now with my shiny new Lenovo x121e I have the same issue! :-(
>
> OpenSUSE 12.1, kernel 3.2.7.
> After a few suspend2disk iterations random files are corrupted.
> But only cached files. A reboot solves the problem.
FWIW, we've been seeing a number of hard to diagnose failures
with suspend to disk for the last few releases in Fedora.
Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
for a while, but there's no smoking gun that really explains what's
getting into these states. Further complicating things, is that it
doesn't seem to be 100% reproducable.
Dave
On Thursday, February 16, 2012, Dave Jones wrote:
> On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
>
> > >> >> Of course, please test the above separately. :-)
> > >> >
> > >> > Ok, I'll test this when I'm at home.
> > >> >
> > >> > BTW: dropping the caches helps, when some files seem corrupted.
> > >> > Today /usr/bin/okular was broken.
> > >> > After setting vm.drop_caches=1 it worked again.
> > >>
> > >> On Linux 2.6.38 I'm unable to reproduce the issue.
> > >> Only 2.6.37 seems to be affected.
> > >> So, I'm moving over to 2.6.38. :)
> > >
> > Bad news:
> > I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
> > Now with my shiny new Lenovo x121e I have the same issue! :-(
> >
> > OpenSUSE 12.1, kernel 3.2.7.
> > After a few suspend2disk iterations random files are corrupted.
> > But only cached files. A reboot solves the problem.
>
> FWIW, we've been seeing a number of hard to diagnose failures
> with suspend to disk for the last few releases in Fedora.
> Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> for a while, but there's no smoking gun that really explains what's
> getting into these states. Further complicating things, is that it
> doesn't seem to be 100% reproducable.
I wonder if that's reproducible with the filesystems freezing patch I posted
some time ago (it will need some rebasing to apply to the current mainline or
3.2.y).
I also thing that this problem discovered by Alan Stern may be involved:
http://marc.info/?l=linux-pm&m=132940331030253&w=4
Thanks,
Rafael
On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
> > FWIW, we've been seeing a number of hard to diagnose failures
> > with suspend to disk for the last few releases in Fedora.
> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> > for a while, but there's no smoking gun that really explains what's
> > getting into these states. Further complicating things, is that it
> > doesn't seem to be 100% reproducable.
>
> I wonder if that's reproducible with the filesystems freezing patch I posted
> some time ago (it will need some rebasing to apply to the current mainline or
> 3.2.y).
>
> I also thing that this problem discovered by Alan Stern may be involved:
>
> http://marc.info/?l=linux-pm&m=132940331030253&w=4
Probably not, unless the filesystems in question are on a USB drive.
Still, if anyone wants to test it, there's a patch here:
http://marc.info/?l=linux-pm&m=132941053601190&w=4
Alan Stern
On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <[email protected]> wrote:
> On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
>
>> > FWIW, we've been seeing a number of hard to diagnose failures
>> > with suspend to disk for the last few releases in Fedora.
>> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
>> > for a while, but there's no smoking gun that really explains what's
>> > getting into these states. Further complicating things, is that it
>> > doesn't seem to be 100% reproducable.
>>
>> I wonder if that's reproducible with the filesystems freezing patch I posted
>> some time ago (it will need some rebasing to apply to the current mainline or
>> 3.2.y).
Where can I find this patch?
I'll happily test it.
But it may take some time as the bug is not easy to reproduce.
>> I also thing that this problem discovered by Alan Stern may be involved:
>>
>> http://marc.info/?l=linux-pm&m=132940331030253&w=4
>
> Probably not, unless the filesystems in question are on a USB drive.
The filesystems are no on a USB device.
--
Thanks,
//richard
On Friday, February 17, 2012, richard -rw- weinberger wrote:
> On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <[email protected]> wrote:
> > On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
> >
> >> > FWIW, we've been seeing a number of hard to diagnose failures
> >> > with suspend to disk for the last few releases in Fedora.
> >> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> >> > for a while, but there's no smoking gun that really explains what's
> >> > getting into these states. Further complicating things, is that it
> >> > doesn't seem to be 100% reproducable.
> >>
> >> I wonder if that's reproducible with the filesystems freezing patch I posted
> >> some time ago (it will need some rebasing to apply to the current mainline or
> >> 3.2.y).
>
> Where can I find this patch?
> I'll happily test it.
> But it may take some time as the bug is not easy to reproduce.
This is the last version posted:
http://marc.info/?l=linux-kernel&m=132775832509351&w=4
However, it only may help if you use the kernel-based hibernation i.e.
"echo disk > /sys/power/state" (that may be worth testing without the
patch too, but Fedora is using this AFAICS, so it probably has that
problem too).
Thanks,
Rafael
On Fri, Feb 17, 2012 at 12:16 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Friday, February 17, 2012, richard -rw- weinberger wrote:
>> On Thu, Feb 16, 2012 at 11:26 PM, Alan Stern <[email protected]> wrote:
>> > On Thu, 16 Feb 2012, Rafael J. Wysocki wrote:
>> >
>> >> > FWIW, we've been seeing a number of hard to diagnose failures
>> >> > with suspend to disk for the last few releases in Fedora.
>> >> > Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
>> >> > for a while, but there's no smoking gun that really explains what's
>> >> > getting into these states. Further complicating things, is that it
>> >> > doesn't seem to be 100% reproducable.
>> >>
>> >> I wonder if that's reproducible with the filesystems freezing patch I posted
>> >> some time ago (it will need some rebasing to apply to the current mainline or
>> >> 3.2.y).
>>
>> Where can I find this patch?
>> I'll happily test it.
>> But it may take some time as the bug is not easy to reproduce.
>
> This is the last version posted:
>
> http://marc.info/?l=linux-kernel&m=132775832509351&w=4
>
> However, it only may help if you use the kernel-based hibernation i.e.
> "echo disk > /sys/power/state" (that may be worth testing without the
> patch too, but Fedora is using this AFAICS, so it probably has that
> problem too).
Okay, I'll use kernel-based hibernation from now on.
If the problem still occurs I'll apply your patch.
Stay tuned!
--
Thanks,
//richard
On 2/16/12 8:30 AM, Dave Jones wrote:
> On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
>
> > >> >> Of course, please test the above separately. :-)
> > >> >
> > >> > Ok, I'll test this when I'm at home.
> > >> >
> > >> > BTW: dropping the caches helps, when some files seem corrupted.
> > >> > Today /usr/bin/okular was broken.
> > >> > After setting vm.drop_caches=1 it worked again.
> > >>
> > >> On Linux 2.6.38 I'm unable to reproduce the issue.
> > >> Only 2.6.37 seems to be affected.
> > >> So, I'm moving over to 2.6.38. :)
> > >
> > Bad news:
> > I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
> > Now with my shiny new Lenovo x121e I have the same issue! :-(
> >
> > OpenSUSE 12.1, kernel 3.2.7.
> > After a few suspend2disk iterations random files are corrupted.
> > But only cached files. A reboot solves the problem.
Just to be clear - you see _data_ corruption in files, but only
until a reboot, and after that they are ok? Ok, reading above
about using drop_caches that sounds like the case.
That sounds different from what I saw in the bug Dave mentions
below, but possibly related root cause, I suppose.
-Eric
> FWIW, we've been seeing a number of hard to diagnose failures
> with suspend to disk for the last few releases in Fedora.
> Eric Sandeen has been chasing https://bugzilla.redhat.com/show_bug.cgi?id=744275
> for a while, but there's no smoking gun that really explains what's
> getting into these states. Further complicating things, is that it
> doesn't seem to be 100% reproducable.
>
> Dave
>
On Fri, Feb 17, 2012 at 12:27 AM, Eric Sandeen <[email protected]> wrote:
> On 2/16/12 8:30 AM, Dave Jones wrote:
>> On Thu, Feb 16, 2012 at 11:52:27AM +0100, richard -rw- weinberger wrote:
>>
>> ?> >> >> Of course, please test the above separately. :-)
>> ?> >> >
>> ?> >> > Ok, I'll test this when I'm at home.
>> ?> >> >
>> ?> >> > BTW: dropping the caches helps, when some files seem corrupted.
>> ?> >> > Today /usr/bin/okular was broken.
>> ?> >> > After setting vm.drop_caches=1 it worked again.
>> ?> >>
>> ?> >> On Linux 2.6.38 I'm unable to reproduce the issue.
>> ?> >> Only 2.6.37 seems to be affected.
>> ?> >> So, I'm moving over to 2.6.38. :)
>> ?> >
>> ?> Bad news:
>> ?> I saw the issue on 3.x too but thought it's because my IdeaPad s10 is crap.
>> ?> Now with my shiny new Lenovo x121e I have the same issue! :-(
>> ?>
>> ?> OpenSUSE 12.1, kernel 3.2.7.
>> ?> After a few suspend2disk iterations random files are corrupted.
>> ?> But only cached files. A reboot solves the problem.
>
> Just to be clear - you see _data_ corruption in files, but only
> until a reboot, and after that they are ok? ?Ok, reading above
> about using drop_caches that sounds like the case.
Yes.
A reboot always solved the data corruption.
drop_caches solved it in 99% of all cases.
On-disk data was never corrupted.
--
Thanks,
//richard