2009-07-10 10:58:40

by Thomas Fjellstrom

[permalink] [raw]
Subject: Possible Suspend to Ram bug?

I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it will
flip out the second time linux wakes up from "suspend to ram".

The system will run fine for days or weeks, so long as it isn't waking up a
second StR.

Here is an example error that I get from the device (my root / device):

[42018.455204] sd 0:0:0:0: [sda] Unhandled error code
[42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK,SUGGEST_OK
[42018.455215] end_request: I/O error, dev sda, sector 12583031
[42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable to read
inode block - inode=391005, block=1572871

At that point using my / fs is pretty much impossible. Every single app fails
to launch with an I/O Error, about the only command I can run in that state is
"dmesg" in an existing konsole.

I'm currently using 2.6.29-2-amd64 from debian, and am running on a Gigabyte
MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.

One interesting thing to note, the file system on the Vertex SSD reports as
clean to fsck on the next boot, while my /home which is on a Seagate 7200.12
drive reports with several orphaned inodes (every single time). And that's
regardless if I use ALT+SYSRQ+S/U to try and sync everything. Also,
ALT+SYSRQ+B doesn't work at that point, only ALT+SYSRQ+O or using the system
power/reset buttons will work.

I'm attaching the full log I was able to save from dmesg (over nfs, luckily
that worked).

--
Thomas Fjellstrom
[email protected]


Attachments:
error.log.gz (6.73 kB)

2009-07-14 10:17:16

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Fri July 10 2009, Thomas Fjellstrom wrote:
> I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it will
> flip out the second time linux wakes up from "suspend to ram".
>
> The system will run fine for days or weeks, so long as it isn't waking up a
> second StR.
>
> Here is an example error that I get from the device (my root / device):
>
> [42018.455204] sd 0:0:0:0: [sda] Unhandled error code
> [42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
> driverbyte=DRIVER_OK,SUGGEST_OK
> [42018.455215] end_request: I/O error, dev sda, sector 12583031
> [42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable to
> read inode block - inode=391005, block=1572871
>
> At that point using my / fs is pretty much impossible. Every single app
> fails to launch with an I/O Error, about the only command I can run in that
> state is "dmesg" in an existing konsole.
>
> I'm currently using 2.6.29-2-amd64 from debian, and am running on a
> Gigabyte MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.
>
> One interesting thing to note, the file system on the Vertex SSD reports as
> clean to fsck on the next boot, while my /home which is on a Seagate
> 7200.12 drive reports with several orphaned inodes (every single time). And
> that's regardless if I use ALT+SYSRQ+S/U to try and sync everything. Also,
> ALT+SYSRQ+B doesn't work at that point, only ALT+SYSRQ+O or using the
> system power/reset buttons will work.
>
> I'm attaching the full log I was able to save from dmesg (over nfs, luckily
> that worked).

Anyone have a clue what might be wrong?

--
Thomas Fjellstrom
[email protected]

2009-07-14 15:53:29

by Jiri Kosina

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Tue, 14 Jul 2009, Thomas Fjellstrom wrote:

> On Fri July 10 2009, Thomas Fjellstrom wrote:
> > I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it will
> > flip out the second time linux wakes up from "suspend to ram".
> >
> > The system will run fine for days or weeks, so long as it isn't waking up a
> > second StR.
> >
> > Here is an example error that I get from the device (my root / device):
> >
> > [42018.455204] sd 0:0:0:0: [sda] Unhandled error code
> > [42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > [42018.455215] end_request: I/O error, dev sda, sector 12583031
> > [42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable to
> > read inode block - inode=391005, block=1572871
> >
> > At that point using my / fs is pretty much impossible. Every single app
> > fails to launch with an I/O Error, about the only command I can run in that
> > state is "dmesg" in an existing konsole.
> >
> > I'm currently using 2.6.29-2-amd64 from debian, and am running on a
> > Gigabyte MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.
> >
> > One interesting thing to note, the file system on the Vertex SSD reports as
> > clean to fsck on the next boot, while my /home which is on a Seagate
> > 7200.12 drive reports with several orphaned inodes (every single time). And
> > that's regardless if I use ALT+SYSRQ+S/U to try and sync everything. Also,
> > ALT+SYSRQ+B doesn't work at that point, only ALT+SYSRQ+O or using the
> > system power/reset buttons will work.
> >
> > I'm attaching the full log I was able to save from dmesg (over nfs, luckily
> > that worked).
> Anyone have a clue what might be wrong?

First please try to reproduce with recent kernel (2.6.30 at least,
2.6.31-rc3 preferrably).

--
Jiri Kosina
SUSE Labs

2009-07-15 10:08:55

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Tue July 14 2009, Jiri Kosina wrote:
> On Tue, 14 Jul 2009, Thomas Fjellstrom wrote:
> > On Fri July 10 2009, Thomas Fjellstrom wrote:
> > > I've recently gotten a new OCZ Vertex 30G SSD and have noticed that it
> > > will flip out the second time linux wakes up from "suspend to ram".
> > >
> > > The system will run fine for days or weeks, so long as it isn't waking
> > > up a second StR.
> > >
> > > Here is an example error that I get from the device (my root / device):
> > >
> > > [42018.455204] sd 0:0:0:0: [sda] Unhandled error code
> > > [42018.455208] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET
> > > driverbyte=DRIVER_OK,SUGGEST_OK
> > > [42018.455215] end_request: I/O error, dev sda, sector 12583031
> > > [42018.455221] EXT3-fs error (device sda2): ext3_get_inode_loc: unable
> > > to read inode block - inode=391005, block=1572871
> > >
> > > At that point using my / fs is pretty much impossible. Every single app
> > > fails to launch with an I/O Error, about the only command I can run in
> > > that state is "dmesg" in an existing konsole.
> > >
> > > I'm currently using 2.6.29-2-amd64 from debian, and am running on a
> > > Gigabyte MA790FXT-UD5P, with a AMD Phenom II X4 810 cpu, and 4G ram.
> > >
> > > One interesting thing to note, the file system on the Vertex SSD
> > > reports as clean to fsck on the next boot, while my /home which is on a
> > > Seagate 7200.12 drive reports with several orphaned inodes (every
> > > single time). And that's regardless if I use ALT+SYSRQ+S/U to try and
> > > sync everything. Also, ALT+SYSRQ+B doesn't work at that point, only
> > > ALT+SYSRQ+O or using the system power/reset buttons will work.
> > >
> > > I'm attaching the full log I was able to save from dmesg (over nfs,
> > > luckily that worked).
> >
> > Anyone have a clue what might be wrong?
>
> First please try to reproduce with recent kernel (2.6.30 at least,
> 2.6.31-rc3 preferrably).

I'll try with debian's 2.6.30 first. But there's a small issue with that, .30
and .31 seem to have some performance regressions according to sites like
phoronix.

--
Thomas Fjellstrom
[email protected]

2009-07-15 11:31:14

by Jiri Kosina

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Wed, 15 Jul 2009, Thomas Fjellstrom wrote:

> I'll try with debian's 2.6.30 first. But there's a small issue with
> that, .30 and .31 seem to have some performance regressions according to
> sites like phoronix.

You know, there are lies, then horrible lies, then benchmarks, and
benchmarks done wrong.

Just do your own measurements under your particular workload, and if you
see any performance regression, just report it.

--
Jiri Kosina
SUSE Labs

2009-07-15 11:45:19

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Wed July 15 2009, Jiri Kosina wrote:
> On Wed, 15 Jul 2009, Thomas Fjellstrom wrote:
> > I'll try with debian's 2.6.30 first. But there's a small issue with
> > that, .30 and .31 seem to have some performance regressions according to
> > sites like phoronix.
>
> You know, there are lies, then horrible lies, then benchmarks, and
> benchmarks done wrong.
>
> Just do your own measurements under your particular workload, and if you
> see any performance regression, just report it.

The benchmarks they run are pretty much what I'd do to test, so I'd more than
likely get the same results, and waste a bunch of time.

2.6.30 did seem to fix the ssd error. but the first time I suspended, my r8169
decided to flip out. I had to rmmod and modprobe it to get the network back
up.

[ 867.780034] ------------[ cut here ]------------
[ 867.780165] WARNING: at
/home/blank/debian/kernel/tmp/linux-2.6-2.6.30/debian/build/source_amd64_none/net/sched/sch_generic.c:226
dev_watchdog+0xc7/0x164()
[ 867.780373] Hardware name: GA-MA790FXT-UD5P
[ 867.780488] NETDEV WATCHDOG: eth1 (r8169): transmit timed out
[ 867.780610] Modules linked in: nvidia(P) powernow_k8 cpufreq_conservative
cpufreq_stats cpufreq_userspace cpufreq_powersave nfsd exportfs nfs lockd
fscache nfs_acl auth_rpcgss sunrpc it87 hwmon_vid adt7473 firewire_sbp2 loop
snd_hda$
[ 867.785472] Pid: 0, comm: swapper Tainted: P 2.6.30-1-amd64 #1
[ 867.785598] Call Trace:
[ 867.785699] <IRQ> [<ffffffff804229aa>] ? dev_watchdog+0xc7/0x164
[ 867.785877] [<ffffffff804229aa>] ? dev_watchdog+0xc7/0x164
[ 867.786005] [<ffffffff8024236b>] ? warn_slowpath_common+0x77/0xa3
[ 867.786130] [<ffffffff804228e3>] ? dev_watchdog+0x0/0x164
[ 867.786252] [<ffffffff802423f3>] ? warn_slowpath_fmt+0x51/0x59
[ 867.786377] [<ffffffff802342fe>] ? enqueue_task+0x5c/0x65
[ 867.786499] [<ffffffff802546f7>] ? autoremove_wake_function+0x9/0x2e
[ 867.786626] [<ffffffff804228b7>] ? netif_tx_lock+0x3d/0x69
[ 867.786749] [<ffffffff8040f3fc>] ? netdev_drivername+0x3b/0x40
[ 867.786873] [<ffffffff804229aa>] ? dev_watchdog+0xc7/0x164
[ 867.786993] [<ffffffff80235601>] ? __wake_up+0x30/0x44
[ 867.787116] [<ffffffff804228e3>] ? dev_watchdog+0x0/0x164
[ 867.787239] [<ffffffff8024aa2b>] ? run_timer_softirq+0x193/0x210
[ 867.787364] [<ffffffff8025b465>] ? getnstimeofday+0x55/0xaf
[ 867.787487] [<ffffffff80246f55>] ? __do_softirq+0xac/0x173
[ 867.787609] [<ffffffff80210bcc>] ? call_softirq+0x1c/0x30
[ 867.787730] [<ffffffff802125fa>] ? do_softirq+0x3a/0x7e
[ 867.787849] [<ffffffff80246cd2>] ? irq_exit+0x3f/0x80
[ 867.787968] [<ffffffff80220e63>] ? smp_apic_timer_interrupt+0x87/0x94
[ 867.788105] [<ffffffff802105d3>] ? apic_timer_interrupt+0x13/0x20
[ 867.788231] <EOI> [<ffffffff80227518>] ? native_safe_halt+0x2/0x3
[ 867.788410] [<ffffffff80216995>] ? default_idle+0x40/0x68
[ 867.788531] [<ffffffff8025d714>] ? clockevents_notify+0x2b/0x75
[ 867.788656] [<ffffffff80216d48>] ? c1e_idle+0xe5/0x10d
[ 867.788776] [<ffffffff8020edda>] ? cpu_idle+0x50/0x91
[ 867.788894] ---[ end trace 521854739609a619 ]---
[ 867.804550] r8169: eth1: link up
[ 915.796566] r8169: eth1: link up
[ 963.796491] r8169: eth1: link up
[ 989.420829] r8169: eth1: link up

At this point I was getting repeated "link up" messages and even though
ifconfig said the network was up, there was no actual connectivity. as
mentioned only rmmod+modprobe of r8169 fixed the problem. It doesn't seem to
happen often though.

I'll update if i see anymore issues.

--
Thomas Fjellstrom
[email protected]

2009-07-15 15:04:58

by Pavel Machek

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Wed 2009-07-15 05:45:14, Thomas Fjellstrom wrote:
> On Wed July 15 2009, Jiri Kosina wrote:
> > On Wed, 15 Jul 2009, Thomas Fjellstrom wrote:
> > > I'll try with debian's 2.6.30 first. But there's a small issue with
> > > that, .30 and .31 seem to have some performance regressions according to
> > > sites like phoronix.
> >
> > You know, there are lies, then horrible lies, then benchmarks, and
> > benchmarks done wrong.
> >
> > Just do your own measurements under your particular workload, and if you
> > see any performance regression, just report it.
>
> The benchmarks they run are pretty much what I'd do to test, so I'd more than
> likely get the same results, and waste a bunch of time.

So rather you expect us to waste a bunch of time?

> 2.6.30 did seem to fix the ssd error. but the first time I suspended, my r8169
> decided to flip out. I had to rmmod and modprobe it to get the network back
> up.
>
> [ 867.780034] ------------[ cut here ]------------
> [ 867.780165] WARNING: at
> /home/blank/debian/kernel/tmp/linux-2.6-2.6.30/debian/build/source_amd64_none/net/sched/sch_generic.c:226
> dev_watchdog+0xc7/0x164()
> [ 867.780373] Hardware name: GA-MA790FXT-UD5P
> [ 867.780488] NETDEV WATCHDOG: eth1 (r8169): transmit timed out
> [ 867.780610] Modules linked in: nvidia(P) powernow_k8 cpufreq_conservative
> cpufreq_stats cpufreq_userspace cpufreq_powersave nfsd exportfs nfs lockd
> fscache nfs_acl auth_rpcgss sunrpc it87 hwmon_vid adt7473 firewire_sbp2 loop
> snd_hda$
> [ 867.785472] Pid: 0, comm: swapper Tainted: P

....and some more.


> At this point I was getting repeated "link up" messages and even though
> ifconfig said the network was up, there was no actual connectivity. as
> mentioned only rmmod+modprobe of r8169 fixed the problem. It doesn't seem to
> happen often though.

Reproduce it without taints, then youu can report a regression in network...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-07-16 22:53:17

by Thomas Fjellstrom

[permalink] [raw]
Subject: Re: Possible Suspend to Ram bug?

On Wed July 15 2009, Thomas Fjellstrom wrote:
>
> I'll update if i see anymore issues.

Earlier today I was missing my dvdrw, I probably have the dmesg for that some
place, but I just had my sdb device having issues (/home and swap) after a
resume. I've attached the log (its a bit long).

--
Thomas Fjellstrom
[email protected]


Attachments:
error.dmesg.gz (22.82 kB)