2006-08-28 17:04:41

by Jeff Chua

[permalink] [raw]
Subject: megaraid_sas suspend ok, resume oops

Anyone working on suspend/resume for the Megaraid SAS RAID card?

This is on a DELL 2950.

Suspend/resume (to disk) has been running great on my IBM x60s, but
when I tried the same kernel (2.6.18-rc4) on the DELL 2950, it
suspended ok, but when resuming, the megaraid driver crashed.


Thanks,
Jeff.


2006-08-28 21:55:53

by Pavel Machek

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

Hi!

> Anyone working on suspend/resume for the Megaraid SAS RAID card?
>
> This is on a DELL 2950.
>
> Suspend/resume (to disk) has been running great on my IBM x60s, but
> when I tried the same kernel (2.6.18-rc4) on the DELL 2950, it
> suspended ok, but when resuming, the megaraid driver crashed.

Debug megaraid driver, then ;-). Really, without any details, no, I
do not think we can help.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-08-29 08:12:36

by Jens Axboe

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On Tue, Aug 29 2006, Jeff Chua wrote:
> Anyone working on suspend/resume for the Megaraid SAS RAID card?
>
> This is on a DELL 2950.
>
> Suspend/resume (to disk) has been running great on my IBM x60s, but
> when I tried the same kernel (2.6.18-rc4) on the DELL 2950, it
> suspended ok, but when resuming, the megaraid driver crashed.

And what exactly is your intention with this email? It can't be getting
the bug fixed, since there's exactly 0% information to help people doing
so :-)

IOW, provide the oops from the resume crash at least.

--
Jens Axboe

2006-08-29 12:22:55

by Jeff Chua

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On 8/29/06, Jens Axboe <[email protected]> wrote:
> On Tue, Aug 29 2006, Jeff Chua wrote:
> > Anyone working on suspend/resume for the Megaraid SAS RAID card?
> >
> > This is on a DELL 2950.
> >
> > Suspend/resume (to disk) has been running great on my IBM x60s, but
> > when I tried the same kernel (2.6.18-rc4) on the DELL 2950, it
> > suspended ok, but when resuming, the megaraid driver crashed.
>
> And what exactly is your intention with this email? It can't be getting
> the bug fixed, since there's exactly 0% information to help people doing
> so :-)
>
> IOW, provide the oops from the resume crash at least.

The intend is to see if there's already someone working on, and if so,
then it'll not be good to post oops that has already been taken care
of. I'm trying not to send unnecessary info.

I'll try to get oops in the next few days when I get a chance.
Currently traveling.


Another point ... on IBM x60s notebook, setting ...

High Memory Support (64GB)
CONFIG_HIGHMEM64G=y
CONFIG_RESOURCES_64BIT=y
CONFIG_X86_PAE=y

will cause resume to "REBOOT" sometimes (may be 6 out of 10).

I was trying to compile a kernel that would run both on the DELL with
16GB RAM, and on my IBM notebook with 2GB RAM.

But without 64 bit support, my notebook will suspend/resume many times
without failing (with the 5 ahci patches from Pavel Machek)....

>>This is the take 5 of AHCI suspend/resume patches.
>>The patch is against libata #upstream.
>>Signed-off-by: Forrest Zhao <[email protected]>
>>Signed-off-by: Hannes Reinecke <[email protected]>
>>Signed-off-by: Jens Axboe <[email protected]>
>>Signed-off-by: Pavel Machek <[email protected]>

For this, there's no oops, suspend ok, but upon resume, screen will go
blank, then reboot. With or without X, and console after fresh
startup, same reboot.

Sorry, I don't know how to get oops info, but whatever I can do to
trace down the bug, please show me how.

I'll do anything to make Linux more robust!!! Please lead the way.

Thank,
Jeff.

2006-08-29 23:45:57

by Nigel Cunningham

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

Hi.

On Tue, 2006-08-29 at 20:22 +0800, Jeff Chua wrote:
> On 8/29/06, Jens Axboe <[email protected]> wrote:
> > On Tue, Aug 29 2006, Jeff Chua wrote:
> > > Anyone working on suspend/resume for the Megaraid SAS RAID card?
> > >
> > > This is on a DELL 2950.
> > >
> > > Suspend/resume (to disk) has been running great on my IBM x60s, but
> > > when I tried the same kernel (2.6.18-rc4) on the DELL 2950, it
> > > suspended ok, but when resuming, the megaraid driver crashed.
> >
> > And what exactly is your intention with this email? It can't be getting
> > the bug fixed, since there's exactly 0% information to help people doing
> > so :-)
> >
> > IOW, provide the oops from the resume crash at least.
>
> The intend is to see if there's already someone working on, and if so,
> then it'll not be good to post oops that has already been taken care
> of. I'm trying not to send unnecessary info.
>
> I'll try to get oops in the next few days when I get a chance.
> Currently traveling.
>
>
> Another point ... on IBM x60s notebook, setting ...
>
> High Memory Support (64GB)
> CONFIG_HIGHMEM64G=y
> CONFIG_RESOURCES_64BIT=y
> CONFIG_X86_PAE=y
>
> will cause resume to "REBOOT" sometimes (may be 6 out of 10).
>
> I was trying to compile a kernel that would run both on the DELL with
> 16GB RAM, and on my IBM notebook with 2GB RAM.
>
> But without 64 bit support, my notebook will suspend/resume many times
> without failing (with the 5 ahci patches from Pavel Machek)....

Neither swsusp (as far as I know) or suspend2 support CONFIG_HIGHMEM64G
at the moment, I'm afraid.

It's not impossible, we just haven't seen it as a priority worth putting
time into. Do you really have more than 4GB of RAM and want to suspend
to disk?

Regards,

Nigel

2006-08-30 01:45:46

by Jeff Chua

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On 8/30/06, Nigel Cunningham <[email protected]> wrote:
>
> Neither swsusp (as far as I know) or suspend2 support CONFIG_HIGHMEM64G
> at the moment, I'm afraid.
>
> It's not impossible, we just haven't seen it as a priority worth putting
> time into. Do you really have more than 4GB of RAM and want to suspend
> to disk?

It'll be really "nice" to have. Currently all the production systems
simply shutdown all databases and applications and put systems to a
halt. But, I'm thinking of implementing suspend_to_disk instead of
shutdown the database and applications, so when power resumes, the
system can carry on where it was left off. Nice, very nice feature to
have.
It's "nice" because nobody has tried, and if this works, I don't see
why not use it for all machines in a data center.

The DELL 2950 has 16GB of RAM, and will be running oracle database.

2006-08-30 02:30:42

by Nigel Cunningham

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

Hi.

On Wed, 2006-08-30 at 09:45 +0800, Jeff Chua wrote:
> On 8/30/06, Nigel Cunningham <[email protected]> wrote:
> >
> > Neither swsusp (as far as I know) or suspend2 support CONFIG_HIGHMEM64G
> > at the moment, I'm afraid.
> >
> > It's not impossible, we just haven't seen it as a priority worth putting
> > time into. Do you really have more than 4GB of RAM and want to suspend
> > to disk?
>
> It'll be really "nice" to have. Currently all the production systems
> simply shutdown all databases and applications and put systems to a
> halt. But, I'm thinking of implementing suspend_to_disk instead of
> shutdown the database and applications, so when power resumes, the
> system can carry on where it was left off. Nice, very nice feature to
> have.
> It's "nice" because nobody has tried, and if this works, I don't see
> why not use it for all machines in a data center.
>
> The DELL 2950 has 16GB of RAM, and will be running oracle database.

Ok. I'll give it a go then, but I'll tell you now that it will probably
take a while as I have a lot on my plate. Feel free to poke me :)

Nigel

2006-08-30 08:41:30

by Jeff Chua

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On 8/30/06, Nigel Cunningham <[email protected]> wrote:

> Ok. I'll give it a go then, but I'll tell you now that it will probably
> take a while as I have a lot on my plate. Feel free to poke me :)

I'll help test. I've 2 2950 with 16GB RAM, 8xSAS HD, and my notebook
with 2GB RAM.

Thanks,
Jeff.

2006-08-30 08:51:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

Hi,

On Wednesday 30 August 2006 01:45, Nigel Cunningham wrote:
> On Tue, 2006-08-29 at 20:22 +0800, Jeff Chua wrote:
> > On 8/29/06, Jens Axboe <[email protected]> wrote:
> > > On Tue, Aug 29 2006, Jeff Chua wrote:
> > > > Anyone working on suspend/resume for the Megaraid SAS RAID card?
> > > >
> > > > This is on a DELL 2950.
> > > >
> > > > Suspend/resume (to disk) has been running great on my IBM x60s, but
> > > > when I tried the same kernel (2.6.18-rc4) on the DELL 2950, it
> > > > suspended ok, but when resuming, the megaraid driver crashed.
> > >
> > > And what exactly is your intention with this email? It can't be getting
> > > the bug fixed, since there's exactly 0% information to help people doing
> > > so :-)
> > >
> > > IOW, provide the oops from the resume crash at least.
> >
> > The intend is to see if there's already someone working on, and if so,
> > then it'll not be good to post oops that has already been taken care
> > of. I'm trying not to send unnecessary info.
> >
> > I'll try to get oops in the next few days when I get a chance.
> > Currently traveling.
> >
> >
> > Another point ... on IBM x60s notebook, setting ...
> >
> > High Memory Support (64GB)
> > CONFIG_HIGHMEM64G=y
> > CONFIG_RESOURCES_64BIT=y
> > CONFIG_X86_PAE=y
> >
> > will cause resume to "REBOOT" sometimes (may be 6 out of 10).
> >
> > I was trying to compile a kernel that would run both on the DELL with
> > 16GB RAM, and on my IBM notebook with 2GB RAM.
> >
> > But without 64 bit support, my notebook will suspend/resume many times
> > without failing (with the 5 ahci patches from Pavel Machek)....
>
> Neither swsusp (as far as I know) or suspend2 support CONFIG_HIGHMEM64G
> at the moment, I'm afraid.
>
> It's not impossible, we just haven't seen it as a priority worth putting
> time into.

It looks like the Fedora default config has HIGHMEM64G set, so I'll be looking
at it shortly.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-08-31 15:08:24

by Pavel Machek

[permalink] [raw]
Subject: Fedora vs. swsusp (was Re: megaraid_sas suspend ok, resume oops)

Hi!

> Another point ... on IBM x60s notebook, setting ...
>
> High Memory Support (64GB)
> CONFIG_HIGHMEM64G=y
> CONFIG_RESOURCES_64BIT=y
> CONFIG_X86_PAE=y

>
> will cause resume to "REBOOT" sometimes (may be 6 out of
> 10).

Okay, I guess that explains 'swsusp br0ken on Fedora' reports I was
seening.

I wonder if swsusp *ever* worked reliably with highmem64...

...wait a moment, higmem64 implies PAE implies we can no longer use
swsusp's trick with copying initial pgdir, and we'll need to use
x86-64-like code, no?
Pavel
--
Thanks for all the (sleeping) penguins.

2006-08-31 17:01:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Fedora vs. swsusp (was Re: megaraid_sas suspend ok, resume oops)

On Tuesday 29 August 2006 22:39, Pavel Machek wrote:
> Hi!
>
> > Another point ... on IBM x60s notebook, setting ...
> >
> > High Memory Support (64GB)
> > CONFIG_HIGHMEM64G=y
> > CONFIG_RESOURCES_64BIT=y
> > CONFIG_X86_PAE=y
>
> >
> > will cause resume to "REBOOT" sometimes (may be 6 out of
> > 10).
>
> Okay, I guess that explains 'swsusp br0ken on Fedora' reports I was
> seening.
>
> I wonder if swsusp *ever* worked reliably with highmem64...
>
> ...wait a moment, higmem64 implies PAE implies we can no longer use
> swsusp's trick with copying initial pgdir, and we'll need to use
> x86-64-like code, no?

I think so.

Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-09-02 13:30:53

by Dave Jones

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On Wed, Aug 30, 2006 at 10:54:56AM +0200, Rafael J. Wysocki wrote:

> > > But without 64 bit support, my notebook will suspend/resume many times
> > > without failing (with the 5 ahci patches from Pavel Machek)....
> >
> > Neither swsusp (as far as I know) or suspend2 support CONFIG_HIGHMEM64G
> > at the moment, I'm afraid.
> >
> > It's not impossible, we just haven't seen it as a priority worth putting
> > time into.
>
> It looks like the Fedora default config has HIGHMEM64G set, so I'll be looking
> at it shortly.

There is no 'Fedora default config'. We ship a number of different kernels,
some of which enable PAE, some disable it.

For FC5, the installer installs a PAE kernel if you have >4GB, or SMP.
For FC6, it'll only install one if you have >4GB.
(or possibly if you have an NX capable CPU, I forget if we enabled that
magick in the installer)

Precluding NX support + swsusp kinda sucks, but I guess it's a tiny subset of users.

Dave

--
http://www.codemonkey.org.uk

--
VGER BF report: H 0.120995

2006-09-02 19:44:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On Saturday 02 September 2006 15:30, Dave Jones wrote:
> On Wed, Aug 30, 2006 at 10:54:56AM +0200, Rafael J. Wysocki wrote:
>
> > > > But without 64 bit support, my notebook will suspend/resume many times
> > > > without failing (with the 5 ahci patches from Pavel Machek)....
> > >
> > > Neither swsusp (as far as I know) or suspend2 support CONFIG_HIGHMEM64G
> > > at the moment, I'm afraid.
> > >
> > > It's not impossible, we just haven't seen it as a priority worth putting
> > > time into.
> >
> > It looks like the Fedora default config has HIGHMEM64G set, so I'll be looking
> > at it shortly.
>
> There is no 'Fedora default config'. We ship a number of different kernels,
> some of which enable PAE, some disable it.
>
> For FC5, the installer installs a PAE kernel if you have >4GB, or SMP.

Ah, that's why people get hit by it if they have less than 4GB. ;-)

> For FC6, it'll only install one if you have >4GB.
> (or possibly if you have an NX capable CPU, I forget if we enabled that
> magick in the installer)
>
> Precluding NX support + swsusp kinda sucks, but I guess it's a tiny subset of users.

Well, I think the majority of NX-capable CPUs are also x86_64, in which case
I'd recommend using a 64-bit kernel anyway.

Thanks for the clarification.

I was afraid the issue would be urgent, but it doesn't seem so now. I'd like to
postpone fixing it until we can create suspend images larger that 350 meg on
i386 boxes with highmem (the patch is ready to go to -mm after 2.6.19-rc1 as
2.6.20 material).

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

--
VGER BF report: H 0.00668657

2006-09-02 20:11:24

by Dave Jones

[permalink] [raw]
Subject: Re: megaraid_sas suspend ok, resume oops

On Sat, Sep 02, 2006 at 09:47:05PM +0200, Rafael J. Wysocki wrote:

> > Precluding NX support + swsusp kinda sucks, but I guess it's a tiny subset of users.
>
> Well, I think the majority of NX-capable CPUs are also x86_64, in which case
> I'd recommend using a 64-bit kernel anyway.

There's a fairly large number of these "Core Duo" systems out there :)
Hopefully these are the last CPUs lacking longmode that Intel will make.
Asides from these, the only other 32-bit only CPUs with NX are the newer VIA C3s.

For the Fedora users it's not that big a deal not being able to take advantage
of NX, as we fall back to using the old segment limit tricks that exec-shield
does to emulate NX, without having to worry about PAE headaches.

Given the only other use of PAE is >4GB support, and these systems typically
max out at 4GB due to the limited number of memory slots, it's not really that
big a problem.

> I was afraid the issue would be urgent, but it doesn't seem so now. I'd like to
> postpone fixing it until we can create suspend images larger that 350 meg on
> i386 boxes with highmem (the patch is ready to go to -mm after 2.6.19-rc1 as
> 2.6.20 material).

Sounds good to me.

Dave

--
http://www.codemonkey.org.uk

--
VGER BF report: H 0.178966