2008-11-23 16:18:04

by Fabio Comolli

[permalink] [raw]
Subject: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi.
As the subject says, I have a strange regression in latest git.
Sometimes resume from hibernating hangs _after_ the resume stage.
When the problem happens I usually have to powercycle my laptop.

The system managed to recover from the hang only twice and this time I
found in the logs:

Nov 23 16:43:14 hawking kernel: sd 0:0:0:0: [sda] Starting disk
Nov 23 16:43:14 hawking kernel: Restarting tasks ... done.
Nov 23 16:43:52 hawking kernel: ata1.01: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Nov 23 16:43:52 hawking kernel: ata1.01: cmd
a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
Nov 23 16:43:52 hawking kernel: cdb 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
Nov 23 16:43:52 hawking kernel: res
40/00:03:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
Nov 23 16:43:52 hawking kernel: ata1.01: status: { DRDY }
Nov 23 16:43:52 hawking kernel: ata1: soft resetting link
Nov 23 16:43:57 hawking kernel: ata1.01: qc timeout (cmd 0xa1)
Nov 23 16:43:57 hawking kernel: ata1.01: failed to IDENTIFY (I/O
error, err_mask=0x4)
Nov 23 16:43:57 hawking kernel: ata1.01: revalidation failed (errno=-5)
Nov 23 16:43:57 hawking kernel: ata1: soft resetting link
Nov 23 16:44:08 hawking kernel: ata1.01: qc timeout (cmd 0xa1)
Nov 23 16:44:08 hawking kernel: ata1.01: failed to IDENTIFY (I/O
error, err_mask=0x4)
Nov 23 16:44:08 hawking kernel: ata1.01: revalidation failed (errno=-5)
Nov 23 16:44:08 hawking kernel: ata1: soft resetting link
Nov 23 16:44:38 hawking kernel: ata1.00: configured for UDMA/100
Nov 23 16:44:38 hawking kernel: ata1.01: configured for MWDMA2
Nov 23 16:44:38 hawking kernel: ata1: EH complete
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] 156301488 512-byte
hardware sectors: (80.0 GB/74.5 GiB)
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write Protect is off
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] 156301488 512-byte
hardware sectors: (80.0 GB/74.5 GiB)
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write Protect is off
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA

This problem:

* never happened in 2.6.27.4
* happened many times in 2.6.27.5 and .6
* never happened in 2.6.27.7
* happened the first time with the current -rc series just now (the
logs are related to this one)

This is the first 2.6.28-rc kernel I tried.

I have to say that my kernel is tainted (fglrx and cisco_ipsec) but I
also managed to replicate the hangs also with non-tainted ones.

I already tried some bisection between 2.6.27.4 and 2.6.27.5 but with
no result, probably because I marked as "good" kernel that weren't
good at all. Unfortunately this bug happens really at random times.

During the bisection I always used non-tainted kernels.

If anyone has ideas, I can provide some more data; I can also try
another bisection series but I think it would take very long to get
some good results (I mean, some days of testing between two different
kernels just to reproduce the bug).

Regards,
Fabio


2008-11-23 18:14:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Sunday, 23 of November 2008, Fabio Comolli wrote:
> Hi.
> As the subject says, I have a strange regression in latest git.
> Sometimes resume from hibernating hangs _after_ the resume stage.
> When the problem happens I usually have to powercycle my laptop.
>
> The system managed to recover from the hang only twice and this time I
> found in the logs:
>
> Nov 23 16:43:14 hawking kernel: sd 0:0:0:0: [sda] Starting disk
> Nov 23 16:43:14 hawking kernel: Restarting tasks ... done.
> Nov 23 16:43:52 hawking kernel: ata1.01: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x6 frozen
> Nov 23 16:43:52 hawking kernel: ata1.01: cmd
> a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
> Nov 23 16:43:52 hawking kernel: cdb 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00
> Nov 23 16:43:52 hawking kernel: res
> 40/00:03:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
> Nov 23 16:43:52 hawking kernel: ata1.01: status: { DRDY }
> Nov 23 16:43:52 hawking kernel: ata1: soft resetting link
> Nov 23 16:43:57 hawking kernel: ata1.01: qc timeout (cmd 0xa1)
> Nov 23 16:43:57 hawking kernel: ata1.01: failed to IDENTIFY (I/O
> error, err_mask=0x4)
> Nov 23 16:43:57 hawking kernel: ata1.01: revalidation failed (errno=-5)
> Nov 23 16:43:57 hawking kernel: ata1: soft resetting link
> Nov 23 16:44:08 hawking kernel: ata1.01: qc timeout (cmd 0xa1)
> Nov 23 16:44:08 hawking kernel: ata1.01: failed to IDENTIFY (I/O
> error, err_mask=0x4)
> Nov 23 16:44:08 hawking kernel: ata1.01: revalidation failed (errno=-5)
> Nov 23 16:44:08 hawking kernel: ata1: soft resetting link
> Nov 23 16:44:38 hawking kernel: ata1.00: configured for UDMA/100
> Nov 23 16:44:38 hawking kernel: ata1.01: configured for MWDMA2
> Nov 23 16:44:38 hawking kernel: ata1: EH complete
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] 156301488 512-byte
> hardware sectors: (80.0 GB/74.5 GiB)
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write Protect is off
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] 156301488 512-byte
> hardware sectors: (80.0 GB/74.5 GiB)
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write Protect is off
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> Nov 23 16:44:39 hawking kernel: sd 0:0:0:0: [sda] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA
>
> This problem:
>
> * never happened in 2.6.27.4
> * happened many times in 2.6.27.5 and .6
> * never happened in 2.6.27.7

Please test 2.6.27.7 a bit more. It would be helpful to verify it's OK.

> * happened the first time with the current -rc series just now (the
> logs are related to this one)
>
> This is the first 2.6.28-rc kernel I tried.

Do you mean 2.6.28-rc6?

Rafael

2008-11-23 18:25:05

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi

>>
>> This problem:
>>
>> * never happened in 2.6.27.4
>> * happened many times in 2.6.27.5 and .6
>> * never happened in 2.6.27.7
>
> Please test 2.6.27.7 a bit more. It would be helpful to verify it's OK.
>

Just reproduced the problem with 2.6.27.7. It behaved differently: the
system hung for about 4 minutes and then revived. No logs
unfortunately.

>> * happened the first time with the current -rc series just now (the
>> logs are related to this one)
>>
>> This is the first 2.6.28-rc kernel I tried.
>
> Do you mean 2.6.28-rc6?
>

Yes. I started another bisection session between 2.6.27.4 and 2.6.27.5.

> Rafael

Regards,
Fabio


> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-11-23 20:23:52

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi.

On Sun, Nov 23, 2008 at 7:24 PM, Fabio Comolli <[email protected]> wrote:
> Hi
>
> Yes. I started another bisection session between 2.6.27.4 and 2.6.27.5.
>

Ok, this time things went much better. The bisection pointed to:

------------------------------------------------------------------------------------------------------------------------------------------------
fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect good
ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b is first bad commit
commit ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
Author: Jay Fenlason <[email protected]>
Date: Mon Oct 27 23:28:14 2008 +0100

firewire: fix struct fw_node memory leak

commit 77e557191701afa55ae7320d42ad6458a2ad292e upstream

With the bus_resets patch applied, it is easy to see this memory leak
by repeatedly resetting the firewire bus while running slabtop in
another window. Just watch kmalloc-32 grow and grow...

Signed-off-by: Jay Fenlason <[email protected]>
Signed-off-by: Stefan Richter <[email protected]>

:040000 040000 01cadbd5f5fb81ce4f5e2023573204c4fbec3a28
809a53f4be87bd8be133ebd7564e1139a0cfa45b M drivers
------------------------------------------------------------------------------------------------------------------------------------------------

And this is the log:

------------------------------------------------------------------------------------------------------------------------------------------------
fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect log
git-bisect start
# good: [056c71459d3acf9fefcb2dc67abeef10e649d508] Linux 2.6.27.4
git-bisect good 056c71459d3acf9fefcb2dc67abeef10e649d508
# bad: [788a5f3f70e2a9c46020bdd3a195f2a866441c5d] Linux 2.6.27.5
git-bisect bad 788a5f3f70e2a9c46020bdd3a195f2a866441c5d
# bad: [7bdb542c453c14e54af9ebe5c4a827e4a678c47d] powerpc/numa: Make
memory reserve code more robust
git-bisect bad 7bdb542c453c14e54af9ebe5c4a827e4a678c47d
# good: [f29062d0ec12ee3a58c67228dc829574b4ced378] syncookies: fix
inclusion of tcp options in syn-ack
git-bisect good f29062d0ec12ee3a58c67228dc829574b4ced378
# good: [882491755d4c819de5bb593f04d06692185760aa] firewire: fix
ioctl() return code
git-bisect good 882491755d4c819de5bb593f04d06692185760aa
# bad: [baae4f5fd7a75bdfa70d374b738963053df2bcaa] firewire: fw-sbp2: fix races
git-bisect bad baae4f5fd7a75bdfa70d374b738963053df2bcaa
# bad: [ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b] firewire: fix struct
fw_node memory leak
git-bisect bad ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
# good: [b6021579f54e5b6b31f03fe24de1208a2feb4aec] firewire: Survive
more than 256 bus resets
git-bisect good b6021579f54e5b6b31f03fe24de1208a2feb4aec
------------------------------------------------------------------------------------------------------------------------------------------------

The commit reverted cleanly and now I'm running 2.6.27.7 with that
commit reverted. I'll post my results.

Hoper this helps. Regards,
Fabio

>> Rafael
>
> Regards,
> Fabio
>

2008-11-23 23:32:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Sunday, 23 of November 2008, Fabio Comolli wrote:
> Hi.
>
> On Sun, Nov 23, 2008 at 7:24 PM, Fabio Comolli <[email protected]> wrote:
> > Hi
> >
> > Yes. I started another bisection session between 2.6.27.4 and 2.6.27.5.
> >
>
> Ok, this time things went much better. The bisection pointed to:
>
> ------------------------------------------------------------------------------------------------------------------------------------------------
> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect good
> ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b is first bad commit
> commit ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
> Author: Jay Fenlason <[email protected]>
> Date: Mon Oct 27 23:28:14 2008 +0100
>
> firewire: fix struct fw_node memory leak
>
> commit 77e557191701afa55ae7320d42ad6458a2ad292e upstream
>
> With the bus_resets patch applied, it is easy to see this memory leak
> by repeatedly resetting the firewire bus while running slabtop in
> another window. Just watch kmalloc-32 grow and grow...
>
> Signed-off-by: Jay Fenlason <[email protected]>
> Signed-off-by: Stefan Richter <[email protected]>
>
> :040000 040000 01cadbd5f5fb81ce4f5e2023573204c4fbec3a28
> 809a53f4be87bd8be133ebd7564e1139a0cfa45b M drivers
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> And this is the log:
>
> ------------------------------------------------------------------------------------------------------------------------------------------------
> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect log
> git-bisect start
> # good: [056c71459d3acf9fefcb2dc67abeef10e649d508] Linux 2.6.27.4
> git-bisect good 056c71459d3acf9fefcb2dc67abeef10e649d508
> # bad: [788a5f3f70e2a9c46020bdd3a195f2a866441c5d] Linux 2.6.27.5
> git-bisect bad 788a5f3f70e2a9c46020bdd3a195f2a866441c5d
> # bad: [7bdb542c453c14e54af9ebe5c4a827e4a678c47d] powerpc/numa: Make
> memory reserve code more robust
> git-bisect bad 7bdb542c453c14e54af9ebe5c4a827e4a678c47d
> # good: [f29062d0ec12ee3a58c67228dc829574b4ced378] syncookies: fix
> inclusion of tcp options in syn-ack
> git-bisect good f29062d0ec12ee3a58c67228dc829574b4ced378
> # good: [882491755d4c819de5bb593f04d06692185760aa] firewire: fix
> ioctl() return code
> git-bisect good 882491755d4c819de5bb593f04d06692185760aa
> # bad: [baae4f5fd7a75bdfa70d374b738963053df2bcaa] firewire: fw-sbp2: fix races
> git-bisect bad baae4f5fd7a75bdfa70d374b738963053df2bcaa
> # bad: [ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b] firewire: fix struct
> fw_node memory leak
> git-bisect bad ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
> # good: [b6021579f54e5b6b31f03fe24de1208a2feb4aec] firewire: Survive
> more than 256 bus resets
> git-bisect good b6021579f54e5b6b31f03fe24de1208a2feb4aec
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> The commit reverted cleanly and now I'm running 2.6.27.7 with that
> commit reverted. I'll post my results.
>
> Hoper this helps. Regards,

Yes, thanks for bisecting this (CCs added).

Rafael

2008-11-24 07:19:54

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Rafael J. Wysocki wrote:
> On Sunday, 23 of November 2008, Fabio Comolli wrote:
>> Hi.
>>
>> On Sun, Nov 23, 2008 at 7:24 PM, Fabio Comolli <[email protected]> wrote:
>>> Hi
>>>
>>> Yes. I started another bisection session between 2.6.27.4 and 2.6.27.5.
>>>
>> Ok, this time things went much better. The bisection pointed to:
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect good
>> ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b is first bad commit
>> commit ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
>> Author: Jay Fenlason <[email protected]>
>> Date: Mon Oct 27 23:28:14 2008 +0100
>>
>> firewire: fix struct fw_node memory leak
>>
>> commit 77e557191701afa55ae7320d42ad6458a2ad292e upstream
>>
>> With the bus_resets patch applied, it is easy to see this memory leak
>> by repeatedly resetting the firewire bus while running slabtop in
>> another window. Just watch kmalloc-32 grow and grow...
>>
>> Signed-off-by: Jay Fenlason <[email protected]>
>> Signed-off-by: Stefan Richter <[email protected]>
>>
>> :040000 040000 01cadbd5f5fb81ce4f5e2023573204c4fbec3a28
>> 809a53f4be87bd8be133ebd7564e1139a0cfa45b M drivers
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> And this is the log:
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect log
>> git-bisect start
>> # good: [056c71459d3acf9fefcb2dc67abeef10e649d508] Linux 2.6.27.4
>> git-bisect good 056c71459d3acf9fefcb2dc67abeef10e649d508
>> # bad: [788a5f3f70e2a9c46020bdd3a195f2a866441c5d] Linux 2.6.27.5
>> git-bisect bad 788a5f3f70e2a9c46020bdd3a195f2a866441c5d
>> # bad: [7bdb542c453c14e54af9ebe5c4a827e4a678c47d] powerpc/numa: Make
>> memory reserve code more robust
>> git-bisect bad 7bdb542c453c14e54af9ebe5c4a827e4a678c47d
>> # good: [f29062d0ec12ee3a58c67228dc829574b4ced378] syncookies: fix
>> inclusion of tcp options in syn-ack
>> git-bisect good f29062d0ec12ee3a58c67228dc829574b4ced378
>> # good: [882491755d4c819de5bb593f04d06692185760aa] firewire: fix
>> ioctl() return code
>> git-bisect good 882491755d4c819de5bb593f04d06692185760aa
>> # bad: [baae4f5fd7a75bdfa70d374b738963053df2bcaa] firewire: fw-sbp2: fix races
>> git-bisect bad baae4f5fd7a75bdfa70d374b738963053df2bcaa
>> # bad: [ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b] firewire: fix struct
>> fw_node memory leak
>> git-bisect bad ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
>> # good: [b6021579f54e5b6b31f03fe24de1208a2feb4aec] firewire: Survive
>> more than 256 bus resets
>> git-bisect good b6021579f54e5b6b31f03fe24de1208a2feb4aec
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> The commit reverted cleanly and now I'm running 2.6.27.7 with that
>> commit reverted. I'll post my results.
>>
>> Hoper this helps. Regards,
>
> Yes, thanks for bisecting this (CCs added).

The commit which was pointed to in the bisection does nothing else than
free some data in firewire-core.

(Note to myself and Jay: See http://lkml.org/lkml/2008/11/23/123 and
http://lkml.org/lkml/2008/11/23/153 for the history of this bug.)

Fabio, please test a _bad_ kernel with firewire drivers unloaded before
hibernation. Also, please enable in the "Kernel hacking" kernel config
menu: "Kernel debugging", "Debug slab memory allocations". Thanks,
--
Stefan Richter
-=====-==--- =-== ==---
http://arcgraph.de/sr/

2008-11-24 08:57:16

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi

On Mon, Nov 24, 2008 at 8:18 AM, Stefan Richter
<[email protected]> wrote:
>
> The commit which was pointed to in the bisection does nothing else than
> free some data in firewire-core.
>
> (Note to myself and Jay: See http://lkml.org/lkml/2008/11/23/123 and
> http://lkml.org/lkml/2008/11/23/153 for the history of this bug.)
>
> Fabio, please test a _bad_ kernel with firewire drivers unloaded before
> hibernation.

That's what I'm already doing right now.

> Also, please enable in the "Kernel hacking" kernel config
> menu: "Kernel debugging", "Debug slab memory allocations". Thanks,

Will do. Thanks

> --
> Stefan Richter
> -=====-==--- =-== ==---
> http://arcgraph.de/sr/
>

Regards,
Fabio

2008-11-25 23:16:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Monday, 24 of November 2008, Stefan Richter wrote:
> Rafael J. Wysocki wrote:
> > On Sunday, 23 of November 2008, Fabio Comolli wrote:
> >> Hi.
> >>
> >> On Sun, Nov 23, 2008 at 7:24 PM, Fabio Comolli <[email protected]> wrote:
> >>> Hi
> >>>
> >>> Yes. I started another bisection session between 2.6.27.4 and 2.6.27.5.
> >>>
> >> Ok, this time things went much better. The bisection pointed to:
> >>
> >> ------------------------------------------------------------------------------------------------------------------------------------------------
> >> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect good
> >> ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b is first bad commit
> >> commit ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
> >> Author: Jay Fenlason <[email protected]>
> >> Date: Mon Oct 27 23:28:14 2008 +0100
> >>
> >> firewire: fix struct fw_node memory leak
> >>
> >> commit 77e557191701afa55ae7320d42ad6458a2ad292e upstream
> >>
> >> With the bus_resets patch applied, it is easy to see this memory leak
> >> by repeatedly resetting the firewire bus while running slabtop in
> >> another window. Just watch kmalloc-32 grow and grow...
> >>
> >> Signed-off-by: Jay Fenlason <[email protected]>
> >> Signed-off-by: Stefan Richter <[email protected]>
> >>
> >> :040000 040000 01cadbd5f5fb81ce4f5e2023573204c4fbec3a28
> >> 809a53f4be87bd8be133ebd7564e1139a0cfa45b M drivers
> >> ------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> And this is the log:
> >>
> >> ------------------------------------------------------------------------------------------------------------------------------------------------
> >> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect log
> >> git-bisect start
> >> # good: [056c71459d3acf9fefcb2dc67abeef10e649d508] Linux 2.6.27.4
> >> git-bisect good 056c71459d3acf9fefcb2dc67abeef10e649d508
> >> # bad: [788a5f3f70e2a9c46020bdd3a195f2a866441c5d] Linux 2.6.27.5
> >> git-bisect bad 788a5f3f70e2a9c46020bdd3a195f2a866441c5d
> >> # bad: [7bdb542c453c14e54af9ebe5c4a827e4a678c47d] powerpc/numa: Make
> >> memory reserve code more robust
> >> git-bisect bad 7bdb542c453c14e54af9ebe5c4a827e4a678c47d
> >> # good: [f29062d0ec12ee3a58c67228dc829574b4ced378] syncookies: fix
> >> inclusion of tcp options in syn-ack
> >> git-bisect good f29062d0ec12ee3a58c67228dc829574b4ced378
> >> # good: [882491755d4c819de5bb593f04d06692185760aa] firewire: fix
> >> ioctl() return code
> >> git-bisect good 882491755d4c819de5bb593f04d06692185760aa
> >> # bad: [baae4f5fd7a75bdfa70d374b738963053df2bcaa] firewire: fw-sbp2: fix races
> >> git-bisect bad baae4f5fd7a75bdfa70d374b738963053df2bcaa
> >> # bad: [ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b] firewire: fix struct
> >> fw_node memory leak
> >> git-bisect bad ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
> >> # good: [b6021579f54e5b6b31f03fe24de1208a2feb4aec] firewire: Survive
> >> more than 256 bus resets
> >> git-bisect good b6021579f54e5b6b31f03fe24de1208a2feb4aec
> >> ------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> The commit reverted cleanly and now I'm running 2.6.27.7 with that
> >> commit reverted. I'll post my results.
> >>
> >> Hoper this helps. Regards,
> >
> > Yes, thanks for bisecting this (CCs added).
>
> The commit which was pointed to in the bisection does nothing else than
> free some data in firewire-core.
>
> (Note to myself and Jay: See http://lkml.org/lkml/2008/11/23/123 and
> http://lkml.org/lkml/2008/11/23/153 for the history of this bug.)
>
> Fabio, please test a _bad_ kernel with firewire drivers unloaded before
> hibernation. Also, please enable in the "Kernel hacking" kernel config
> menu: "Kernel debugging", "Debug slab memory allocations". Thanks,

I _think_ it also happens on the Toshiba Portege R500 I'm testing at the
moment, but I have never been able to recover the box from the failure
(I think it is a panic BTW, because the caps lock LED starts to blink when it
happens).

I'm going to debug this a bit more in the next few days.

Thanks,
Rafael

2008-11-25 23:32:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Wednesday, 26 of November 2008, Rafael J. Wysocki wrote:
> On Monday, 24 of November 2008, Stefan Richter wrote:
> > Rafael J. Wysocki wrote:
> > > On Sunday, 23 of November 2008, Fabio Comolli wrote:
> > >> Hi.
> > >>
> > >> On Sun, Nov 23, 2008 at 7:24 PM, Fabio Comolli <[email protected]> wrote:
> > >>> Hi
> > >>>
> > >>> Yes. I started another bisection session between 2.6.27.4 and 2.6.27.5.
> > >>>
> > >> Ok, this time things went much better. The bisection pointed to:
> > >>
> > >> ------------------------------------------------------------------------------------------------------------------------------------------------
> > >> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect good
> > >> ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b is first bad commit
> > >> commit ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
> > >> Author: Jay Fenlason <[email protected]>
> > >> Date: Mon Oct 27 23:28:14 2008 +0100
> > >>
> > >> firewire: fix struct fw_node memory leak
> > >>
> > >> commit 77e557191701afa55ae7320d42ad6458a2ad292e upstream
> > >>
> > >> With the bus_resets patch applied, it is easy to see this memory leak
> > >> by repeatedly resetting the firewire bus while running slabtop in
> > >> another window. Just watch kmalloc-32 grow and grow...
> > >>
> > >> Signed-off-by: Jay Fenlason <[email protected]>
> > >> Signed-off-by: Stefan Richter <[email protected]>
> > >>
> > >> :040000 040000 01cadbd5f5fb81ce4f5e2023573204c4fbec3a28
> > >> 809a53f4be87bd8be133ebd7564e1139a0cfa45b M drivers
> > >> ------------------------------------------------------------------------------------------------------------------------------------------------
> > >>
> > >> And this is the log:
> > >>
> > >> ------------------------------------------------------------------------------------------------------------------------------------------------
> > >> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect log
> > >> git-bisect start
> > >> # good: [056c71459d3acf9fefcb2dc67abeef10e649d508] Linux 2.6.27.4
> > >> git-bisect good 056c71459d3acf9fefcb2dc67abeef10e649d508
> > >> # bad: [788a5f3f70e2a9c46020bdd3a195f2a866441c5d] Linux 2.6.27.5
> > >> git-bisect bad 788a5f3f70e2a9c46020bdd3a195f2a866441c5d
> > >> # bad: [7bdb542c453c14e54af9ebe5c4a827e4a678c47d] powerpc/numa: Make
> > >> memory reserve code more robust
> > >> git-bisect bad 7bdb542c453c14e54af9ebe5c4a827e4a678c47d
> > >> # good: [f29062d0ec12ee3a58c67228dc829574b4ced378] syncookies: fix
> > >> inclusion of tcp options in syn-ack
> > >> git-bisect good f29062d0ec12ee3a58c67228dc829574b4ced378
> > >> # good: [882491755d4c819de5bb593f04d06692185760aa] firewire: fix
> > >> ioctl() return code
> > >> git-bisect good 882491755d4c819de5bb593f04d06692185760aa
> > >> # bad: [baae4f5fd7a75bdfa70d374b738963053df2bcaa] firewire: fw-sbp2: fix races
> > >> git-bisect bad baae4f5fd7a75bdfa70d374b738963053df2bcaa
> > >> # bad: [ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b] firewire: fix struct
> > >> fw_node memory leak
> > >> git-bisect bad ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
> > >> # good: [b6021579f54e5b6b31f03fe24de1208a2feb4aec] firewire: Survive
> > >> more than 256 bus resets
> > >> git-bisect good b6021579f54e5b6b31f03fe24de1208a2feb4aec
> > >> ------------------------------------------------------------------------------------------------------------------------------------------------
> > >>
> > >> The commit reverted cleanly and now I'm running 2.6.27.7 with that
> > >> commit reverted. I'll post my results.
> > >>
> > >> Hoper this helps. Regards,
> > >
> > > Yes, thanks for bisecting this (CCs added).
> >
> > The commit which was pointed to in the bisection does nothing else than
> > free some data in firewire-core.
> >
> > (Note to myself and Jay: See http://lkml.org/lkml/2008/11/23/123 and
> > http://lkml.org/lkml/2008/11/23/153 for the history of this bug.)
> >
> > Fabio, please test a _bad_ kernel with firewire drivers unloaded before
> > hibernation. Also, please enable in the "Kernel hacking" kernel config
> > menu: "Kernel debugging", "Debug slab memory allocations". Thanks,
>
> I _think_ it also happens on the Toshiba Portege R500 I'm testing at the
> moment, but I have never been able to recover the box from the failure
> (I think it is a panic BTW, because the caps lock LED starts to blink when it
> happens).
>
> I'm going to debug this a bit more in the next few days.

Also, on a possibly related note, I've just found a report from a Mac Mini user
who told me his machine hanged during resume from hibernation if his external
firewire drive was connected to the port. He worked around the problem by
switching to the new firewire stack that worked for him.

Thanks,
Rafael

2008-11-26 01:03:46

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Rafael J. Wysocki wrote:
>>>>> On Sun, Nov 23, 2008 at 7:24 PM, Fabio Comolli <[email protected]> wrote:
>>>>> The bisection pointed to:
>>>>>
>>>>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> fcomolli@hawking:~/software/GIT-TREES/linux-2.6.27.y> git bisect good
>>>>> ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b is first bad commit
>>>>> commit ff0f8d16839cd02dc95bd92c212cbd5d433a4d2b
>>>>> Author: Jay Fenlason <[email protected]>
>>>>> Date: Mon Oct 27 23:28:14 2008 +0100
>>>>>
>>>>> firewire: fix struct fw_node memory leak
>>>>>
>>>>> commit 77e557191701afa55ae7320d42ad6458a2ad292e upstream

(I still have a suspicion that this commit, or firewire even, is not the
actual culprit. But one never knows.)

> Also, on a possibly related note, I've just found a report from a Mac Mini user
> who told me his machine hanged during resume from hibernation if his external
> firewire drive was connected to the port. He worked around the problem by
> switching to the new firewire stack that worked for him.

The above bisection result is about the new stack = drivers/firewire/.
The old stack is drivers/ieee1394/ and I prefix all its changes with
"ieee1394:".

Of course the old stack is supposed to hibernate + restore properly too.
I personally tested only suspend + resume though, and that's quite long
ago...
--
Stefan Richter
-=====-==--- =-== ==-=-
http://arcgraph.de/sr/

2008-11-26 08:19:32

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi

On Wed, Nov 26, 2008 at 2:03 AM, Stefan Richter
<[email protected]> wrote:
> Rafael J. Wysocki wrote:
>> Also, on a possibly related note, I've just found a report from a Mac Mini user
>> who told me his machine hanged during resume from hibernation if his external
>> firewire drive was connected to the port. He worked around the problem by
>> switching to the new firewire stack that worked for him.
>
> The above bisection result is about the new stack = drivers/firewire/.
> The old stack is drivers/ieee1394/ and I prefix all its changes with
> "ieee1394:".
>
> Of course the old stack is supposed to hibernate + restore properly too.
> I personally tested only suspend + resume though, and that's quite long
> ago...

FWIW, I don't own any firewire devices, I only compiled the stack to
see if the port was recognised by the kernel. So nothing has ever be
connected to that port.

I haven't reproduced the problem so far without the drivers compiled
in (and with the debug option you suggested). I'll post some more info
in a few days.

> --
> Stefan Richter
> -=====-==--- =-== ==-=-
> http://arcgraph.de/sr/
>

Regards,
Fabio

2008-11-26 12:29:23

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Fabio Comolli wrote:
> I haven't reproduced the problem so far without the drivers compiled
> in (and with the debug option you suggested). I'll post some more info
> in a few days.

Sounds like I urgently need to set up hibernation on a test PC here.
Thanks,
--
Stefan Richter
-=====-==--- =-== ==-=-
http://arcgraph.de/sr/

2008-11-26 19:46:01

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Ok, I reproduced the bug with 2.6.27.7 without firewire. So the
firewire stack is innocent after all.
FWIW, it happened after the resume, as soon as I plugged the AC
adapter. The system became unresponsive for at least two minutes and
then "resurrected" as nothing happened. Nothing in the logs and the
dmesg is clean.

I ran out of ideas. I'm trying 2.6.27.4, the kernel then never showed
the problem.

I will also launch a full SMART test on my hd, in case it is the guilty.

Regards,
Fabio

On Wed, Nov 26, 2008 at 1:28 PM, Stefan Richter
<[email protected]> wrote:
> Fabio Comolli wrote:
>> I haven't reproduced the problem so far without the drivers compiled
>> in (and with the debug option you suggested). I'll post some more info
>> in a few days.
>
> Sounds like I urgently need to set up hibernation on a test PC here.
> Thanks,
> --
> Stefan Richter
> -=====-==--- =-== ==-=-
> http://arcgraph.de/sr/
>

2008-11-26 20:08:36

by Stefan Richter

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Fabio Comolli wrote:
> Ok, I reproduced the bug with 2.6.27.7 without firewire. So the
> firewire stack is innocent after all.

That's great news for me at least. ;-)

> FWIW, it happened after the resume, as soon as I plugged the AC
> adapter. The system became unresponsive for at least two minutes and
> then "resurrected" as nothing happened. Nothing in the logs and the
> dmesg is clean.
>
> I ran out of ideas. I'm trying 2.6.27.4, the kernel then never showed
> the problem.

The 2.6.27.5 changelog shows a bunch of ACPI changes. They may not be
responsible, but in my uninformed opinion these are the changes to look
at more closely. Since plain bisection did not work well for you, maybe
you should ask the maintainers involved in the ACPI patches for a
priority list of patches to unapply for long-term tests.

Also look at the diffstat whether there were changes to drivers which
you use.

Furthermore, maybe the few scheduler changes may play a role. But I
don't understand what they do, so I may be completely off.

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.27.5
http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fincr%2Fpatch-2.6.27.4-5.bz2
--
Stefan Richter
-=====-==--- =-== ==-=-
http://arcgraph.de/sr/

2008-11-26 23:03:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Wednesday, 26 of November 2008, Stefan Richter wrote:
> Fabio Comolli wrote:
> > Ok, I reproduced the bug with 2.6.27.7 without firewire. So the
> > firewire stack is innocent after all.
>
> That's great news for me at least. ;-)
>
> > FWIW, it happened after the resume, as soon as I plugged the AC
> > adapter. The system became unresponsive for at least two minutes and
> > then "resurrected" as nothing happened. Nothing in the logs and the
> > dmesg is clean.
> >
> > I ran out of ideas. I'm trying 2.6.27.4, the kernel then never showed
> > the problem.
>
> The 2.6.27.5 changelog shows a bunch of ACPI changes. They may not be
> responsible, but in my uninformed opinion these are the changes to look
> at more closely. Since plain bisection did not work well for you, maybe
> you should ask the maintainers involved in the ACPI patches for a
> priority list of patches to unapply for long-term tests.

Actually, yes, Fabio, you can try to revert all of the "ACPI: EC:" commits
applied after 2.6.27.4 and retest.

Thanks,
Rafael

2008-11-27 15:17:23

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi

On Thu, Nov 27, 2008 at 12:02 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, 26 of November 2008, Stefan Richter wrote:
>>
>> The 2.6.27.5 changelog shows a bunch of ACPI changes. They may not be
>> responsible, but in my uninformed opinion these are the changes to look
>> at more closely. Since plain bisection did not work well for you, maybe
>> you should ask the maintainers involved in the ACPI patches for a
>> priority list of patches to unapply for long-term tests.
>
> Actually, yes, Fabio, you can try to revert all of the "ACPI: EC:" commits
> applied after 2.6.27.4 and retest.

Will do tonight. I see that there are some other "ACPI: EC:" commits
in 2.6.27.6 and 2.6.27.7.

I'l just compile the ec.c file from 2.6.27.4 in 2.6.27.7 and test if
the three commits introduced in 2.6.27.5 wouldn't revert cleanly,

By the way, my HD passed a "smartctl -t long" test without any problems.

>
> Thanks,
> Rafael
>

Regards,
Fabio

2008-11-27 21:54:26

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi

On Thu, Nov 27, 2008 at 4:17 PM, Fabio Comolli <[email protected]> wrote:
> Hi
>
> On Thu, Nov 27, 2008 at 12:02 AM, Rafael J. Wysocki <[email protected]> wrote:
>> Actually, yes, Fabio, you can try to revert all of the "ACPI: EC:" commits
>> applied after 2.6.27.4 and retest.
>
> Will do tonight. I see that there are some other "ACPI: EC:" commits
> in 2.6.27.6 and 2.6.27.7.
>
> I'l just compile the ec.c file from 2.6.27.4 in 2.6.27.7 and test if
> the three commits introduced in 2.6.27.5 wouldn't revert cleanly,
>
> By the way, my HD passed a "smartctl -t long" test without any problems.
>

Reproduced with 2.6.27.7 with ec.c taken from 2.6.27.4 - after three
minutes of freeze the laptop came back to normal as nothing had
happened.

>>
>> Thanks,
>> Rafael
>>

Regards,
Fabio

2008-12-05 00:25:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

(CCing to linux-acpi at Rui's request)

On Thursday, 27 of November 2008, Fabio Comolli wrote:
> Hi
>
> On Thu, Nov 27, 2008 at 4:17 PM, Fabio Comolli <[email protected]> wrote:
> > Hi
> >
> > On Thu, Nov 27, 2008 at 12:02 AM, Rafael J. Wysocki <[email protected]> wrote:
> >> Actually, yes, Fabio, you can try to revert all of the "ACPI: EC:" commits
> >> applied after 2.6.27.4 and retest.
> >
> > Will do tonight. I see that there are some other "ACPI: EC:" commits
> > in 2.6.27.6 and 2.6.27.7.
> >
> > I'l just compile the ec.c file from 2.6.27.4 in 2.6.27.7 and test if
> > the three commits introduced in 2.6.27.5 wouldn't revert cleanly,
> >
> > By the way, my HD passed a "smartctl -t long" test without any problems.
> >
>
> Reproduced with 2.6.27.7 with ec.c taken from 2.6.27.4 - after three
> minutes of freeze the laptop came back to normal as nothing had
> happened.

2008-12-12 18:57:01

by Pavel Machek

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

On Thu 2008-11-27 22:54:14, Fabio Comolli wrote:
> Hi
>
> On Thu, Nov 27, 2008 at 4:17 PM, Fabio Comolli <[email protected]> wrote:
> > Hi
> >
> > On Thu, Nov 27, 2008 at 12:02 AM, Rafael J. Wysocki <[email protected]> wrote:
> >> Actually, yes, Fabio, you can try to revert all of the "ACPI: EC:" commits
> >> applied after 2.6.27.4 and retest.
> >
> > Will do tonight. I see that there are some other "ACPI: EC:" commits
> > in 2.6.27.6 and 2.6.27.7.
> >
> > I'l just compile the ec.c file from 2.6.27.4 in 2.6.27.7 and test if
> > the three commits introduced in 2.6.27.5 wouldn't revert cleanly,
> >
> > By the way, my HD passed a "smartctl -t long" test without any problems.
> >
>
> Reproduced with 2.6.27.7 with ec.c taken from 2.6.27.4 - after three
> minutes of freeze the laptop came back to normal as nothing had
> happened.

Is it possible to turn on printk timing, or something? If it is
desktop machine, is there chance to pull some data out of the machine
during resume (serial console, debug leds?).

Could you use Linus' RTC debugging hack, then hard reset machine after
1.5minute to see where it spends most of the time?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-12-12 19:01:46

by Fabio Comolli

[permalink] [raw]
Subject: Re: Regression in 2.6.28-rc and 2.6.27-stable - hibernate related

Hi Pavel

On Fri, Dec 12, 2008 at 7:58 PM, Pavel Machek <[email protected]> wrote:
> On Thu 2008-11-27 22:54:14, Fabio Comolli wrote:
>> Hi
>>
>> On Thu, Nov 27, 2008 at 4:17 PM, Fabio Comolli <[email protected]> wrote:
>> > Hi
>> >
>> > On Thu, Nov 27, 2008 at 12:02 AM, Rafael J. Wysocki <[email protected]> wrote:
>> >> Actually, yes, Fabio, you can try to revert all of the "ACPI: EC:" commits
>> >> applied after 2.6.27.4 and retest.
>> >
>> > Will do tonight. I see that there are some other "ACPI: EC:" commits
>> > in 2.6.27.6 and 2.6.27.7.
>> >
>> > I'l just compile the ec.c file from 2.6.27.4 in 2.6.27.7 and test if
>> > the three commits introduced in 2.6.27.5 wouldn't revert cleanly,
>> >
>> > By the way, my HD passed a "smartctl -t long" test without any problems.
>> >
>>
>> Reproduced with 2.6.27.7 with ec.c taken from 2.6.27.4 - after three
>> minutes of freeze the laptop came back to normal as nothing had
>> happened.
>
> Is it possible to turn on printk timing, or something?

It's already enabled.

> If it is
> desktop machine, is there chance to pull some data out of the machine
> during resume (serial console, debug leds?).

It's a laptop without any serial ports unfortunately.

>
> Could you use Linus' RTC debugging hack, then hard reset machine after
> 1.5minute to see where it spends most of the time?

I'm almost done with a third bisection session (0 commit to test). If
I don't get any useful results I will try it.

> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

Regards,
Fabio