2012-02-02 05:13:00

by Norbert Preining

[permalink] [raw]
Subject: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Dear all,

(please Cc)

since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
time, namely the detection of the CD drive hangs for 10sec (both
dmesg were taken on the same hardware on the same day):

with 3.3-rc1 and rc2:
[ 3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
[ 9.004013] ata2: link is slow to respond, please be patient (ready=0)
[ 13.652013] ata2: COMRESET failed (errno=-16)
[ 13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 13.975721] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
[ 13.977166] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
[ 13.981294] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
[ 13.982734] ata2.00: configured for UDMA/33
[ 13.987964] scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ862AS 1.21 PQ: 0 ANSI: 5
[ 13.991482] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[ 13.992971] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 13.994574] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 13.994672] sr 1:0:0:0: Attached scsi generic sg1 type 5
[ 14.316021] ata5: SATA link down (SStatus 0 SControl 300)
[ 14.636019] ata6: SATA link down (SStatus 0 SControl 300)

with 3.2:
[ 3.218156] sd 0:0:0:0: [sda] Attached SCSI disk
[ 3.484023] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 3.487708] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
[ 3.489156] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
[ 3.494321] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
[ 3.495761] ata2.00: configured for UDMA/33
[ 3.501057] scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ862AS 1.21 PQ: 0 ANSI: 5
[ 3.504515] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[ 3.505995] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 3.507611] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 3.507715] sr 1:0:0:0: Attached scsi generic sg1 type 5
[ 3.828025] ata5: SATA link down (SStatus 0 SControl 300)
[ 4.148020] ata6: SATA link down (SStatus 0 SControl 300)


Is there anything I can do here?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
LLANELLI (adj.)
Descriptive of the waggling movement of a person's hands when shaking
water from them or warming up for a piece of workshop theatre.
--- Douglas Adams, The Meaning of Liff


2012-02-02 08:38:38

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Adding some Cc's.

On 02/02/2012 10:42 AM, Norbert Preining wrote:

> Dear all,
>
> (please Cc)
>
> since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
> time, namely the detection of the CD drive hangs for 10sec (both
> dmesg were taken on the same hardware on the same day):
>
> with 3.3-rc1 and rc2:
> [ 3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
> [ 9.004013] ata2: link is slow to respond, please be patient (ready=0)
> [ 13.652013] ata2: COMRESET failed (errno=-16)
> [ 13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 13.975721] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> [ 13.977166] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
> [ 13.981294] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> [ 13.982734] ata2.00: configured for UDMA/33
> [ 13.987964] scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ862AS 1.21 PQ: 0 ANSI: 5
> [ 13.991482] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> [ 13.992971] cdrom: Uniform CD-ROM driver Revision: 3.20
> [ 13.994574] sr 1:0:0:0: Attached scsi CD-ROM sr0
> [ 13.994672] sr 1:0:0:0: Attached scsi generic sg1 type 5
> [ 14.316021] ata5: SATA link down (SStatus 0 SControl 300)
> [ 14.636019] ata6: SATA link down (SStatus 0 SControl 300)
>
> with 3.2:
> [ 3.218156] sd 0:0:0:0: [sda] Attached SCSI disk
> [ 3.484023] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 3.487708] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> [ 3.489156] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
> [ 3.494321] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> [ 3.495761] ata2.00: configured for UDMA/33
> [ 3.501057] scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ862AS 1.21 PQ: 0 ANSI: 5
> [ 3.504515] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> [ 3.505995] cdrom: Uniform CD-ROM driver Revision: 3.20
> [ 3.507611] sr 1:0:0:0: Attached scsi CD-ROM sr0
> [ 3.507715] sr 1:0:0:0: Attached scsi generic sg1 type 5
> [ 3.828025] ata5: SATA link down (SStatus 0 SControl 300)
> [ 4.148020] ata6: SATA link down (SStatus 0 SControl 300)
>
>
> Is there anything I can do here?
>

2012-02-03 01:15:23

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Thu, 2012-02-02 at 14:08 +0530, Srivatsa S. Bhat wrote:
> Adding some Cc's.
>
> On 02/02/2012 10:42 AM, Norbert Preining wrote:
>
> > Dear all,
> >
> > (please Cc)
> >
> > since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
> > time, namely the detection of the CD drive hangs for 10sec (both
> > dmesg were taken on the same hardware on the same day):
> >
> > with 3.3-rc1 and rc2:
> > [ 3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
> > [ 9.004013] ata2: link is slow to respond, please be patient (ready=0)
> > [ 13.652013] ata2: COMRESET failed (errno=-16)
> > [ 13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

I'm looking into this problem.
But I can't reproduce this regression.

Could you attach the full 3.3-rc2 dmesg?

Thanks,
Lin Ming

> > [ 13.975721] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> > [ 13.977166] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
> > [ 13.981294] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> > [ 13.982734] ata2.00: configured for UDMA/33
> > [ 13.987964] scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ862AS 1.21 PQ: 0 ANSI: 5
> > [ 13.991482] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> > [ 13.992971] cdrom: Uniform CD-ROM driver Revision: 3.20
> > [ 13.994574] sr 1:0:0:0: Attached scsi CD-ROM sr0
> > [ 13.994672] sr 1:0:0:0: Attached scsi generic sg1 type 5
> > [ 14.316021] ata5: SATA link down (SStatus 0 SControl 300)
> > [ 14.636019] ata6: SATA link down (SStatus 0 SControl 300)
> >
> > with 3.2:
> > [ 3.218156] sd 0:0:0:0: [sda] Attached SCSI disk
> > [ 3.484023] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > [ 3.487708] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> > [ 3.489156] ata2.00: ATAPI: MATSHITADVD-RAM UJ862AS, 1.21, max UDMA/33
> > [ 3.494321] ata2.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
> > [ 3.495761] ata2.00: configured for UDMA/33
> > [ 3.501057] scsi 1:0:0:0: CD-ROM MATSHITA DVD-RAM UJ862AS 1.21 PQ: 0 ANSI: 5
> > [ 3.504515] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
> > [ 3.505995] cdrom: Uniform CD-ROM driver Revision: 3.20
> > [ 3.507611] sr 1:0:0:0: Attached scsi CD-ROM sr0
> > [ 3.507715] sr 1:0:0:0: Attached scsi generic sg1 type 5
> > [ 3.828025] ata5: SATA link down (SStatus 0 SControl 300)
> > [ 4.148020] ata6: SATA link down (SStatus 0 SControl 300)
> >
> >
> > Is there anything I can do here?
> >
>

2012-02-03 04:21:38

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fri, 2012-02-03 at 09:15 +0800, Lin Ming wrote:
> On Thu, 2012-02-02 at 14:08 +0530, Srivatsa S. Bhat wrote:
> > Adding some Cc's.
> >
> > On 02/02/2012 10:42 AM, Norbert Preining wrote:
> >
> > > Dear all,
> > >
> > > (please Cc)
> > >
> > > since going from 3.2 to 3.3-rc1 I see a kind of regression wrt booting
> > > time, namely the detection of the CD drive hangs for 10sec (both
> > > dmesg were taken on the same hardware on the same day):
> > >
> > > with 3.3-rc1 and rc2:
> > > [ 3.694012] sd 0:0:0:0: [sda] Attached SCSI disk
> > > [ 9.004013] ata2: link is slow to respond, please be patient (ready=0)
> > > [ 13.652013] ata2: COMRESET failed (errno=-16)
> > > [ 13.972022] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>
> I'm looking into this problem.
> But I can't reproduce this regression.
>
> Could you attach the full 3.3-rc2 dmesg?

And could you do a bisect?

First, you can check if commit 318893e is good or bad.

If it's bad, then you only need to do a bisect between v3.2 and 318893e.
Otherwise, you need to do a bisect between v3.2 and v3.3-rc1.

Thanks,
Lin Ming

2012-02-03 04:29:42

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fr, 03 Feb 2012, Lin Ming wrote:
> Could you attach the full 3.3-rc2 dmesg?

Attached.

> First, you can check if commit 318893e is good or bad.

Will try.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
Another world, another day, another dawn.
--- Douglas Adams, The Hitchhikers Guide to the Galaxy

2012-02-03 04:30:03

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fr, 03 Feb 2012, Lin Ming wrote:
> Could you attach the full 3.3-rc2 dmesg?

Attached

> And could you do a bisect?

Will try.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
He dropped his voice still lower. In the stillness, a fly
would not have dared cleat its throat.
--- Douglas Adams, The Hitchhikers Guide to the Galaxy


Attachments:
(No filename) (686.00 B)
dmesg-3.3-rc2 (57.36 kB)
Download all attachments

2012-02-03 05:25:07

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fr, 03 Feb 2012, Lin Ming wrote:
> And could you do a bisect?

Done that. First failing commit is:
-----
commit 7faa33da9b7add01db9f1ad92c6a5d9145e940a7
Author: Tejun Heo <[email protected]>
Date: Fri Jul 22 11:41:26 2011 +0200

ahci: start engine only during soft/hard resets

Previous commit booted without delay.

I didn't try to revert that commit on top of HEAD.

Suggestions?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
Program aborting:
Close all that you have worked on.
You ask far too much.
--- Windows Error Haiku

2012-02-03 05:34:36

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fri, 2012-02-03 at 14:24 +0900, Norbert Preining wrote:
> On Fr, 03 Feb 2012, Lin Ming wrote:
> > And could you do a bisect?
>
> Done that. First failing commit is:
> -----
> commit 7faa33da9b7add01db9f1ad92c6a5d9145e940a7
> Author: Tejun Heo <[email protected]>
> Date: Fri Jul 22 11:41:26 2011 +0200
>
> ahci: start engine only during soft/hard resets
>
> Previous commit booted without delay.
>
> I didn't try to revert that commit on top of HEAD.

Please revert that commit to test.
That helps us to make sure we find the exact bad commit.

Thanks.

>
> Suggestions?
>
> Best wishes
>
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan TeX Live & Debian Developer
> DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
> ------------------------------------------------------------------------
> Program aborting:
> Close all that you have worked on.
> You ask far too much.
> --- Windows Error Haiku

2012-02-03 05:43:41

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fr, 03 Feb 2012, Lin Ming wrote:
> > I didn't try to revert that commit on top of HEAD.
>
> Please revert that commit to test.
> That helps us to make sure we find the exact bad commit.

Confirmed. Reverted 7faa33da9b7 on top of 6c073a7ee250 made
the boot delay go away. dmesg from this boot attached.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
PELUTHO (n.) A South American ball game. The balls are whacked against
a brick wall with a stout wooden bat until the prisoner confesses.
--- Douglas Adams, The Meaning of Liff


Attachments:
(No filename) (873.00 B)
dmesg-3.3-rc2-revert (41.32 kB)
Download all attachments

2012-02-03 08:27:44

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Fri, 2012-02-03 at 14:43 +0900, Norbert Preining wrote:
> On Fr, 03 Feb 2012, Lin Ming wrote:
> > > I didn't try to revert that commit on top of HEAD.
> >
> > Please revert that commit to test.
> > That helps us to make sure we find the exact bad commit.
>
> Confirmed. Reverted 7faa33da9b7 on top of 6c073a7ee250 made
> the boot delay go away. dmesg from this boot attached.

Dig into the code, but I can't find where the problem is.

Anyway, does below DEBUG patch help?
Let's always stop the engine during hard reset.

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index a72bfd0..8fef702 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -578,10 +578,6 @@ int ahci_stop_engine(struct ata_port *ap)

tmp = readl(port_mmio + PORT_CMD);

- /* check if the HBA is idle */
- if ((tmp & (PORT_CMD_START | PORT_CMD_LIST_ON)) == 0)
- return 0;
-
/* setting HBA to idle */
tmp &= ~PORT_CMD_START;
writel(tmp, port_mmio + PORT_CMD);


>
> Best wishes
>
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan TeX Live & Debian Developer
> DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
> ------------------------------------------------------------------------
> PELUTHO (n.) A South American ball game. The balls are whacked against
> a brick wall with a stout wooden bat until the prisoner confesses.
> --- Douglas Adams, The Meaning of Liff

2012-02-06 00:46:26

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Hi Lin,


sorry for the delay, weekend I was off ...

On Fr, 03 Feb 2012, Lin Ming wrote:
> > Confirmed. Reverted 7faa33da9b7 on top of 6c073a7ee250 made
> > the boot delay go away. dmesg from this boot attached.
>
> Dig into the code, but I can't find where the problem is.
>
> Anyway, does below DEBUG patch help?
> Let's always stop the engine during hard reset.

If you meant:
"Try that patch on top of HEAD *without* reverting 7faa33da9b7?"
then I can report that it does NOT help. With *only* this patch I still
get 10sec delay, and otherwise nothing changes.


Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
YESNABY (n.)
A 'yes, maybe' which means 'no'.
--- Douglas Adams, The Meaning of Liff

2012-02-06 01:37:02

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mon, 2012-02-06 at 09:46 +0900, Norbert Preining wrote:
> Hi Lin,
>
>
> sorry for the delay, weekend I was off ...
>
> On Fr, 03 Feb 2012, Lin Ming wrote:
> > > Confirmed. Reverted 7faa33da9b7 on top of 6c073a7ee250 made
> > > the boot delay go away. dmesg from this boot attached.
> >
> > Dig into the code, but I can't find where the problem is.
> >
> > Anyway, does below DEBUG patch help?
> > Let's always stop the engine during hard reset.
>
> If you meant:
> "Try that patch on top of HEAD *without* reverting 7faa33da9b7?"
> then I can report that it does NOT help. With *only* this patch I still
> get 10sec delay, and otherwise nothing changes.

Does below help?

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index a72bfd0..33f7333 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -746,6 +746,9 @@ static void ahci_start_port(struct ata_port *ap)
/* enable FIS reception */
ahci_start_fis_rx(ap);

+ /* enable DMA */
+ ahci_start_engine(ap);
+
/* turn on LEDs */
if (ap->flags & ATA_FLAG_EM) {
ata_for_each_link(link, ap, EDGE) {



2012-02-06 02:41:06

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mo, 06 Feb 2012, Lin Ming wrote:
> > > Anyway, does below DEBUG patch help?
> > > Let's always stop the engine during hard reset.
> >
> > If you meant:
> > "Try that patch on top of HEAD *without* reverting 7faa33da9b7?"
> > then I can report that it does NOT help. With *only* this patch I still
> > get 10sec delay, and otherwise nothing changes.
>
> Does below help?

In which combination - with or without the first patch?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
HERSTMONCEUX (n.)
The correct name for the gold medallion worn by someone who is in the
habit of wearing their shirt open to the waist.
--- Douglas Adams, The Meaning of Liff

2012-02-06 02:49:55

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mon, 2012-02-06 at 11:40 +0900, Norbert Preining wrote:
> On Mo, 06 Feb 2012, Lin Ming wrote:
> > > > Anyway, does below DEBUG patch help?
> > > > Let's always stop the engine during hard reset.
> > >
> > > If you meant:
> > > "Try that patch on top of HEAD *without* reverting 7faa33da9b7?"
> > > then I can report that it does NOT help. With *only* this patch I still
> > > get 10sec delay, and otherwise nothing changes.
> >
> > Does below help?
>
> In which combination - with or without the first patch?

Without.

Only this patch on top of HEAD *without* reverting 7faa33da9b7.

>
> Best wishes
>
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan TeX Live & Debian Developer
> DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
> ------------------------------------------------------------------------
> HERSTMONCEUX (n.)
> The correct name for the gold medallion worn by someone who is in the
> habit of wearing their shirt open to the waist.
> --- Douglas Adams, The Meaning of Liff

2012-02-06 03:15:40

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mo, 06 Feb 2012, Lin Ming wrote:
> Only this patch on top of HEAD *without* reverting 7faa33da9b7.

Works. Am I right that it differs from 7faa33da9b7 only in that
the later also changes:
@@ -2019,7 +2022,7 @@ static int ahci_port_suspend(struct ata_port *ap, pm_message_t mesg)
ahci_power_down(ap);
else {
ata_port_err(ap, "%s (%d)\n", emsg, rc);
- ata_port_freeze(ap);
+ ahci_start_port(ap);
}


Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
HOVE (adj.)
Descriptive of the expression seen on the face of one person in the
presence of another who clearly isn't going to stop talking for a very
long time.
--- Douglas Adams, The Meaning of Liff

2012-02-06 04:43:00

by Lin Ming

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mon, 2012-02-06 at 12:15 +0900, Norbert Preining wrote:
> On Mo, 06 Feb 2012, Lin Ming wrote:
> > Only this patch on top of HEAD *without* reverting 7faa33da9b7.
>
> Works. Am I right that it differs from 7faa33da9b7 only in that
> the later also changes:

Right.

Tejun,

This regression is caused by 7faa33da9b7(ahci: start engine only during
soft/hard resets).

But I can't reproduce it.

What's your guess?

Thanks,
Lin Ming

> @@ -2019,7 +2022,7 @@ static int ahci_port_suspend(struct ata_port *ap, pm_message_t mesg)
> ahci_power_down(ap);
> else {
> ata_port_err(ap, "%s (%d)\n", emsg, rc);
> - ata_port_freeze(ap);
> + ahci_start_port(ap);
> }
>
>
> Best wishes
>
> Norbert
> ------------------------------------------------------------------------
> Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
> JAIST, Japan TeX Live & Debian Developer
> DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
> ------------------------------------------------------------------------
> HOVE (adj.)
> Descriptive of the expression seen on the face of one person in the
> presence of another who clearly isn't going to stop talking for a very
> long time.
> --- Douglas Adams, The Meaning of Liff

2012-02-06 16:19:51

by Tejun Heo

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Hello,

(cc'ing Jian Peng, hi)

On Mon, Feb 06, 2012 at 12:42:56PM +0800, Lin Ming wrote:
> On Mon, 2012-02-06 at 12:15 +0900, Norbert Preining wrote:
> > On Mo, 06 Feb 2012, Lin Ming wrote:
> > > Only this patch on top of HEAD *without* reverting 7faa33da9b7.
> >
> > Works. Am I right that it differs from 7faa33da9b7 only in that
> > the later also changes:
>
> Right.
>
> Tejun,
>
> This regression is caused by 7faa33da9b7(ahci: start engine only during
> soft/hard resets).
>
> But I can't reproduce it.
>
> What's your guess?

Urgh.... yeah, following standard can sometimes be silly thing to do.
Jian, I think we'll have to add a flag for your controller and revert
to the original behavior for others. How can your controller be
distinguished?

Thanks.

--
tejun

2012-02-06 19:20:49

by Brian Norris

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mon, Feb 6, 2012 at 8:19 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> (cc'ing Jian Peng, hi)

Hello,

Jian Peng is no longer working on this project; I have taken over for his work.

> On Mon, Feb 06, 2012 at 12:42:56PM +0800, Lin Ming wrote:
>> On Mon, 2012-02-06 at 12:15 +0900, Norbert Preining wrote:
>> > On Mo, 06 Feb 2012, Lin Ming wrote:
>> > > Only this patch on top of HEAD *without* reverting 7faa33da9b7.
>> >
>> > Works. Am I right that it differs from 7faa33da9b7 only in that
>> > the later also changes:
>>
>> Right.
>>
>> Tejun,
>>
>> This regression is caused by 7faa33da9b7(ahci: start engine only during
>> soft/hard resets).
>>
>> But I can't reproduce it.
>>
>> What's your guess?
>
> Urgh.... yeah, following standard can sometimes be silly thing to do.
> Jian, I think we'll have to add a flag for your controller and revert
> to the original behavior for others. ?How can your controller be
> distinguished?

Our controller utilizes the ahci_platform.c driver and does not have a
distinguishing interface ID (e.g., PCI ID). It is maintained in an
out-of-tree distribution, but we would like to have a
standards-compliant solution in mainline so that we do not have to
support a fork of the main codebase.

Is there any possibility of debugging this regression instead of
effectively reverting it for mainline? Or perhaps can I have better
information regarding the hardware on which this regression is seen?
With some time, I can try to debug it further.

I see the following options:
(1) implement a flag that can be passed through ahci_platform; this
would not be very useful, as we would still have to tweak the driver
out of tree.
(2) Drop the fix entirely. This is a spec. violation, but we can
simply try to maintain the fix out-of-tree.
(3) Debug Norbert's hardware problems.

(1) or (2) aren't ideal, and they are effectively very similar. The
only advantage to (1) is that there will be at least some illusion of
our controller's support in future kernel development.

Brian

2012-02-06 22:46:36

by Norbert Preining

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Hi Brian,

On Mo, 06 Feb 2012, Brian Norris wrote:
> Is there any possibility of debugging this regression instead of
> effectively reverting it for mainline? Or perhaps can I have better
> information regarding the hardware on which this regression is seen?

Sure, Sony VAIO Z11 laptop. I have attached several dmesg output
to emails in this thread, that should give you a good idea what
the hardware is. FOr the ahci:
$ lspci -v -s 00:1f.2
00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03) (prog-if 01 [AHCI 1.0])
Subsystem: Sony Corporation Device 9025
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 44
I/O ports at 7128 [size=8]
I/O ports at 713c [size=4]
I/O ports at 7120 [size=8]
I/O ports at 7138 [size=4]
I/O ports at 7020 [size=32]
Memory at daa25000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ahci

If you need any other information, please let me know.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TeX Live & Debian Developer
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
FIUNARY (n.)
The safe place you put something and then forget where it was.
--- Douglas Adams, The Meaning of Liff

2012-02-08 09:11:14

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mon, 06 Feb 2012 11:20:45 PST, Brian Norris said:

> (3) Debug Norbert's hardware problems.

It's not just Norbert - my Dell Latitude E6500 trips over this as well (didn't
we go through this same song-and-dance *before* with this same exact
patch? Yes we did, back in May 2011:

http://lkml.indiana.edu/hypermail/linux/kernel/1105.1/02970.html

Somebody hand me a stake and mallet to pount through the heart of this
patch so it *stays* dead already.. Geez.. ;)

Norbert's box has this:
00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03) (prog-if 01 [AH
CI 1.0])

While mine apparentlyh has a different PCI ID:
lspci -nn -v -s 00:1f.2
00:1f.2 RAID bus controller [0104]: Intel Corporation Mobile 82801 SATA RAID Controller [8086:282a] (rev 03)
Subsystem: Dell Device [1028:024f]
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 45
I/O ports at 6e70 [size=8]
I/O ports at 6e78 [size=4]
I/O ports at 6e80 [size=8]
I/O ports at 6e88 [size=4]
I/O ports at 6ea0 [size=32]
Memory at fed1c800 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ahci



from dmesg:
[ 1.295050] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.296037] ata1.00: ATA-8: WDC WD1600BJKT-75F4T0, 11.01A11, max UDMA/133
[ 1.296041] ata1.00: 312581808 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1.297051] ata1.00: configured for UDMA/133
[ 1.297773] scsi 0:0:0:0: Direct-Access ATA WDC WD1600BJKT-7 11.0 PQ: 0 ANSI: 5
[ 1.298623] sd 0:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
[ 1.298811] sd 0:0:0:0: [sda] Write Protect is off
[ 1.298816] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.298891] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1.329096] sda: sda1 sda2
[ 1.330321] sd 0:0:0:0: [sda] Attached SCSI disk
[ 6.652045] ata2: link is slow to respond, please be patient (ready=0)
[ 11.344043] ata2: COMRESET failed (errno=-16)
[ 11.649049] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 11.650654] ata2.00: ATAPI: MATSHITA DVD+/-RW UJ892, 1.01, max UDMA/100
[ 11.653036] ata2.00: configured for UDMA/100
[ 11.656392] scsi 1:0:0:0: CD-ROM MATSHITA DVD+-RW UJ892 1.01 PQ: 0 ANSI: 5
[ 11.659202] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[ 11.659206] cdrom: Uniform CD-ROM driver Revision: 3.20
[ 11.659794] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 11.965047] ata5: SATA link down (SStatus 0 SControl 300)
[ 12.270047] ata6: SATA link down (SStatus 0 SControl 300)




Attachments:
(No filename) (865.00 B)

2012-02-13 17:44:54

by Tejun Heo

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Hello,

On Mon, Feb 06, 2012 at 11:20:45AM -0800, Brian Norris wrote:
> I see the following options:
> (1) implement a flag that can be passed through ahci_platform; this
> would not be very useful, as we would still have to tweak the driver
> out of tree.

Yeah, please add module param to make this behavior conditional.

> (2) Drop the fix entirely. This is a spec. violation, but we can
> simply try to maintain the fix out-of-tree.

Nothing is perfect and real hardware should come before spec.

Thanks.

--
tejun

2012-02-15 05:15:34

by Brian Norris

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Mon, Feb 13, 2012 at 9:44 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Mon, Feb 06, 2012 at 11:20:45AM -0800, Brian Norris wrote:
>> I see the following options:
>> (1) implement a flag that can be passed through ahci_platform; this
>> would not be very useful, as we would still have to tweak the driver
>> out of tree.
>
> Yeah, please add module param to make this behavior conditional.

Perhaps a module param (for ahci_platform) that sets a flag in
ata_port_info? I'm not sure if/how I'm allowed to introduce new ATA
flags...

>> (2) Drop the fix entirely. This is a spec. violation, but we can
>> simply try to maintain the fix out-of-tree.
>
> Nothing is perfect and real hardware should come before spec.

You have ignored my option (3): to fix the observed problems. I
already tested this last patch against the previous regression
reports, found here:
http://marc.info/?l=linux-ide&m=130529205513940&w=2

However, I had not noticed that the delayed DVD drive recognition was
a separate issue. Since then, I've noticed that I have tested the
exact same AHCI controller as Valdis' issue (a little different than
Norbert's) but the key difference is the MATSHITA DVD drive (both
Norbert and Valdis have MATSHITA). The same controller works with
non-MATSHITA DVD.

I also noticed that Mark Lord commented that he needed the patch in
question. Is this a different controller that needs my same fix?
http://marc.info/?l=linux-ide&m=131967244009803&w=2

So it appears that we are weighing the MATSHITA DVD issues against the
issues seen by me and possibly Mark Lord. If the decision really
stands that finding a unified solution is impossible, then I can just
drop the issue and make conditional behavior.

Brian

2012-02-15 16:57:16

by Tejun Heo

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Hello,

On Tue, Feb 14, 2012 at 09:15:32PM -0800, Brian Norris wrote:
> Perhaps a module param (for ahci_platform) that sets a flag in
> ata_port_info? I'm not sure if/how I'm allowed to introduce new ATA
> flags...

I think adding a module param directly to libahci.c should do. Just
add it after ignore_sss and apply it to all ahci's on the host.

> So it appears that we are weighing the MATSHITA DVD issues against the
> issues seen by me and possibly Mark Lord. If the decision really
> stands that finding a unified solution is impossible, then I can just
> drop the issue and make conditional behavior.

My memory is already fuzzy but strict adherence to the spec didn't
make whole lot of sense to me. I think I wrote several times on the
issue already. Plus, we're talking about introducing regressions to
generic x86 setups against few specific platforms. If somebody can
come up with generic solution, fine, but for now, let's just revert to
the original behavior, please.

Thanks.

--
tejun

2012-02-15 18:29:35

by Jeff Garzik

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On 02/15/2012 11:57 AM, Tejun Heo wrote:
> Hello,
>
> On Tue, Feb 14, 2012 at 09:15:32PM -0800, Brian Norris wrote:
>> Perhaps a module param (for ahci_platform) that sets a flag in
>> ata_port_info? I'm not sure if/how I'm allowed to introduce new ATA
>> flags...
>
> I think adding a module param directly to libahci.c should do. Just
> add it after ignore_sss and apply it to all ahci's on the host.

A module parameter is not necessarily the best/only option.
ahci_platform already has infrastructure set up to deal with
platform-specific quirks. An internal flag seems more appropriate to
enable automatic handling of this on the specific platforms where it
applies (plus the revert Tejun has already mentioned).

Nothing wrong with debugging the regression further (Brian's option #3),
but in the meantime we need to be actively using the best known working
state, which means making the "fix" opt-in rather than unconditional or
opt-out.

Jeff




2012-02-15 18:31:48

by Tejun Heo

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

Hey, Jeff.

On Wed, Feb 15, 2012 at 01:29:29PM -0500, Jeff Garzik wrote:
> A module parameter is not necessarily the best/only option.
> ahci_platform already has infrastructure set up to deal with
> platform-specific quirks. An internal flag seems more appropriate
> to enable automatic handling of this on the specific platforms where
> it applies (plus the revert Tejun has already mentioned).

The problem is that there's no way to identify the controller in
question, so we can't do this automatically, so might just as well do
it in the simplest way for now. :(

Thanks.

--
tejun

2012-02-15 19:18:17

by Jeff Garzik

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On 02/15/2012 01:31 PM, Tejun Heo wrote:
> Hey, Jeff.
>
> On Wed, Feb 15, 2012 at 01:29:29PM -0500, Jeff Garzik wrote:
>> A module parameter is not necessarily the best/only option.
>> ahci_platform already has infrastructure set up to deal with
>> platform-specific quirks. An internal flag seems more appropriate
>> to enable automatic handling of this on the specific platforms where
>> it applies (plus the revert Tejun has already mentioned).
>
> The problem is that there's no way to identify the controller in
> question, so we can't do this automatically, so might just as well do
> it in the simplest way for now. :(

See ahci_devtype[] and ahci_port_info[] in ahci_platform.c for how to do
this. Brian would not have to tweak the driver out of tree as claimed;
we put all changes in tree, and the platform calls itself
"spec-strict-ahci" or whatever string you prefer.

Jeff


2012-02-15 23:39:11

by Brian Norris

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On Wed, Feb 15, 2012 at 11:18 AM, Jeff Garzik <[email protected]> wrote:
> On 02/15/2012 01:31 PM, Tejun Heo wrote:
>> The problem is that there's no way to identify the controller in
>> question, so we can't do this automatically, so might just as well do
>> it in the simplest way for now. ?:(
>
>
> See ahci_devtype[] and ahci_port_info[] in ahci_platform.c for how to do
> this. ?Brian would not have to tweak the driver out of tree as claimed; we
> put all changes in tree, and the platform calls itself "spec-strict-ahci" or
> whatever string you prefer.

OK, so can it seems I would add a flag in include/linux/libata.h that
can be added to a new entry in ahci_port_info[]? It seems a little
awkward to put in "libata.h", plus I'm not sure if all the flags have
direct meaning to ATA or if they can be AHCI-specific.

Brian

2012-02-16 14:30:01

by Mark Lord

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On 12-02-15 01:31 PM, Tejun Heo wrote:
> Hey, Jeff.
>
> On Wed, Feb 15, 2012 at 01:29:29PM -0500, Jeff Garzik wrote:
>> A module parameter is not necessarily the best/only option.
>> ahci_platform already has infrastructure set up to deal with
>> platform-specific quirks. An internal flag seems more appropriate
>> to enable automatic handling of this on the specific platforms where
>> it applies (plus the revert Tejun has already mentioned).
>
> The problem is that there's no way to identify the controller in
> question, so we can't do this automatically, so might just as well do
> it in the simplest way for now. :(

Well, a module parameter is no good,
because that method would affect all attached controllers
rather than just the one(s) with the issue.

Something in sysfs perhaps.

2012-02-16 16:19:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On 02/16/2012 09:20 AM, Mark Lord wrote:
> On 12-02-15 01:31 PM, Tejun Heo wrote:
>> Hey, Jeff.
>>
>> On Wed, Feb 15, 2012 at 01:29:29PM -0500, Jeff Garzik wrote:
>>> A module parameter is not necessarily the best/only option.
>>> ahci_platform already has infrastructure set up to deal with
>>> platform-specific quirks. An internal flag seems more appropriate
>>> to enable automatic handling of this on the specific platforms where
>>> it applies (plus the revert Tejun has already mentioned).
>>
>> The problem is that there's no way to identify the controller in
>> question, so we can't do this automatically, so might just as well do
>> it in the simplest way for now. :(
>
> Well, a module parameter is no good,
> because that method would affect all attached controllers
> rather than just the one(s) with the issue.
>
> Something in sysfs perhaps.

See the method already described in my previous message.
grep for IMX53_AHCI.

Jeff

2012-02-16 16:22:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: Regression 3.2 -> 3.3-rc1 10 sec hang at boot and resume, COMRESET failed

On 02/15/2012 06:39 PM, Brian Norris wrote:
> On Wed, Feb 15, 2012 at 11:18 AM, Jeff Garzik<[email protected]> wrote:
>> On 02/15/2012 01:31 PM, Tejun Heo wrote:
>>> The problem is that there's no way to identify the controller in
>>> question, so we can't do this automatically, so might just as well do
>>> it in the simplest way for now. :(
>>
>>
>> See ahci_devtype[] and ahci_port_info[] in ahci_platform.c for how to do
>> this. Brian would not have to tweak the driver out of tree as claimed; we
>> put all changes in tree, and the platform calls itself "spec-strict-ahci" or
>> whatever string you prefer.
>
> OK, so can it seems I would add a flag in include/linux/libata.h that
> can be added to a new entry in ahci_port_info[]? It seems a little
> awkward to put in "libata.h", plus I'm not sure if all the flags have
> direct meaning to ATA or if they can be AHCI-specific.

Nothing needs to go into libata.h. grep for IMX53_AHCI, and see how
that differentiates ata_port_info information. There you may vary
AHCI-specific flags (defined in ahci.h), port operations, or anything
else specific to the controller.

Create a flag, add it to ahci.h, guard the spec-strict behavior creating
problems with said flag, and apply said flag to your controller in
ahci_platform.

Jeff