2009-04-21 07:04:18

by Benny Halevy

[permalink] [raw]
Subject: Is sata_nv compatible with async scsi scan?

Hi Jeff,

Since 2.6.29 I'm having intermittent problems with booting kernels.
After supposedly waiting for scsi async scan to complete,
quite frequently I see errors from the init resume process
when it fails to find the swap partition and later the root
file system fails to load.

A workaround that I found to be helpful is booting the kernel
with scsi_mod.scan=sync so I suspect sata_nv has a problem
with asynchronous scanning.

The two machines this happens on have:
IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)

and

IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)

This one: IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
seems to be less prone to the problem, though I tend to reboot it
much less frequently, so it's possible I just happened to be lucky with it.

Benny


2009-04-21 10:28:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: Is sata_nv compatible with async scsi scan?

Benny Halevy wrote:
> Hi Jeff,
>
> Since 2.6.29 I'm having intermittent problems with booting kernels.
> After supposedly waiting for scsi async scan to complete,
> quite frequently I see errors from the init resume process
> when it fails to find the swap partition and later the root
> file system fails to load.
>
> A workaround that I found to be helpful is booting the kernel
> with scsi_mod.scan=sync so I suspect sata_nv has a problem
> with asynchronous scanning.

Personally, I think the whole system is broken, so continue to use this
workaround until it gets fixed upstream. This sounds like some timing
issues related to waiting for the device probe to finish, something that
people keep breaking (witness my USB flash drive boot breakage).

Jeff


2009-04-21 11:26:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: Is sata_nv compatible with async scsi scan?

On Tue, Apr 21, 2009 at 06:27:56AM -0400, Jeff Garzik wrote:
> Benny Halevy wrote:
> >Hi Jeff,
> >
> >Since 2.6.29 I'm having intermittent problems with booting kernels.
> >After supposedly waiting for scsi async scan to complete,
> >quite frequently I see errors from the init resume process
> >when it fails to find the swap partition and later the root
> >file system fails to load.
> >
> >A workaround that I found to be helpful is booting the kernel
> >with scsi_mod.scan=sync so I suspect sata_nv has a problem
> >with asynchronous scanning.
>
> Personally, I think the whole system is broken, so continue to use this
> workaround until it gets fixed upstream. This sounds like some timing
> issues related to waiting for the device probe to finish, something that
> people keep breaking (witness my USB flash drive boot breakage).

No, it's 4ace92fc112c6069b4fcb95a31d3142d4a43ff2a.

Specifically, this bit:

@@ -179,6 +180,8 @@ int scsi_complete_async_scans(void)
spin_unlock(&async_scan_lock);

kfree(data);
+ /* Synchronize async operations globally */
+ async_synchronize_full();
return 0;
}

Vegard Nossum has a patch that seems to have been ignored:

http://marc.info/?l=linux-kernel&m=123920746830420&w=2

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-04-21 12:30:50

by Benny Halevy

[permalink] [raw]
Subject: Re: Is sata_nv compatible with async scsi scan?

On Apr. 21, 2009, 14:26 +0300, Matthew Wilcox <[email protected]> wrote:
> On Tue, Apr 21, 2009 at 06:27:56AM -0400, Jeff Garzik wrote:
>> Benny Halevy wrote:
>>> Hi Jeff,
>>>
>>> Since 2.6.29 I'm having intermittent problems with booting kernels.
>>> After supposedly waiting for scsi async scan to complete,
>>> quite frequently I see errors from the init resume process
>>> when it fails to find the swap partition and later the root
>>> file system fails to load.
>>>
>>> A workaround that I found to be helpful is booting the kernel
>>> with scsi_mod.scan=sync so I suspect sata_nv has a problem
>>> with asynchronous scanning.
>> Personally, I think the whole system is broken, so continue to use this
>> workaround until it gets fixed upstream. This sounds like some timing
>> issues related to waiting for the device probe to finish, something that
>> people keep breaking (witness my USB flash drive boot breakage).
>
> No, it's 4ace92fc112c6069b4fcb95a31d3142d4a43ff2a.
>
> Specifically, this bit:
>
> @@ -179,6 +180,8 @@ int scsi_complete_async_scans(void)
> spin_unlock(&async_scan_lock);
>
> kfree(data);
> + /* Synchronize async operations globally */
> + async_synchronize_full();
> return 0;
> }
>
> Vegard Nossum has a patch that seems to have been ignored:
>
> http://marc.info/?l=linux-kernel&m=123920746830420&w=2
>

Hmm, it might help somewhat but my test machine still failed
to boot 2 out of 5 times with this patch.

Benny