Hi Jeff,
Since 2.6.29 I'm having intermittent problems with booting kernels.
After supposedly waiting for scsi async scan to complete,
quite frequently I see errors from the init resume process
when it fails to find the swap partition and later the root
file system fails to load.
A workaround that I found to be helpful is booting the kernel
with scsi_mod.scan=sync so I suspect sata_nv has a problem
with asynchronous scanning.
The two machines this happens on have:
IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
and
IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
This one: IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
seems to be less prone to the problem, though I tend to reboot it
much less frequently, so it's possible I just happened to be lucky with it.
Benny
Benny Halevy wrote:
> Hi Jeff,
>
> Since 2.6.29 I'm having intermittent problems with booting kernels.
> After supposedly waiting for scsi async scan to complete,
> quite frequently I see errors from the init resume process
> when it fails to find the swap partition and later the root
> file system fails to load.
>
> A workaround that I found to be helpful is booting the kernel
> with scsi_mod.scan=sync so I suspect sata_nv has a problem
> with asynchronous scanning.
Personally, I think the whole system is broken, so continue to use this
workaround until it gets fixed upstream. This sounds like some timing
issues related to waiting for the device probe to finish, something that
people keep breaking (witness my USB flash drive boot breakage).
Jeff
On Tue, Apr 21, 2009 at 06:27:56AM -0400, Jeff Garzik wrote:
> Benny Halevy wrote:
> >Hi Jeff,
> >
> >Since 2.6.29 I'm having intermittent problems with booting kernels.
> >After supposedly waiting for scsi async scan to complete,
> >quite frequently I see errors from the init resume process
> >when it fails to find the swap partition and later the root
> >file system fails to load.
> >
> >A workaround that I found to be helpful is booting the kernel
> >with scsi_mod.scan=sync so I suspect sata_nv has a problem
> >with asynchronous scanning.
>
> Personally, I think the whole system is broken, so continue to use this
> workaround until it gets fixed upstream. This sounds like some timing
> issues related to waiting for the device probe to finish, something that
> people keep breaking (witness my USB flash drive boot breakage).
No, it's 4ace92fc112c6069b4fcb95a31d3142d4a43ff2a.
Specifically, this bit:
@@ -179,6 +180,8 @@ int scsi_complete_async_scans(void)
spin_unlock(&async_scan_lock);
kfree(data);
+ /* Synchronize async operations globally */
+ async_synchronize_full();
return 0;
}
Vegard Nossum has a patch that seems to have been ignored:
http://marc.info/?l=linux-kernel&m=123920746830420&w=2
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
On Apr. 21, 2009, 14:26 +0300, Matthew Wilcox <[email protected]> wrote:
> On Tue, Apr 21, 2009 at 06:27:56AM -0400, Jeff Garzik wrote:
>> Benny Halevy wrote:
>>> Hi Jeff,
>>>
>>> Since 2.6.29 I'm having intermittent problems with booting kernels.
>>> After supposedly waiting for scsi async scan to complete,
>>> quite frequently I see errors from the init resume process
>>> when it fails to find the swap partition and later the root
>>> file system fails to load.
>>>
>>> A workaround that I found to be helpful is booting the kernel
>>> with scsi_mod.scan=sync so I suspect sata_nv has a problem
>>> with asynchronous scanning.
>> Personally, I think the whole system is broken, so continue to use this
>> workaround until it gets fixed upstream. This sounds like some timing
>> issues related to waiting for the device probe to finish, something that
>> people keep breaking (witness my USB flash drive boot breakage).
>
> No, it's 4ace92fc112c6069b4fcb95a31d3142d4a43ff2a.
>
> Specifically, this bit:
>
> @@ -179,6 +180,8 @@ int scsi_complete_async_scans(void)
> spin_unlock(&async_scan_lock);
>
> kfree(data);
> + /* Synchronize async operations globally */
> + async_synchronize_full();
> return 0;
> }
>
> Vegard Nossum has a patch that seems to have been ignored:
>
> http://marc.info/?l=linux-kernel&m=123920746830420&w=2
>
Hmm, it might help somewhat but my test machine still failed
to boot 2 out of 5 times with this patch.
Benny