Hello,
Is it possible that OOM isn't handled very well if say, my entire
file system structure is on a USB storage device?
I'm not an expert on this particular matter but I'm pretty sure
that I noticed GFP_KERNEL allocation being done on the write-out
path in the usb-storage kernel thread, leading to a deadlock
during OOM.
Suggestions are welcomed...
--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
On Sat, 17 Feb 2007, Dan Aloni wrote:
> Hello,
>
> Is it possible that OOM isn't handled very well if say, my entire
> file system structure is on a USB storage device?
>
> I'm not an expert on this particular matter but I'm pretty sure
> that I noticed GFP_KERNEL allocation being done on the write-out
> path in the usb-storage kernel thread, leading to a deadlock
> during OOM.
Can you be any more specific than that? usb-storage should use only
GFP_NOIO in its I/O paths.
Alan Stern
Alan Stern wrote:
> On Sat, 17 Feb 2007, Dan Aloni wrote:
>
>
>> Hello,
>>
>> Is it possible that OOM isn't handled very well if say, my entire
>> file system structure is on a USB storage device?
>>
>> I'm not an expert on this particular matter but I'm pretty sure
>> that I noticed GFP_KERNEL allocation being done on the write-out
>> path in the usb-storage kernel thread, leading to a deadlock
>> during OOM.
>>
>
> Can you be any more specific than that? usb-storage should use only
> GFP_NOIO in its I/O paths.
>
>
>
You are right, I looked over this state with kdb, and usb-storage
waited in usb_stor_bulk_transfer_sg, which does pass GFP_NOIO
at this scenario.
It looked suspicious though, because OOM handling was invoked
from many processes, and it didn't print about any process being
killed and it didn't complain about no processes to kill either.
(I'll look more into this, perhaps there's an OOM handling bug)
BTW, soft-rebooting the machine in that state made the USB
storage device (LEXAR, JD LIGHTNING II) inaccessible to the
BIOS. I had to do a complete power cycle in order or the BIOS
to see it again.
--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
Dan Aloni wrote:
> Alan Stern wrote:
[...]
>> Can you be any more specific than that? usb-storage should use only
>> GFP_NOIO in its I/O paths.
>>
>>
>>
> You are right, I looked over this state with kdb, and usb-storage
> waited in usb_stor_bulk_transfer_sg, which does pass GFP_NOIO
> at this scenario.
>
> It looked suspicious though, because OOM handling was invoked
> from many processes, and it didn't print about any process being
> killed and it didn't complain about no processes to kill either.
Hmm, I'm pretty sure I stomped over this (from select_bad_process()):
/*
* This task already has access to memory reserves and is
* being killed. Don't allow any other task access to the
* memory reserve.
*
* Note: this may have a chance of deadlock if it gets
* blocked waiting for another task which itself is waiting
* for memory. Is there a better alternative?
*/
if (test_tsk_thread_flag(p, TIF_MEMDIE))
return ERR_PTR(-1UL);
Which might explains why the OOM handling was behaving like it did.
It would have been nice if it at least printed "OOM: I'm in a deadlock,
please FIXME...".
--
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
On Sat, 17 Feb 2007, Dan Aloni wrote:
> You are right, I looked over this state with kdb, and usb-storage
> waited in usb_stor_bulk_transfer_sg, which does pass GFP_NOIO
> at this scenario.
...
> BTW, soft-rebooting the machine in that state made the USB
> storage device (LEXAR, JD LIGHTNING II) inaccessible to the
> BIOS. I had to do a complete power cycle in order or the BIOS
> to see it again.
That's the fault of the BIOS and/or the Lexar device. The BIOS ought to
do a complete reset of the USB controller and a reset of the device. If
the device remains unusable after that, there isn't anything we can do
about it.
Alan Stern