Hello I don't have enough knowledge on USB core but I've wondered
why GFP_NOIO has been used in xhci_alloc_dev for
xhci_alloc_virt_device. I found commit ("a6d940dd759b xhci: Use
GFP_NOIO during device reset"). But can we just change GFP_NOIO
to __GFP_RECLAIM | __GFP_FS ?
Please refer to below case.
I got a report from Lee YongTaek <[email protected]> that the
xhci_alloc_virt_device was too slow over 2 seconds only for one page
allocation.
1) It was because kernel version was v4.14 and DMA allocation was
done from CMA(Contiguous Memory Allocator) where CMA region was
almost filled with file page and CMA passes GFP down to page
isolation. And the page isolation only allows file page isolation only to
requests having __GFP_FS.
2) Historically CMA was changed at v4.19 to use GFP_KERNEL
regardless of GFP passed to DMA allocation through the
commit 6518202970c1 "(mm/cma: remove unsupported gfp_mask
parameter from cma_alloc()".
I think pre v4.19 the xhci_alloc_virt_device could be very slow
depending on CMA situation but free to USB deadlock issue. But as of
v4.19, I think, it will be fast but can face the deadlock issue.
Consequently I think to meet the both cases, I think USB can pass
__GFP_FS without __GFP_IO.
If __GFP_FS is passed from USB core, of course, the CMA patch also
need to be changed to pass GFP.
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 005e65922608..38abcd03a1a2 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3893,7 +3893,7 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct
usb_device *udev)
* xhci_discover_or_reset_device(), which may be called as part of
* mass storage driver error handling.
*/
- if (!xhci_alloc_virt_device(xhci, slot_id, udev, GFP_NOIO)) {
+ if (!xhci_alloc_virt_device(xhci, slot_id, udev, __GFP_RECLAIM
| __GFP_FS)) {
xhci_warn(xhci, "Could not allocate xHCI USB device
data structures\n");
goto disable_slot;
}
Thank you
On Sat, May 18, 2019 at 01:02:28AM +0900, Jaewon Kim wrote:
> Hello I don't have enough knowledge on USB core but I've wondered
> why GFP_NOIO has been used in xhci_alloc_dev for
> xhci_alloc_virt_device. I found commit ("a6d940dd759b xhci: Use
> GFP_NOIO during device reset"). But can we just change GFP_NOIO
> to __GFP_RECLAIM | __GFP_FS ?
No. __GFP_FS implies __GFP_IO; you can't set __GFP_FS and clear __GFP_IO.
It seems like the problem you have is using the CMA to do DMA allocation.
Why would you do such a thing?
> Please refer to below case.
>
> I got a report from Lee YongTaek <[email protected]> that the
> xhci_alloc_virt_device was too slow over 2 seconds only for one page
> allocation.
>
> 1) It was because kernel version was v4.14 and DMA allocation was
> done from CMA(Contiguous Memory Allocator) where CMA region was
> almost filled with file page and CMA passes GFP down to page
> isolation. And the page isolation only allows file page isolation only to
> requests having __GFP_FS.
>
> 2) Historically CMA was changed at v4.19 to use GFP_KERNEL
> regardless of GFP passed to DMA allocation through the
> commit 6518202970c1 "(mm/cma: remove unsupported gfp_mask
> parameter from cma_alloc()".
>
> I think pre v4.19 the xhci_alloc_virt_device could be very slow
> depending on CMA situation but free to USB deadlock issue. But as of
> v4.19, I think, it will be fast but can face the deadlock issue.
> Consequently I think to meet the both cases, I think USB can pass
> __GFP_FS without __GFP_IO.
>
> If __GFP_FS is passed from USB core, of course, the CMA patch also
> need to be changed to pass GFP.
Thank you for your comment.
In ARM64 architecture, default CMA region is commonly activated and it
could be used
if no specific memory region is defined. The USB driver in my platform
also uses the
default CMA region as DMA allocation. If using the CMA to do DMA
allocation is improper,
then do you think that the USB driver for my platform should be
changed not to use CMA?
According to my understanding, in CONFIG_DMA_CMA on both v4.14 and v5.0,
__GFP_DIRECT_RECLAIM will try CMA allocation first instead of normal
buddy allocation.
Then it will get default CMA region through dev_get_cma_area API.
Please refer to
dev_get_cma_area code below though I am using v4.14 for my platform.
git show v5.0:include/linux/dma-contiguous.h
61 #ifdef CONFIG_DMA_CMA
62
63 extern struct cma *dma_contiguous_default_area;
64
65 static inline struct cma *dev_get_cma_area(struct device *dev)
66 {
67 if (dev && dev->cma_area)
68 return dev->cma_area;
69 return dma_contiguous_default_area;
70 }
Thank you
2019년 5월 18일 (토) 오전 1:34, Matthew Wilcox <[email protected]>님이 작성:
>
> On Sat, May 18, 2019 at 01:02:28AM +0900, Jaewon Kim wrote:
> > Hello I don't have enough knowledge on USB core but I've wondered
> > why GFP_NOIO has been used in xhci_alloc_dev for
> > xhci_alloc_virt_device. I found commit ("a6d940dd759b xhci: Use
> > GFP_NOIO during device reset"). But can we just change GFP_NOIO
> > to __GFP_RECLAIM | __GFP_FS ?
>
> No. __GFP_FS implies __GFP_IO; you can't set __GFP_FS and clear __GFP_IO.
>
> It seems like the problem you have is using the CMA to do DMA allocation.
> Why would you do such a thing?
>
> > Please refer to below case.
> >
> > I got a report from Lee YongTaek <[email protected]> that the
> > xhci_alloc_virt_device was too slow over 2 seconds only for one page
> > allocation.
> >
> > 1) It was because kernel version was v4.14 and DMA allocation was
> > done from CMA(Contiguous Memory Allocator) where CMA region was
> > almost filled with file page and CMA passes GFP down to page
> > isolation. And the page isolation only allows file page isolation only to
> > requests having __GFP_FS.
> >
> > 2) Historically CMA was changed at v4.19 to use GFP_KERNEL
> > regardless of GFP passed to DMA allocation through the
> > commit 6518202970c1 "(mm/cma: remove unsupported gfp_mask
> > parameter from cma_alloc()".
> >
> > I think pre v4.19 the xhci_alloc_virt_device could be very slow
> > depending on CMA situation but free to USB deadlock issue. But as of
> > v4.19, I think, it will be fast but can face the deadlock issue.
> > Consequently I think to meet the both cases, I think USB can pass
> > __GFP_FS without __GFP_IO.
> >
> > If __GFP_FS is passed from USB core, of course, the CMA patch also
> > need to be changed to pass GFP.
>
>
Folks, you can't just pass arbitary GFP_ flags to dma allocation
routines, beause very often they are not just wrappers around
the page allocator.
So no, you can't just fine grained control the flags, but the
existing code is just as buggy.
Please switch to use memalloc_noio_save() instead.
On So, 2019-05-19 at 22:56 -0700, Christoph Hellwig wrote:
> Folks, you can't just pass arbitary GFP_ flags to dma allocation
> routines, beause very often they are not just wrappers around
> the page allocator.
>
> So no, you can't just fine grained control the flags, but the
> existing code is just as buggy.
>
> Please switch to use memalloc_noio_save() instead.
>
Hi,
we actually do. It is just higher up in the calling path:
int usb_reset_device(struct usb_device *udev)
{
int ret;
int i;
unsigned int noio_flag;
struct usb_port *port_dev;
struct usb_host_config *config = udev->actconfig;
struct usb_hub *hub = usb_hub_to_struct_hub(udev->parent);
if (udev->state == USB_STATE_NOTATTACHED ||
udev->state == USB_STATE_SUSPENDED) {
dev_dbg(&udev->dev, "device reset not allowed in state %d\n",
udev->state);
return -EINVAL;
}
if (!udev->parent) {
/* this requires hcd-specific logic; see ohci_restart() */
dev_dbg(&udev->dev, "%s for root hub!\n", __func__);
return -EISDIR;
}
port_dev = hub->ports[udev->portnum - 1];
/*
* Don't allocate memory with GFP_KERNEL in current
* context to avoid possible deadlock if usb mass
* storage interface or usbnet interface(iSCSI case)
* is included in current configuration. The easist
* approach is to do it for every device reset,
* because the device 'memalloc_noio' flag may have
* not been set before reseting the usb device.
*/
noio_flag = memalloc_noio_save();
So, do we need to audit the mem_flags again?
What are we supposed to use? GFP_KERNEL?
Regards
Oliver
On Mon, May 20, 2019 at 11:09:25AM +0200, Oliver Neukum wrote:
> we actually do. It is just higher up in the calling path:
Perfect!
> So, do we need to audit the mem_flags again?
> What are we supposed to use? GFP_KERNEL?
GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason,
that is the allocation is from irq context or under a spinlock. If you
think you have a case where you think you don't want to block, but it
is not because of the above reasons we need to have a chat about the
details.
On Mon, 20 May 2019, Christoph Hellwig wrote:
> On Mon, May 20, 2019 at 11:09:25AM +0200, Oliver Neukum wrote:
> > we actually do. It is just higher up in the calling path:
>
> Perfect!
>
> > So, do we need to audit the mem_flags again?
> > What are we supposed to use? GFP_KERNEL?
>
> GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason,
> that is the allocation is from irq context or under a spinlock. If you
> think you have a case where you think you don't want to block, but it
> is not because of the above reasons we need to have a chat about the
> details.
What if the allocation requires the kernel to swap some old pages out
to the backing store, but the backing store is on the device that the
driver is managing? The swap can't take place until the current I/O
operation is complete (assuming the driver can handle only one I/O
operation at a time), and the current operation can't complete until
the old pages are swapped out. Result: deadlock.
Isn't that the whole reason for using GFP_NOIO in the first place?
Alan Stern
On Mon, May 20, 2019 at 10:16:57AM -0400, Alan Stern wrote:
> What if the allocation requires the kernel to swap some old pages out
> to the backing store, but the backing store is on the device that the
> driver is managing? The swap can't take place until the current I/O
> operation is complete (assuming the driver can handle only one I/O
> operation at a time), and the current operation can't complete until
> the old pages are swapped out. Result: deadlock.
>
> Isn't that the whole reason for using GFP_NOIO in the first place?
It is, or rather was. As it has been incredibly painful to wire
up the gfp_t argument through some callstacks, most notably the
vmalloc allocator which is used by a lot of the DMA allocators on
non-coherent platforms, we now have the memalloc_noio_save and
memalloc_nofs_save functions that mark a thread as not beeing to
go into I/O / FS reclaim. So even if you use GFP_KERNEL you will
not dip into reclaim with those flags set on the thread.
On Mo, 2019-05-20 at 07:23 -0700, Christoph Hellwig wrote:
> On Mon, May 20, 2019 at 10:16:57AM -0400, Alan Stern wrote:
> > What if the allocation requires the kernel to swap some old pages out
> > to the backing store, but the backing store is on the device that the
> > driver is managing? The swap can't take place until the current I/O
> > operation is complete (assuming the driver can handle only one I/O
> > operation at a time), and the current operation can't complete until
> > the old pages are swapped out. Result: deadlock.
> >
> > Isn't that the whole reason for using GFP_NOIO in the first place?
>
> It is, or rather was. As it has been incredibly painful to wire
> up the gfp_t argument through some callstacks, most notably the
> vmalloc allocator which is used by a lot of the DMA allocators on
> non-coherent platforms, we now have the memalloc_noio_save and
> memalloc_nofs_save functions that mark a thread as not beeing to
> go into I/O / FS reclaim. So even if you use GFP_KERNEL you will
> not dip into reclaim with those flags set on the thread.
OK, but this leaves a question open. Will the GFP_NOIO actually
hurt, if it is used after memalloc_noio_save()?
Regards
Oliver
On Mo, 2019-05-20 at 10:16 -0400, Alan Stern wrote:
> On Mon, 20 May 2019, Christoph Hellwig wrote:
>
> > GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason,
> > that is the allocation is from irq context or under a spinlock. If you
> > think you have a case where you think you don't want to block, but it
> > is not because of the above reasons we need to have a chat about the
> > details.
>
> What if the allocation requires the kernel to swap some old pages out
> to the backing store, but the backing store is on the device that the
> driver is managing? The swap can't take place until the current I/O
> operation is complete (assuming the driver can handle only one I/O
> operation at a time), and the current operation can't complete until
> the old pages are swapped out. Result: deadlock.
>
> Isn't that the whole reason for using GFP_NOIO in the first place?
Hi,
lookig at this it seems to me that we are in danger of a deadlock
- during reset - devices cannot do IO while being reset
covered by the USB layer in usb_reset_device
- resume & restore - devices cannot do IO while suspended
covered by driver core in rpm_callback
- disconnect - a disconnected device cannot do IO
is this a theoretical case or should I do something to
the driver core?
How about changing configurations on USB?
Regards
Oliver
On Tue, May 21, 2019 at 10:54:37AM +0200, Oliver Neukum wrote:
> OK, but this leaves a question open. Will the GFP_NOIO actually
> hurt, if it is used after memalloc_noio_save()?
Unless we have a bug somewhere it should not make any difference,
neither positively nor negatively.
On Tue, 21 May 2019, Oliver Neukum wrote:
> On Mo, 2019-05-20 at 10:16 -0400, Alan Stern wrote:
> > On Mon, 20 May 2019, Christoph Hellwig wrote:
> >
> > > GFP_KERNEL if you can block, GFP_ATOMIC if you can't for a good reason,
> > > that is the allocation is from irq context or under a spinlock. If you
> > > think you have a case where you think you don't want to block, but it
> > > is not because of the above reasons we need to have a chat about the
> > > details.
> >
> > What if the allocation requires the kernel to swap some old pages out
> > to the backing store, but the backing store is on the device that the
> > driver is managing? The swap can't take place until the current I/O
> > operation is complete (assuming the driver can handle only one I/O
> > operation at a time), and the current operation can't complete until
> > the old pages are swapped out. Result: deadlock.
> >
> > Isn't that the whole reason for using GFP_NOIO in the first place?
>
> Hi,
>
> lookig at this it seems to me that we are in danger of a deadlock
>
> - during reset - devices cannot do IO while being reset
> covered by the USB layer in usb_reset_device
> - resume & restore - devices cannot do IO while suspended
> covered by driver core in rpm_callback
> - disconnect - a disconnected device cannot do IO
> is this a theoretical case or should I do something to
> the driver core?
>
> How about changing configurations on USB?
Changing configurations amounts to much the same as disconnecting,
because both operations destroy all the existing interfaces.
Disconnect can arise in two different ways.
Physical hot-unplug: All I/O operations will fail.
Rmmod or unbind: I/O operations will succeed.
The second case is probably okay. The first we can do nothing about.
However, in either case we do need to make sure that memory allocations
do not require any writebacks. This suggests that we need to call
memalloc_noio_save() from within usb_unbind_interface().
Alan Stern
On Di, 2019-05-21 at 10:00 -0400, Alan Stern wrote:
>
> Changing configurations amounts to much the same as disconnecting,
> because both operations destroy all the existing interfaces.
>
> Disconnect can arise in two different ways.
>
> Physical hot-unplug: All I/O operations will fail.
>
> Rmmod or unbind: I/O operations will succeed.
>
> The second case is probably okay. The first we can do nothing about.
> However, in either case we do need to make sure that memory allocations
> do not require any writebacks. This suggests that we need to call
> memalloc_noio_save() from within usb_unbind_interface().
I agree with the problem, but I fail to see why this issue would be
specific to USB. Shouldn't this be done in the device core layer?
Regards
Oliver
On Wed, 22 May 2019, Oliver Neukum wrote:
> On Di, 2019-05-21 at 10:00 -0400, Alan Stern wrote:
> >
> > Changing configurations amounts to much the same as disconnecting,
> > because both operations destroy all the existing interfaces.
> >
> > Disconnect can arise in two different ways.
> >
> > Physical hot-unplug: All I/O operations will fail.
> >
> > Rmmod or unbind: I/O operations will succeed.
> >
> > The second case is probably okay. The first we can do nothing about.
> > However, in either case we do need to make sure that memory allocations
> > do not require any writebacks. This suggests that we need to call
> > memalloc_noio_save() from within usb_unbind_interface().
>
> I agree with the problem, but I fail to see why this issue would be
> specific to USB. Shouldn't this be done in the device core layer?
Only for drivers that are on the block-device writeback path. The
device core doesn't know which drivers these are.
Alan Stern
On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote:
> On Wed, 22 May 2019, Oliver Neukum wrote:
>
> > I agree with the problem, but I fail to see why this issue would be
> > specific to USB. Shouldn't this be done in the device core layer?
>
> Only for drivers that are on the block-device writeback path. The
> device core doesn't know which drivers these are.
Neither does USB know. It is very hard to predict or even tell which
devices are block device drivers. I think we must assume that
any device may be affected.
Regards
Oliver
On So, 2019-05-19 at 22:56 -0700, Christoph Hellwig wrote:
> Folks, you can't just pass arbitary GFP_ flags to dma allocation
> routines, beause very often they are not just wrappers around
> the page allocator.
>
> So no, you can't just fine grained control the flags, but the
> existing code is just as buggy.
>
> Please switch to use memalloc_noio_save() instead.
Thinking about this again, we have a problem. We introduced
memalloc_noio_save() in 3.10 . Hence the code should have been
correct in v4.14. Which means that either
6518202970c1 "(mm/cma: remove unsupported gfp_mask
parameter from cma_alloc()"
is buggy, or the original issue with a delay of 2 seconds
still exist.
Do we need to do something?
Regards
Oliver
On Wed, 22 May 2019, Oliver Neukum wrote:
> On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote:
> > On Wed, 22 May 2019, Oliver Neukum wrote:
> >
> > > I agree with the problem, but I fail to see why this issue would be
> > > specific to USB. Shouldn't this be done in the device core layer?
> >
> > Only for drivers that are on the block-device writeback path. The
> > device core doesn't know which drivers these are.
>
> Neither does USB know. It is very hard to predict or even tell which
> devices are block device drivers. I think we must assume that
> any device may be affected.
All right. Would you like to submit a patch?
Alan Stern
On Thu, May 23, 2019 at 02:32:09PM +0200, Oliver Neukum wrote:
> > Please switch to use memalloc_noio_save() instead.
>
> Thinking about this again, we have a problem. We introduced
> memalloc_noio_save() in 3.10 . Hence the code should have been
> correct in v4.14. Which means that either
> 6518202970c1 "(mm/cma: remove unsupported gfp_mask
> parameter from cma_alloc()"
> is buggy, or the original issue with a delay of 2 seconds
> still exist.
>
> Do we need to do something?
cma_alloc calls into alloc_contig_range to do the actual allocation,
which then calls current_gfp_context() to pick up the adjustments
from memalloc_noio_save and friends. So at least in current mainline
we should be fine.
Am Donnerstag, den 23.05.2019, 10:01 -0400 schrieb Alan Stern:
> On Wed, 22 May 2019, Oliver Neukum wrote:
>
> > On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote:
> > > On Wed, 22 May 2019, Oliver Neukum wrote:
> > >
> > > > I agree with the problem, but I fail to see why this issue would be
> > > > specific to USB. Shouldn't this be done in the device core layer?
> > >
> > > Only for drivers that are on the block-device writeback path. The
> > > device core doesn't know which drivers these are.
> >
> > Neither does USB know. It is very hard to predict or even tell which
> > devices are block device drivers. I think we must assume that
> > any device may be affected.
>
> All right. Would you like to submit a patch?
Do you like this one?
Regards
Oliver
On Tue, 28 May 2019, Oliver Neukum wrote:
> Am Donnerstag, den 23.05.2019, 10:01 -0400 schrieb Alan Stern:
> > On Wed, 22 May 2019, Oliver Neukum wrote:
> >
> > > On Mi, 2019-05-22 at 10:56 -0400, Alan Stern wrote:
> > > > On Wed, 22 May 2019, Oliver Neukum wrote:
> > > >
> > > > > I agree with the problem, but I fail to see why this issue would be
> > > > > specific to USB. Shouldn't this be done in the device core layer?
> > > >
> > > > Only for drivers that are on the block-device writeback path. The
> > > > device core doesn't know which drivers these are.
> > >
> > > Neither does USB know. It is very hard to predict or even tell which
> > > devices are block device drivers. I think we must assume that
> > > any device may be affected.
> >
> > All right. Would you like to submit a patch?
>
> Do you like this one?
Hmmm. I might be inclined to move the start of the I/O-protected
region a little earlier. For example, the first
blocking_notifier_call_chain() might result in some memory allocations.
The end is okay; once bus_remove_device() has returned the driver will
be completely unbound, so there shouldn't be any pending I/O through
the device.
Alan Stern