There have been issues with get_user_pages and filesystem writeback.
The issues are better described in [1].
The solution being proposed wants to keep track of gup_pinned pages which will allow to take furthur steps to coordinate between subsystems using gup.
put_user_page() simply calls put_page inside for now. But the implementation will change once all call sites of put_page() are converted.
I currently do not have the driver to test. Could I have some suggestions to test this code? The solution is currently implemented in [2] and
it would be great if we could apply the patch on top of [2] and run some tests to check if any regressions occur.
[1] https://lwn.net/Articles/753027/
[2] https://github.com/johnhubbard/linux/tree/gup_dma_core
Cc: Matt Sickler <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: J?r?me Glisse <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Bharath Vedartham <[email protected]>
---
drivers/staging/kpc2000/kpc_dma/fileops.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 6166587..82c70e6 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -198,9 +198,7 @@ int kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
sg_free_table(&acd->sgt);
err_dma_map_sg:
err_alloc_sg_table:
- for (i = 0 ; i < acd->page_count ; i++){
- put_page(acd->user_pages[i]);
- }
+ put_user_pages(acd->user_pages, acd->page_count);
err_get_user_pages:
kfree(acd->user_pages);
err_alloc_userpages:
@@ -229,9 +227,7 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
- for (i = 0 ; i < acd->page_count ; i++){
- put_page(acd->user_pages[i]);
- }
+ put_user_pages(acd->user_pages, acd->page_count);
sg_free_table(&acd->sgt);
--
1.8.3.1
On 7/15/19 12:52 PM, Bharath Vedartham wrote:
> There have been issues with get_user_pages and filesystem writeback.
> The issues are better described in [1].
>
> The solution being proposed wants to keep track of gup_pinned pages which will allow to take furthur steps to coordinate between subsystems using gup.
>
> put_user_page() simply calls put_page inside for now. But the implementation will change once all call sites of put_page() are converted.
>
> I currently do not have the driver to test. Could I have some suggestions to test this code? The solution is currently implemented in [2] and
> it would be great if we could apply the patch on top of [2] and run some tests to check if any regressions occur.
Hi Bharath,
Process point: the above paragraph, and other meta-questions (about the patch, rather than part of the patch) should be placed either after the "---", or in a cover letter (git-send-email --cover-letter). That way, the patch itself is in a commit-able state.
One more below:
>
> [1] https://lwn.net/Articles/753027/
> [2] https://github.com/johnhubbard/linux/tree/gup_dma_core
>
> Cc: Matt Sickler <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
>
> Signed-off-by: Bharath Vedartham <[email protected]>
> ---
> drivers/staging/kpc2000/kpc_dma/fileops.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
> index 6166587..82c70e6 100644
> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> @@ -198,9 +198,7 @@ int kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
> sg_free_table(&acd->sgt);
> err_dma_map_sg:
> err_alloc_sg_table:
> - for (i = 0 ; i < acd->page_count ; i++){
> - put_page(acd->user_pages[i]);
> - }
> + put_user_pages(acd->user_pages, acd->page_count);
> err_get_user_pages:
> kfree(acd->user_pages);
> err_alloc_userpages:
> @@ -229,9 +227,7 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
>
> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
>
> - for (i = 0 ; i < acd->page_count ; i++){
> - put_page(acd->user_pages[i]);
> - }
> + put_user_pages(acd->user_pages, acd->page_count);
>
> sg_free_table(&acd->sgt);
>
>
Because this is a common pattern, and because the code here doesn't likely need to set page dirty before the dma_unmap_sg call, I think the following would be better (it's untested), instead of the above diff hunk:
diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
index 48ca88bc6b0b..d486f9866449 100644
--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
@@ -211,16 +211,13 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
BUG_ON(acd->ldev == NULL);
BUG_ON(acd->ldev->pldev == NULL);
- for (i = 0 ; i < acd->page_count ; i++) {
- if (!PageReserved(acd->user_pages[i])) {
- set_page_dirty(acd->user_pages[i]);
- }
- }
-
dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
for (i = 0 ; i < acd->page_count ; i++) {
- put_page(acd->user_pages[i]);
+ if (!PageReserved(acd->user_pages[i])) {
+ put_user_pages_dirty(&acd->user_pages[i], 1);
+ else
+ put_user_page(acd->user_pages[i]);
}
sg_free_table(&acd->sgt);
Assuming that you make those two changes, you can add:
Reviewed-by: John Hubbard <[email protected]>
thanks,
--
John Hubbard
NVIDIA
On Mon, Jul 15, 2019 at 01:14:13PM -0700, John Hubbard wrote:
> On 7/15/19 12:52 PM, Bharath Vedartham wrote:
> > There have been issues with get_user_pages and filesystem writeback.
> > The issues are better described in [1].
> >
> > The solution being proposed wants to keep track of gup_pinned pages which will allow to take furthur steps to coordinate between subsystems using gup.
> >
> > put_user_page() simply calls put_page inside for now. But the implementation will change once all call sites of put_page() are converted.
> >
> > I currently do not have the driver to test. Could I have some suggestions to test this code? The solution is currently implemented in [2] and
> > it would be great if we could apply the patch on top of [2] and run some tests to check if any regressions occur.
>
> Hi Bharath,
>
> Process point: the above paragraph, and other meta-questions (about the patch, rather than part of the patch) should be placed either after the "---", or in a cover letter (git-send-email --cover-letter). That way, the patch itself is in a commit-able state.
>
> One more below:
Will fix that in the next version.
> >
> > [1] https://lwn.net/Articles/753027/
> > [2] https://github.com/johnhubbard/linux/tree/gup_dma_core
> >
> > Cc: Matt Sickler <[email protected]>
> > Cc: Greg Kroah-Hartman <[email protected]>
> > Cc: J?r?me Glisse <[email protected]>
> > Cc: Ira Weiny <[email protected]>
> > Cc: John Hubbard <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> >
> > Signed-off-by: Bharath Vedartham <[email protected]>
> > ---
> > drivers/staging/kpc2000/kpc_dma/fileops.c | 8 ++------
> > 1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > index 6166587..82c70e6 100644
> > --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> > +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > @@ -198,9 +198,7 @@ int kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
> > sg_free_table(&acd->sgt);
> > err_dma_map_sg:
> > err_alloc_sg_table:
> > - for (i = 0 ; i < acd->page_count ; i++){
> > - put_page(acd->user_pages[i]);
> > - }
> > + put_user_pages(acd->user_pages, acd->page_count);
> > err_get_user_pages:
> > kfree(acd->user_pages);
> > err_alloc_userpages:
> > @@ -229,9 +227,7 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
> >
> > dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
> >
> > - for (i = 0 ; i < acd->page_count ; i++){
> > - put_page(acd->user_pages[i]);
> > - }
> > + put_user_pages(acd->user_pages, acd->page_count);
> >
> > sg_free_table(&acd->sgt);
> >
> >
>
> Because this is a common pattern, and because the code here doesn't likely need to set page dirty before the dma_unmap_sg call, I think the following would be better (it's untested), instead of the above diff hunk:
>
> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
> index 48ca88bc6b0b..d486f9866449 100644
> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> @@ -211,16 +211,13 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
> BUG_ON(acd->ldev == NULL);
> BUG_ON(acd->ldev->pldev == NULL);
>
> - for (i = 0 ; i < acd->page_count ; i++) {
> - if (!PageReserved(acd->user_pages[i])) {
> - set_page_dirty(acd->user_pages[i]);
> - }
> - }
> -
> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
>
> for (i = 0 ; i < acd->page_count ; i++) {
> - put_page(acd->user_pages[i]);
> + if (!PageReserved(acd->user_pages[i])) {
> + put_user_pages_dirty(&acd->user_pages[i], 1);
> + else
> + put_user_page(acd->user_pages[i]);
> }
>
> sg_free_table(&acd->sgt);
I had my doubts on this. This definitley needs to be looked at by the
driver author.
> Assuming that you make those two changes, you can add:
>
> Reviewed-by: John Hubbard <[email protected]>
Great!
>
> thanks,
> --
> John Hubbard
> NVIDIA
It looks like Outlook is going to absolutely trash this email. Hopefully it comes through okay.
>> There have been issues with get_user_pages and filesystem writeback.
>> The issues are better described in [1].
>>
>> The solution being proposed wants to keep track of gup_pinned pages
>which will allow to take furthur steps to coordinate between subsystems
>using gup.
>>
>> put_user_page() simply calls put_page inside for now. But the
>implementation will change once all call sites of put_page() are converted.
>>
>> I currently do not have the driver to test. Could I have some suggestions to
>test this code? The solution is currently implemented in [2] and
>> it would be great if we could apply the patch on top of [2] and run some
>tests to check if any regressions occur.
>
>Because this is a common pattern, and because the code here doesn't likely
>need to set page dirty before the dma_unmap_sg call, I think the following
>would be better (it's untested), instead of the above diff hunk:
>
>diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c
>b/drivers/staging/kpc2000/kpc_dma/fileops.c
>index 48ca88bc6b0b..d486f9866449 100644
>--- a/drivers/staging/kpc2000/kpc_dma/fileops.c
>+++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
>@@ -211,16 +211,13 @@ void transfer_complete_cb(struct aio_cb_data
>*acd, size_t xfr_count, u32 flags)
> BUG_ON(acd->ldev == NULL);
> BUG_ON(acd->ldev->pldev == NULL);
>
>- for (i = 0 ; i < acd->page_count ; i++) {
>- if (!PageReserved(acd->user_pages[i])) {
>- set_page_dirty(acd->user_pages[i]);
>- }
>- }
>-
> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
>
> for (i = 0 ; i < acd->page_count ; i++) {
>- put_page(acd->user_pages[i]);
>+ if (!PageReserved(acd->user_pages[i])) {
>+ put_user_pages_dirty(&acd->user_pages[i], 1);
>+ else
>+ put_user_page(acd->user_pages[i]);
> }
>
> sg_free_table(&acd->sgt);
I don't think I ever really knew the right way to do this.
The changes Bharath suggested look okay to me. I'm not sure about the check for PageReserved(), though. At first glance it appears to be equivalent to what was there before, but maybe I should learn what that Reserved page flag really means.
From [1], the only comment that seems applicable is
* - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
* not marked PG_reserved (as they might be in use by somebody else who does
* not respect the caching strategy).
These pages should be coming from anonymous (RAM, not file backed) memory in userspace. Sometimes it comes from hugepage backed memory, though I don't think that makes a difference. I should note that transfer_complete_cb handles both RAM to device and device to RAM DMAs, if that matters.
[1] https://elixir.bootlin.com/linux/v5.2/source/include/linux/page-flags.h#L17
On 7/15/19 2:47 PM, Matt Sickler wrote:
> It looks like Outlook is going to absolutely trash this email. Hopefully it comes through okay.
>
...
>>
>> Because this is a common pattern, and because the code here doesn't likely
>> need to set page dirty before the dma_unmap_sg call, I think the following
>> would be better (it's untested), instead of the above diff hunk:
>>
>> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c
>> b/drivers/staging/kpc2000/kpc_dma/fileops.c
>> index 48ca88bc6b0b..d486f9866449 100644
>> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
>> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
>> @@ -211,16 +211,13 @@ void transfer_complete_cb(struct aio_cb_data
>> *acd, size_t xfr_count, u32 flags)
>> BUG_ON(acd->ldev == NULL);
>> BUG_ON(acd->ldev->pldev == NULL);
>>
>> - for (i = 0 ; i < acd->page_count ; i++) {
>> - if (!PageReserved(acd->user_pages[i])) {
>> - set_page_dirty(acd->user_pages[i]);
>> - }
>> - }
>> -
>> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
>>
>> for (i = 0 ; i < acd->page_count ; i++) {
>> - put_page(acd->user_pages[i]);
>> + if (!PageReserved(acd->user_pages[i])) {
>> + put_user_pages_dirty(&acd->user_pages[i], 1);
>> + else
>> + put_user_page(acd->user_pages[i]);
>> }
>>
>> sg_free_table(&acd->sgt);
>
> I don't think I ever really knew the right way to do this.
>
> The changes Bharath suggested look okay to me. I'm not sure about the check for PageReserved(), though. At first glance it appears to be equivalent to what was there before, but maybe I should learn what that Reserved page flag really means.
> From [1], the only comment that seems applicable is
> * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
> * not marked PG_reserved (as they might be in use by somebody else who does
> * not respect the caching strategy).
>
> These pages should be coming from anonymous (RAM, not file backed) memory in userspace. Sometimes it comes from hugepage backed memory, though I don't think that makes a difference. I should note that transfer_complete_cb handles both RAM to device and device to RAM DMAs, if that matters.
>
> [1] https://elixir.bootlin.com/linux/v5.2/source/include/linux/page-flags.h#L17
>
I agree: the PageReserved check looks unnecessary here, from my outside-the-kpc_2000-team
perspective, anyway. Assuming that your analysis above is correct, you could collapse that
whole think into just:
@@ -211,17 +209,8 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
BUG_ON(acd->ldev == NULL);
BUG_ON(acd->ldev->pldev == NULL);
- for (i = 0 ; i < acd->page_count ; i++) {
- if (!PageReserved(acd->user_pages[i])) {
- set_page_dirty(acd->user_pages[i]);
- }
- }
-
dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
-
- for (i = 0 ; i < acd->page_count ; i++) {
- put_page(acd->user_pages[i]);
- }
+ put_user_pages_dirty(&acd->user_pages[i], acd->page_count);
sg_free_table(&acd->sgt);
(Also, Matt, I failed to Cc: you on a semi-related cleanup that I just sent out for this
driver, as long as I have your attention:
https://lore.kernel.org/r/[email protected]
)
thanks,
--
John Hubbard
NVIDIA
On 7/15/19 3:01 PM, John Hubbard wrote:
> On 7/15/19 2:47 PM, Matt Sickler wrote:
...
> I agree: the PageReserved check looks unnecessary here, from my outside-the-kpc_2000-team
> perspective, anyway. Assuming that your analysis above is correct, you could collapse that
> whole think into just:
>
> @@ -211,17 +209,8 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
> BUG_ON(acd->ldev == NULL);
> BUG_ON(acd->ldev->pldev == NULL);
>
> - for (i = 0 ; i < acd->page_count ; i++) {
> - if (!PageReserved(acd->user_pages[i])) {
> - set_page_dirty(acd->user_pages[i]);
> - }
> - }
> -
> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
> -
> - for (i = 0 ; i < acd->page_count ; i++) {
> - put_page(acd->user_pages[i]);
> - }
> + put_user_pages_dirty(&acd->user_pages[i], acd->page_count);
Ahem, I typed too quickly. :) Please make that:
put_user_pages_dirty(acd->user_pages, acd->page_count);
thanks,
--
John Hubbard
NVIDIA
On Mon, Jul 15, 2019 at 03:01:43PM -0700, John Hubbard wrote:
> On 7/15/19 2:47 PM, Matt Sickler wrote:
> > It looks like Outlook is going to absolutely trash this email. Hopefully it comes through okay.
> >
> ...
> >>
> >> Because this is a common pattern, and because the code here doesn't likely
> >> need to set page dirty before the dma_unmap_sg call, I think the following
> >> would be better (it's untested), instead of the above diff hunk:
> >>
> >> diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> b/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> index 48ca88bc6b0b..d486f9866449 100644
> >> --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> >> @@ -211,16 +211,13 @@ void transfer_complete_cb(struct aio_cb_data
> >> *acd, size_t xfr_count, u32 flags)
> >> BUG_ON(acd->ldev == NULL);
> >> BUG_ON(acd->ldev->pldev == NULL);
> >>
> >> - for (i = 0 ; i < acd->page_count ; i++) {
> >> - if (!PageReserved(acd->user_pages[i])) {
> >> - set_page_dirty(acd->user_pages[i]);
> >> - }
> >> - }
> >> -
> >> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
> >>
> >> for (i = 0 ; i < acd->page_count ; i++) {
> >> - put_page(acd->user_pages[i]);
> >> + if (!PageReserved(acd->user_pages[i])) {
> >> + put_user_pages_dirty(&acd->user_pages[i], 1);
> >> + else
> >> + put_user_page(acd->user_pages[i]);
> >> }
> >>
> >> sg_free_table(&acd->sgt);
> >
> > I don't think I ever really knew the right way to do this.
> >
> > The changes Bharath suggested look okay to me. I'm not sure about the check for PageReserved(), though. At first glance it appears to be equivalent to what was there before, but maybe I should learn what that Reserved page flag really means.
> > From [1], the only comment that seems applicable is
> > * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are
> > * not marked PG_reserved (as they might be in use by somebody else who does
> > * not respect the caching strategy).
> >
> > These pages should be coming from anonymous (RAM, not file backed) memory in userspace. Sometimes it comes from hugepage backed memory, though I don't think that makes a difference. I should note that transfer_complete_cb handles both RAM to device and device to RAM DMAs, if that matters.
Yes. file_operations->read passes a userspace buffer which AFAIK is
anonymous memory.
> > [1] https://elixir.bootlin.com/linux/v5.2/source/include/linux/page-flags.h#L17
> >
>
> I agree: the PageReserved check looks unnecessary here, from my outside-the-kpc_2000-team
> perspective, anyway. Assuming that your analysis above is correct, you could collapse that
> whole think into just:
Since the file_operations->read passes a userspace buffer, I doubt that
the pages of the userspace buffer will be reserved.
> @@ -211,17 +209,8 @@ void transfer_complete_cb(struct aio_cb_data *acd, size_t xfr_count, u32 flags)
> BUG_ON(acd->ldev == NULL);
> BUG_ON(acd->ldev->pldev == NULL);
>
> - for (i = 0 ; i < acd->page_count ; i++) {
> - if (!PageReserved(acd->user_pages[i])) {
> - set_page_dirty(acd->user_pages[i]);
> - }
> - }
> -
> dma_unmap_sg(&acd->ldev->pldev->dev, acd->sgt.sgl, acd->sgt.nents, acd->ldev->dir);
> -
> - for (i = 0 ; i < acd->page_count ; i++) {
> - put_page(acd->user_pages[i]);
> - }
> + put_user_pages_dirty(&acd->user_pages[i], acd->page_count);
>
> sg_free_table(&acd->sgt);
>
> (Also, Matt, I failed to Cc: you on a semi-related cleanup that I just sent out for this
> driver, as long as I have your attention:
>
> https://lore.kernel.org/r/[email protected]
> )
Matt will you be willing to pick this up for testing or do you want a
different patch?
> thanks,
> --
> John Hubbard
> NVIDIA
Thank you
Bharath