2020-05-19 04:59:33

by John Hubbard

[permalink] [raw]
Subject: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()

This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

Note that this effectively changes the code's behavior as well: it now
ultimately calls set_page_dirty_lock(), instead of SetPageDirty().This
is probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [3]

Also, this deletes one of the two FIXME comments (about refcounting),
because there is nothing wrong with the refcounting at this point.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

[3] https://lore.kernel.org/r/[email protected]

Cc: "Kai Mäkisara" <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: [email protected]
Signed-off-by: John Hubbard <[email protected]>
---
drivers/scsi/st.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index c5f9b348b438..0369c7edfd94 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4922,7 +4922,7 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
unsigned long end = (uaddr + count + PAGE_SIZE - 1) >> PAGE_SHIFT;
unsigned long start = uaddr >> PAGE_SHIFT;
const int nr_pages = end - start;
- int res, i, j;
+ int res, i;
struct page **pages;
struct rq_map_data *mdata = &STbp->map_data;

@@ -4944,7 +4944,7 @@ static int sgl_map_user_pages(struct st_buffer *STbp,

/* Try to fault in all of the necessary pages */
/* rw==READ means read from drive, write into memory area */
- res = get_user_pages_fast(uaddr, nr_pages, rw == READ ? FOLL_WRITE : 0,
+ res = pin_user_pages_fast(uaddr, nr_pages, rw == READ ? FOLL_WRITE : 0,
pages);

/* Errors and no page mapped should return here */
@@ -4964,8 +4964,7 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
return nr_pages;
out_unmap:
if (res > 0) {
- for (j=0; j < res; j++)
- put_page(pages[j]);
+ unpin_user_pages(pages, res);
res = 0;
}
kfree(pages);
@@ -4977,18 +4976,10 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
static int sgl_unmap_user_pages(struct st_buffer *STbp,
const unsigned int nr_pages, int dirtied)
{
- int i;
+ /* FIXME: cache flush missing for rw==READ */

- for (i=0; i < nr_pages; i++) {
- struct page *page = STbp->mapped_pages[i];
+ unpin_user_pages_dirty_lock(STbp->mapped_pages, nr_pages, dirtied);

- if (dirtied)
- SetPageDirty(page);
- /* FIXME: cache flush missing for rw==READ
- * FIXME: call the correct reference counting function
- */
- put_page(page);
- }
kfree(STbp->mapped_pages);
STbp->mapped_pages = NULL;

--
2.26.2


2020-05-19 20:17:14

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()

On 2020-05-18 21:55, John Hubbard wrote:
> This code was using get_user_pages*(), in a "Case 2" scenario
> (DMA/RDMA), using the categorization from [1]. That means that it's
> time to convert the get_user_pages*() + put_page() calls to
> pin_user_pages*() + unpin_user_pages() calls.
>

Looks like I accidentally doubled a word on the subject line:
"convert convert".

I'd appreciate it a maintainer could remove one of those for
me, while applying the patch, assuming that we don't need a
v2 for other reasons.


thanks,
--
John Hubbard
NVIDIA

2020-05-20 02:01:57

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()


John,

> Looks like I accidentally doubled a word on the subject line: "convert
> convert".
>
> I'd appreciate it a maintainer could remove one of those for me, while
> applying the patch, assuming that we don't need a v2 for other
> reasons.

I can fix that up. But I'll give Kai a chance to review before I apply.

--
Martin K. Petersen Oracle Linux Engineering

2020-05-21 19:49:17

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()

On 2020-05-18 21:55, John Hubbard wrote:
> This code was using get_user_pages*(), in a "Case 2" scenario
> (DMA/RDMA), using the categorization from [1]. That means that it's
> time to convert the get_user_pages*() + put_page() calls to
> pin_user_pages*() + unpin_user_pages() calls.
>
> There is some helpful background in [2]: basically, this is a small
> part of fixing a long-standing disconnect between pinning pages, and
> file systems' use of those pages.
>
> Note that this effectively changes the code's behavior as well: it now
> ultimately calls set_page_dirty_lock(), instead of SetPageDirty().This
> is probably more accurate.
>
> As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
> dealing with a file backed page where we have reference on the inode it
> hangs off." [3]
>
> Also, this deletes one of the two FIXME comments (about refcounting),
> because there is nothing wrong with the refcounting at this point.
>
> [1] Documentation/core-api/pin_user_pages.rst
>
> [2] "Explicit pinning of user-space pages":
> https://lwn.net/Articles/807108/
>
> [3] https://lore.kernel.org/r/[email protected]

Kai, why is the st driver calling get_user_pages_fast() directly instead
of calling blk_rq_map_user()? blk_rq_map_user() is already used in
st_scsi_execute(). I think that the blk_rq_map_user() implementation is
also based on get_user_pages_fast(). See also iov_iter_get_pages_alloc()
in lib/iov_iter.c.

John, why are the get_user_pages_fast() calls in the st driver modified
but not the blk_rq_map_user() call? Are you sure that the modified code
is a "case 2" scenario and not a "case 1" scenario?

Thanks,

Bart.

2020-05-21 19:59:04

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()

On 2020-05-21 12:47, Bart Van Assche wrote:
> On 2020-05-18 21:55, John Hubbard wrote:
>> This code was using get_user_pages*(), in a "Case 2" scenario
>> (DMA/RDMA), using the categorization from [1]. That means that it's
>> time to convert the get_user_pages*() + put_page() calls to
>> pin_user_pages*() + unpin_user_pages() calls.
>>
>> There is some helpful background in [2]: basically, this is a small
>> part of fixing a long-standing disconnect between pinning pages, and
>> file systems' use of those pages.
>>
>> Note that this effectively changes the code's behavior as well: it now
>> ultimately calls set_page_dirty_lock(), instead of SetPageDirty().This
>> is probably more accurate.
>>
>> As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
>> dealing with a file backed page where we have reference on the inode it
>> hangs off." [3]
>>
>> Also, this deletes one of the two FIXME comments (about refcounting),
>> because there is nothing wrong with the refcounting at this point.
>>
>> [1] Documentation/core-api/pin_user_pages.rst
>>
>> [2] "Explicit pinning of user-space pages":
>> https://lwn.net/Articles/807108/
>>
>> [3] https://lore.kernel.org/r/[email protected]
>
> Kai, why is the st driver calling get_user_pages_fast() directly instead
> of calling blk_rq_map_user()? blk_rq_map_user() is already used in
> st_scsi_execute(). I think that the blk_rq_map_user() implementation is
> also based on get_user_pages_fast(). See also iov_iter_get_pages_alloc()
> in lib/iov_iter.c.
>
> John, why are the get_user_pages_fast() calls in the st driver modified
> but not the blk_rq_map_user() call? Are you sure that the modified code
> is a "case 2" scenario and not a "case 1" scenario?
>

No, I am not sure. I thought this was a DMA case (I'm not a SCSI Tape user,
so it *seemed* reasonable that a DMA engine was involved), but if it's really
direct IO, then we need to just drop this patch entirely. Because: I need to
convert the block/biovec code, including iov_iter_get_pages_alloc() and
friends, in order to handle direct IO. I'm working on that but it's not
ready yet.

(I was trying to get the smaller, non-direct-IO cases converted first.)

Thanks for spotting the discrepancy, and apologies for the confusion on this
end.

Also, I doubt if it's worth it, but do you want a patch to change SetPageDirty()
to set_page_dirty_lock(), meanwhile? It seems like if that's never come up, then
it's mostly a theoretical bug.

thanks,
--
John Hubbard
NVIDIA

2020-05-21 21:00:28

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()

On 2020-05-21 12:57, John Hubbard wrote:
> Also, I doubt if it's worth it, but do you want a patch to change
> SetPageDirty()
> to set_page_dirty_lock(), meanwhile? It seems like if that's never come
> up, then
> it's mostly a theoretical bug.

Hi John,

Since I do not use the st driver myself I will leave it to Kai to answer
that question.

Thanks,

Bart.

2020-05-22 08:51:02

by Kai Mäkisara (Kolumbus)

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()



> On 21. May 2020, at 22.47, Bart Van Assche <[email protected]> wrote:
>
> On 2020-05-18 21:55, John Hubbard wrote:
>> This code was using get_user_pages*(), in a "Case 2" scenario
>> (DMA/RDMA), using the categorization from [1]. That means that it's
>> time to convert the get_user_pages*() + put_page() calls to
>> pin_user_pages*() + unpin_user_pages() calls.
>>
>> There is some helpful background in [2]: basically, this is a small
>> part of fixing a long-standing disconnect between pinning pages, and
>> file systems' use of those pages.
>>
>> Note that this effectively changes the code's behavior as well: it now
>> ultimately calls set_page_dirty_lock(), instead of SetPageDirty().This
>> is probably more accurate.
>>
>> As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
>> dealing with a file backed page where we have reference on the inode it
>> hangs off." [3]
>>
>> Also, this deletes one of the two FIXME comments (about refcounting),
>> because there is nothing wrong with the refcounting at this point.
>>
>> [1] Documentation/core-api/pin_user_pages.rst
>>
>> [2] "Explicit pinning of user-space pages":
>> https://lwn.net/Articles/807108/
>>
>> [3] https://lore.kernel.org/r/[email protected]
>
> Kai, why is the st driver calling get_user_pages_fast() directly instead
> of calling blk_rq_map_user()? blk_rq_map_user() is already used in
> st_scsi_execute(). I think that the blk_rq_map_user() implementation is
> also based on get_user_pages_fast(). See also iov_iter_get_pages_alloc()
> in lib/iov_iter.c.
>
The reason is that the blk_ functions were not available when that part
of the code was done. Nobody has converted that to use the more
modern functions because the old method still works.

Thanks,
Kai

2020-05-26 18:19:53

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH] scsi: st: convert convert get_user_pages() --> pin_user_pages()

On 2020-05-22 01:32, "Kai Mäkisara (Kolumbus)" wrote:
>> On 21. May 2020, at 22.47, Bart Van Assche <[email protected]> wrote:
>>
>> On 2020-05-18 21:55, John Hubbard wrote:
>>> This code was using get_user_pages*(), in a "Case 2" scenario
>>> (DMA/RDMA), using the categorization from [1]. That means that it's
>>> time to convert the get_user_pages*() + put_page() calls to
>>> pin_user_pages*() + unpin_user_pages() calls.
>>>
>>> There is some helpful background in [2]: basically, this is a small
>>> part of fixing a long-standing disconnect between pinning pages, and
>>> file systems' use of those pages.
>>>
>>> Note that this effectively changes the code's behavior as well: it now
>>> ultimately calls set_page_dirty_lock(), instead of SetPageDirty().This
>>> is probably more accurate.
>>>
>>> As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
>>> dealing with a file backed page where we have reference on the inode it
>>> hangs off." [3]
>>>
>>> Also, this deletes one of the two FIXME comments (about refcounting),
>>> because there is nothing wrong with the refcounting at this point.
>>>
>>> [1] Documentation/core-api/pin_user_pages.rst
>>>
>>> [2] "Explicit pinning of user-space pages":
>>> https://lwn.net/Articles/807108/
>>>
>>> [3] https://lore.kernel.org/r/[email protected]
>>
>> Kai, why is the st driver calling get_user_pages_fast() directly instead
>> of calling blk_rq_map_user()? blk_rq_map_user() is already used in
>> st_scsi_execute(). I think that the blk_rq_map_user() implementation is
>> also based on get_user_pages_fast(). See also iov_iter_get_pages_alloc()
>> in lib/iov_iter.c.
>>
> The reason is that the blk_ functions were not available when that part
> of the code was done. Nobody has converted that to use the more
> modern functions because the old method still works.
>


Hi Kai, Bart,

Say, thinking about this some more, it seems like: as only sgl_unmap_user_pages()
is allowed to release the pages that were pinned in sgl_map_user_pages(), then
this is a clear case of Direct IO (not DMA/RDMA, as I first thought) that needs
to be handled via pin_user_pages_fast().

I'll send a v2 with the commit log updated to refer to Direct IO, because this
does still apply, after all.

thanks,
--
John Hubbard
NVIDIA