kmap() has been deprecated in favor of the kmap_local_page() due to high
cost, restricted mapping space, the overhead of a global lock for
synchronization, and making the process sleep in the absence of free
slots.
kmap_local_{page, folio}() is faster than kmap() and offers thread-local
and CPU-local mappings, can take pagefaults in a local kmap region and
preserves preemption by saving the mappings of outgoing tasks and
restoring those of the incoming one during a context switch.
The difference between kmap_local_page() and kmap_local_folio() consist
only in the first taking a pointer to a page and the second taking two
arguments, a pointer to a folio and the byte offset within the folio which
identifies the page.
The mappings are kept thread local in the functions 'vboxsf_read_folio',
'vboxsf_writepage', 'vboxsf_write_end' in file.c
Suggested-by: Ira Weiny <[email protected]>
Signed-off-by: Sumitra Sharma <[email protected]>
---
fs/vboxsf/file.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index 572aa1c43b37..5190619bc3c5 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -234,7 +234,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
u8 *buf;
int err;
- buf = kmap(page);
+ buf = kmap_local_folio(folio, off);
err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
if (err == 0) {
@@ -245,7 +245,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
SetPageError(page);
}
- kunmap(page);
+ kunmap_local(buf);
unlock_page(page);
return err;
}
@@ -286,10 +286,10 @@ static int vboxsf_writepage(struct page *page, struct writeback_control *wbc)
if (!sf_handle)
return -EBADF;
- buf = kmap(page);
+ buf = kmap_local_page(page);
err = vboxsf_write(sf_handle->root, sf_handle->handle,
off, &nwrite, buf);
- kunmap(page);
+ kunmap_local(buf);
kref_put(&sf_handle->refcount, vboxsf_handle_release);
@@ -320,10 +320,10 @@ static int vboxsf_write_end(struct file *file, struct address_space *mapping,
if (!PageUptodate(page) && copied < len)
zero_user(page, from + copied, len - copied);
- buf = kmap(page);
+ buf = kmap_local_page(page);
err = vboxsf_write(sf_handle->root, sf_handle->handle,
pos, &nwritten, buf + from);
- kunmap(page);
+ kunmap_local(buf);
if (err) {
nwritten = 0;
--
2.25.1
Hi,
On 6/27/23 15:51, Sumitra Sharma wrote:
> kmap() has been deprecated in favor of the kmap_local_page() due to high
> cost, restricted mapping space, the overhead of a global lock for
> synchronization, and making the process sleep in the absence of free
> slots.
>
> kmap_local_{page, folio}() is faster than kmap() and offers thread-local
> and CPU-local mappings, can take pagefaults in a local kmap region and
> preserves preemption by saving the mappings of outgoing tasks and
> restoring those of the incoming one during a context switch.
>
> The difference between kmap_local_page() and kmap_local_folio() consist
> only in the first taking a pointer to a page and the second taking two
> arguments, a pointer to a folio and the byte offset within the folio which
> identifies the page.
>
> The mappings are kept thread local in the functions 'vboxsf_read_folio',
> 'vboxsf_writepage', 'vboxsf_write_end' in file.c
>
> Suggested-by: Ira Weiny <[email protected]>
> Signed-off-by: Sumitra Sharma <[email protected]>
Thanks, patch looks good to me:
Reviewed-by: Hans de Goede <[email protected]>
Regards,
Hans
> ---
> fs/vboxsf/file.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
> index 572aa1c43b37..5190619bc3c5 100644
> --- a/fs/vboxsf/file.c
> +++ b/fs/vboxsf/file.c
> @@ -234,7 +234,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
> u8 *buf;
> int err;
>
> - buf = kmap(page);
> + buf = kmap_local_folio(folio, off);
>
> err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
> if (err == 0) {
> @@ -245,7 +245,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
> SetPageError(page);
> }
>
> - kunmap(page);
> + kunmap_local(buf);
> unlock_page(page);
> return err;
> }
> @@ -286,10 +286,10 @@ static int vboxsf_writepage(struct page *page, struct writeback_control *wbc)
> if (!sf_handle)
> return -EBADF;
>
> - buf = kmap(page);
> + buf = kmap_local_page(page);
> err = vboxsf_write(sf_handle->root, sf_handle->handle,
> off, &nwrite, buf);
> - kunmap(page);
> + kunmap_local(buf);
>
> kref_put(&sf_handle->refcount, vboxsf_handle_release);
>
> @@ -320,10 +320,10 @@ static int vboxsf_write_end(struct file *file, struct address_space *mapping,
> if (!PageUptodate(page) && copied < len)
> zero_user(page, from + copied, len - copied);
>
> - buf = kmap(page);
> + buf = kmap_local_page(page);
> err = vboxsf_write(sf_handle->root, sf_handle->handle,
> pos, &nwritten, buf + from);
> - kunmap(page);
> + kunmap_local(buf);
>
> if (err) {
> nwritten = 0;
On Tue, Jun 27, 2023 at 04:34:51PM +0200, Hans de Goede wrote:
> Hi,
>
> On 6/27/23 15:51, Sumitra Sharma wrote:
> > kmap() has been deprecated in favor of the kmap_local_page() due to high
> > cost, restricted mapping space, the overhead of a global lock for
> > synchronization, and making the process sleep in the absence of free
> > slots.
> >
> > kmap_local_{page, folio}() is faster than kmap() and offers thread-local
> > and CPU-local mappings, can take pagefaults in a local kmap region and
> > preserves preemption by saving the mappings of outgoing tasks and
> > restoring those of the incoming one during a context switch.
> >
> > The difference between kmap_local_page() and kmap_local_folio() consist
> > only in the first taking a pointer to a page and the second taking two
> > arguments, a pointer to a folio and the byte offset within the folio which
> > identifies the page.
> >
> > The mappings are kept thread local in the functions 'vboxsf_read_folio',
> > 'vboxsf_writepage', 'vboxsf_write_end' in file.c
> >
> > Suggested-by: Ira Weiny <[email protected]>
> > Signed-off-by: Sumitra Sharma <[email protected]>
>
> Thanks, patch looks good to me:
It doesn't look great to me, tbh. It's generally an antipattern to map
the page/folio up at the top and then pass the virtual address down to
the bottom. Usually we want to work in terms of physical addresses
as long as possible. I see the vmmdev_hgcm_function_parameter can
take physical addresses; does it work to simply use the phys_addr
instead of the linear_addr? I see this commentary:
/** Deprecated Doesn't work, use PAGELIST. */
VMMDEV_HGCM_PARM_TYPE_PHYSADDR = 3,
so, um, can we use
/** Physical addresses of locked pages for a buffer. */
VMMDEV_HGCM_PARM_TYPE_PAGELIST = 10,
and convert vboxsf_read_folio() to pass the folio down to vboxsf_read()
which converts it to a PAGELIST (however one does that)?
On Tue, Jun 27, 2023 at 06:51:15AM -0700, Sumitra Sharma wrote:
> +++ b/fs/vboxsf/file.c
> @@ -234,7 +234,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
> u8 *buf;
> int err;
>
> - buf = kmap(page);
> + buf = kmap_local_folio(folio, off);
Did you test this? 'off' is the offset in the _file_. Whereas
kmap_local_folio() takes the offset within the _folio_. They have
different types (loff_t vs size_t) to warn you that they're different
things.
Hi Matthew,
On 6/27/23 19:48, Matthew Wilcox wrote:
> On Tue, Jun 27, 2023 at 06:51:15AM -0700, Sumitra Sharma wrote:
>> +++ b/fs/vboxsf/file.c
>> @@ -234,7 +234,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
>> u8 *buf;
>> int err;
>>
>> - buf = kmap(page);
>> + buf = kmap_local_folio(folio, off);
>
> Did you test this? 'off' is the offset in the _file_. Whereas
> kmap_local_folio() takes the offset within the _folio_. They have
> different types (loff_t vs size_t) to warn you that they're different
> things.
Ah yes you are completely right, off is the offset in the file
and buf should point to the *start* of a mapping of the page.
Regards,
Hans
Hi Matthew,
On 6/27/23 19:46, Matthew Wilcox wrote:
> On Tue, Jun 27, 2023 at 04:34:51PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 6/27/23 15:51, Sumitra Sharma wrote:
>>> kmap() has been deprecated in favor of the kmap_local_page() due to high
>>> cost, restricted mapping space, the overhead of a global lock for
>>> synchronization, and making the process sleep in the absence of free
>>> slots.
>>>
>>> kmap_local_{page, folio}() is faster than kmap() and offers thread-local
>>> and CPU-local mappings, can take pagefaults in a local kmap region and
>>> preserves preemption by saving the mappings of outgoing tasks and
>>> restoring those of the incoming one during a context switch.
>>>
>>> The difference between kmap_local_page() and kmap_local_folio() consist
>>> only in the first taking a pointer to a page and the second taking two
>>> arguments, a pointer to a folio and the byte offset within the folio which
>>> identifies the page.
>>>
>>> The mappings are kept thread local in the functions 'vboxsf_read_folio',
>>> 'vboxsf_writepage', 'vboxsf_write_end' in file.c
>>>
>>> Suggested-by: Ira Weiny <[email protected]>
>>> Signed-off-by: Sumitra Sharma <[email protected]>
>>
>> Thanks, patch looks good to me:
>
> It doesn't look great to me, tbh. It's generally an antipattern to map
> the page/folio up at the top and then pass the virtual address down to
> the bottom. Usually we want to work in terms of physical addresses
> as long as possible. I see the vmmdev_hgcm_function_parameter can
> take physical addresses; does it work to simply use the phys_addr
> instead of the linear_addr? I see this commentary:
>
> /** Deprecated Doesn't work, use PAGELIST. */
> VMMDEV_HGCM_PARM_TYPE_PHYSADDR = 3,
>
> so, um, can we use
> /** Physical addresses of locked pages for a buffer. */
> VMMDEV_HGCM_PARM_TYPE_PAGELIST = 10,
>
> and convert vboxsf_read_folio() to pass the folio down to vboxsf_read()
> which converts it to a PAGELIST (however one does that)?
It has been a long time since I looked at this code in detail. I don't
think you can just use different types when making virtualbox hypervisor
calls and then expect the hypervisor to say sure that another way to
represent a memory buffer, I'll take that instead.
After I upstreamed vboxsf support VirtualBox upstream did do some
further optimizations to speed up vboxsf. So there may be something
there which allows passing a physical address to the hypervisor,
but I don't have the time to dive into this.
When I upstreamed this the idea was that VirtualBox upstream
would see the benefits of having the guest drivers upstream and would
help with upstream maintenance. But unfortunately this never materialized
and they are still doing their own out of tree thing even for
their guest drivers.
TL;DR: for now I believe that it is best to just keep the code as
is and do a straight forward folio conversion.
Regards,
Hans
Hi,
On 6/27/23 20:10, Hans de Goede wrote:
> Hi Matthew,
>
> On 6/27/23 19:46, Matthew Wilcox wrote:
>> On Tue, Jun 27, 2023 at 04:34:51PM +0200, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 6/27/23 15:51, Sumitra Sharma wrote:
>>>> kmap() has been deprecated in favor of the kmap_local_page() due to high
>>>> cost, restricted mapping space, the overhead of a global lock for
>>>> synchronization, and making the process sleep in the absence of free
>>>> slots.
>>>>
>>>> kmap_local_{page, folio}() is faster than kmap() and offers thread-local
>>>> and CPU-local mappings, can take pagefaults in a local kmap region and
>>>> preserves preemption by saving the mappings of outgoing tasks and
>>>> restoring those of the incoming one during a context switch.
>>>>
>>>> The difference between kmap_local_page() and kmap_local_folio() consist
>>>> only in the first taking a pointer to a page and the second taking two
>>>> arguments, a pointer to a folio and the byte offset within the folio which
>>>> identifies the page.
>>>>
>>>> The mappings are kept thread local in the functions 'vboxsf_read_folio',
>>>> 'vboxsf_writepage', 'vboxsf_write_end' in file.c
>>>>
>>>> Suggested-by: Ira Weiny <[email protected]>
>>>> Signed-off-by: Sumitra Sharma <[email protected]>
>>>
>>> Thanks, patch looks good to me:
>>
>> It doesn't look great to me, tbh. It's generally an antipattern to map
>> the page/folio up at the top and then pass the virtual address down to
>> the bottom. Usually we want to work in terms of physical addresses
>> as long as possible. I see the vmmdev_hgcm_function_parameter can
>> take physical addresses; does it work to simply use the phys_addr
>> instead of the linear_addr? I see this commentary:
>>
>> /** Deprecated Doesn't work, use PAGELIST. */
>> VMMDEV_HGCM_PARM_TYPE_PHYSADDR = 3,
>>
>> so, um, can we use
>> /** Physical addresses of locked pages for a buffer. */
>> VMMDEV_HGCM_PARM_TYPE_PAGELIST = 10,
>>
>> and convert vboxsf_read_folio() to pass the folio down to vboxsf_read()
>> which converts it to a PAGELIST (however one does that)?
>
>
> It has been a long time since I looked at this code in detail. I don't
> think you can just use different types when making virtualbox hypervisor
> calls and then expect the hypervisor to say sure that another way to
> represent a memory buffer, I'll take that instead.
Ok correction to this, drivers/virt/vboxguest/vboxguest_utils.c actually
already translates the VMMDEV_HGCM_PARM_TYPE_LINADDR_KERNEL_IN /
VMMDEV_HGCM_PARM_TYPE_LINADDR_KERNEL_OUT buffers used by vboxsf_write()/
vboxsf_read() to PAGELIST-s before passing them to the hypervisor
using page_to_phys(virt_to_page()) so we map a page and then call
page_to_phys(virt_to_page()) on it which indeed is quite inefficient.
That still leaves the problem that I have very little time to look into
this though ...
Regards,
Hans
Here's a more comprehensive read_folio patch. It's not at all
efficient, but then if we wanted an efficient vboxsf, we'd implement
vboxsf_readahead() and actually do an async call with deferred setting
of the uptodate flag. I can consult with anyone who wants to do all
this work.
I haven't even compiled this, just trying to show the direction this
should take.
diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index 2307f8037efc..f1af9a7bd3d8 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -227,26 +227,31 @@ const struct inode_operations vboxsf_reg_iops = {
static int vboxsf_read_folio(struct file *file, struct folio *folio)
{
- struct page *page = &folio->page;
struct vboxsf_handle *sf_handle = file->private_data;
- loff_t off = page_offset(page);
- u32 nread = PAGE_SIZE;
- u8 *buf;
+ loff_t pos = folio_pos(folio);
+ size_t offset = 0;
int err;
- buf = kmap(page);
+ do {
+ u8 *buf = kmap_local_folio(folio, offset);
+ u32 nread = PAGE_SIZE;
- err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
- if (err == 0) {
- memset(&buf[nread], 0, PAGE_SIZE - nread);
- flush_dcache_page(page);
- SetPageUptodate(page);
- } else {
- SetPageError(page);
- }
+ err = vboxsf_read(sf_handle->root, sf_handle->handle, pos,
+ &nread, buf);
+ if (nread < PAGE_SIZE)
+ memset(&buf[nread], 0, PAGE_SIZE - nread);
+ kunmap_local(buf);
+ if (err)
+ break;
+ offset += PAGE_SIZE;
+ pos += PAGE_SIZE;
+ } while (offset < folio_size(folio);
- kunmap(page);
- unlock_page(page);
+ if (!err) {
+ flush_dcache_folio(folio);
+ folio_mark_uptodate(folio);
+ }
+ folio_unlock(folio);
return err;
}
On gioved? 29 giugno 2023 00:23:54 CEST Fabio M. De Francesco wrote:
[...]
>
> Shouldn't we call folio_lock() to lock the folio to be able to unlock with
> folio_unlock()?
Sorry, I wanted to write "If so, I can't find either a folio_lock() or a
page_lock() in this filesystem.".
Fabio
> If so, I can't find any neither a folio_lock() or a page_lock() in this
> filesystem.
>
> Again sorry for not understanding, can you please explain it?
>
> > return err;
> >
> > }
On mercoled? 28 giugno 2023 19:15:04 CEST Matthew Wilcox wrote:
> Here's a more comprehensive read_folio patch. It's not at all
> efficient, but then if we wanted an efficient vboxsf, we'd implement
> vboxsf_readahead() and actually do an async call with deferred setting
> of the uptodate flag.
> I can consult with anyone who wants to do all
> this work.
Interesting...
> I haven't even compiled this, just trying to show the direction this
> should take.
>
> diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
> index 2307f8037efc..f1af9a7bd3d8 100644
> --- a/fs/vboxsf/file.c
> +++ b/fs/vboxsf/file.c
> @@ -227,26 +227,31 @@ const struct inode_operations vboxsf_reg_iops = {
>
> static int vboxsf_read_folio(struct file *file, struct folio *folio)
> {
> - struct page *page = &folio->page;
> struct vboxsf_handle *sf_handle = file->private_data;
> - loff_t off = page_offset(page);
> - u32 nread = PAGE_SIZE;
> - u8 *buf;
> + loff_t pos = folio_pos(folio);
> + size_t offset = 0;
> int err;
>
> - buf = kmap(page);
> + do {
Please let me understand why you are calling vboxsf_read() in a loop, a
PAGE_SIZE at a time.
I have had only few minutes (whereas I'd need more time) to look at this code.
If I understand the current code it reads a single page at offset zero of a
folio and then memset() with zeros from &buf[nread] up to the end of the page.
Then it seems that this function currently assume that the folio doesn't need
to be read until "offset < folio_size(folio)" becomes false.
Does it imply that the folio is always one page sized? Doesn't it? I'm surely
missing some basics...
> + u8 *buf = kmap_local_folio(folio, offset);
> + u32 nread = PAGE_SIZE;
>
> - err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread,
buf);
> - if (err == 0) {
> - memset(&buf[nread], 0, PAGE_SIZE - nread);
> - flush_dcache_page(page);
> - SetPageUptodate(page);
> - } else {
> - SetPageError(page);
> - }
> + err = vboxsf_read(sf_handle->root, sf_handle->handle, pos,
> + &nread, buf);
> + if (nread < PAGE_SIZE)
> + memset(&buf[nread], 0, PAGE_SIZE - nread);
> + kunmap_local(buf);
> + if (err)
> + break;
> + offset += PAGE_SIZE;
> + pos += PAGE_SIZE;
> + } while (offset < folio_size(folio);
>
> - kunmap(page);
> - unlock_page(page);
> + if (!err) {
> + flush_dcache_folio(folio);
> + folio_mark_uptodate(folio);
> + }
> + folio_unlock(folio);
Shouldn't we call folio_lock() to lock the folio to be able to unlock with
folio_unlock()?
If so, I can't find any neither a folio_lock() or a page_lock() in this
filesystem.
Again sorry for not understanding, can you please explain it?
> return err;
> }
Thanks,
Fabio
Fabio M. De Francesco wrote:
> On gioved? 29 giugno 2023 00:23:54 CEST Fabio M. De Francesco wrote:
>
> [...]
>
> >
> > Shouldn't we call folio_lock() to lock the folio to be able to unlock with
> > folio_unlock()?
>
> Sorry, I wanted to write "If so, I can't find either a folio_lock() or a
> page_lock() in this filesystem.".
Check the documentation for read_folio:
... "The filesystem should unlock the folio once the read has
completed, whether it was successful or not." ...
Ira
Matthew Wilcox wrote:
> Here's a more comprehensive read_folio patch. It's not at all
> efficient, but then if we wanted an efficient vboxsf, we'd implement
> vboxsf_readahead() and actually do an async call with deferred setting
> of the uptodate flag. I can consult with anyone who wants to do all
> this work.
>
> I haven't even compiled this, just trying to show the direction this
> should take.
I'm a bit confused. Is this progressing toward having the folio passed
down to vboxsf_read() or just an expanded vboxsf_read_folio which can
handle larger folios as well as fix the kmap?
Ira
>
> diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
> index 2307f8037efc..f1af9a7bd3d8 100644
> --- a/fs/vboxsf/file.c
> +++ b/fs/vboxsf/file.c
> @@ -227,26 +227,31 @@ const struct inode_operations vboxsf_reg_iops = {
>
> static int vboxsf_read_folio(struct file *file, struct folio *folio)
> {
> - struct page *page = &folio->page;
> struct vboxsf_handle *sf_handle = file->private_data;
> - loff_t off = page_offset(page);
> - u32 nread = PAGE_SIZE;
> - u8 *buf;
> + loff_t pos = folio_pos(folio);
> + size_t offset = 0;
> int err;
>
> - buf = kmap(page);
> + do {
> + u8 *buf = kmap_local_folio(folio, offset);
> + u32 nread = PAGE_SIZE;
>
> - err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
> - if (err == 0) {
> - memset(&buf[nread], 0, PAGE_SIZE - nread);
> - flush_dcache_page(page);
> - SetPageUptodate(page);
> - } else {
> - SetPageError(page);
> - }
> + err = vboxsf_read(sf_handle->root, sf_handle->handle, pos,
> + &nread, buf);
> + if (nread < PAGE_SIZE)
> + memset(&buf[nread], 0, PAGE_SIZE - nread);
> + kunmap_local(buf);
> + if (err)
> + break;
> + offset += PAGE_SIZE;
> + pos += PAGE_SIZE;
> + } while (offset < folio_size(folio);
>
> - kunmap(page);
> - unlock_page(page);
> + if (!err) {
> + flush_dcache_folio(folio);
> + folio_mark_uptodate(folio);
> + }
> + folio_unlock(folio);
> return err;
> }
>
On Thu, Jun 29, 2023 at 12:23:54AM +0200, Fabio M. De Francesco wrote:
> > - buf = kmap(page);
> > + do {
>
> Please let me understand why you are calling vboxsf_read() in a loop, a
> PAGE_SIZE at a time.
Because kmap_local_folio() can only (guarantee to) map one page at a
time. Also vboxsf_read() is only tested with a single page at a time.
> If I understand the current code it reads a single page at offset zero of a
> folio and then memset() with zeros from &buf[nread] up to the end of the page.
> Then it seems that this function currently assume that the folio doesn't need
> to be read until "offset < folio_size(folio)" becomes false.
>
> Does it imply that the folio is always one page sized? Doesn't it? I'm surely
> missing some basics...
vboxsf does not yet claim to support large folios, so every folio that
it sees will be only a single page in size. Hopefully at some point
that will change. Again, somebody would need to test that. In the
meantime, if someone is going to the trouble of switching over to using
the folio API, let's actually include support for large folios.
> > - kunmap(page);
> > - unlock_page(page);
> > + if (!err) {
> > + flush_dcache_folio(folio);
> > + folio_mark_uptodate(folio);
> > + }
> > + folio_unlock(folio);
>
> Shouldn't we call folio_lock() to lock the folio to be able to unlock with
> folio_unlock()?
>
> If so, I can't find any neither a folio_lock() or a page_lock() in this
> filesystem.
>
> Again sorry for not understanding, can you please explain it?
Ira gave the minimal explanation, but a slightly fuller explanation is
that the folio is locked while it is being fetched from backing store.
That prevents both a second thread from reading from it while another
thread is bringing it uptodate, and two threads trying to bring it
uptodate at the same time.
Most filesystems have an asynchronous read_folio, so you don't see the
folio_unlock() in the read_folio() function; instead it's in the I/O
completion path. vboxsf is synchronous.
On Wed, Jun 28, 2023 at 07:23:54PM -0700, Ira Weiny wrote:
> Matthew Wilcox wrote:
> > Here's a more comprehensive read_folio patch. It's not at all
> > efficient, but then if we wanted an efficient vboxsf, we'd implement
> > vboxsf_readahead() and actually do an async call with deferred setting
> > of the uptodate flag. I can consult with anyone who wants to do all
> > this work.
> >
> > I haven't even compiled this, just trying to show the direction this
> > should take.
>
> I'm a bit confused. Is this progressing toward having the folio passed
> down to vboxsf_read() or just an expanded vboxsf_read_folio which can
> handle larger folios as well as fix the kmap?
This just handles large folios & switches to kmap_local_folio(). Doing
this properly is a _lot_ of work, and needs someone who can commit to
actually testing it.
On Tue, Jun 27, 2023 at 06:48:20PM +0100, Matthew Wilcox wrote:
> On Tue, Jun 27, 2023 at 06:51:15AM -0700, Sumitra Sharma wrote:
> > +++ b/fs/vboxsf/file.c
> > @@ -234,7 +234,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
> > u8 *buf;
> > int err;
> >
> > - buf = kmap(page);
> > + buf = kmap_local_folio(folio, off);
>
> Did you test this? 'off' is the offset in the _file_. Whereas
> kmap_local_folio() takes the offset within the _folio_. They have
> different types (loff_t vs size_t) to warn you that they're different
> things.
>
Hi Matthew,
When creating this patch, I read and searched about the loff_t vs size_t.
By mistake, I implemented it in the wrong way.
Also, I did not test it and just compiled it. I apologise for doing so.
And for the other points you have put as feedback. I will take some time to understand
it. And would like to work on the changes you suggest.
Thanks & regards
Sumitra
Hi Sumitra,
On 6/29/23 06:30, Sumitra Sharma wrote:
> On Tue, Jun 27, 2023 at 06:48:20PM +0100, Matthew Wilcox wrote:
>> On Tue, Jun 27, 2023 at 06:51:15AM -0700, Sumitra Sharma wrote:
>>> +++ b/fs/vboxsf/file.c
>>> @@ -234,7 +234,7 @@ static int vboxsf_read_folio(struct file *file, struct folio *folio)
>>> u8 *buf;
>>> int err;
>>>
>>> - buf = kmap(page);
>>> + buf = kmap_local_folio(folio, off);
>>
>> Did you test this? 'off' is the offset in the _file_. Whereas
>> kmap_local_folio() takes the offset within the _folio_. They have
>> different types (loff_t vs size_t) to warn you that they're different
>> things.
>>
>
> Hi Matthew,
>
> When creating this patch, I read and searched about the loff_t vs size_t.
> By mistake, I implemented it in the wrong way.
>
> Also, I did not test it and just compiled it. I apologise for doing so.
>
> And for the other points you have put as feedback. I will take some time to understand
> it. And would like to work on the changes you suggest.
If you work further on this please make sure that you actually test your
changes. Submitting untested fs changes is a really bad idea. People don't
like it when their data gets corrupted.
Note vboxsf can be tested easily by setting up a VirtualBox guest and then
using the shared folder features. Note do *not* use the VirtualBox provided
guest utils they contain their own out of tree vboxsf implementation and
will use that.
To avoid this you could e.g. use Fedora inside the guest and install
the Fedora packaged vbox guest utils which does not replace the mainline
vboxsf.
Regards,
Hans
On Wed, Jun 28, 2023 at 06:15:04PM +0100, Matthew Wilcox wrote:
> Here's a more comprehensive read_folio patch. It's not at all
> efficient, but then if we wanted an efficient vboxsf, we'd implement
> vboxsf_readahead() and actually do an async call with deferred setting
> of the uptodate flag. I can consult with anyone who wants to do all
> this work.
>
> I haven't even compiled this, just trying to show the direction this
> should take.
>
> diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
> index 2307f8037efc..f1af9a7bd3d8 100644
> --- a/fs/vboxsf/file.c
> +++ b/fs/vboxsf/file.c
> @@ -227,26 +227,31 @@ const struct inode_operations vboxsf_reg_iops = {
>
> static int vboxsf_read_folio(struct file *file, struct folio *folio)
> {
> - struct page *page = &folio->page;
> struct vboxsf_handle *sf_handle = file->private_data;
> - loff_t off = page_offset(page);
> - u32 nread = PAGE_SIZE;
> - u8 *buf;
> + loff_t pos = folio_pos(folio);
> + size_t offset = 0;
> int err;
>
> - buf = kmap(page);
> + do {
> + u8 *buf = kmap_local_folio(folio, offset);
> + u32 nread = PAGE_SIZE;
>
> - err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
> - if (err == 0) {
> - memset(&buf[nread], 0, PAGE_SIZE - nread);
> - flush_dcache_page(page);
> - SetPageUptodate(page);
> - } else {
> - SetPageError(page);
> - }
> + err = vboxsf_read(sf_handle->root, sf_handle->handle, pos,
> + &nread, buf);
> + if (nread < PAGE_SIZE)
> + memset(&buf[nread], 0, PAGE_SIZE - nread);
> + kunmap_local(buf);
> + if (err)
> + break;
> + offset += PAGE_SIZE;
> + pos += PAGE_SIZE;
> + } while (offset < folio_size(folio);
>
> - kunmap(page);
> - unlock_page(page);
> + if (!err) {
> + flush_dcache_folio(folio);
> + folio_mark_uptodate(folio);
> + }
> + folio_unlock(folio);
> return err;
> }
>
Hi
So, after reading the comments, I understood that the problem presented
by Hans and Matthew is as follows:
1) In the current code, the buffers used by vboxsf_write()/vboxsf_read() are
translated to PAGELIST-s before passing to the hypervisor,
but inefficiently— it first maps a page in vboxsf_read_folio() and then
calls page_to_phys(virt_to_page()) in the function hgcm_call_init_linaddr().
The inefficiency in the current implementation arises due to the unnecessary
mapping of a page in vboxsf_read_folio() because the mapping output, i.e. the
linear address, is used deep down in file 'drivers/virt/vboxguest/vboxguest_utils.c'.
Hence, the mapping must be done in this file; to do so, the folio must be passed
until this point. It can be done by adding a new member, 'struct folio *folio',
in the 'struct vmmdev_hgcm_function_parameter64'.
The unused member 'phys_addr' in this struct can also be removed.
2) Expanding the vboxsf_read_folio so that it can handle larger folios.
Matthew already has suggested the changes, I have to read more on this.
Parallelly I will be setting up the testing environment to test the changes.
Please let me know if I am wrong anywhere.
Thanks & regards
Sumitra
Hi,
On 6/29/23 11:28, Sumitra Sharma wrote:
> On Wed, Jun 28, 2023 at 06:15:04PM +0100, Matthew Wilcox wrote:
>> Here's a more comprehensive read_folio patch. It's not at all
>> efficient, but then if we wanted an efficient vboxsf, we'd implement
>> vboxsf_readahead() and actually do an async call with deferred setting
>> of the uptodate flag. I can consult with anyone who wants to do all
>> this work.
>>
>> I haven't even compiled this, just trying to show the direction this
>> should take.
>>
>> diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
>> index 2307f8037efc..f1af9a7bd3d8 100644
>> --- a/fs/vboxsf/file.c
>> +++ b/fs/vboxsf/file.c
>> @@ -227,26 +227,31 @@ const struct inode_operations vboxsf_reg_iops = {
>>
>> static int vboxsf_read_folio(struct file *file, struct folio *folio)
>> {
>> - struct page *page = &folio->page;
>> struct vboxsf_handle *sf_handle = file->private_data;
>> - loff_t off = page_offset(page);
>> - u32 nread = PAGE_SIZE;
>> - u8 *buf;
>> + loff_t pos = folio_pos(folio);
>> + size_t offset = 0;
>> int err;
>>
>> - buf = kmap(page);
>> + do {
>> + u8 *buf = kmap_local_folio(folio, offset);
>> + u32 nread = PAGE_SIZE;
>>
>> - err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
>> - if (err == 0) {
>> - memset(&buf[nread], 0, PAGE_SIZE - nread);
>> - flush_dcache_page(page);
>> - SetPageUptodate(page);
>> - } else {
>> - SetPageError(page);
>> - }
>> + err = vboxsf_read(sf_handle->root, sf_handle->handle, pos,
>> + &nread, buf);
>> + if (nread < PAGE_SIZE)
>> + memset(&buf[nread], 0, PAGE_SIZE - nread);
>> + kunmap_local(buf);
>> + if (err)
>> + break;
>> + offset += PAGE_SIZE;
>> + pos += PAGE_SIZE;
>> + } while (offset < folio_size(folio);
>>
>> - kunmap(page);
>> - unlock_page(page);
>> + if (!err) {
>> + flush_dcache_folio(folio);
>> + folio_mark_uptodate(folio);
>> + }
>> + folio_unlock(folio);
>> return err;
>> }
>>
>
> Hi
>
> So, after reading the comments, I understood that the problem presented
> by Hans and Matthew is as follows:
>
> 1) In the current code, the buffers used by vboxsf_write()/vboxsf_read() are
> translated to PAGELIST-s before passing to the hypervisor,
> but inefficiently— it first maps a page in vboxsf_read_folio() and then
> calls page_to_phys(virt_to_page()) in the function hgcm_call_init_linaddr().
>
> The inefficiency in the current implementation arises due to the unnecessary
> mapping of a page in vboxsf_read_folio() because the mapping output, i.e. the
> linear address, is used deep down in file 'drivers/virt/vboxguest/vboxguest_utils.c'.
> Hence, the mapping must be done in this file; to do so, the folio must be passed
> until this point. It can be done by adding a new member, 'struct folio *folio',
> in the 'struct vmmdev_hgcm_function_parameter64'.
struct vmmdev_hgcm_function_parameter64 is defined in
include/uapi/linux/vbox_vmmdev_types.h
This is part of the userspace API of vboxguest which allows userspace
to make (some) VirtualBox hypervisor request through a chardev.
You can not just go and change this struct.
Note there already is a VMMDEV_HGCM_PARM_TYPE_PAGELIST type. So you
could do the conversion from folio to pagelist (see the vboxguest code
for how to do this) in the vboxsf_read() code (making it take a folio
pointer as arg) and then directly pass the pagelist to the vbg_hgcm_call().
> The unused member 'phys_addr' in this struct can also be removed.
Again no you can NOT just go and make changes to an uapi header.
Regards,
Hans
Hi,
On 6/29/23 16:42, Matthew Wilcox wrote:
> On Thu, Jun 29, 2023 at 02:28:44AM -0700, Sumitra Sharma wrote:
>> On Wed, Jun 28, 2023 at 06:15:04PM +0100, Matthew Wilcox wrote:
>>> Here's a more comprehensive read_folio patch. It's not at all
>>> efficient, but then if we wanted an efficient vboxsf, we'd implement
>>> vboxsf_readahead() and actually do an async call with deferred setting
>>> of the uptodate flag. I can consult with anyone who wants to do all
>>> this work.
>>
>> So, after reading the comments, I understood that the problem presented
>> by Hans and Matthew is as follows:
>>
>> 1) In the current code, the buffers used by vboxsf_write()/vboxsf_read() are
>> translated to PAGELIST-s before passing to the hypervisor,
>> but inefficiently— it first maps a page in vboxsf_read_folio() and then
>> calls page_to_phys(virt_to_page()) in the function hgcm_call_init_linaddr().
>
> It does ... and I'm not even sure that virt_to_page() works for kmapped
> pages. Has it been tested with a 32-bit guest with, say, 4-8GB of memory?
>
>> The inefficiency in the current implementation arises due to the unnecessary
>> mapping of a page in vboxsf_read_folio() because the mapping output, i.e. the
>> linear address, is used deep down in file 'drivers/virt/vboxguest/vboxguest_utils.c'.
>> Hence, the mapping must be done in this file; to do so, the folio must be passed
>> until this point. It can be done by adding a new member, 'struct folio *folio',
>> in the 'struct vmmdev_hgcm_function_parameter64'.
>
> That's not the way to do it (as Hans already said).
>
> The other problem is that vboxsf_read() is synchronous. It makes the
> call to the host, then waits for the outcome. What we really need is
> a vboxsf_readahead() that looks something like this:
>
> static void vboxsf_readahead(struct readahead_control *ractl)
> {
> unsigned int nr = readahead_count(ractl);
> req = vbg_req_alloc(... something involving nr ...);
> ... fill in the page array ...
> ... submit the request ...
> }
>
> You also need to set up a kthread that will sit on the hgcm_wq and handle
> the completions that come in (where you'd call folio_mark_uptodate() if
> the call is successful, folio_unlock() to indicate the I/O has completed,
> etc, etc).
>
> Then go back to read_folio() (which can be synchronous), and maybe factor
> out the parts of vboxsf_readahead() that can be reused for filling in
> the vbg_req.
>
> Hans might well have better ideas about this could be structured; I'm
> new to the vbox code.
I think that moving to directly passing vbox PAGELIST structs on
read is something which should not be too much work.
Moving to an async model is a lot more involved and has a much
larger chance of causing problems. So we need not only someone
to step up to not only make the change to async, but that person
also need to commit to help with maintaining vboxsf going forward,
esp. wrt debug + fix any problems stemming from the move to async
which are likely to only pop up much later as more and more
distros move to the new code.
Regards,
Hans
On Thu, Jun 29, 2023 at 02:28:44AM -0700, Sumitra Sharma wrote:
> On Wed, Jun 28, 2023 at 06:15:04PM +0100, Matthew Wilcox wrote:
> > Here's a more comprehensive read_folio patch. It's not at all
> > efficient, but then if we wanted an efficient vboxsf, we'd implement
> > vboxsf_readahead() and actually do an async call with deferred setting
> > of the uptodate flag. I can consult with anyone who wants to do all
> > this work.
>
> So, after reading the comments, I understood that the problem presented
> by Hans and Matthew is as follows:
>
> 1) In the current code, the buffers used by vboxsf_write()/vboxsf_read() are
> translated to PAGELIST-s before passing to the hypervisor,
> but inefficiently— it first maps a page in vboxsf_read_folio() and then
> calls page_to_phys(virt_to_page()) in the function hgcm_call_init_linaddr().
It does ... and I'm not even sure that virt_to_page() works for kmapped
pages. Has it been tested with a 32-bit guest with, say, 4-8GB of memory?
> The inefficiency in the current implementation arises due to the unnecessary
> mapping of a page in vboxsf_read_folio() because the mapping output, i.e. the
> linear address, is used deep down in file 'drivers/virt/vboxguest/vboxguest_utils.c'.
> Hence, the mapping must be done in this file; to do so, the folio must be passed
> until this point. It can be done by adding a new member, 'struct folio *folio',
> in the 'struct vmmdev_hgcm_function_parameter64'.
That's not the way to do it (as Hans already said).
The other problem is that vboxsf_read() is synchronous. It makes the
call to the host, then waits for the outcome. What we really need is
a vboxsf_readahead() that looks something like this:
static void vboxsf_readahead(struct readahead_control *ractl)
{
unsigned int nr = readahead_count(ractl);
req = vbg_req_alloc(... something involving nr ...);
... fill in the page array ...
... submit the request ...
}
You also need to set up a kthread that will sit on the hgcm_wq and handle
the completions that come in (where you'd call folio_mark_uptodate() if
the call is successful, folio_unlock() to indicate the I/O has completed,
etc, etc).
Then go back to read_folio() (which can be synchronous), and maybe factor
out the parts of vboxsf_readahead() that can be reused for filling in
the vbg_req.
Hans might well have better ideas about this could be structured; I'm
new to the vbox code.
On gioved? 29 giugno 2023 05:16:04 CEST Matthew Wilcox wrote:
> On Thu, Jun 29, 2023 at 12:23:54AM +0200, Fabio M. De Francesco wrote:
> > > - buf = kmap(page);
> > > + do {
> >
> > Please let me understand why you are calling vboxsf_read() in a loop, a
> > PAGE_SIZE at a time.
>
> Because kmap_local_folio() can only (guarantee to) map one page at a
> time.
Yes, one page at a time. This part served to introduce the _main_ question
that is the one you answered below (i.e., since the current code maps a page a
a time with no loops, do we need to manage folios spanning more than a single
page?)
> Also vboxsf_read() is only tested with a single page at a time.
>
> > If I understand the current code it reads a single page at offset zero of
a
> > folio and then memset() with zeros from &buf[nread] up to the end of the
> > page. Then it seems that this function currently assume that the folio
> > doesn't need to be read until "offset < folio_size(folio)" becomes false.
> >
> > Does it imply that the folio is always one page sized? Doesn't it? I'm
> > surely
> > missing some basics...
>
> vboxsf does not yet claim to support large folios, so every folio that
> it sees will be only a single page in size.
> Hopefully at some point
> that will change. Again, somebody would need to test that. In the
> meantime, if someone is going to the trouble of switching over to using
> the folio API, let's actually include support for large folios.
"[...] at some point that will change." wrt larger folios spanning multiple
pages is the answer that justifies the loop. I couldn't know about this plan.
Thanks for explaining that there is a plan towards that goal.
I think that Sumitra can address the task to re-use your patch to
vboxsf_read_folio() and then properly test it with VirtualBox.
Instead the much larger effort to implement vboxsf_readahead() and actually do
an async call with deferred setting of the uptodate flag will surely require a
considerable amount of time and whoever wanted to address it would need your
guidance.
You said that you are ready to provide consult, but I'm not sure whether
Sumitra would be allowed to spend a large part of her time to do an out of
scope task (wrt her internship).
If yes, I have nothing against. If not, I'm pretty sure that someone else can
set aside enough time to address this large task ;-)
>
> > > - kunmap(page);
> > > - unlock_page(page);
> > > + if (!err) {
> > > + flush_dcache_folio(folio);
> > > + folio_mark_uptodate(folio);
> > > + }
> > > + folio_unlock(folio);
> >
> > Shouldn't we call folio_lock() to lock the folio to be able to unlock with
> > folio_unlock()?
> >
> > If so, I can't find any neither a folio_lock() or a page_lock() in this
> > filesystem.
> >
> > Again sorry for not understanding, can you please explain it?
>
> Ira gave the minimal explanation, but a slightly fuller explanation is
> that the folio is locked while it is being fetched from backing store.
This explains why I could not easily find the call to lock it.
> That prevents both a second thread from reading from it while another
> thread is bringing it uptodate, and two threads trying to bring it
> uptodate at the same time.
>
> Most filesystems have an asynchronous read_folio, so you don't see the
> folio_unlock() in the read_folio() function; instead it's in the I/O
> completion path. vboxsf is synchronous.
And this explains different implementation between synchronous and
asynchronous reads
Again thanks,
Fabio