kmap() is being deprecated in favor of kmap_local_page().
There are two main problems with kmap(): (1) It comes with an overhead as
mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.
With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and are still valid.
However, there is a huge constraint which might block some conversions
to kmap_local_page(): the kernel virtual address cannot be handed across
different threads. Ira made me notice that the kmap() and kunmap() in this
driver happen in two different workqueues. Therefore, kunmap_local() will
try to unmap an invalid address.
Let me explain why I'm sending an RFC. When I hit the above mentioned
issues I tried to refactor the code in ways where mapping and unmapping
happen in a single thread (to not break the rules of threads locality).
However, while reading this code again I think I noticed an important
prerequisite which may lead to a simpler solution... If I'm not wrong, it
looks like the pages are allocated in nvmet_tcp_map_data(), using the
GFP_KERNEL flag.
This would assure that those pages _cannot_ come from HIGHMEM. If I'm not
missing something (again!), a plain page_address() could replace the kmap()
of sg_page(sg); furthermore, we shouldn't need the unmappings any longer.
Unfortunately, I don't know this protocol and I'm not so experienced with
kernel development to be able to understand this code properly.
Therefore, I have two questions: am I right about thinking that the pages
mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL? If so,
can anyone with more knowledge than mine please say if my changes make any
sense?
Suggested-by: Ira Weiny <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/nvme/target/tcp.c | 19 +------------------
1 file changed, 1 insertion(+), 18 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index dc3b4dc8fe08..affba6d862fc 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -167,7 +167,6 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops;
static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c);
static void nvmet_tcp_finish_cmd(struct nvmet_tcp_cmd *cmd);
static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd);
-static void nvmet_tcp_unmap_pdu_iovec(struct nvmet_tcp_cmd *cmd);
static inline u16 nvmet_tcp_cmd_tag(struct nvmet_tcp_queue *queue,
struct nvmet_tcp_cmd *cmd)
@@ -309,19 +308,6 @@ static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd)
cmd->req.sg = NULL;
}
-static void nvmet_tcp_unmap_pdu_iovec(struct nvmet_tcp_cmd *cmd)
-{
- struct scatterlist *sg;
- int i;
-
- sg = &cmd->req.sg[cmd->sg_idx];
-
- for (i = 0; i < cmd->nr_mapped; i++)
- kunmap(sg_page(&sg[i]));
-
- cmd->nr_mapped = 0;
-}
-
static void nvmet_tcp_map_pdu_iovec(struct nvmet_tcp_cmd *cmd)
{
struct kvec *iov = cmd->iov;
@@ -338,7 +324,7 @@ static void nvmet_tcp_map_pdu_iovec(struct nvmet_tcp_cmd *cmd)
while (length) {
u32 iov_len = min_t(u32, length, sg->length - sg_offset);
- iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset;
+ iov->iov_base = page_address(sg_page(sg)) + sg->offset + sg_offset;
iov->iov_len = iov_len;
length -= iov_len;
@@ -1141,7 +1127,6 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
cmd->rbytes_done += ret;
}
- nvmet_tcp_unmap_pdu_iovec(cmd);
if (queue->data_digest) {
nvmet_tcp_prep_recv_ddgst(cmd);
return 0;
@@ -1411,7 +1396,6 @@ static void nvmet_tcp_restore_socket_callbacks(struct nvmet_tcp_queue *queue)
static void nvmet_tcp_finish_cmd(struct nvmet_tcp_cmd *cmd)
{
nvmet_req_uninit(&cmd->req);
- nvmet_tcp_unmap_pdu_iovec(cmd);
nvmet_tcp_free_cmd_buffers(cmd);
}
@@ -1424,7 +1408,6 @@ static void nvmet_tcp_uninit_data_in_cmds(struct nvmet_tcp_queue *queue)
if (nvmet_tcp_need_data_in(cmd))
nvmet_req_uninit(&cmd->req);
- nvmet_tcp_unmap_pdu_iovec(cmd);
nvmet_tcp_free_cmd_buffers(cmd);
}
--
2.37.1
Fabio,
On 8/16/22 02:18, Fabio M. De Francesco wrote:
> kmap() is being deprecated in favor of kmap_local_page().
>
> There are two main problems with kmap(): (1) It comes with an overhead as
> mapping space is restricted and protected by a global lock for
> synchronization and (2) it also requires global TLB invalidation when the
> kmap’s pool wraps and it might block when the mapping space is fully
> utilized until a slot becomes available.
>
so I believe this should give us better performance under heavy
workload ?
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page faults, and can be called from any context (including interrupts).
> It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
> the tasks can be preempted and, when they are scheduled to run again, the
> kernel virtual addresses are restored and are still valid.
>
> However, there is a huge constraint which might block some conversions
> to kmap_local_page(): the kernel virtual address cannot be handed across
> different threads. Ira made me notice that the kmap() and kunmap() in this
> driver happen in two different workqueues. Therefore, kunmap_local() will
> try to unmap an invalid address.
>
> Let me explain why I'm sending an RFC. When I hit the above mentioned
> issues I tried to refactor the code in ways where mapping and unmapping
> happen in a single thread (to not break the rules of threads locality).
>
> However, while reading this code again I think I noticed an important
> prerequisite which may lead to a simpler solution... If I'm not wrong, it
> looks like the pages are allocated in nvmet_tcp_map_data(), using the
> GFP_KERNEL flag.
>
> This would assure that those pages _cannot_ come from HIGHMEM. If I'm not
> missing something (again!), a plain page_address() could replace the kmap()
> of sg_page(sg); furthermore, we shouldn't need the unmappings any longer.
>
> Unfortunately, I don't know this protocol and I'm not so experienced with
> kernel development to be able to understand this code properly.
>
> Therefore, I have two questions: am I right about thinking that the pages
> mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL? If so,
> can anyone with more knowledge than mine please say if my changes make any
> sense?
>
> Suggested-by: Ira Weiny <[email protected]>
> Signed-off-by: Fabio M. De Francesco <[email protected]>
Thanks a lot for detailed explanation.
Quick question what kind of performance benefits you have seen with
this change ? we need to document the performance numbers since commit
log mentions here that kmap_loca_page() is faster than kmap().
In case you are not aware please have a look at the blktests to create
a simple loopback setpu with nvme-loop transport.
-ck
On martedì 16 agosto 2022 15:12:08 CEST Chaitanya Kulkarni wrote:
> Fabio,
>
> On 8/16/22 02:18, Fabio M. De Francesco wrote:
>
> > kmap() is being deprecated in favor of kmap_local_page().
> >
> > There are two main problems with kmap(): (1) It comes with an overhead as
> > mapping space is restricted and protected by a global lock for
> > synchronization and (2) it also requires global TLB invalidation when the
> > kmap’s pool wraps and it might block when the mapping space is fully
> > utilized until a slot becomes available.
> >
>
> so I believe this should give us better performance under heavy
> workload ?
>
Yes, correct. Can you please take a look at the highmem official documentation
(highmem.rst)? I reworked and extended it with two series of patches.
Everything about the deprecation of kmap() is explained there and in a patch
from Ira: "checkpatch: Add kmap and kmap_atomic to the deprecated list" which
you reviewed at https://lore.kernel.org/all/[email protected]/
> > With kmap_local_page() the mappings are per thread, CPU local, can take
> > page faults, and can be called from any context (including interrupts).
> > It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
> > the tasks can be preempted and, when they are scheduled to run again, the
> > kernel virtual addresses are restored and are still valid.
> >
> > However, there is a huge constraint which might block some conversions
> > to kmap_local_page(): the kernel virtual address cannot be handed across
> > different threads. Ira made me notice that the kmap() and kunmap() in this
> > driver happen in two different workqueues. Therefore, kunmap_local() will
> > try to unmap an invalid address.
> >
> > Let me explain why I'm sending an RFC. When I hit the above mentioned
> > issues I tried to refactor the code in ways where mapping and unmapping
> > happen in a single thread (to not break the rules of threads locality).
> >
> > However, while reading this code again I think I noticed an important
> > prerequisite which may lead to a simpler solution... If I'm not wrong, it
> > looks like the pages are allocated in nvmet_tcp_map_data(), using the
> > GFP_KERNEL flag.
> >
> > This would assure that those pages _cannot_ come from HIGHMEM. If I'm not
> > missing something (again!), a plain page_address() could replace the
kmap()
> > of sg_page(sg); furthermore, we shouldn't need the un-mappings any longer.
> >
> > Unfortunately, I don't know this protocol and I'm not so experienced with
> > kernel development to be able to understand this code properly.
> >
> > Therefore, I have two questions: am I right about thinking that the pages
> > mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL? If so,
> > can anyone with more knowledge than mine please say if my changes make any
> > sense?
> >
> > Suggested-by: Ira Weiny <[email protected]>
> > Signed-off-by: Fabio M. De Francesco <[email protected]>
>
>
> Thanks a lot for detailed explanation.
You are welcome!
> Quick question what kind of performance benefits you have seen with
> this change ? we need to document the performance numbers since commit
> log mentions here that kmap_local_page() is faster than kmap().
OK, but kmap_local_page() was discarded because not applicable here without
heavy refactoring.
> In case you are not aware please have a look at the blktests to create
> a simple loopback setup with nvme-loop transport.
I have nothing against learning how blktests works and running this tool.
I'll do as you requested.
However, please read the implementation of kmap():
#ifdef CONFIG_HIGHMEM
static inline void *kmap(struct page *page)
{
void *addr;
might_sleep();
if (!PageHighMem(page))
addr = page_address(page);
else
addr = kmap_high(page);
kmap_flush_tlb((unsigned long)addr);
return addr;
}
If page is not from HIGHMEM it is a simple page_address(), like it is in my
RFC patch.
#else /* CONFIG_HIGHMEM */
static inline void *kmap(struct page *page)
{
might_sleep();
return page_address(page);
}
Again, a plain page_address().
Furthermore, with a simple page_address() we avoid the calls to kunmap().
I think it implicitly say all we need to know about why we should prefer
page_address() whenever we are _sure_ that pages cannot come from HIGHMEM.
Thanks for your comments and questions,
Fabio
> -ck
>
On Tue, Aug 16, 2022 at 11:18:08AM +0200, Fabio M. De Francesco wrote:
>
> Therefore, I have two questions: am I right about thinking that the pages
> mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL?
I think you are correct.
> If so, can anyone with more knowledge than mine please say if my changes make
> any sense?
I think it does make sense. I like the code simplification, though this use
was't really paying the kmap penalty since, as you mentioned, this is never
highmem.
You should also remove the cmd's 'nr_mapped' field while you're at it,
otherwise you'll hit the WARN in nvmet_tcp_free_cmd_buffers().
>> Therefore, I have two questions: am I right about thinking that the pages
>> mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL?
>
> I think you are correct.
It is correct. It is the same model for the linux scsi target, sunrpc
etc.
>> If so, can anyone with more knowledge than mine please say if my changes make
>> any sense?
>
> I think it does make sense. I like the code simplification, though this use
> was't really paying the kmap penalty since, as you mentioned, this is never
> highmem.
Yes, its the same code-path. Would be great if we still had an
abstraction that would do the right thing regardless of highmem or
not like kmap provides though.
> You should also remove the cmd's 'nr_mapped' field while you're at it,
> otherwise you'll hit the WARN in nvmet_tcp_free_cmd_buffers().
Not remove nr_mapped because we use it to know the iovec entries, but
we can just remove the WARN statement.
On mercoledì 17 agosto 2022 11:44:09 CEST Sagi Grimberg wrote:
> >> Therefore, I have two questions: am I right about thinking that the pages
> >> mapped in nvmet_tcp_map_pdu_iovec() are allocated with GFP_KERNEL?
> >
> > I think you are correct.
>
> It is correct. It is the same model for the linux scsi target, sunrpc
> etc.
I'll try to address the comments from the two last messages from Keith and
Sagi with this email (I replied yesterday to Chaitanya).
First of all: good to know that it is the same model for other subsystem. This
is useful to know. Thanks!
> >> If so, can anyone with more knowledge than mine please say if my changes
> >> make
> >> any sense?
> >
> > I think it does make sense.
Thanks, I'm glad I was not wrong :-)
> > I like the code simplification, though this use
> > was't really paying the kmap penalty since, as you mentioned, this is
never
> > highmem.
Correct, however everybody like code simplification. I added a couple of
sentences to kmap_local_page() documentation in highmem.rst. They clearly
state that, when users know that pages cannot come from Highmem, they may
better prefer page_address().
The changes to nvmet-tcp started with trying to convert kmap() / kunmap() to
kmap_local_page() /kunmap_local(), but it ended up to code shortening and
simplification with a plain use of page_address().
Obviously, due to my little experience with kernel developing and less than
little knowledge of this protocol I had to ask whether or not I was right in
identifying the site of the allocations.
The reasons why I had to use page_address() will be clearer reading what
follows...
> Yes, its the same code-path. Would be great if we still had an
> abstraction that would do the right thing regardless of highmem or
> not like kmap provides though.
It would be great and it is already possible (this is why Thomas Gleixner
created this kmap_local_page() API) but here we have a huge issue. kmap() and
kmap_atomic() have recently been deprecated and they shouldn't any longer be
used in new code: https://lore.kernel.org/all/[email protected]/ ("[PATCH] checkpatch: Add kmap and kmap_atomic to the
deprecated list").
kmap_local_page() always does the right thing: users can call it with or
without HIGHMEM enabled, in-atomic (also in interrupts) or in preemptible
contexts, they can take page faults.
It doesn't require global lock for synchronization and doesn't require global
TLB invalidation when the kmap's pool wraps and doesn't block waiting for free
slots.
Nice, isn't it?
However, with nvmet-tcp we cannot easily use kmap_local_page() because it
comes with a major problem: it's local to the thread. If users handed the
kernel virtual addresses returned by this function to other threads, the
pointers would be invalid.
Here kmap() and kunmap() call sites are in two different workqueues.
Therefore, there is no way to convert kmap() to kmap_local_page(), unless this
code is heavily refactored.
Knowing that the pages cannot come from Highmem avoids this refactoring and in
the meantime it allows us to delete the kmap() and kunmap() calls sites.
> > You should also remove the cmd's 'nr_mapped' field while you're at it,
> > otherwise you'll hit the WARN in nvmet_tcp_free_cmd_buffers().
>
> Not remove nr_mapped because we use it to know the iovec entries, but
> we can just remove the WARN statement.
Ah, OK. I'll take care of this too. That was not my first concern when I did
the RFC. The "real" patch must also address this detail.
@Chaitanya:
Since this is a mere simplification and shorten of code, I suppose I can skip
the performance tests. Ira and I have still hundreds of call sites with kmap()
and kmap_atomic() which we should care of, therefore we prefer to leave alone
everything that is not strictly necessary for the deprecated API deletions.
Thanks to you all,
Fabio
>>> You should also remove the cmd's 'nr_mapped' field while you're at it,
>>> otherwise you'll hit the WARN in nvmet_tcp_free_cmd_buffers().
>>
>> Not remove nr_mapped because we use it to know the iovec entries, but
>> we can just remove the WARN statement.
>
> It's only used locally within nvmet_tcp_map_pdu_iovec() after this change, so
> no need to carry it in struct nvmet_tcp_cmd anymore.
You are right, we can kill it.
On Wed, Aug 17, 2022 at 12:44:09PM +0300, Sagi Grimberg wrote:
>
> > You should also remove the cmd's 'nr_mapped' field while you're at it,
> > otherwise you'll hit the WARN in nvmet_tcp_free_cmd_buffers().
>
> Not remove nr_mapped because we use it to know the iovec entries, but
> we can just remove the WARN statement.
It's only used locally within nvmet_tcp_map_pdu_iovec() after this change, so
no need to carry it in struct nvmet_tcp_cmd anymore.
> @Chaitanya:
>
> Since this is a mere simplification and shorten of code, I suppose I can skip
> the performance tests. Ira and I have still hundreds of call sites with kmap()
> and kmap_atomic() which we should care of, therefore we prefer to leave alone
> everything that is not strictly necessary for the deprecated API deletions.
>
> Thanks to you all,
>
fine by me.
-ck