kmap() is being deprecated in favor of kmap_local_page().
There are two main problems with kmap(): (1) It comes with an overhead as
the mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.
With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and still valid.
Since its use in fs/ufs is safe everywhere, it should be preferred.
Therefore, replace kmap() with kmap_local_page() in fs/ufs. kunmap_local()
requires the mapping address, so return that address from ufs_get_page()
to be used in ufs_put_page().
This series could have not been ever made because nothing prevented the
previous patch from working properly but Al Viro made a long series of
very appreciated comments about how many unnecessary and redundant lines
of code I could have removed. He could see things I was entirely unable
to notice. Furthermore, he also provided solutions and details about how
I could decompose a single patch into a small series of three
independent units.[1][2][3]
I want to thank him so much for the patience, kindness and the time he
decided to spend to provide those analysis and write three messages full
of interesting insights.[1][2][3]
Changes from v1
1/3: No changes.
2/3: Restore the return of "err" that was mistakenly deleted
together with the removal of the "out" label in
ufs_add_link(). Thanks to Al Viro.[4]
Return the address of the kmap()'ed page instead of a
pointer to a pointer to the mapped page; a page_address()
had been overlooked in ufs_get_page(). Thanks to Al
Viro.[5]
3/3: Return the kernel virtual address got from the call to
kmap_local_page() after conversion from kmap(). Again
thanks to Al Viro.[6]
[1] https://lore.kernel.org/lkml/Y4E++JERgUMoqfjG@ZenIV/
[2] https://lore.kernel.org/lkml/Y4FG0O7VWTTng5yh@ZenIV/
[3] https://lore.kernel.org/lkml/Y4ONIFJatIGsVNpf@ZenIV/
[4] https://lore.kernel.org/lkml/Y5Zc0qZ3+zsI74OZ@ZenIV/
[5] https://lore.kernel.org/lkml/Y5ZZy23FFAnQDR3C@ZenIV/
[6] https://lore.kernel.org/lkml/Y5ZcMPzPG9h6C9eh@ZenIV/
The cover letter of the v1 is at
https://lore.kernel.org/lkml/[email protected]/
Fabio M. De Francesco (3):
fs/ufs: Use the offset_in_page() helper
fs/ufs: Change the signature of ufs_get_page()
fs/ufs: Replace kmap() with kmap_local_page()
fs/ufs/dir.c | 138 +++++++++++++++++++++++++++------------------------
1 file changed, 72 insertions(+), 66 deletions(-)
--
2.38.1
Change the signature of ufs_get_page() in order to prepare this function
to the conversion to the use of kmap_local_page(). Change also those call
sites which are required to conform its invocations to the new
signature.
Cc: Ira Weiny <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
fs/ufs/dir.c | 59 ++++++++++++++++++++++------------------------------
1 file changed, 25 insertions(+), 34 deletions(-)
diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 69f78583c9c1..d1cbb48e5d52 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -185,21 +185,21 @@ static bool ufs_check_page(struct page *page)
return false;
}
-static struct page *ufs_get_page(struct inode *dir, unsigned long n)
+static void *ufs_get_page(struct inode *dir, unsigned long n, struct page **page)
{
struct address_space *mapping = dir->i_mapping;
- struct page *page = read_mapping_page(mapping, n, NULL);
- if (!IS_ERR(page)) {
- kmap(page);
- if (unlikely(!PageChecked(page))) {
- if (!ufs_check_page(page))
+ *page = read_mapping_page(mapping, n, NULL);
+ if (!IS_ERR(*page)) {
+ kmap(*page);
+ if (unlikely(!PageChecked(*page))) {
+ if (!ufs_check_page(*page))
goto fail;
}
}
- return page;
+ return page_address(*page);
fail:
- ufs_put_page(page);
+ ufs_put_page(*page);
return ERR_PTR(-EIO);
}
@@ -227,15 +227,12 @@ ufs_next_entry(struct super_block *sb, struct ufs_dir_entry *p)
struct ufs_dir_entry *ufs_dotdot(struct inode *dir, struct page **p)
{
- struct page *page = ufs_get_page(dir, 0);
- struct ufs_dir_entry *de = NULL;
+ struct ufs_dir_entry *de = ufs_get_page(dir, 0, p);
- if (!IS_ERR(page)) {
- de = ufs_next_entry(dir->i_sb,
- (struct ufs_dir_entry *)page_address(page));
- *p = page;
- }
- return de;
+ if (!IS_ERR(de))
+ return ufs_next_entry(dir->i_sb, de);
+ else
+ return NULL;
}
/*
@@ -273,11 +270,10 @@ struct ufs_dir_entry *ufs_find_entry(struct inode *dir, const struct qstr *qstr,
start = 0;
n = start;
do {
- char *kaddr;
- page = ufs_get_page(dir, n);
- if (!IS_ERR(page)) {
- kaddr = page_address(page);
- de = (struct ufs_dir_entry *) kaddr;
+ char *kaddr = ufs_get_page(dir, n, &page);
+
+ if (!IS_ERR(kaddr)) {
+ de = (struct ufs_dir_entry *)kaddr;
kaddr += ufs_last_byte(dir, n) - reclen;
while ((char *) de <= kaddr) {
if (ufs_match(sb, namelen, name, de))
@@ -328,12 +324,10 @@ int ufs_add_link(struct dentry *dentry, struct inode *inode)
for (n = 0; n <= npages; n++) {
char *dir_end;
- page = ufs_get_page(dir, n);
- err = PTR_ERR(page);
- if (IS_ERR(page))
- goto out;
+ kaddr = ufs_get_page(dir, n, &page);
+ if (IS_ERR(kaddr))
+ return PTR_ERR(kaddr);
lock_page(page);
- kaddr = page_address(page);
dir_end = kaddr + ufs_last_byte(dir, n);
de = (struct ufs_dir_entry *)kaddr;
kaddr += PAGE_SIZE - reclen;
@@ -395,7 +389,6 @@ int ufs_add_link(struct dentry *dentry, struct inode *inode)
/* OFFSET_CACHE */
out_put:
ufs_put_page(page);
-out:
return err;
out_unlock:
unlock_page(page);
@@ -429,6 +422,7 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
unsigned chunk_mask = ~(UFS_SB(sb)->s_uspi->s_dirblksize - 1);
bool need_revalidate = !inode_eq_iversion(inode, file->f_version);
unsigned flags = UFS_SB(sb)->s_flags;
+ struct page *page;
UFSD("BEGIN\n");
@@ -439,16 +433,14 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
char *kaddr, *limit;
struct ufs_dir_entry *de;
- struct page *page = ufs_get_page(inode, n);
-
- if (IS_ERR(page)) {
+ kaddr = ufs_get_page(inode, n, &page);
+ if (IS_ERR(kaddr)) {
ufs_error(sb, __func__,
"bad page in #%lu",
inode->i_ino);
ctx->pos += PAGE_SIZE - offset;
return -EIO;
}
- kaddr = page_address(page);
if (unlikely(need_revalidate)) {
if (offset) {
offset = ufs_validate_entry(sb, kaddr, offset, chunk_mask);
@@ -595,12 +587,11 @@ int ufs_empty_dir(struct inode * inode)
for (i = 0; i < npages; i++) {
char *kaddr;
struct ufs_dir_entry *de;
- page = ufs_get_page(inode, i);
- if (IS_ERR(page))
+ kaddr = ufs_get_page(inode, i, &page);
+ if (IS_ERR(kaddr))
continue;
- kaddr = page_address(page);
de = (struct ufs_dir_entry *)kaddr;
kaddr += ufs_last_byte(inode, i) - UFS_DIR_REC_LEN(1);
--
2.38.1
On Tue, Dec 13, 2022 at 12:19:05AM +0100, Fabio M. De Francesco wrote:
> +static void *ufs_get_page(struct inode *dir, unsigned long n, struct page **page)
> {
> struct address_space *mapping = dir->i_mapping;
> - struct page *page = read_mapping_page(mapping, n, NULL);
> - if (!IS_ERR(page)) {
> - kmap(page);
> - if (unlikely(!PageChecked(page))) {
> - if (!ufs_check_page(page))
> + *page = read_mapping_page(mapping, n, NULL);
> + if (!IS_ERR(*page)) {
> + kmap(*page);
> + if (unlikely(!PageChecked(*page))) {
> + if (!ufs_check_page(*page))
> goto fail;
> }
> }
> - return page;
> + return page_address(*page);
Er... You really don't want to do that when you've got ERR_PTR()
from read_mapping_page().
>
> fail:
> - ufs_put_page(page);
> + ufs_put_page(*page);
> return ERR_PTR(-EIO);
> }
IDGI...
static void *ufs_get_page(struct inode *dir, unsigned long n, struct page **p)
{
struct address_space *mapping = dir->i_mapping;
struct page *page = read_mapping_page(mapping, n, NULL);
if (!IS_ERR(page)) {
kmap(page);
if (unlikely(!PageChecked(page))) {
if (!ufs_check_page(page))
goto fail;
}
*p = page;
return page_address(page);
}
return ERR_CAST(page);
fail:
ufs_put_page(page);
return ERR_PTR(-EIO);
}
all there is to it... The only things you need to change are
1) type of function
2) make sure to store the page into that pointer to pointer to page on success
3) return page_address(page) instead of page on success
4) use ERR_CAST() to convert ERR_PTR() that is struct page * into equal
ERR_PTR() that is void * (the last one is optional, just makes the intent
more clear).
On marted? 13 dicembre 2022 08:10:58 CET Al Viro wrote:
> On Tue, Dec 13, 2022 at 12:19:05AM +0100, Fabio M. De Francesco wrote:
> > +static void *ufs_get_page(struct inode *dir, unsigned long n, struct page
> > **page)>
> > {
> >
> > struct address_space *mapping = dir->i_mapping;
> >
> > - struct page *page = read_mapping_page(mapping, n, NULL);
> > - if (!IS_ERR(page)) {
> > - kmap(page);
> > - if (unlikely(!PageChecked(page))) {
> > - if (!ufs_check_page(page))
> > + *page = read_mapping_page(mapping, n, NULL);
> > + if (!IS_ERR(*page)) {
> > + kmap(*page);
> > + if (unlikely(!PageChecked(*page))) {
> > + if (!ufs_check_page(*page))
> >
> > goto fail;
> >
> > }
> >
> > }
> >
> > - return page;
> > + return page_address(*page);
>
> Er... You really don't want to do that when you've got ERR_PTR()
> from read_mapping_page().
>
Sure :-)
Obviously, I'll fix it ASAP.
But I can't yet understand why that pointer was returned unconditionally in
the code which I'm changing with this patch (i.e., regardless of any potential
errors from read_mapping_page()) :-/
>
> > fail:
> > - ufs_put_page(page);
> > + ufs_put_page(*page);
> >
> > return ERR_PTR(-EIO);
> >
> > }
>
> IDGI...
>
> static void *ufs_get_page(struct inode *dir, unsigned long n, struct page
**p)
> {
> struct address_space *mapping = dir->i_mapping;
> struct page *page = read_mapping_page(mapping, n, NULL);
>
> if (!IS_ERR(page)) {
> kmap(page);
> if (unlikely(!PageChecked(page))) {
> if (!ufs_check_page(page))
> goto fail;
> }
> *p = page;
> return page_address(page);
> }
> return ERR_CAST(page);
>
> fail:
> ufs_put_page(page);
> return ERR_PTR(-EIO);
> }
>
> all there is to it... The only things you need to change are
> 1) type of function
> 2) make sure to store the page into that pointer to pointer to page on
success
> 3) return page_address(page) instead of page on success
> 4) use ERR_CAST() to convert ERR_PTR() that is struct page * into equal
> ERR_PTR() that is void * (the last one is optional, just makes the intent
> more clear).
I've never seen anything like this: you are spoon feeding me :-)
Please don't misunderstand: I'm OK with it and, above all, I'm surely
appreciating how much patience and time you are devolving to help me.
I'll send v3 ASAP (for sure within the next 24 hours).
Furthermore, if there are no other issues, I'd start ASAP with the
decomposition of the patch to fs/sysv into a series of three units and rework
the whole design in accordance to the suggestions you have been kindly
providing.
I'm sorry for responding and reacting so slowly. Aside from being slow because
of a fundamental lack of knowledge and experience, I can do Kernel development
(including studying new subjects and code, doing patches, doing reviews, and
the likes) about 7 hours a week. This is all the time I can set aside until I
quit one of my jobs at the end of January.
Again thanks,
Fabio
P.S.: While at this patch I'd like to gently ping you about another conversion
that I sent (and resent) months ago:
[RESEND PATCH] fs/aio: Replace kmap{,_atomic}() with kmap_local_page() at:
https://lore.kernel.org/lkml/[email protected]/
This patch to fs/aio.c has already received two "Reviewed-by" tags: the first
from Ira Weiny and the second from Jeff Moyer (you won't see the second there,
because I forgot to forward it when I sent again :-( ).
The tag from Jeff is in the following message (from another [RESEND PATCH]
thread):
https://lore.kernel.org/lkml/[email protected]/
In that same thread, I explained better I meant in the last sentence of the
commit message, because Jeff commented that it is ambiguous (I'm adding him in
the Cc list of recipients).
The original patch, submitted on Thu, 7 Jul 2022 01:33:28 +0200 is at:
https://lore.kernel.org/lkml/[email protected]/
I'd appreciate very much if you can spend some time on that patch too :)
On Tue, Dec 13, 2022 at 08:08:22PM +0100, Fabio M. De Francesco wrote:
> But I can't yet understand why that pointer was returned unconditionally in
> the code which I'm changing with this patch (i.e., regardless of any potential
> errors from read_mapping_page()) :-/
Because on error from read_mapping_page() you get page or ERR_PTR(-E...),
which is what you want the sucker to return on error.
Here you want to return page_address(page) on success or ERR_PTR(-E...) on
error.
For original calling conventions return page; worked both on success and
on error. With the new ones it's no longer true - giving page_address() an
error value (address in range (unsigned long)(-4095) to (unsigned long)(-1))
is *not* going to return the same error value.
It's even more obvious with your third patch - on read_mapping_page() failure
you want to return the same bit pattern it gave you, on its success you want
to pass the pointer it gave you to kmap_local_page() and return the address
you get from kmap_local_page(), right? Check what your patch ends up doing -
you store result of kmap_local_page() in kaddr, then return it. Including
the case when you have *not* called kmap_local_page() at all...
Again, warnings are occasionally useful...
Brain dump on ERR_PTR() and related things: kernel makes sure that the top 4K
of possible addresses are never used for objects of any type (that's
0xfffff000--0xffffffff on 32bit, 0xfffffffffffff000--0xffffffffffffffff
on 64bit). That gives us 4095 values that are never going to clash with
any valid pointer values. One way of thinking about those is "split NULL" -
instead of one value that is guaranteed to be distinct from address of
any object we have a small bunch of those and we can use the _choice_ of
such value to carry information. Similar to how NaN are used for real
numbers, actually.
If function returns a pointer on success and errno value on failure,
it can be declared to return a pointer type, using ERR_PTR(-22) to represent
"error #22", etc. These values can be easily recognized; IS_ERR(p) is
true for those and only for those. PTR_ERR() converts those to numbers -
PTR_ERR(ERR_PTR(-22)) is -22, etc.
On non-error values PTR_ERR() yields garbage; it won't break (or do any
calculations, actually - just cast to signed long), but the value is
going to be useless for anything.
IS_ERR() is annotated so that compiler knows that it's unlikely to be
true; i.e. if (IS_ERR(...)) {do something} won't have the body cluttering
the hot path.
You can compare them for (in)equality just fine -
if (p == ERR_PTR(-ENOMEM))
is fine; no need to bother with the things like
if (PTR_ERR(p) == -ENOMEM)
You obviously can't dereference them and (slightly less obviously) can't do
ordered comparisons. You definitely can't do pointer arithmetics on those.
These values are used with all pointer types; something like
struct foo *f()
{
struct foo *p;
char *s;
p = kmalloc(sizeof(struct foo), GFP_KERNEL);
if (!p)
return ERR_PTR(-ENOMEM);
s = bar();
// if bar() has failed, return the error it gave you
if (IS_ERR(s)) {
kfree(p);
return (void *)s; // UNIDIOMATIC
}
....
return p;
}
is valid, but it's better spelled as ERR_CAST(s) instead of (void *)s.
Note that these values can also be used as arguments - it's less common
that use as return values, but there are situations when it makes sense.
Not the case here, though...
See include/linux/err.h; one warning - use of IS_ERR_OR_NULL() is very
often a mistake. Mixing NULL and ERR_PTR() in calling conventions is
a good way to end up with massive confusion down the road; sometimes
it's warranted, but such cases are rare and generally you want to ask
yourself "do I really want to go there?"
> P.S.: While at this patch I'd like to gently ping you about another conversion
> that I sent (and resent) months ago:
>
> [RESEND PATCH] fs/aio: Replace kmap{,_atomic}() with kmap_local_page() at:
> https://lore.kernel.org/lkml/[email protected]/
>
> This patch to fs/aio.c has already received two "Reviewed-by" tags: the first
> from Ira Weiny and the second from Jeff Moyer (you won't see the second there,
> because I forgot to forward it when I sent again :-( ).
>
> The tag from Jeff is in the following message (from another [RESEND PATCH]
> thread):
> https://lore.kernel.org/lkml/[email protected]/
>
> In that same thread, I explained better I meant in the last sentence of the
> commit message, because Jeff commented that it is ambiguous (I'm adding him in
> the Cc list of recipients).
>
> The original patch, submitted on Thu, 7 Jul 2022 01:33:28 +0200 is at:
> https://lore.kernel.org/lkml/[email protected]/
>
> I'd appreciate very much if you can spend some time on that patch too :)
Will check...