From: Eric Biggers <[email protected]>
If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
it may be concurrently modified by a rename. This can cause undefined
behavior (possibly out-of-bounds memory accesses or crashes) in
utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
that may be concurrently modified.
Fix this by first copying the filename to a stack buffer if needed.
This way we get a stable snapshot of the filename.
Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
Cc: <[email protected]> # v5.2+
Cc: Al Viro <[email protected]>
Cc: Daniel Rosenberg <[email protected]>
Cc: Gabriel Krisman Bertazi <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/dir.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index c654205f648dd..19aef8328bb18 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -675,6 +675,7 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
struct qstr qstr = {.name = str, .len = len };
const struct dentry *parent = READ_ONCE(dentry->d_parent);
const struct inode *inode = READ_ONCE(parent->d_inode);
+ char strbuf[DNAME_INLINE_LEN];
if (!inode || !IS_CASEFOLDED(inode) ||
!EXT4_SB(inode->i_sb)->s_encoding) {
@@ -683,6 +684,22 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
return memcmp(str, name->name, len);
}
+ /*
+ * If the dentry name is stored in-line, then it may be concurrently
+ * modified by a rename. If this happens, the VFS will eventually retry
+ * the lookup, so it doesn't matter what ->d_compare() returns.
+ * However, it's unsafe to call utf8_strncasecmp() with an unstable
+ * string. Therefore, we have to copy the name into a temporary buffer.
+ */
+ if (len <= DNAME_INLINE_LEN - 1) {
+ unsigned int i;
+
+ for (i = 0; i < len; i++)
+ strbuf[i] = READ_ONCE(str[i]);
+ strbuf[len] = 0;
+ qstr.name = strbuf;
+ }
+
return ext4_ci_compare(inode, name, &qstr, false);
}
--
2.26.2
Eric Biggers <[email protected]> writes:
> From: Eric Biggers <[email protected]>
>
> If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
> it may be concurrently modified by a rename. This can cause undefined
> behavior (possibly out-of-bounds memory accesses or crashes) in
> utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
> that may be concurrently modified.
>
> Fix this by first copying the filename to a stack buffer if needed.
> This way we get a stable snapshot of the filename.
>
> Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
> Cc: <[email protected]> # v5.2+
> Cc: Al Viro <[email protected]>
> Cc: Daniel Rosenberg <[email protected]>
> Cc: Gabriel Krisman Bertazi <[email protected]>
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/ext4/dir.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index c654205f648dd..19aef8328bb18 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -675,6 +675,7 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
> struct qstr qstr = {.name = str, .len = len };
> const struct dentry *parent = READ_ONCE(dentry->d_parent);
> const struct inode *inode = READ_ONCE(parent->d_inode);
> + char strbuf[DNAME_INLINE_LEN];
>
> if (!inode || !IS_CASEFOLDED(inode) ||
> !EXT4_SB(inode->i_sb)->s_encoding) {
> @@ -683,6 +684,22 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
> return memcmp(str, name->name, len);
> }
>
> + /*
> + * If the dentry name is stored in-line, then it may be concurrently
> + * modified by a rename. If this happens, the VFS will eventually retry
> + * the lookup, so it doesn't matter what ->d_compare() returns.
> + * However, it's unsafe to call utf8_strncasecmp() with an unstable
> + * string. Therefore, we have to copy the name into a temporary buffer.
> + */
> + if (len <= DNAME_INLINE_LEN - 1) {
> + unsigned int i;
> +
> + for (i = 0; i < len; i++)
> + strbuf[i] = READ_ONCE(str[i]);
> + strbuf[len] = 0;
> + qstr.name = strbuf;
> + }
> +
Could we avoid this if the casefolded version were cached in the dentry?
Then we could use utf8_strncasecmp_folded which would be safe. Would
this be acceptable for vfs?
> return ext4_ci_compare(inode, name, &qstr, false);
> }
--
Gabriel Krisman Bertazi
On Sat, May 30, 2020 at 02:17:02AM -0400, Gabriel Krisman Bertazi wrote:
> > > > + /*
> > + * If the dentry name is stored in-line, then it may be concurrently
> > + * modified by a rename. If this happens, the VFS will eventually retry
> > + * the lookup, so it doesn't matter what ->d_compare() returns.
> > + * However, it's unsafe to call utf8_strncasecmp() with an unstable
> > + * string. Therefore, we have to copy the name into a temporary buffer.
> > + */
> > + if (len <= DNAME_INLINE_LEN - 1) {
> > + unsigned int i;
> > +
> > + for (i = 0; i < len; i++)
> > + strbuf[i] = READ_ONCE(str[i]);
> > + strbuf[len] = 0;
> > + qstr.name = strbuf;
> > + }
> > +
>
> Could we avoid this if the casefolded version were cached in the dentry?
> Then we could use utf8_strncasecmp_folded which would be safe. Would
> this be acceptable for vfs?
The VFS assumes that each dentry has one name, the one in d_name. That's what
it passes to ->d_compare(), and that's what it updates in __d_move().
So while ext4 and f2fs could put the casefolded name in ->d_fsdata,
->d_compare() wouldn't actually have access to it (unless we added d_fsdata as a
parameter to ->d_compare()). Also, the casefolded name would get outdated when
__d_move() changes d_name.
We could instead make d_name always be the casefolded name. I'm not sure that
would be possible, though. For one, I don't think ->lookup() is allowed to just
change the dentry name. It would also make getcwd(), /proc/*/fd/, etc. always
show casefolded names, which could be problematic. And probably other issues I
can't think of off the top of my head.
- Eric
On Fri, May 29, 2020 at 11:02:16PM -0700, Eric Biggers wrote:
> + if (len <= DNAME_INLINE_LEN - 1) {
> + unsigned int i;
> +
> + for (i = 0; i < len; i++)
> + strbuf[i] = READ_ONCE(str[i]);
> + strbuf[len] = 0;
This READ_ONCE is going to force the compiler to use byte accesses.
What's wrong with using a plain memcpy()?
> + qstr.name = strbuf;
> + }
> +
> return ext4_ci_compare(inode, name, &qstr, false);
> }
>
> --
> 2.26.2
>
On Sat, May 30, 2020 at 10:18:14AM -0700, Matthew Wilcox wrote:
> On Fri, May 29, 2020 at 11:02:16PM -0700, Eric Biggers wrote:
> > + if (len <= DNAME_INLINE_LEN - 1) {
> > + unsigned int i;
> > +
> > + for (i = 0; i < len; i++)
> > + strbuf[i] = READ_ONCE(str[i]);
> > + strbuf[len] = 0;
>
> This READ_ONCE is going to force the compiler to use byte accesses.
> What's wrong with using a plain memcpy()?
>
It's undefined behavior when the source can be concurrently modified.
Compilers can assume that it's not, and remove the memcpy() (instead just using
the source data directly) if they can prove that the destination array is never
modified again before it goes out of scope.
Do you have any suggestions that don't involve undefined behavior?
- Eric
On Sat, May 30, 2020 at 10:35:47AM -0700, Eric Biggers wrote:
> On Sat, May 30, 2020 at 10:18:14AM -0700, Matthew Wilcox wrote:
> > On Fri, May 29, 2020 at 11:02:16PM -0700, Eric Biggers wrote:
> > > + if (len <= DNAME_INLINE_LEN - 1) {
> > > + unsigned int i;
> > > +
> > > + for (i = 0; i < len; i++)
> > > + strbuf[i] = READ_ONCE(str[i]);
> > > + strbuf[len] = 0;
> >
> > This READ_ONCE is going to force the compiler to use byte accesses.
> > What's wrong with using a plain memcpy()?
> >
>
> It's undefined behavior when the source can be concurrently modified.
>
> Compilers can assume that it's not, and remove the memcpy() (instead just using
> the source data directly) if they can prove that the destination array is never
> modified again before it goes out of scope.
>
> Do you have any suggestions that don't involve undefined behavior?
Even memcpy(strbuf, (volatile void *)str, len)? It's been a while since I've
looked at these parts of C99...
On Sat, May 30, 2020 at 10:35:47AM -0700, Eric Biggers wrote:
> On Sat, May 30, 2020 at 10:18:14AM -0700, Matthew Wilcox wrote:
> > On Fri, May 29, 2020 at 11:02:16PM -0700, Eric Biggers wrote:
> > > + if (len <= DNAME_INLINE_LEN - 1) {
> > > + unsigned int i;
> > > +
> > > + for (i = 0; i < len; i++)
> > > + strbuf[i] = READ_ONCE(str[i]);
> > > + strbuf[len] = 0;
> >
> > This READ_ONCE is going to force the compiler to use byte accesses.
> > What's wrong with using a plain memcpy()?
> >
>
> It's undefined behavior when the source can be concurrently modified.
>
> Compilers can assume that it's not, and remove the memcpy() (instead just using
> the source data directly) if they can prove that the destination array is never
> modified again before it goes out of scope.
>
> Do you have any suggestions that don't involve undefined behavior?
void *memcpy_unsafe(void *dst, volatile void *src, __kernel_size_t);
It can just call memcpy() of course, but the compiler can't reason about
this function because it's not a stdlib function.
On Sat, May 30, 2020 at 06:59:07PM +0100, Al Viro wrote:
> On Sat, May 30, 2020 at 10:35:47AM -0700, Eric Biggers wrote:
> > On Sat, May 30, 2020 at 10:18:14AM -0700, Matthew Wilcox wrote:
> > > On Fri, May 29, 2020 at 11:02:16PM -0700, Eric Biggers wrote:
> > > > + if (len <= DNAME_INLINE_LEN - 1) {
> > > > + unsigned int i;
> > > > +
> > > > + for (i = 0; i < len; i++)
> > > > + strbuf[i] = READ_ONCE(str[i]);
> > > > + strbuf[len] = 0;
> > >
> > > This READ_ONCE is going to force the compiler to use byte accesses.
> > > What's wrong with using a plain memcpy()?
> > >
> >
> > It's undefined behavior when the source can be concurrently modified.
> >
> > Compilers can assume that it's not, and remove the memcpy() (instead just using
> > the source data directly) if they can prove that the destination array is never
> > modified again before it goes out of scope.
> >
> > Do you have any suggestions that don't involve undefined behavior?
>
> Even memcpy(strbuf, (volatile void *)str, len)? It's been a while since I've
> looked at these parts of C99...
That doesn't make sense. memcpy() takes a non-volatile pointer, so the pointer
just gets implicitly cast back to (void *), and you get a compiler warning.
- Eric
On Sat, May 30, 2020 at 01:41:32PM -0700, Matthew Wilcox wrote:
> On Sat, May 30, 2020 at 10:35:47AM -0700, Eric Biggers wrote:
> > On Sat, May 30, 2020 at 10:18:14AM -0700, Matthew Wilcox wrote:
> > > On Fri, May 29, 2020 at 11:02:16PM -0700, Eric Biggers wrote:
> > > > + if (len <= DNAME_INLINE_LEN - 1) {
> > > > + unsigned int i;
> > > > +
> > > > + for (i = 0; i < len; i++)
> > > > + strbuf[i] = READ_ONCE(str[i]);
> > > > + strbuf[len] = 0;
> > >
> > > This READ_ONCE is going to force the compiler to use byte accesses.
> > > What's wrong with using a plain memcpy()?
> > >
> >
> > It's undefined behavior when the source can be concurrently modified.
> >
> > Compilers can assume that it's not, and remove the memcpy() (instead just using
> > the source data directly) if they can prove that the destination array is never
> > modified again before it goes out of scope.
> >
> > Do you have any suggestions that don't involve undefined behavior?
>
> void *memcpy_unsafe(void *dst, volatile void *src, __kernel_size_t);
>
> It can just call memcpy() of course, but the compiler can't reason about
> this function because it's not a stdlib function.
The compiler can still reason about it if it's in the same file, if it's an
inline function, or if link-time-optimization is enabled. (LTO isn't yet
supported by the mainline kernel, but people have been working on it.)
Also, as I mentioned to Al, it's necessary to cast away 'volatile' to call
memcpy(). So the 'volatile' serves no purpose.
How about using barrier(), which expands to asm("" : : : "memory") to tell the
compiler that memory was clobbered?
if (len <= DNAME_INLINE_LEN - 1) {
memcpy(strbuf, str, len);
strbuf[len] = 0;
/* prevent compiler from optimizing out the temporary buffer */
barrier();
}
I think it's still technically undefined to call memcpy() on concurrently
modified memory at all, but I think the above would be okay in practice...
Using 'noinline' could be another option.
- Eric