From: Eric Biggers <[email protected]>
Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
->d_parent and ->d_inode can be concurrently modified, and in
particular, ->d_inode may be changed to NULL. For ext4_d_hash() this
resulted in a reproducible NULL dereference if a lookup is done in a
directory being deleted, e.g. with:
int main()
{
if (fork()) {
for (;;) {
mkdir("subdir", 0700);
rmdir("subdir");
}
} else {
for (;;)
access("subdir/file", 0);
}
}
... or by running the 't_encrypted_d_revalidate' program from xfstests.
Both repros work in any directory on a filesystem with the encoding
feature, even if the directory doesn't actually have the casefold flag.
I couldn't reproduce a crash in ext4_d_compare(), but it appears that a
similar crash is possible there.
Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
falling back to the case sensitive behavior if the inode is NULL.
Reported-by: Al Viro <[email protected]>
Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
Cc: <[email protected]> # v5.2+
Signed-off-by: Eric Biggers <[email protected]>
---
fs/ext4/dir.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 8964778aabefb..0129d14629881 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
const char *str, const struct qstr *name)
{
struct qstr qstr = {.name = str, .len = len };
- struct inode *inode = dentry->d_parent->d_inode;
+ const struct dentry *parent = READ_ONCE(dentry->d_parent);
+ const struct inode *inode = READ_ONCE(parent->d_inode);
- if (!IS_CASEFOLDED(inode) || !EXT4_SB(inode->i_sb)->s_encoding) {
+ if (!inode || !IS_CASEFOLDED(inode) ||
+ !EXT4_SB(inode->i_sb)->s_encoding) {
if (len != name->len)
return -1;
return memcmp(str, name->name, len);
@@ -686,10 +688,11 @@ static int ext4_d_hash(const struct dentry *dentry, struct qstr *str)
{
const struct ext4_sb_info *sbi = EXT4_SB(dentry->d_sb);
const struct unicode_map *um = sbi->s_encoding;
+ const struct inode *inode = READ_ONCE(dentry->d_inode);
unsigned char *norm;
int len, ret = 0;
- if (!IS_CASEFOLDED(dentry->d_inode) || !um)
+ if (!inode || !IS_CASEFOLDED(inode) || !um)
return 0;
norm = kmalloc(PATH_MAX, GFP_ATOMIC);
--
2.25.0
Hi Eric,
On Thu, Jan 23, 2020 at 08:12:34PM -0800, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
> ->d_parent and ->d_inode can be concurrently modified, and in
> particular, ->d_inode may be changed to NULL. For ext4_d_hash() this
> resulted in a reproducible NULL dereference if a lookup is done in a
> directory being deleted, e.g. with:
>
> int main()
> {
> if (fork()) {
> for (;;) {
> mkdir("subdir", 0700);
> rmdir("subdir");
> }
> } else {
> for (;;)
> access("subdir/file", 0);
> }
> }
>
> ... or by running the 't_encrypted_d_revalidate' program from xfstests.
> Both repros work in any directory on a filesystem with the encoding
> feature, even if the directory doesn't actually have the casefold flag.
>
> I couldn't reproduce a crash in ext4_d_compare(), but it appears that a
> similar crash is possible there.
>
> Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
> falling back to the case sensitive behavior if the inode is NULL.
>
> Reported-by: Al Viro <[email protected]>
> Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
> Cc: <[email protected]> # v5.2+
> Signed-off-by: Eric Biggers <[email protected]>
> ---
> fs/ext4/dir.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 8964778aabefb..0129d14629881 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
> const char *str, const struct qstr *name)
> {
> struct qstr qstr = {.name = str, .len = len };
> - struct inode *inode = dentry->d_parent->d_inode;
> + const struct dentry *parent = READ_ONCE(dentry->d_parent);
I'm not sure if we really need READ_ONCE d_parent here (p.s. d_parent
won't be NULL anyway), and d_seq will guard all its validity. If I'm
wrong, correct me kindly...
Otherwise, it looks good to me...
Reviewed-by: Gao Xiang <[email protected]>
Thanks,
Gao Xiang
> + const struct inode *inode = READ_ONCE(parent->d_inode);
>
> - if (!IS_CASEFOLDED(inode) || !EXT4_SB(inode->i_sb)->s_encoding) {
> + if (!inode || !IS_CASEFOLDED(inode) ||
> + !EXT4_SB(inode->i_sb)->s_encoding) {
> if (len != name->len)
> return -1;
> return memcmp(str, name->name, len);
> @@ -686,10 +688,11 @@ static int ext4_d_hash(const struct dentry *dentry, struct qstr *str)
> {
> const struct ext4_sb_info *sbi = EXT4_SB(dentry->d_sb);
> const struct unicode_map *um = sbi->s_encoding;
> + const struct inode *inode = READ_ONCE(dentry->d_inode);
> unsigned char *norm;
> int len, ret = 0;
>
> - if (!IS_CASEFOLDED(dentry->d_inode) || !um)
> + if (!inode || !IS_CASEFOLDED(inode) || !um)
> return 0;
>
> norm = kmalloc(PATH_MAX, GFP_ATOMIC);
> --
> 2.25.0
>
On Fri, Jan 24, 2020 at 01:04:25PM +0800, Gao Xiang wrote:
> > diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> > index 8964778aabefb..0129d14629881 100644
> > --- a/fs/ext4/dir.c
> > +++ b/fs/ext4/dir.c
> > @@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
> > const char *str, const struct qstr *name)
> > {
> > struct qstr qstr = {.name = str, .len = len };
> > - struct inode *inode = dentry->d_parent->d_inode;
> > + const struct dentry *parent = READ_ONCE(dentry->d_parent);
>
> I'm not sure if we really need READ_ONCE d_parent here (p.s. d_parent
> won't be NULL anyway), and d_seq will guard all its validity. If I'm
> wrong, correct me kindly...
>
> Otherwise, it looks good to me...
> Reviewed-by: Gao Xiang <[email protected]>
>
While d_parent can't be set to NULL, it can still be changed concurrently.
So we need READ_ONCE() to ensure that a consistent value is used.
- Eric
On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
> On Fri, Jan 24, 2020 at 01:04:25PM +0800, Gao Xiang wrote:
> > > diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> > > index 8964778aabefb..0129d14629881 100644
> > > --- a/fs/ext4/dir.c
> > > +++ b/fs/ext4/dir.c
> > > @@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
> > > const char *str, const struct qstr *name)
> > > {
> > > struct qstr qstr = {.name = str, .len = len };
> > > - struct inode *inode = dentry->d_parent->d_inode;
> > > + const struct dentry *parent = READ_ONCE(dentry->d_parent);
> >
> > I'm not sure if we really need READ_ONCE d_parent here (p.s. d_parent
> > won't be NULL anyway), and d_seq will guard all its validity. If I'm
> > wrong, correct me kindly...
> >
> > Otherwise, it looks good to me...
> > Reviewed-by: Gao Xiang <[email protected]>
> >
>
> While d_parent can't be set to NULL, it can still be changed concurrently.
> So we need READ_ONCE() to ensure that a consistent value is used.
If I understand correctly, unlazy RCU->ref-walk will be guarded by
seqlock, and for ref-walk we have d_lock (and even parent lock)
in relative paths. So I prematurely think no race of renaming or
unlinking evenually.
I'm curious about that if experts could correct me about this.
Thanks,
Gao Xiang
>
> - Eric
On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
[]
> So we need READ_ONCE() to ensure that a consistent value is used.
By the way, my understanding is all pointer could be accessed
atomicly guaranteed by compiler. In my opinion, we generally
use READ_ONCE() on pointers for other uses (such as, avoid
accessing a variable twice due to compiler optimization and
it will break some logic potentially or need some data
dependency barrier...)
Thanks,
Gao Xiang
On Fri, Jan 24, 2020 at 01:27:50PM +0800, Gao Xiang wrote:
> On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
> > On Fri, Jan 24, 2020 at 01:04:25PM +0800, Gao Xiang wrote:
> > > > diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> > > > index 8964778aabefb..0129d14629881 100644
> > > > --- a/fs/ext4/dir.c
> > > > +++ b/fs/ext4/dir.c
> > > > @@ -671,9 +671,11 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
> > > > const char *str, const struct qstr *name)
> > > > {
> > > > struct qstr qstr = {.name = str, .len = len };
> > > > - struct inode *inode = dentry->d_parent->d_inode;
> > > > + const struct dentry *parent = READ_ONCE(dentry->d_parent);
> > >
> > > I'm not sure if we really need READ_ONCE d_parent here (p.s. d_parent
> > > won't be NULL anyway), and d_seq will guard all its validity. If I'm
> > > wrong, correct me kindly...
> > >
> > > Otherwise, it looks good to me...
> > > Reviewed-by: Gao Xiang <[email protected]>
> > >
> >
> > While d_parent can't be set to NULL, it can still be changed concurrently.
> > So we need READ_ONCE() to ensure that a consistent value is used.
>
> If I understand correctly, unlazy RCU->ref-walk will be guarded by
> seqlock, and for ref-walk we have d_lock (and even parent lock)
> in relative paths. So I prematurely think no race of renaming or
> unlinking evenually.
>
> I'm curious about that if experts could correct me about this.
>
Taking a seqlock for read doesn't prevent the protected data from changing.
It just allows the reader to detect that it changed.
So we still need to handle the dentry fields being changed concurrently here,
even if it will be detected by a read_seqcount_retry() later.
- Eric
On Fri, Jan 24, 2020 at 01:34:23PM +0800, Gao Xiang wrote:
> On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
>
> []
>
> > So we need READ_ONCE() to ensure that a consistent value is used.
>
> By the way, my understanding is all pointer could be accessed
> atomicly guaranteed by compiler. In my opinion, we generally
> use READ_ONCE() on pointers for other uses (such as, avoid
> accessing a variable twice due to compiler optimization and
> it will break some logic potentially or need some data
> dependency barrier...)
>
> Thanks,
> Gao Xiang
But that *is* why we need READ_ONCE() here. Without it, there's no guarantee
that the compiler doesn't load the variable twice. Please read:
https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
- Eric
On Thu, Jan 23, 2020 at 09:42:56PM -0800, Eric Biggers wrote:
> On Fri, Jan 24, 2020 at 01:34:23PM +0800, Gao Xiang wrote:
> > On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
> >
> > []
> >
> > > So we need READ_ONCE() to ensure that a consistent value is used.
> >
> > By the way, my understanding is all pointer could be accessed
> > atomicly guaranteed by compiler. In my opinion, we generally
> > use READ_ONCE() on pointers for other uses (such as, avoid
> > accessing a variable twice due to compiler optimization and
> > it will break some logic potentially or need some data
> > dependency barrier...)
> >
> > Thanks,
> > Gao Xiang
>
> But that *is* why we need READ_ONCE() here. Without it, there's no guarantee
> that the compiler doesn't load the variable twice. Please read:
> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
After scanning the patch, it seems the parent variable (dentry->d_parent)
only referenced once as below:
- struct inode *inode = dentry->d_parent->d_inode;
+ const struct dentry *parent = READ_ONCE(dentry->d_parent);
+ const struct inode *inode = READ_ONCE(parent->d_inode);
So I think it is enough as
const struct inode *inode = READ_ONCE(dentry->d_parent->d_inode);
to access parent inode once to avoid parent inode being accessed
for more time (and all pointers dereference should be in atomic
by compilers) as one reason on
if (!inode || !IS_CASEFOLDED(inode) || ...
or etc.
Thanks for your web reference, I will look into it. I think there
is no worry about dentry->d_parent here because of this only one
dereference on dentry->d_parent.
You could ignore my words anyway, just my little thought though.
Other part of the patch is ok.
Thanks,
Gao Xiang
>
> - Eric
On Fri, Jan 24, 2020 at 02:15:31PM +0800, Gao Xiang wrote:
> On Thu, Jan 23, 2020 at 09:42:56PM -0800, Eric Biggers wrote:
> > On Fri, Jan 24, 2020 at 01:34:23PM +0800, Gao Xiang wrote:
> > > On Thu, Jan 23, 2020 at 09:16:01PM -0800, Eric Biggers wrote:
> > >
> > > []
> > >
> > > > So we need READ_ONCE() to ensure that a consistent value is used.
> > >
> > > By the way, my understanding is all pointer could be accessed
> > > atomicly guaranteed by compiler. In my opinion, we generally
> > > use READ_ONCE() on pointers for other uses (such as, avoid
> > > accessing a variable twice due to compiler optimization and
> > > it will break some logic potentially or need some data
> > > dependency barrier...)
> > >
> > > Thanks,
> > > Gao Xiang
> >
> > But that *is* why we need READ_ONCE() here. Without it, there's no guarantee
> > that the compiler doesn't load the variable twice. Please read:
> > https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
>
> After scanning the patch, it seems the parent variable (dentry->d_parent)
> only referenced once as below:
>
> - struct inode *inode = dentry->d_parent->d_inode;
> + const struct dentry *parent = READ_ONCE(dentry->d_parent);
> + const struct inode *inode = READ_ONCE(parent->d_inode);
>
> So I think it is enough as
>
> const struct inode *inode = READ_ONCE(dentry->d_parent->d_inode);
>
> to access parent inode once to avoid parent inode being accessed
> for more time (and all pointers dereference should be in atomic
> by compilers) as one reason on
>
> if (!inode || !IS_CASEFOLDED(inode) || ...
>
> or etc.
>
> Thanks for your web reference, I will look into it. I think there
> is no worry about dentry->d_parent here because of this only one
> dereference on dentry->d_parent.
>
> You could ignore my words anyway, just my little thought though.
> Other part of the patch is ok.
>
While that does make it really unlikely to cause a real-world problem, it's
still undefined behavior to not properly annotate a data race, it would make the
code harder to understand as there would be no indication that there's a data
race, and it would confuse tools that try to automatically detect data races.
So let's keep the READ_ONCE() on d_parent.
- Eric
On Fri, Jan 24, 2020 at 10:12:54AM -0800, Eric Biggers wrote:
> > Thanks for your web reference, I will look into it. I think there
> > is no worry about dentry->d_parent here because of this only one
> > dereference on dentry->d_parent.
> >
> > You could ignore my words anyway, just my little thought though.
> > Other part of the patch is ok.
> >
>
> While that does make it really unlikely to cause a real-world problem, it's
> still undefined behavior to not properly annotate a data race, it would make the
> code harder to understand as there would be no indication that there's a data
> race, and it would confuse tools that try to automatically detect data races.
> So let's keep the READ_ONCE() on d_parent.
*nod*
Note that on everything except alpha it'll generate the same code, unless
the compiler screws up an generates multiple loads. On alpha it adds
a barrier and I really don't want to sit down and check if its absense
could lead to anything unpleasant.
On Thu, Jan 23, 2020 at 08:12:34PM -0800, Eric Biggers wrote:
> From: Eric Biggers <[email protected]>
>
> Since ->d_compare() and ->d_hash() can be called in RCU-walk mode,
> ->d_parent and ->d_inode can be concurrently modified, and in
> particular, ->d_inode may be changed to NULL. For ext4_d_hash() this
> resulted in a reproducible NULL dereference if a lookup is done in a
> directory being deleted, e.g. with:
>
> int main()
> {
> if (fork()) {
> for (;;) {
> mkdir("subdir", 0700);
> rmdir("subdir");
> }
> } else {
> for (;;)
> access("subdir/file", 0);
> }
> }
>
> ... or by running the 't_encrypted_d_revalidate' program from xfstests.
> Both repros work in any directory on a filesystem with the encoding
> feature, even if the directory doesn't actually have the casefold flag.
>
> I couldn't reproduce a crash in ext4_d_compare(), but it appears that a
> similar crash is possible there.
>
> Fix these bugs by reading ->d_parent and ->d_inode using READ_ONCE() and
> falling back to the case sensitive behavior if the inode is NULL.
>
> Reported-by: Al Viro <[email protected]>
> Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
> Cc: <[email protected]> # v5.2+
> Signed-off-by: Eric Biggers <[email protected]>
Thanks, applied.
- Ted