Hi Ext4 Developers,
I notice a potential data race on ext_inode_hdr(inode)->eh_depth,
ext_inode_hdr(inode)->eh_max between a create and unlink syscall.
Following is the trace:
[Setup]
mkdir("foo", 511) = 0;
open("foo", 65536, 511) = 3;
create("bar", 511) = 4;
symlink("foo", "sym_foo") = 0;
open("sym_foo", 65536, 511) = 5;
[Thread 1]
create("bar", 438);
__do_sys_creat
ksys_open
do_filp_open
path_openat
do_last
handle_truncate
do_truncate
notify_change
ext4_setattr
ext4_truncate
ext4_ext_truncate
ext4_ext _remove_space
[WRITE, 2 bytes] ext_inode_hdr(inode)->eh_depth = 0;
[WRITE, 2 bytes] ext_inode_hdr(inode)->eh_max
= cpu_to_le16(ext4_ext_space_root(inode, 0));
[Thread 2]
unlink("sym_foo");
__do_sys_unlink
do_unlinkat
iput
iput_final
evict
ext4_evict_inode
ext4_orphan_del
ext4_mark_iloc_dirty
ext4_do_update_inode
[READ, 4 bytes] raw_inode->i_block[block] = ei->i_data[block];
I could observe that the order between the READ and WRITE is not
deterministic and I was curious what will happen if the READ takes
place in the middle of the two WRITES? Does it cause any damages or
violations?
Best Regards,
Meng
On Thu, Nov 28, 2019 at 12:03:04PM -0500, Meng Xu wrote:
> I notice a potential data race on ext_inode_hdr(inode)->eh_depth,
> ext_inode_hdr(inode)->eh_max between a create and unlink syscall.
> Following is the trace:
>
> [Setup]
> mkdir("foo", 511) = 0;
> open("foo", 65536, 511) = 3;
> create("bar", 511) = 4;
> symlink("foo", "sym_foo") = 0;
> open("sym_foo", 65536, 511) = 5;
>
> [Thread 1]
> create("bar", 438);
>
> __do_sys_creat
> ksys_open
> do_filp_open
> path_openat
> do_last
> handle_truncate
> do_truncate
> notify_change
> ext4_setattr
> ext4_truncate
> ext4_ext_truncate
> ext4_ext _remove_space
> [WRITE, 2 bytes] ext_inode_hdr(inode)->eh_depth = 0;
> [WRITE, 2 bytes] ext_inode_hdr(inode)->eh_max
> = cpu_to_le16(ext4_ext_space_root(inode, 0));
>
> [Thread 2]
> unlink("sym_foo");
>
> __do_sys_unlink
> do_unlinkat
> iput
> iput_final
> evict
> ext4_evict_inode
> ext4_orphan_del
> ext4_mark_iloc_dirty
> ext4_do_update_inode
> [READ, 4 bytes] raw_inode->i_block[block] = ei->i_data[block];
>
>
> I could observe that the order between the READ and WRITE is not
> deterministic and I was curious what will happen if the READ takes
> place in the middle of the two WRITES? Does it cause any damages or
> violations?
This makes no sense. The inodes corresponding to "sym_foo" and "bar"
are completely differenth. So why would there be a data race?
How are you concluding that that there is, in fact, a data race?
- Ted
Hi Ted,
First, thank you for checking this out.
I hook every memory access in the kernel so I know that the [READ] and
[WRITE] are accessing to the exact same memory address. Plus, this
access cannot be from two malloc-ed inode because we replaced kfree
with a quarantine scheme like KASan so they two inodes will have to
have two different addresses. This is what confused me too.
In addition, just in case it may make a difference, there is an fsync
happening on another thread too. The three threads are like:
[Setup]
mkdir("foo", 511) = 0;
open("foo", 65536, 511) = 3;
creat("bar", 511) = 4;
symlink("foo", "sym_foo") = 0;
open("sym_foo", 65536, 511) = 5;
dup2(5, 195) = 195;
[Thread 0: fsync(195)]
[Thread 1: creat("bar", 438)]
[Thread 2: unlink("sym_foo")]
Or in orders observed at runtime:
Enter fsync(195);
Enter unlink("sym_foo");
Enter creat("bar", 438);
Exit unlink("sym_foo");
Exit creat("bar", 438);
Exit fsync(195);
I can provide more information (eg, other function calls on the trace
or memory access logs), if that would help in checking this case. And
I am sorry for wasting your time if this case does not make sense.
Best regards,
Meng
On Thu, Nov 28, 2019 at 6:19 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> On Thu, Nov 28, 2019 at 12:03:04PM -0500, Meng Xu wrote:
> > I notice a potential data race on ext_inode_hdr(inode)->eh_depth,
> > ext_inode_hdr(inode)->eh_max between a create and unlink syscall.
> > Following is the trace:
> >
> > [Setup]
> > mkdir("foo", 511) = 0;
> > open("foo", 65536, 511) = 3;
> > create("bar", 511) = 4;
> > symlink("foo", "sym_foo") = 0;
> > open("sym_foo", 65536, 511) = 5;
> >
> > [Thread 1]
> > create("bar", 438);
> >
> > __do_sys_creat
> > ksys_open
> > do_filp_open
> > path_openat
> > do_last
> > handle_truncate
> > do_truncate
> > notify_change
> > ext4_setattr
> > ext4_truncate
> > ext4_ext_truncate
> > ext4_ext _remove_space
> > [WRITE, 2 bytes] ext_inode_hdr(inode)->eh_depth = 0;
> > [WRITE, 2 bytes] ext_inode_hdr(inode)->eh_max
> > = cpu_to_le16(ext4_ext_space_root(inode, 0));
> >
> > [Thread 2]
> > unlink("sym_foo");
> >
> > __do_sys_unlink
> > do_unlinkat
> > iput
> > iput_final
> > evict
> > ext4_evict_inode
> > ext4_orphan_del
> > ext4_mark_iloc_dirty
> > ext4_do_update_inode
> > [READ, 4 bytes] raw_inode->i_block[block] = ei->i_data[block];
> >
> >
> > I could observe that the order between the READ and WRITE is not
> > deterministic and I was curious what will happen if the READ takes
> > place in the middle of the two WRITES? Does it cause any damages or
> > violations?
>
> This makes no sense. The inodes corresponding to "sym_foo" and "bar"
> are completely differenth. So why would there be a data race?
>
> How are you concluding that that there is, in fact, a data race?
>
> - Ted