2019-12-11 04:02:17

by Ming Lei

[permalink] [raw]
Subject: Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

On Tue, Dec 10, 2019 at 09:41:37PM -0500, Theodore Y. Ts'o wrote:
> On Tue, Dec 10, 2019 at 04:05:50PM +0800, Ming Lei wrote:
> > > > The path[2] is expected behaviour. Not sure path [1] is correct,
> > > > given
> > > > ext4_release_file() is supposed to be called when this inode is
> > > > released. That means the file is closed 4358 times during 1GB file
> > > > copying to usb storage.
> > > >
> > > > [1] insert requests when returning to user mode from syscall
> > > >
> > > > b'blk_mq_sched_request_inserted'
> > > > b'blk_mq_sched_request_inserted'
> > > > b'dd_insert_requests'
> > > > b'blk_mq_sched_insert_requests'
> > > > b'blk_mq_flush_plug_list'
> > > > b'blk_flush_plug_list'
> > > > b'io_schedule_prepare'
> > > > b'io_schedule'
> > > > b'rq_qos_wait'
> > > > b'wbt_wait'
> > > > b'__rq_qos_throttle'
> > > > b'blk_mq_make_request'
> > > > b'generic_make_request'
> > > > b'submit_bio'
> > > > b'ext4_io_submit'
> > > > b'ext4_writepages'
> > > > b'do_writepages'
> > > > b'__filemap_fdatawrite_range'
> > > > b'ext4_release_file'
> > > > b'__fput'
> > > > b'task_work_run'
> > > > b'exit_to_usermode_loop'
> > > > b'do_syscall_64'
> > > > b'entry_SYSCALL_64_after_hwframe'
> > > > 4358
>
> I'm guessing that your workload is repeatedly truncating a file (or
> calling open with O_TRUNC) and then writing data to it. When you do
> this, then when the file is closed, we assume that since you were
> replacing the previous contents of a file with new contents, that you
> would be unhappy if the file contents was replaced by a zero length
> file after a crash. That's because ten years, ago there were a *huge*
> number of crappy applications that would replace a file by reading it
> into memory, truncating it, and then write out the new contents of the
> file. This could be a high score file for a game, or a KDE or GNOME
> state file, etc.
>
> So if someone does open, truncate, write, close, we still immediately
> writing out the data on the close, assuming that the programmer really
> wanted open, truncate, write, fsync, close, but was too careless to
> actually do the right thing.
>
> Some workaround[1] like this is done by all of the major file systems,
> and was fallout the agreement from the "O_PONIES"[2] controversy.
> This was discussed and agreed to at the 2009 LSF/MM workshop. (See
> the "rename, fsync, and ponies" section.)
>
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
> [2] https://blahg.josefsipek.net/?p=364
> [3] https://lwn.net/Articles/327601/
>
> So if you're seeing a call to filemap_fdatawrite_range as the result
> of a fput, that's why.
>
> In any case, this behavior has been around for a decade, and it
> appears to be incidental to your performance difficulties with your
> USB thumbdrive and block-mq.

I didn't reproduce the issue in my test environment, and follows
Andrea's test commands[1]:

mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile
SECONDS=0
cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
umount /mnt/pendrive 2>&1 |tee -a $logfile

The 'cp' command supposes to open/close the file just once, however
ext4_release_file() & write pages is observed to run for 4358 times
when executing the above 'cp' test.


[1] https://marc.info/?l=linux-kernel&m=157486689806734&w=2


Thanks,
Ming


2019-12-11 16:09:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

On Wed, Dec 11, 2019 at 12:00:58PM +0800, Ming Lei wrote:
> I didn't reproduce the issue in my test environment, and follows
> Andrea's test commands[1]:
>
> mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile
> SECONDS=0
> cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
> umount /mnt/pendrive 2>&1 |tee -a $logfile
>
> The 'cp' command supposes to open/close the file just once, however
> ext4_release_file() & write pages is observed to run for 4358 times
> when executing the above 'cp' test.

Why are we sure the ext4_release_file() / _fput() is coming from the
cp command, as opposed to something else that might be running on the
system under test? _fput() is called by the kernel when the last
reference to a struct file is released. (Specifically, if you have a
fd which is dup'ed, it's only when the last fd corresponding to the
struct file is closed, and the struct file is about to be released,
does the file system's f_ops->release function get called.)

So the first question I'd ask is whether there is anything else going
on the system, and whether the writes are happening to the USB thumb
drive, or to some other storage device. And if there is something
else which is writing to the pendrive, maybe that's why no one else
has been able to reproduce the OP's complaint....

- Ted

2019-12-11 21:34:20

by Ming Lei

[permalink] [raw]
Subject: Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

On Wed, Dec 11, 2019 at 11:07:45AM -0500, Theodore Y. Ts'o wrote:
> On Wed, Dec 11, 2019 at 12:00:58PM +0800, Ming Lei wrote:
> > I didn't reproduce the issue in my test environment, and follows
> > Andrea's test commands[1]:
> >
> > mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile
> > SECONDS=0
> > cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
> > umount /mnt/pendrive 2>&1 |tee -a $logfile
> >
> > The 'cp' command supposes to open/close the file just once, however
> > ext4_release_file() & write pages is observed to run for 4358 times
> > when executing the above 'cp' test.
>
> Why are we sure the ext4_release_file() / _fput() is coming from the
> cp command, as opposed to something else that might be running on the
> system under test? _fput() is called by the kernel when the last

Please see the log:

https://lore.kernel.org/linux-scsi/[email protected]/2-log_ming.zip

Which is collected by:

#!/bin/sh
MAJ=$1
MIN=$2
MAJ=$(( $MAJ << 20 ))
DEV=$(( $MAJ | $MIN ))

/usr/share/bcc/tools/trace -t -C \
't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d", args->rwbs, args->sector, args->nr_sector' \
't:block:block_rq_insert (args->dev == '$DEV') "%s %d %d", args->rwbs, args->sector, args->nr_sector'

$MAJ:$MIN points to the USB storage disk.

From the above IO trace, there are two write paths, one is from cp,
another is from writeback wq.

The stackcount trace[1] is consistent with the IO trace log since it
only shows two IO paths, that is why I concluded that the write done via
ext4_release_file() is from 'cp'.

[1] https://lore.kernel.org/linux-scsi/[email protected]/2-log_ming_20191129_150609.zip

> reference to a struct file is released. (Specifically, if you have a
> fd which is dup'ed, it's only when the last fd corresponding to the
> struct file is closed, and the struct file is about to be released,
> does the file system's f_ops->release function get called.)
>
> So the first question I'd ask is whether there is anything else going
> on the system, and whether the writes are happening to the USB thumb
> drive, or to some other storage device. And if there is something
> else which is writing to the pendrive, maybe that's why no one else
> has been able to reproduce the OP's complaint....

OK, we can ask Andrea to confirm that via the following trace, which
will add pid/comm info in the stack trace:

/usr/share/bcc/tools/stackcount blk_mq_sched_request_inserted

Andrew, could you collect the above log again when running new/bad
kernel for confirming if the write done by ext4_release_file() is from
the 'cp' process?

Thanks,
Ming

2019-12-12 07:34:58

by Andrea Vai

[permalink] [raw]
Subject: Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

Il giorno gio, 12/12/2019 alle 05.33 +0800, Ming Lei ha scritto:
> On Wed, Dec 11, 2019 at 11:07:45AM -0500, Theodore Y. Ts'o wrote:
> > On Wed, Dec 11, 2019 at 12:00:58PM +0800, Ming Lei wrote:
> > > I didn't reproduce the issue in my test environment, and follows
> > > Andrea's test commands[1]:
> > >
> > > mount UUID=$uuid /mnt/pendrive 2>&1 |tee -a $logfile
> > > SECONDS=0
> > > cp $testfile /mnt/pendrive 2>&1 |tee -a $logfile
> > > umount /mnt/pendrive 2>&1 |tee -a $logfile
> > >
> > > The 'cp' command supposes to open/close the file just once,
> however
> > > ext4_release_file() & write pages is observed to run for 4358
> times
> > > when executing the above 'cp' test.
> >
> > Why are we sure the ext4_release_file() / _fput() is coming from
> the
> > cp command, as opposed to something else that might be running on
> the
> > system under test? _fput() is called by the kernel when the last
>
> Please see the log:
>
> https://lore.kernel.org/linux-scsi/[email protected]/2-log_ming.zip
>
> Which is collected by:
>
> #!/bin/sh
> MAJ=$1
> MIN=$2
> MAJ=$(( $MAJ << 20 ))
> DEV=$(( $MAJ | $MIN ))
>
> /usr/share/bcc/tools/trace -t -C \
> 't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d", args-
> >rwbs, args->sector, args->nr_sector' \
> 't:block:block_rq_insert (args->dev == '$DEV') "%s %d %d", args-
> >rwbs, args->sector, args->nr_sector'
>
> $MAJ:$MIN points to the USB storage disk.
>
> From the above IO trace, there are two write paths, one is from cp,
> another is from writeback wq.
>
> The stackcount trace[1] is consistent with the IO trace log since it
> only shows two IO paths, that is why I concluded that the write done
> via
> ext4_release_file() is from 'cp'.
>
> [1]
> https://lore.kernel.org/linux-scsi/[email protected]/2-log_ming_20191129_150609.zip
>
> > reference to a struct file is released. (Specifically, if you
> have a
> > fd which is dup'ed, it's only when the last fd corresponding to
> the
> > struct file is closed, and the struct file is about to be
> released,
> > does the file system's f_ops->release function get called.)
> >
> > So the first question I'd ask is whether there is anything else
> going
> > on the system, and whether the writes are happening to the USB
> thumb
> > drive, or to some other storage device. And if there is something
> > else which is writing to the pendrive, maybe that's why no one
> else
> > has been able to reproduce the OP's complaint....
>
> OK, we can ask Andrea to confirm that via the following trace, which
> will add pid/comm info in the stack trace:
>
> /usr/share/bcc/tools/stackcount blk_mq_sched_request_inserted
>
> Andrea, could you collect the above log again when running new/bad
> kernel for confirming if the write done by ext4_release_file() is
> from
> the 'cp' process?

Yes, I will try to do it as soon as possible and let you know.
I will also try xfs or btrfs, as you suggested in another message.

Thanks, and bye
Andrea