2019-09-20 17:41:21

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH v2] mm: implement write-behind policy for sequential file writes

Traditional writeback tries to accumulate as much dirty data as possible.
This is worth strategy for extremely short-living files and for batching
writes for saving battery power. But for workloads where disk latency is
important this policy generates periodic disk load spikes which increases
latency for concurrent operations.

Also dirty pages in file cache cannot be reclaimed and reused immediately.
This way massive I/O like file copying affects memory allocation latency.

Present writeback engine allows to tune only dirty data size or expiration
time. Such tuning cannot eliminate spikes - this just lowers and multiplies
them. Other option is switching into sync mode which flushes written data
right after each write, obviously this have significant performance impact.
Such tuning is system-wide and affects memory-mapped and randomly written
files, flusher threads handle them much better.

This patch implements write-behind policy which tracks sequential writes
and starts background writeback when file have enough dirty pages.

Global switch in sysctl vm.dirty_write_behind:
=0: disabled, default
=1: enabled for strictly sequential writes (append, copying)
=2: enabled for all sequential writes

The only parameter is window size: maximum amount of dirty pages behind
current position and maximum amount of pages in background writeback.

Setup is per-disk in sysfs in file /sys/block/$DISK/bdi/write_behind_kb.
Default: 16MiB, '0' disables write-behind for this disk.

When amount of unwritten pages exceeds window size write-behind starts
background writeback for max(excess, max_sectors_kb) and then waits for
the same amount of background writeback initiated at previously.

|<-wait-this->| |<-send-this->|<---pending-write-behind--->|
|<--async-write-behind--->|<--------previous-data------>|<-new-data->|
current head-^ new head-^ file position-^

Remaining tail pages are flushed at closing file if async write-behind was
started or this is new file and it is at least max_sectors_kb long.

Overall behavior depending on total data size:
< max_sectors_kb - no writes
> max_sectors_kb - write new files in background after close
> write_behind_kb - streaming write, write tail at close

Special cases:

* files with POSIX_FADV_RANDOM, O_DIRECT, O_[D]SYNC are ignored

* writing cursor for O_APPEND is aligned to covers previous small appends
Append might happen via multiple files or via new file each time.

* mode vm.dirty_write_behind=1 ignores non-append writes
This reacts only to completely sequential writes like copying files,
writing logs with O_APPEND or rewriting files after O_TRUNC.

Note: ext4 feature "auto_da_alloc" also writes cache at closing file
after truncating it to 0 and after renaming one file over other.

Changes since v1 (2017-10-02):
* rework window management:
* change default window 1MiB -> 16MiB
* change default request 256KiB -> max_sectors_kb
* drop always-async behavior for O_NONBLOCK
* drop handling POSIX_FADV_NOREUSE (should be in separate patch)
* ignore writes with O_DIRECT, O_SYNC, O_DSYNC
* align head position for O_APPEND
* add strictly sequential mode
* write tail pages for new files
* make void, keep errors at mapping

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Link: https://lore.kernel.org/patchwork/patch/836149/ (v1)
---
Documentation/ABI/testing/sysfs-class-bdi | 5 +
Documentation/admin-guide/sysctl/vm.rst | 15 +++
fs/file_table.c | 2
include/linux/backing-dev-defs.h | 1
include/linux/fs.h | 8 +-
include/linux/mm.h | 1
kernel/sysctl.c | 9 ++
mm/backing-dev.c | 43 +++++----
mm/filemap.c | 136 +++++++++++++++++++++++++++++
9 files changed, 199 insertions(+), 21 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-bdi b/Documentation/ABI/testing/sysfs-class-bdi
index d773d5697cf5..f16be656cbd5 100644
--- a/Documentation/ABI/testing/sysfs-class-bdi
+++ b/Documentation/ABI/testing/sysfs-class-bdi
@@ -30,6 +30,11 @@ read_ahead_kb (read-write)

Size of the read-ahead window in kilobytes

+write_behind_kb (read-write)
+
+ Size of the write-behind window in kilobytes.
+ 0 -> disable write-behind for this disk.
+
min_ratio (read-write)

Under normal circumstances each device is given a part of the
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009ca..a275fa42579f 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -35,6 +35,7 @@ Currently, these files are in /proc/sys/vm:
- dirty_ratio
- dirtytime_expire_seconds
- dirty_writeback_centisecs
+- dirty_write_behind
- drop_caches
- extfrag_threshold
- hugetlb_shm_group
@@ -210,6 +211,20 @@ out to disk. This tunable expresses the interval between those wakeups, in
Setting this to zero disables periodic writeback altogether.


+dirty_write_behind
+==================
+
+This controls write-behind writeback policy - automatic background writeback
+for sequentially written data behind current writing position.
+
+=0: disabled, default
+=1: enabled for strictly sequential writes (append, copying)
+=2: enabled for all sequential writes
+
+Write-behind window size configured in sysfs for each block device:
+/sys/block/$DEV/bdi/write_behind_kb
+
+
drop_caches
===========

diff --git a/fs/file_table.c b/fs/file_table.c
index b07b53f24ff5..bb40b45f27d3 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -276,6 +276,8 @@ static void __fput(struct file *file)
if (file->f_op->fasync)
file->f_op->fasync(-1, file, 0);
}
+ if ((mode & FMODE_WRITE) && vm_dirty_write_behind)
+ generic_write_behind_close(file);
if (file->f_op->release)
file->f_op->release(inode, file);
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 4fc87dee005a..4f1abd1d64a7 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -191,6 +191,7 @@ struct backing_dev_info {
struct list_head bdi_list;
unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
unsigned long io_pages; /* max allowed IO size */
+ unsigned long write_behind_pages; /* write-behind window in pages */
congested_fn *congested_fn; /* Function pointer if device is md/dm */
void *congested_data; /* Pointer to aux data for congested func */

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 997a530ff4e9..42cad18aaec7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -942,6 +942,7 @@ struct file {
struct fown_struct f_owner;
const struct cred *f_cred;
struct file_ra_state f_ra;
+ pgoff_t f_write_behind;

u64 f_version;
#ifdef CONFIG_SECURITY
@@ -2788,6 +2789,10 @@ extern int vfs_fsync(struct file *file, int datasync);
extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
unsigned int flags);

+extern int vm_dirty_write_behind;
+extern void generic_write_behind(struct kiocb *iocb, ssize_t count);
+extern void generic_write_behind_close(struct file *file);
+
/*
* Sync the bytes written if this was a synchronous write. Expect ki_pos
* to already be updated for the write, and will return either the amount
@@ -2801,7 +2806,8 @@ static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
(iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
if (ret)
return ret;
- }
+ } else if (vm_dirty_write_behind)
+ generic_write_behind(iocb, count);

return count;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..1b47a6e06ef2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2443,6 +2443,7 @@ void task_dirty_inc(struct task_struct *tsk);

/* readahead.c */
#define VM_READAHEAD_PAGES (SZ_128K / PAGE_SIZE)
+#define VM_WRITE_BEHIND_PAGES (SZ_16M / PAGE_SIZE)

int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 078950d9605b..74b6b66ee8da 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1404,6 +1404,15 @@ static struct ctl_table vm_table[] = {
.proc_handler = dirtytime_interval_handler,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "dirty_write_behind",
+ .data = &vm_dirty_write_behind,
+ .maxlen = sizeof(vm_dirty_write_behind),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = &two,
+ },
{
.procname = "swappiness",
.data = &vm_swappiness,
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d9daa3e422d0..7fee95c02862 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -131,25 +131,6 @@ static inline void bdi_debug_unregister(struct backing_dev_info *bdi)
}
#endif

-static ssize_t read_ahead_kb_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct backing_dev_info *bdi = dev_get_drvdata(dev);
- unsigned long read_ahead_kb;
- ssize_t ret;
-
- ret = kstrtoul(buf, 10, &read_ahead_kb);
- if (ret < 0)
- return ret;
-
- bdi->ra_pages = read_ahead_kb >> (PAGE_SHIFT - 10);
-
- return count;
-}
-
-#define K(pages) ((pages) << (PAGE_SHIFT - 10))
-
#define BDI_SHOW(name, expr) \
static ssize_t name##_show(struct device *dev, \
struct device_attribute *attr, char *page) \
@@ -160,7 +141,26 @@ static ssize_t name##_show(struct device *dev, \
} \
static DEVICE_ATTR_RW(name);

-BDI_SHOW(read_ahead_kb, K(bdi->ra_pages))
+#define BDI_ATTR_KB(name, field) \
+static ssize_t name##_store(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ struct backing_dev_info *bdi = dev_get_drvdata(dev); \
+ unsigned long kb; \
+ ssize_t ret; \
+ \
+ ret = kstrtoul(buf, 10, &kb); \
+ if (ret < 0) \
+ return ret; \
+ \
+ bdi->field = kb >> (PAGE_SHIFT - 10); \
+ return count; \
+} \
+BDI_SHOW(name, ((bdi->field) << (PAGE_SHIFT - 10)))
+
+BDI_ATTR_KB(read_ahead_kb, ra_pages)
+BDI_ATTR_KB(write_behind_kb, write_behind_pages)

static ssize_t min_ratio_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
@@ -213,6 +213,7 @@ static DEVICE_ATTR_RO(stable_pages_required);

static struct attribute *bdi_dev_attrs[] = {
&dev_attr_read_ahead_kb.attr,
+ &dev_attr_write_behind_kb.attr,
&dev_attr_min_ratio.attr,
&dev_attr_max_ratio.attr,
&dev_attr_stable_pages_required.attr,
@@ -859,6 +860,8 @@ static int bdi_init(struct backing_dev_info *bdi)
INIT_LIST_HEAD(&bdi->wb_list);
init_waitqueue_head(&bdi->wb_waitq);

+ bdi->write_behind_pages = VM_WRITE_BEHIND_PAGES;
+
ret = cgwb_bdi_init(bdi);

return ret;
diff --git a/mm/filemap.c b/mm/filemap.c
index d0cf700bf201..5398b1bea1bf 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3525,3 +3525,139 @@ int try_to_release_page(struct page *page, gfp_t gfp_mask)
}

EXPORT_SYMBOL(try_to_release_page);
+
+int vm_dirty_write_behind __read_mostly;
+EXPORT_SYMBOL(vm_dirty_write_behind);
+
+/**
+ * generic_write_behind() - writeback dirty pages behind current position.
+ *
+ * This function tracks writing position. If file has enough sequentially
+ * written data it starts background writeback and then waits for previous
+ * writeback initiated some iterations ago.
+ *
+ * Write-behind maintains per-file head cursor in file->f_write_behind and
+ * two windows around: background writeback before and pending data after.
+ *
+ * |<-wait-this->| |<-send-this->|<---pending-write-behind--->|
+ * |<--async-write-behind--->|<--------previous-data------>|<-new-data->|
+ * current head-^ new head-^ file position-^
+ */
+void generic_write_behind(struct kiocb *iocb, ssize_t count)
+{
+ struct file *file = iocb->ki_filp;
+ struct address_space *mapping = file->f_mapping;
+ struct inode *inode = mapping->host;
+ struct backing_dev_info *bdi = inode_to_bdi(inode);
+ unsigned long window = READ_ONCE(bdi->write_behind_pages);
+ pgoff_t head = file->f_write_behind;
+ pgoff_t begin = (iocb->ki_pos - count) >> PAGE_SHIFT;
+ pgoff_t end = iocb->ki_pos >> PAGE_SHIFT;
+
+ /* Skip if write is random, direct, sync or disabled for disk */
+ if ((file->f_mode & FMODE_RANDOM) || !window ||
+ (iocb->ki_flags & (IOCB_DIRECT | IOCB_DSYNC)))
+ return;
+
+ /* Skip non-sequential writes in strictly sequential mode. */
+ if (vm_dirty_write_behind < 2 &&
+ iocb->ki_pos != i_size_read(inode) &&
+ !(iocb->ki_flags & IOCB_APPEND))
+ return;
+
+ /* Contigious write and still within window. */
+ if (end - head < window)
+ return;
+
+ spin_lock(&file->f_lock);
+
+ /* Re-read under lock. */
+ head = file->f_write_behind;
+
+ /* Non-contiguous, move head position. */
+ if (head > end || begin - head > window) {
+ /*
+ * Append might happen though multiple files or via new file
+ * every time. Align head cursor to cover previous appends.
+ */
+ if (iocb->ki_flags & IOCB_APPEND)
+ begin = roundup(begin - min(begin, window - 1),
+ bdi->io_pages);
+
+ file->f_write_behind = head = begin;
+ }
+
+ /* Still not big enough. */
+ if (end - head < window) {
+ spin_unlock(&file->f_lock);
+ return;
+ }
+
+ /* Write excess and try at least max_sectors_kb if possible */
+ end = head + max(end - head - window, min(end - head, bdi->io_pages));
+
+ /* Set head for next iteration, everything behind will be written. */
+ file->f_write_behind = end;
+
+ spin_unlock(&file->f_lock);
+
+ /* Start background writeback. */
+ __filemap_fdatawrite_range(mapping,
+ (loff_t)head << PAGE_SHIFT,
+ ((loff_t)end << PAGE_SHIFT) - 1,
+ WB_SYNC_NONE);
+
+ if (head < window)
+ return;
+
+ /* Wait for pages falling behind writeback window. */
+ head -= window;
+ end -= window;
+ __filemap_fdatawait_range(mapping,
+ (loff_t)head << PAGE_SHIFT,
+ ((loff_t)end << PAGE_SHIFT) - 1);
+}
+EXPORT_SYMBOL(generic_write_behind);
+
+/**
+ * generic_write_behind_close() - write tail pages
+ *
+ * This function finishes write-behind steam and writes remaining tail pages
+ * in background. It start write if write-behind stream was started before
+ * (i.e. total written size is bigger than write-behind window) or if this is
+ * new file and it is bigger than max_sectors_kb.
+ */
+void generic_write_behind_close(struct file *file)
+{
+ struct address_space *mapping = file->f_mapping;
+ struct inode *inode = mapping->host;
+ struct backing_dev_info *bdi = inode_to_bdi(inode);
+ unsigned long window = READ_ONCE(bdi->write_behind_pages);
+ pgoff_t head = file->f_write_behind;
+ pgoff_t end = (file->f_pos + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+ if ((file->f_mode & FMODE_RANDOM) ||
+ (file->f_flags & (O_APPEND | O_DSYNC | O_DIRECT)) ||
+ !bdi_cap_writeback_dirty(bdi) || !window)
+ return;
+
+ /* Skip non-sequential writes in strictly sequential mode. */
+ if (vm_dirty_write_behind < 2 &&
+ file->f_pos != i_size_read(inode))
+ return;
+
+ /* Non-contiguous */
+ if (head > end || end - head > window)
+ return;
+
+ /* Start stream only for new files bigger than max_sectors_kb. */
+ if (end - head < (window - min(window, bdi->io_pages)) &&
+ (!(file->f_mode & FMODE_CREATED) || end - head < bdi->io_pages))
+ return;
+
+ /* Write tail pages in background. */
+ __filemap_fdatawrite_range(mapping,
+ (loff_t)head << PAGE_SHIFT,
+ file->f_pos - 1,
+ WB_SYNC_NONE);
+}


2019-09-20 17:44:38

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

Script for trivial demo in attachment

$ bash test_writebehind.sh
SIZE
3,2G dummy
vm.dirty_write_behind = 0
COPY

real 0m3.629s
user 0m0.016s
sys 0m3.613s
Dirty: 3254552 kB
SYNC

real 0m31.953s
user 0m0.002s
sys 0m0.000s
vm.dirty_write_behind = 1
COPY

real 0m32.738s
user 0m0.008s
sys 0m4.047s
Dirty: 2900 kB
SYNC

real 0m0.427s
user 0m0.000s
sys 0m0.004s
vm.dirty_write_behind = 2
COPY

real 0m32.168s
user 0m0.000s
sys 0m4.066s
Dirty: 3088 kB
SYNC

real 0m0.421s
user 0m0.004s
sys 0m0.001s


With vm.dirty_write_behind 1 or 2 files are written even faster and
during copying amount of dirty memory always stays around at 16MiB.


On 20/09/2019 10.35, Konstantin Khlebnikov wrote:
> Traditional writeback tries to accumulate as much dirty data as possible.
> This is worth strategy for extremely short-living files and for batching
> writes for saving battery power. But for workloads where disk latency is
> important this policy generates periodic disk load spikes which increases
> latency for concurrent operations.
>
> Also dirty pages in file cache cannot be reclaimed and reused immediately.
> This way massive I/O like file copying affects memory allocation latency.
>
> Present writeback engine allows to tune only dirty data size or expiration
> time. Such tuning cannot eliminate spikes - this just lowers and multiplies
> them. Other option is switching into sync mode which flushes written data
> right after each write, obviously this have significant performance impact.
> Such tuning is system-wide and affects memory-mapped and randomly written
> files, flusher threads handle them much better.
>
> This patch implements write-behind policy which tracks sequential writes
> and starts background writeback when file have enough dirty pages.
>
> Global switch in sysctl vm.dirty_write_behind:
> =0: disabled, default
> =1: enabled for strictly sequential writes (append, copying)
> =2: enabled for all sequential writes
>
> The only parameter is window size: maximum amount of dirty pages behind
> current position and maximum amount of pages in background writeback.
>
> Setup is per-disk in sysfs in file /sys/block/$DISK/bdi/write_behind_kb.
> Default: 16MiB, '0' disables write-behind for this disk.
>
> When amount of unwritten pages exceeds window size write-behind starts
> background writeback for max(excess, max_sectors_kb) and then waits for
> the same amount of background writeback initiated at previously.
>
> |<-wait-this->| |<-send-this->|<---pending-write-behind--->|
> |<--async-write-behind--->|<--------previous-data------>|<-new-data->|
> current head-^ new head-^ file position-^
>
> Remaining tail pages are flushed at closing file if async write-behind was
> started or this is new file and it is at least max_sectors_kb long.
>
> Overall behavior depending on total data size:
> < max_sectors_kb - no writes
>> max_sectors_kb - write new files in background after close
>> write_behind_kb - streaming write, write tail at close
>
> Special cases:
>
> * files with POSIX_FADV_RANDOM, O_DIRECT, O_[D]SYNC are ignored
>
> * writing cursor for O_APPEND is aligned to covers previous small appends
> Append might happen via multiple files or via new file each time.
>
> * mode vm.dirty_write_behind=1 ignores non-append writes
> This reacts only to completely sequential writes like copying files,
> writing logs with O_APPEND or rewriting files after O_TRUNC.
>
> Note: ext4 feature "auto_da_alloc" also writes cache at closing file
> after truncating it to 0 and after renaming one file over other.
>
> Changes since v1 (2017-10-02):
> * rework window management:
> * change default window 1MiB -> 16MiB
> * change default request 256KiB -> max_sectors_kb
> * drop always-async behavior for O_NONBLOCK
> * drop handling POSIX_FADV_NOREUSE (should be in separate patch)
> * ignore writes with O_DIRECT, O_SYNC, O_DSYNC
> * align head position for O_APPEND
> * add strictly sequential mode
> * write tail pages for new files
> * make void, keep errors at mapping
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> Link: https://lore.kernel.org/patchwork/patch/836149/ (v1)
> ---


Attachments:
test_writebehind.sh (428.00 B)

2019-09-22 19:12:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Fri, Sep 20, 2019 at 4:05 PM Linus Torvalds
<[email protected]> wrote:
>
>
> Now, I hear you say "those are so small these days that it doesn't
> matter". And maybe you're right. But particularly for slow media,
> triggering good streaming write behavior has been a problem in the
> past.

Which reminds me: the writebehind trigger should likely be tied to the
estimate of the bdi write speed.

We _do_ have that avg_write_bandwidth thing in the bdi_writeback
structure, it sounds like a potentially good idea to try to use that
to estimate when to do writebehind.

No?

Linus

2019-09-23 13:25:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Fri, Sep 20, 2019 at 12:35 AM Konstantin Khlebnikov
<[email protected]> wrote:
>
> This patch implements write-behind policy which tracks sequential writes
> and starts background writeback when file have enough dirty pages.

Apart from a spelling error ("contigious"), my only reaction is that
I've wanted this for the multi-file writes, not just for single big
files.

Yes, single big files may be a simpler and perhaps the "10% effort for
90% of the gain", and thus the right thing to do, but I do wonder if
you've looked at simply extending it to cover multiple files when
people copy a whole directory (or unpack a tar-file, or similar).

Now, I hear you say "those are so small these days that it doesn't
matter". And maybe you're right. But partiocularly for slow media,
triggering good streaming write behavior has been a problem in the
past.

So I'm wondering whether the "writebehind" state should perhaps be
considered be a process state, rather than "struct file" state, and
also start triggering for writing smaller files.

Maybe this was already discussed and people decided that the big-file
case was so much easier that it wasn't worth worrying about
writebehind for multiple files.

Linus

2019-09-23 15:28:03

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

Hi Konstantin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.3 next-20190919]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Konstantin-Khlebnikov/mm-implement-write-behind-policy-for-sequential-file-writes/20190920-155606
reproduce: make htmldocs
:::::: branch date: 8 hours ago
:::::: commit date: 8 hours ago

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All warnings (new ones prefixed by >>):

drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:1: warning: no structured comments found
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1: warning: no structured comments found
drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c:1: warning: 'pp_dpm_sclk pp_dpm_mclk pp_dpm_pcie' not found
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:132: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source @atomic_obj
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:238: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source gpu_info FW provided soc bounding box struct or 0 if not
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'atomic_obj' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'backlight_link' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'backlight_caps' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'freesync_module' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'fw_dmcu' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'dmcu_fw_version' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:243: warning: Function parameter or member 'soc_bounding_box' not described in 'amdgpu_display_manager'
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1: warning: 'register_hpd_handlers' not found
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1: warning: 'dm_crtc_high_irq' not found
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1: warning: 'dm_pflip_high_irq' not found
include/linux/spi/spi.h:190: warning: Function parameter or member 'driver_override' not described in 'spi_device'
drivers/gpio/gpiolib-of.c:92: warning: Excess function parameter 'dev' description in 'of_gpio_need_valid_mask'
include/linux/i2c.h:337: warning: Function parameter or member 'init_irq' not described in 'i2c_client'
include/linux/regulator/machine.h:196: warning: Function parameter or member 'max_uV_step' not described in 'regulation_constraints'
include/linux/regulator/driver.h:223: warning: Function parameter or member 'resume' not described in 'regulator_ops'
fs/fs-writeback.c:913: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'
fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'
fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end'
fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'
drivers/usb/typec/bus.c:1: warning: 'typec_altmode_unregister_driver' not found
drivers/usb/typec/bus.c:1: warning: 'typec_altmode_register_driver' not found
drivers/usb/typec/class.c:1: warning: 'typec_altmode_register_notifier' not found
drivers/usb/typec/class.c:1: warning: 'typec_altmode_unregister_notifier' not found
kernel/dma/coherent.c:1: warning: no structured comments found
include/linux/input/sparse-keymap.h:43: warning: Function parameter or member 'sw' not described in 'key_entry'
include/linux/skbuff.h:888: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'list' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
include/linux/skbuff.h:888: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
include/net/sock.h:233: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
include/net/sock.h:233: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
include/net/sock.h:515: warning: Function parameter or member 'sk_rx_skb_cache' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
include/net/sock.h:515: warning: Function parameter or member 'sk_bpf_storage' not described in 'sock'
include/net/sock.h:2439: warning: Function parameter or member 'tcp_rx_skb_cache_key' not described in 'DECLARE_STATIC_KEY_FALSE'
include/net/sock.h:2439: warning: Excess function parameter 'sk' description in 'DECLARE_STATIC_KEY_FALSE'
include/net/sock.h:2439: warning: Excess function parameter 'skb' description in 'DECLARE_STATIC_KEY_FALSE'
include/linux/netdevice.h:2053: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'xps_cpus_map' not described in 'net_device'
include/linux/netdevice.h:2053: warning: Function parameter or member 'xps_rxqs_map' not described in 'net_device'
include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
drivers/net/phy/phylink.c:595: warning: Function parameter or member 'config' not described in 'phylink_create'
drivers/net/phy/phylink.c:595: warning: Excess function parameter 'ndev' description in 'phylink_create'
lib/genalloc.c:1: warning: 'gen_pool_add_virt' not found
lib/genalloc.c:1: warning: 'gen_pool_alloc' not found
lib/genalloc.c:1: warning: 'gen_pool_free' not found
lib/genalloc.c:1: warning: 'gen_pool_alloc_algo' not found
include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
include/linux/rculist.h:374: warning: Excess function parameter 'cond' description in 'list_for_each_entry_rcu'
include/linux/rculist.h:651: warning: Excess function parameter 'cond' description in 'hlist_for_each_entry_rcu'
mm/util.c:1: warning: 'get_user_pages_fast' not found
mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'
>> mm/filemap.c:3551: warning: Function parameter or member 'iocb' not described in 'generic_write_behind'
>> mm/filemap.c:3551: warning: Function parameter or member 'count' not described in 'generic_write_behind'
include/drm/drm_modeset_helper_vtables.h:1053: warning: Function parameter or member 'prepare_writeback_job' not described in 'drm_connector_helper_funcs'
include/drm/drm_modeset_helper_vtables.h:1053: warning: Function parameter or member 'cleanup_writeback_job' not described in 'drm_connector_helper_funcs'
include/drm/drm_atomic_state_helper.h:1: warning: no structured comments found
include/drm/drm_gem_shmem_helper.h:87: warning: Function parameter or member 'madv' not described in 'drm_gem_shmem_object'
include/drm/drm_gem_shmem_helper.h:87: warning: Function parameter or member 'madv_list' not described in 'drm_gem_shmem_object'
drivers/gpu/drm/i915/display/intel_dpll_mgr.h:158: warning: Enum value 'DPLL_ID_TGL_MGPLL5' not described in enum 'intel_dpll_id'
drivers/gpu/drm/i915/display/intel_dpll_mgr.h:158: warning: Enum value 'DPLL_ID_TGL_MGPLL6' not described in enum 'intel_dpll_id'
drivers/gpu/drm/i915/display/intel_dpll_mgr.h:158: warning: Excess enum value 'DPLL_ID_TGL_TCPLL6' description in 'intel_dpll_id'
drivers/gpu/drm/i915/display/intel_dpll_mgr.h:158: warning: Excess enum value 'DPLL_ID_TGL_TCPLL5' description in 'intel_dpll_id'
drivers/gpu/drm/i915/display/intel_dpll_mgr.h:342: warning: Function parameter or member 'wakeref' not described in 'intel_shared_dpll'
Error: Cannot open file drivers/gpu/drm/i915/i915_gem_batch_pool.c
Error: Cannot open file drivers/gpu/drm/i915/i915_gem_batch_pool.c
Error: Cannot open file drivers/gpu/drm/i915/i915_gem_batch_pool.c
drivers/gpu/drm/i915/i915_drv.h:1129: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source The OA context specific information.
drivers/gpu/drm/i915/i915_drv.h:1143: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source State of the OA buffer.
drivers/gpu/drm/i915/i915_drv.h:1154: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Locks reads and writes to all head/tail state
drivers/gpu/drm/i915/i915_drv.h:1176: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source One 'aging' tail pointer and one 'aged' tail pointer ready to
drivers/gpu/drm/i915/i915_drv.h:1188: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Index for the aged tail ready to read() data up to.
drivers/gpu/drm/i915/i915_drv.h:1193: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source A monotonic timestamp for when the current aging tail pointer
drivers/gpu/drm/i915/i915_drv.h:1199: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Although we can always read back the head pointer register,
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'pinned_ctx' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'specific_ctx_id' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'specific_ctx_id_mask' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'poll_check_timer' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'poll_wq' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'pollin' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'periodic' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'period_exponent' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1207: warning: Function parameter or member 'oa_buffer' not described in 'i915_perf_stream'
drivers/gpu/drm/i915/i915_drv.h:1129: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source The OA context specific information.
drivers/gpu/drm/i915/i915_drv.h:1143: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source State of the OA buffer.
drivers/gpu/drm/i915/i915_drv.h:1154: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Locks reads and writes to all head/tail state
drivers/gpu/drm/i915/i915_drv.h:1176: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source One 'aging' tail pointer and one 'aged' tail pointer ready to
drivers/gpu/drm/i915/i915_drv.h:1188: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Index for the aged tail ready to read() data up to.
drivers/gpu/drm/i915/i915_drv.h:1193: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source A monotonic timestamp for when the current aging tail pointer
drivers/gpu/drm/i915/i915_drv.h:1199: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Although we can always read back the head pointer register,
drivers/gpu/drm/i915/i915_drv.h:1129: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source The OA context specific information.
drivers/gpu/drm/i915/i915_drv.h:1143: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source State of the OA buffer.
drivers/gpu/drm/i915/i915_drv.h:1154: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Locks reads and writes to all head/tail state
drivers/gpu/drm/i915/i915_drv.h:1176: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source One 'aging' tail pointer and one 'aged' tail pointer ready to
drivers/gpu/drm/i915/i915_drv.h:1188: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Index for the aged tail ready to read() data up to.
drivers/gpu/drm/i915/i915_drv.h:1193: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source A monotonic timestamp for when the current aging tail pointer
drivers/gpu/drm/i915/i915_drv.h:1199: warning: Incorrect use of kernel-doc format: Documentation Makefile include scripts source Although we can always read back the head pointer register,
drivers/gpu/drm/mcde/mcde_drv.c:1: warning: 'ST-Ericsson MCDE DRM Driver' not found
include/net/cfg80211.h:1185: warning: Function parameter or member 'txpwr' not described in 'station_parameters'
include/net/mac80211.h:4056: warning: Function parameter or member 'sta_set_txpwr' not described in 'ieee80211_ops'
include/net/mac80211.h:2018: warning: Function parameter or member 'txpwr' not described in 'ieee80211_sta'
Documentation/admin-guide/perf/imx-ddr.rst:21: WARNING: Unexpected indentation.
Documentation/admin-guide/perf/imx-ddr.rst:34: WARNING: Unexpected indentation.
Documentation/admin-guide/perf/imx-ddr.rst:40: WARNING: Unexpected indentation.
Documentation/admin-guide/perf/imx-ddr.rst:45: WARNING: Unexpected indentation.
Documentation/admin-guide/perf/imx-ddr.rst:52: WARNING: Unexpected indentation.
Documentation/hwmon/inspur-ipsps1.rst:2: WARNING: Title underline too short.

# https://github.com/0day-ci/linux/commit/e0e7df8d5b71bf59ad93fe75e662c929b580d805
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout e0e7df8d5b71bf59ad93fe75e662c929b580d805
vim +3551 mm/filemap.c

e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3534
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3535 /**
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3536 * generic_write_behind() - writeback dirty pages behind current position.
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3537 *
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3538 * This function tracks writing position. If file has enough sequentially
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3539 * written data it starts background writeback and then waits for previous
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3540 * writeback initiated some iterations ago.
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3541 *
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3542 * Write-behind maintains per-file head cursor in file->f_write_behind and
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3543 * two windows around: background writeback before and pending data after.
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3544 *
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3545 * |<-wait-this->| |<-send-this->|<---pending-write-behind--->|
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3546 * |<--async-write-behind--->|<--------previous-data------>|<-new-data->|
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3547 * current head-^ new head-^ file position-^
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3548 */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3549 void generic_write_behind(struct kiocb *iocb, ssize_t count)
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3550 {
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 @3551 struct file *file = iocb->ki_filp;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3552 struct address_space *mapping = file->f_mapping;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3553 struct inode *inode = mapping->host;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3554 struct backing_dev_info *bdi = inode_to_bdi(inode);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3555 unsigned long window = READ_ONCE(bdi->write_behind_pages);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3556 pgoff_t head = file->f_write_behind;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3557 pgoff_t begin = (iocb->ki_pos - count) >> PAGE_SHIFT;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3558 pgoff_t end = iocb->ki_pos >> PAGE_SHIFT;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3559
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3560 /* Skip if write is random, direct, sync or disabled for disk */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3561 if ((file->f_mode & FMODE_RANDOM) || !window ||
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3562 (iocb->ki_flags & (IOCB_DIRECT | IOCB_DSYNC)))
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3563 return;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3564
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3565 /* Skip non-sequential writes in strictly sequential mode. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3566 if (vm_dirty_write_behind < 2 &&
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3567 iocb->ki_pos != i_size_read(inode) &&
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3568 !(iocb->ki_flags & IOCB_APPEND))
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3569 return;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3570
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3571 /* Contigious write and still within window. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3572 if (end - head < window)
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3573 return;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3574
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3575 spin_lock(&file->f_lock);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3576
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3577 /* Re-read under lock. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3578 head = file->f_write_behind;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3579
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3580 /* Non-contiguous, move head position. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3581 if (head > end || begin - head > window) {
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3582 /*
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3583 * Append might happen though multiple files or via new file
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3584 * every time. Align head cursor to cover previous appends.
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3585 */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3586 if (iocb->ki_flags & IOCB_APPEND)
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3587 begin = roundup(begin - min(begin, window - 1),
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3588 bdi->io_pages);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3589
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3590 file->f_write_behind = head = begin;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3591 }
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3592
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3593 /* Still not big enough. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3594 if (end - head < window) {
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3595 spin_unlock(&file->f_lock);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3596 return;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3597 }
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3598
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3599 /* Write excess and try at least max_sectors_kb if possible */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3600 end = head + max(end - head - window, min(end - head, bdi->io_pages));
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3601
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3602 /* Set head for next iteration, everything behind will be written. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3603 file->f_write_behind = end;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3604
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3605 spin_unlock(&file->f_lock);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3606
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3607 /* Start background writeback. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3608 __filemap_fdatawrite_range(mapping,
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3609 (loff_t)head << PAGE_SHIFT,
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3610 ((loff_t)end << PAGE_SHIFT) - 1,
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3611 WB_SYNC_NONE);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3612
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3613 if (head < window)
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3614 return;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3615
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3616 /* Wait for pages falling behind writeback window. */
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3617 head -= window;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3618 end -= window;
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3619 __filemap_fdatawait_range(mapping,
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3620 (loff_t)head << PAGE_SHIFT,
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3621 ((loff_t)end << PAGE_SHIFT) - 1);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3622 }
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3623 EXPORT_SYMBOL(generic_write_behind);
e0e7df8d5b71bf Konstantin Khlebnikov 2019-09-20 3624

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (27.54 kB)
.config.gz (7.10 kB)
.config.gz
Download all attachments

2019-09-24 16:41:29

by Chen, Rong A

[permalink] [raw]
Subject: [mm] e0e7df8d5b: will-it-scale.per_process_ops -7.3% regression

Greeting,

FYI, we noticed a -7.3% regression of will-it-scale.per_process_ops due to commit:


commit: e0e7df8d5b71bf59ad93fe75e662c929b580d805 ("[PATCH v2] mm: implement write-behind policy for sequential file writes")
url: https://github.com/0day-ci/linux/commits/Konstantin-Khlebnikov/mm-implement-write-behind-policy-for-sequential-file-writes/20190920-155606


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 100%
mode: process
test: open1
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-05-14.cgz/lkp-csl-2ap4/open1/will-it-scale

commit:
574cc45397 (" drm main pull for 5.4-rc1")
e0e7df8d5b ("mm: implement write-behind policy for sequential file writes")

574cc4539762561d e0e7df8d5b71bf59ad93fe75e66
---------------- ---------------------------
%stddev %change %stddev
\ | \
370456 -7.3% 343238 will-it-scale.per_process_ops
71127653 -7.3% 65901758 will-it-scale.workload
828565 ± 23% +66.8% 1381984 ± 23% cpuidle.C1.time
1499 +1.1% 1515 turbostat.Avg_MHz
163498 ± 5% +26.4% 206691 ± 4% slabinfo.filp.active_slabs
163498 ± 5% +26.4% 206691 ± 4% slabinfo.filp.num_slabs
39055 ± 2% +17.1% 45720 ± 5% meminfo.Inactive
38615 ± 2% +17.3% 45291 ± 5% meminfo.Inactive(anon)
51382 ± 3% +19.6% 61469 ± 7% meminfo.Mapped
5163010 ± 2% +12.7% 5819765 ± 3% meminfo.Memused
2840181 ± 3% +22.5% 3478003 ± 5% meminfo.SUnreclaim
2941874 ± 3% +21.7% 3579791 ± 5% meminfo.Slab
67755 ± 5% +23.8% 83884 ± 3% meminfo.max_used_kB
79719901 +17.3% 93512842 numa-numastat.node0.local_node
79738690 +17.3% 93533079 numa-numastat.node0.numa_hit
81987497 +16.6% 95625946 numa-numastat.node1.local_node
82018695 +16.6% 95652480 numa-numastat.node1.numa_hit
82693483 +15.8% 95762465 numa-numastat.node2.local_node
82705924 +15.8% 95789007 numa-numastat.node2.numa_hit
80329941 +17.1% 94048289 numa-numastat.node3.local_node
80361116 +17.1% 94068512 numa-numastat.node3.numa_hit
9678 ± 2% +17.1% 11334 ± 5% proc-vmstat.nr_inactive_anon
13001 ± 3% +19.2% 15503 ± 7% proc-vmstat.nr_mapped
738232 ± 4% +18.5% 875062 ± 2% proc-vmstat.nr_slab_unreclaimable
9678 ± 2% +17.1% 11334 ± 5% proc-vmstat.nr_zone_inactive_anon
2391 ± 92% -84.5% 369.50 ± 46% proc-vmstat.numa_hint_faults
3.243e+08 +16.8% 3.789e+08 proc-vmstat.numa_hit
3.242e+08 +16.8% 3.788e+08 proc-vmstat.numa_local
1.296e+09 +16.8% 1.514e+09 proc-vmstat.pgalloc_normal
1.296e+09 +16.8% 1.514e+09 proc-vmstat.pgfree
862.61 ± 5% +37.7% 1188 ± 5% sched_debug.cfs_rq:/.exec_clock.stddev
229663 ± 62% +113.3% 489907 ± 29% sched_debug.cfs_rq:/.load.max
491.04 ± 4% -9.5% 444.29 ± 7% sched_debug.cfs_rq:/.nr_spread_over.min
229429 ± 62% +113.4% 489618 ± 29% sched_debug.cfs_rq:/.runnable_weight.max
-1959962 +36.2% -2669681 sched_debug.cfs_rq:/.spread0.min
1416008 ± 2% -13.3% 1227494 ± 5% sched_debug.cpu.avg_idle.avg
1240763 ± 8% -28.2% 891028 ± 18% sched_debug.cpu.avg_idle.stddev
352361 ± 6% -29.6% 248105 ± 25% sched_debug.cpu.max_idle_balance_cost.stddev
-20.00 +51.0% -30.21 sched_debug.cpu.nr_uninterruptible.min
6618 ± 10% -20.8% 5240 ± 8% sched_debug.cpu.ttwu_count.max
1452719 ± 4% +7.2% 1557262 ± 3% numa-meminfo.node0.MemUsed
797565 ± 2% +20.8% 963538 ± 2% numa-meminfo.node0.SUnreclaim
835343 ± 3% +19.6% 998867 ± 2% numa-meminfo.node0.Slab
831114 ± 2% +20.1% 998248 ± 2% numa-meminfo.node1.SUnreclaim
848052 +19.8% 1016069 ± 2% numa-meminfo.node1.Slab
1441558 ± 6% +15.7% 1668466 ± 3% numa-meminfo.node2.MemUsed
879835 ± 2% +20.4% 1059441 numa-meminfo.node2.SUnreclaim
901359 ± 3% +20.3% 1084727 ± 2% numa-meminfo.node2.Slab
1446041 ± 5% +15.5% 1669477 ± 3% numa-meminfo.node3.MemUsed
899442 ± 5% +23.0% 1106354 numa-meminfo.node3.SUnreclaim
924903 ± 5% +22.1% 1129709 numa-meminfo.node3.Slab
198945 +19.8% 238298 ± 2% numa-vmstat.node0.nr_slab_unreclaimable
40181885 +17.3% 47129598 numa-vmstat.node0.numa_hit
40163521 +17.3% 47110122 numa-vmstat.node0.numa_local
208512 +20.9% 252000 ± 2% numa-vmstat.node1.nr_slab_unreclaimable
41144466 +16.7% 48021716 numa-vmstat.node1.numa_hit
41027051 +16.8% 47908675 numa-vmstat.node1.numa_local
220763 ± 2% +21.9% 269115 ± 2% numa-vmstat.node2.nr_slab_unreclaimable
41437805 +16.2% 48167791 numa-vmstat.node2.numa_hit
41338581 +16.2% 48054485 numa-vmstat.node2.numa_local
225216 ± 2% +24.7% 280851 ± 2% numa-vmstat.node3.nr_slab_unreclaimable
40385721 +16.9% 47195289 numa-vmstat.node3.numa_hit
40268228 +16.9% 47088405 numa-vmstat.node3.numa_local
77.00 ± 29% +494.8% 458.00 ±110% interrupts.CPU10.RES:Rescheduling_interrupts
167.25 ± 65% +347.8% 749.00 ± 85% interrupts.CPU103.RES:Rescheduling_interrupts
136.50 ± 42% +309.2% 558.50 ± 85% interrupts.CPU107.RES:Rescheduling_interrupts
132.50 ± 26% +637.5% 977.25 ± 50% interrupts.CPU109.RES:Rescheduling_interrupts
212.50 ± 51% -65.2% 74.00 ± 9% interrupts.CPU115.RES:Rescheduling_interrupts
270.25 ± 20% -77.2% 61.50 ± 10% interrupts.CPU121.RES:Rescheduling_interrupts
184.00 ± 50% -57.5% 78.25 ± 51% interrupts.CPU128.RES:Rescheduling_interrupts
85.25 ± 38% +911.4% 862.25 ±135% interrupts.CPU137.RES:Rescheduling_interrupts
72.25 ± 6% +114.2% 154.75 ± 25% interrupts.CPU147.RES:Rescheduling_interrupts
415.00 ± 75% -69.8% 125.25 ± 59% interrupts.CPU15.RES:Rescheduling_interrupts
928.25 ± 93% -89.8% 94.50 ± 50% interrupts.CPU182.RES:Rescheduling_interrupts
359.75 ± 76% -58.8% 148.25 ± 85% interrupts.CPU19.RES:Rescheduling_interrupts
95.75 ± 30% +103.9% 195.25 ± 48% interrupts.CPU45.RES:Rescheduling_interrupts
60.25 ± 9% +270.5% 223.25 ± 93% interrupts.CPU83.RES:Rescheduling_interrupts
906.75 ±136% -90.5% 85.75 ± 36% interrupts.CPU85.RES:Rescheduling_interrupts
199.25 ± 25% -52.1% 95.50 ± 43% interrupts.CPU90.RES:Rescheduling_interrupts
5192 ± 34% +41.5% 7347 ± 24% interrupts.CPU95.NMI:Non-maskable_interrupts
5192 ± 34% +41.5% 7347 ± 24% interrupts.CPU95.PMI:Performance_monitoring_interrupts
1.75 +26.1% 2.20 perf-stat.i.MPKI
7.975e+10 -6.8% 7.435e+10 perf-stat.i.branch-instructions
3.782e+08 -5.9% 3.558e+08 perf-stat.i.branch-misses
75.36 +0.9 76.29 perf-stat.i.cache-miss-rate%
5.484e+08 +18.8% 6.515e+08 perf-stat.i.cache-misses
7.276e+08 +17.3% 8.539e+08 perf-stat.i.cache-references
1.37 +8.2% 1.48 perf-stat.i.cpi
5.701e+11 +0.7% 5.744e+11 perf-stat.i.cpu-cycles
1040 -15.2% 882.10 perf-stat.i.cycles-between-cache-misses
1.253e+11 -7.2% 1.163e+11 perf-stat.i.dTLB-loads
7.443e+10 -7.2% 6.904e+10 perf-stat.i.dTLB-stores
3.336e+08 +12.6% 3.755e+08 perf-stat.i.iTLB-load-misses
5004598 ± 7% -60.9% 1954451 ± 6% perf-stat.i.iTLB-loads
4.175e+11 -6.9% 3.887e+11 perf-stat.i.instructions
1251 -17.3% 1035 perf-stat.i.instructions-per-iTLB-miss
0.73 -7.6% 0.68 perf-stat.i.ipc
19.77 -1.5 18.31 perf-stat.i.node-load-miss-rate%
5003202 ± 2% +16.5% 5829006 perf-stat.i.node-load-misses
20521507 +28.1% 26283838 perf-stat.i.node-loads
1.84 +0.4 2.28 perf-stat.i.node-store-miss-rate%
1469703 +29.0% 1895783 perf-stat.i.node-store-misses
78304054 +4.0% 81463725 perf-stat.i.node-stores
1.74 +26.1% 2.20 perf-stat.overall.MPKI
75.37 +0.9 76.30 perf-stat.overall.cache-miss-rate%
1.37 +8.2% 1.48 perf-stat.overall.cpi
1039 -15.2% 881.41 perf-stat.overall.cycles-between-cache-misses
1251 -17.3% 1035 perf-stat.overall.instructions-per-iTLB-miss
0.73 -7.6% 0.68 perf-stat.overall.ipc
19.59 -1.5 18.14 perf-stat.overall.node-load-miss-rate%
1.84 +0.4 2.27 perf-stat.overall.node-store-miss-rate%
7.943e+10 -6.8% 7.404e+10 perf-stat.ps.branch-instructions
3.767e+08 -5.9% 3.543e+08 perf-stat.ps.branch-misses
5.465e+08 +18.8% 6.492e+08 perf-stat.ps.cache-misses
7.25e+08 +17.4% 8.508e+08 perf-stat.ps.cache-references
5.68e+11 +0.7% 5.722e+11 perf-stat.ps.cpu-cycles
1.248e+11 -7.2% 1.158e+11 perf-stat.ps.dTLB-loads
7.413e+10 -7.3% 6.874e+10 perf-stat.ps.dTLB-stores
3.322e+08 +12.5% 3.739e+08 perf-stat.ps.iTLB-load-misses
4986239 ± 7% -61.0% 1946378 ± 6% perf-stat.ps.iTLB-loads
4.158e+11 -6.9% 3.87e+11 perf-stat.ps.instructions
4982520 ± 2% +16.5% 5803884 perf-stat.ps.node-load-misses
20448588 +28.1% 26201547 perf-stat.ps.node-loads
1463675 +29.0% 1887791 perf-stat.ps.node-store-misses
77979119 +4.0% 81107191 perf-stat.ps.node-stores
1.25e+14 -6.8% 1.165e+14 perf-stat.total.instructions
10.11 -1.9 8.21 perf-profile.calltrace.cycles-pp.file_free_rcu.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd
17.28 -0.8 16.48 perf-profile.calltrace.cycles-pp.close
9.41 -0.7 8.69 perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_open.do_syscall_64
6.32 -0.7 5.64 perf-profile.calltrace.cycles-pp.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64
5.27 -0.5 4.72 perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.96 -0.5 13.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.close
13.58 -0.4 13.14 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
0.92 -0.3 0.64 perf-profile.calltrace.cycles-pp.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
3.10 -0.2 2.86 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
2.44 -0.2 2.21 perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_open
4.02 -0.2 3.80 perf-profile.calltrace.cycles-pp.selinux_inode_permission.security_inode_permission.link_path_walk.path_openat.do_filp_open
1.82 -0.2 1.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.open64
9.26 -0.2 9.04 perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
2.12 ± 2% -0.2 1.90 perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open
1.03 ± 10% -0.2 0.82 perf-profile.calltrace.cycles-pp.inode_permission.link_path_walk.path_openat.do_filp_open.do_sys_open
2.55 ± 2% -0.2 2.36 ± 3% perf-profile.calltrace.cycles-pp.security_inode_permission.may_open.path_openat.do_filp_open.do_sys_open
1.37 -0.2 1.18 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.15 -0.2 0.95 perf-profile.calltrace.cycles-pp.ima_file_check.path_openat.do_filp_open.do_sys_open.do_syscall_64
1.79 -0.2 1.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.close
2.41 ± 3% -0.2 2.22 ± 3% perf-profile.calltrace.cycles-pp.selinux_inode_permission.security_inode_permission.may_open.path_openat.do_filp_open
2.88 -0.2 2.71 perf-profile.calltrace.cycles-pp.security_file_open.do_dentry_open.path_openat.do_filp_open.do_sys_open
2.38 -0.2 2.22 perf-profile.calltrace.cycles-pp.security_file_alloc.__alloc_file.alloc_empty_file.path_openat.do_filp_open
4.31 -0.2 4.16 perf-profile.calltrace.cycles-pp.security_inode_permission.link_path_walk.path_openat.do_filp_open.do_sys_open
9.93 -0.1 9.80 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
1.63 -0.1 1.50 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.security_file_alloc.__alloc_file.alloc_empty_file.path_openat
1.38 -0.1 1.26 perf-profile.calltrace.cycles-pp.__alloc_fd.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
5.16 -0.1 5.04 perf-profile.calltrace.cycles-pp.getname_flags.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
1.13 -0.1 1.02 perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_open
2.26 -0.1 2.15 perf-profile.calltrace.cycles-pp.selinux_file_open.security_file_open.do_dentry_open.path_openat.do_filp_open
0.63 -0.1 0.52 ± 2% perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.strncpy_from_user.getname_flags.do_sys_open
1.29 -0.1 1.18 perf-profile.calltrace.cycles-pp.lookup_fast.path_openat.do_filp_open.do_sys_open.do_syscall_64
1.75 -0.1 1.65 perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_open.do_syscall_64
0.67 -0.1 0.58 perf-profile.calltrace.cycles-pp.kmem_cache_free.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
1.22 ± 2% -0.1 1.12 perf-profile.calltrace.cycles-pp.avc_has_perm_noaudit.selinux_inode_permission.security_inode_permission.link_path_walk.path_openat
1.21 -0.1 1.12 perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.74 -0.1 0.66 perf-profile.calltrace.cycles-pp.__inode_security_revalidate.selinux_file_open.security_file_open.do_dentry_open.path_openat
0.89 -0.1 0.81 perf-profile.calltrace.cycles-pp.inode_security_rcu.selinux_inode_permission.security_inode_permission.may_open.path_openat
0.79 ± 4% -0.1 0.72 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.76 -0.1 0.70 perf-profile.calltrace.cycles-pp.__inode_security_revalidate.inode_security_rcu.selinux_inode_permission.security_inode_permission.may_open
0.67 ± 3% -0.1 0.61 perf-profile.calltrace.cycles-pp.__d_lookup_rcu.lookup_fast.path_openat.do_filp_open.do_sys_open
0.66 ± 3% -0.1 0.60 perf-profile.calltrace.cycles-pp.inode_permission.may_open.path_openat.do_filp_open.do_sys_open
1.02 -0.1 0.96 perf-profile.calltrace.cycles-pp.path_init.path_openat.do_filp_open.do_sys_open.do_syscall_64
0.81 -0.1 0.75 perf-profile.calltrace.cycles-pp.task_work_add.fput_many.filp_close.__x64_sys_close.do_syscall_64
0.67 -0.0 0.63 perf-profile.calltrace.cycles-pp.rcu_segcblist_enqueue.__call_rcu.task_work_run.exit_to_usermode_loop.do_syscall_64
0.78 -0.0 0.74 perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.__softirqentry_text_start
0.55 -0.0 0.53 perf-profile.calltrace.cycles-pp.selinux_file_alloc_security.security_file_alloc.__alloc_file.alloc_empty_file.path_openat
0.71 +0.1 0.82 perf-profile.calltrace.cycles-pp.memset_erms.kmem_cache_alloc.__alloc_file.alloc_empty_file.path_openat
3.38 +0.1 3.50 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.66 +0.1 1.78 perf-profile.calltrace.cycles-pp.__call_rcu.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.70 +0.1 0.84 perf-profile.calltrace.cycles-pp.__virt_addr_valid.__check_object_size.strncpy_from_user.getname_flags.do_sys_open
1.81 +0.4 2.23 perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.do_sys_open.do_syscall_64
39.47 +0.7 40.17 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
0.00 +0.8 0.75 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.new_slab.___slab_alloc.__slab_alloc
38.69 +0.8 39.45 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.84 perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.new_slab.___slab_alloc.__slab_alloc.kmem_cache_alloc
29.90 +0.9 30.79 perf-profile.calltrace.cycles-pp.__softirqentry_text_start.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
29.90 +0.9 30.79 perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
29.87 +0.9 30.76 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd.smpboot_thread_fn
29.88 +0.9 30.78 perf-profile.calltrace.cycles-pp.rcu_core.__softirqentry_text_start.run_ksoftirqd.smpboot_thread_fn.kthread
29.93 +0.9 30.84 perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork
29.94 +0.9 30.85 perf-profile.calltrace.cycles-pp.ret_from_fork
29.94 +0.9 30.85 perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
0.89 ± 29% +0.9 1.81 perf-profile.calltrace.cycles-pp.setup_object_debug.new_slab.___slab_alloc.__slab_alloc.kmem_cache_alloc
7.25 ± 3% +1.1 8.36 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_one_page.__free_pages_ok.unfreeze_partials
7.75 ± 3% +1.1 8.87 perf-profile.calltrace.cycles-pp.__free_pages_ok.unfreeze_partials.put_cpu_partial.kmem_cache_free.rcu_do_batch
7.72 ± 3% +1.1 8.85 perf-profile.calltrace.cycles-pp.free_one_page.__free_pages_ok.unfreeze_partials.put_cpu_partial.kmem_cache_free
7.29 ± 3% +1.1 8.41 perf-profile.calltrace.cycles-pp._raw_spin_lock.free_one_page.__free_pages_ok.unfreeze_partials.put_cpu_partial
9.12 ± 3% +1.1 10.25 perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd
7.96 ± 3% +1.1 9.10 perf-profile.calltrace.cycles-pp.put_cpu_partial.kmem_cache_free.rcu_do_batch.rcu_core.__softirqentry_text_start
7.92 ± 3% +1.1 9.07 perf-profile.calltrace.cycles-pp.unfreeze_partials.put_cpu_partial.kmem_cache_free.rcu_do_batch.rcu_core
2.38 +1.5 3.83 perf-profile.calltrace.cycles-pp.new_slab.___slab_alloc.__slab_alloc.kmem_cache_alloc.__alloc_file
10.53 +1.7 12.19 perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd
5.47 +2.2 7.64 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.__alloc_file.alloc_empty_file.path_openat.do_filp_open
3.34 +2.2 5.56 perf-profile.calltrace.cycles-pp.___slab_alloc.__slab_alloc.kmem_cache_alloc.__alloc_file.alloc_empty_file
3.39 +2.3 5.65 perf-profile.calltrace.cycles-pp.__slab_alloc.kmem_cache_alloc.__alloc_file.alloc_empty_file.path_openat
11.39 +2.7 14.08 perf-profile.calltrace.cycles-pp.alloc_empty_file.path_openat.do_filp_open.do_sys_open.do_syscall_64
10.91 +2.7 13.63 perf-profile.calltrace.cycles-pp.__alloc_file.alloc_empty_file.path_openat.do_filp_open.do_sys_open
10.62 -2.1 8.54 perf-profile.children.cycles-pp.file_free_rcu
17.31 -0.8 16.51 perf-profile.children.cycles-pp.close
9.47 -0.7 8.74 perf-profile.children.cycles-pp.link_path_walk
6.37 -0.7 5.68 perf-profile.children.cycles-pp.do_dentry_open
5.48 -0.6 4.90 perf-profile.children.cycles-pp.__fput
6.49 -0.4 6.08 perf-profile.children.cycles-pp.selinux_inode_permission
6.95 -0.3 6.60 perf-profile.children.cycles-pp.security_inode_permission
3.48 -0.3 3.15 perf-profile.children.cycles-pp.lookup_fast
2.38 -0.3 2.09 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.74 ± 5% -0.3 1.46 perf-profile.children.cycles-pp.inode_permission
0.94 -0.3 0.66 perf-profile.children.cycles-pp.__close_fd
3.10 -0.2 2.86 perf-profile.children.cycles-pp.__x64_sys_close
2.27 ± 2% -0.2 2.04 ± 2% perf-profile.children.cycles-pp.dput
2.47 -0.2 2.24 perf-profile.children.cycles-pp.walk_component
2.21 ± 2% -0.2 1.98 perf-profile.children.cycles-pp.___might_sleep
2.24 -0.2 2.02 perf-profile.children.cycles-pp.syscall_return_via_sysret
9.32 -0.2 9.12 perf-profile.children.cycles-pp.task_work_run
1.17 -0.2 0.97 perf-profile.children.cycles-pp.ima_file_check
1.99 -0.2 1.80 perf-profile.children.cycles-pp.__inode_security_revalidate
2.92 -0.2 2.73 perf-profile.children.cycles-pp.security_file_open
0.56 -0.2 0.38 perf-profile.children.cycles-pp.selinux_task_getsecid
0.69 -0.2 0.51 perf-profile.children.cycles-pp.security_task_getsecid
2.40 -0.2 2.24 perf-profile.children.cycles-pp.security_file_alloc
0.20 ± 4% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.try_module_get
1.44 -0.1 1.31 perf-profile.children.cycles-pp.__might_sleep
10.01 -0.1 9.88 perf-profile.children.cycles-pp.exit_to_usermode_loop
1.46 -0.1 1.33 perf-profile.children.cycles-pp.inode_security_rcu
1.00 -0.1 0.87 perf-profile.children.cycles-pp._cond_resched
5.20 -0.1 5.08 perf-profile.children.cycles-pp.getname_flags
1.05 -0.1 0.93 perf-profile.children.cycles-pp.__fsnotify_parent
1.42 -0.1 1.30 perf-profile.children.cycles-pp.fsnotify
1.41 -0.1 1.29 perf-profile.children.cycles-pp.__alloc_fd
2.29 -0.1 2.18 perf-profile.children.cycles-pp.selinux_file_open
0.64 -0.1 0.53 perf-profile.children.cycles-pp.__check_heap_object
1.42 ± 2% -0.1 1.31 ± 2% perf-profile.children.cycles-pp.irq_exit
1.80 -0.1 1.69 perf-profile.children.cycles-pp.terminate_walk
0.33 -0.1 0.23 perf-profile.children.cycles-pp.file_ra_state_init
0.65 ± 3% -0.1 0.56 perf-profile.children.cycles-pp.generic_permission
1.23 -0.1 1.15 perf-profile.children.cycles-pp.fput_many
0.83 ± 3% -0.1 0.74 perf-profile.children.cycles-pp._raw_spin_lock_irq
0.53 -0.1 0.45 ± 2% perf-profile.children.cycles-pp.rcu_all_qs
0.58 ± 5% -0.1 0.51 ± 2% perf-profile.children.cycles-pp.mntput_no_expire
0.75 -0.1 0.69 perf-profile.children.cycles-pp.lockref_put_or_lock
1.03 -0.1 0.97 perf-profile.children.cycles-pp.path_init
0.84 -0.1 0.78 perf-profile.children.cycles-pp.task_work_add
0.14 ± 3% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.ima_file_free
0.26 ± 7% -0.1 0.21 ± 2% perf-profile.children.cycles-pp.path_get
0.83 -0.0 0.78 perf-profile.children.cycles-pp.__slab_free
0.62 -0.0 0.58 perf-profile.children.cycles-pp.percpu_counter_add_batch
0.67 -0.0 0.63 perf-profile.children.cycles-pp.rcu_segcblist_enqueue
0.20 ± 11% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.mntget
0.22 ± 4% -0.0 0.19 ± 3% perf-profile.children.cycles-pp.get_unused_fd_flags
0.10 ± 14% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.close@plt
0.34 ± 2% -0.0 0.31 perf-profile.children.cycles-pp.lockref_get
0.24 -0.0 0.21 ± 2% perf-profile.children.cycles-pp.__x64_sys_open
0.11 ± 8% -0.0 0.08 ± 10% perf-profile.children.cycles-pp.putname
0.18 ± 2% -0.0 0.16 perf-profile.children.cycles-pp.should_failslab
0.55 -0.0 0.53 perf-profile.children.cycles-pp.selinux_file_alloc_security
0.21 ± 3% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.expand_files
0.07 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.module_put
0.12 -0.0 0.10 ± 4% perf-profile.children.cycles-pp.security_file_free
0.17 -0.0 0.15 ± 3% perf-profile.children.cycles-pp.find_next_zero_bit
0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.memset
0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__mutex_init
0.10 -0.0 0.09 perf-profile.children.cycles-pp.mntput
0.12 +0.0 0.13 ± 3% perf-profile.children.cycles-pp.__list_del_entry_valid
0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.discard_slab
0.08 +0.0 0.10 ± 4% perf-profile.children.cycles-pp.kick_process
0.04 ± 57% +0.0 0.07 ± 7% perf-profile.children.cycles-pp.native_irq_return_iret
0.12 ± 4% +0.0 0.15 ± 3% perf-profile.children.cycles-pp.blkcg_maybe_throttle_current
1.31 +0.0 1.34 perf-profile.children.cycles-pp.memset_erms
0.40 +0.0 0.44 perf-profile.children.cycles-pp.lockref_get_not_dead
0.07 ± 6% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.rcu_segcblist_pend_cbs
0.01 ±173% +0.0 0.06 ± 11% perf-profile.children.cycles-pp.native_write_msr
0.27 ± 6% +0.1 0.33 ± 10% perf-profile.children.cycles-pp.ktime_get
0.16 ± 5% +0.1 0.22 ± 4% perf-profile.children.cycles-pp.get_partial_node
0.01 ±173% +0.1 0.07 ± 30% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.____fput
0.05 +0.1 0.15 ± 3% perf-profile.children.cycles-pp.__mod_zone_page_state
0.12 ± 16% +0.1 0.23 ± 10% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.05 ± 8% +0.1 0.16 ± 2% perf-profile.children.cycles-pp.legitimize_links
1.71 +0.1 1.83 perf-profile.children.cycles-pp.__call_rcu
3.40 +0.1 3.52 perf-profile.children.cycles-pp.strncpy_from_user
0.30 ± 2% +0.1 0.43 ± 2% perf-profile.children.cycles-pp.locks_remove_posix
0.72 +0.1 0.86 perf-profile.children.cycles-pp.__virt_addr_valid
0.08 +0.1 0.23 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.84 ± 9% +0.2 1.02 ± 8% perf-profile.children.cycles-pp.hrtimer_interrupt
0.15 ± 3% +0.2 0.38 perf-profile.children.cycles-pp.check_stack_object
0.65 +0.4 1.05 perf-profile.children.cycles-pp.setup_object_debug
0.36 +0.4 0.76 perf-profile.children.cycles-pp.get_page_from_freelist
1.90 +0.4 2.31 perf-profile.children.cycles-pp.__check_object_size
0.39 +0.4 0.84 perf-profile.children.cycles-pp.__alloc_pages_nodemask
39.52 +0.7 40.22 perf-profile.children.cycles-pp.do_filp_open
38.84 +0.8 39.59 perf-profile.children.cycles-pp.path_openat
31.27 +0.8 32.05 perf-profile.children.cycles-pp.rcu_core
31.26 +0.8 32.03 perf-profile.children.cycles-pp.rcu_do_batch
31.31 +0.8 32.09 perf-profile.children.cycles-pp.__softirqentry_text_start
29.90 +0.9 30.79 perf-profile.children.cycles-pp.run_ksoftirqd
29.93 +0.9 30.84 perf-profile.children.cycles-pp.smpboot_thread_fn
29.94 +0.9 30.85 perf-profile.children.cycles-pp.kthread
29.94 +0.9 30.85 perf-profile.children.cycles-pp.ret_from_fork
10.63 ± 2% +1.0 11.61 perf-profile.children.cycles-pp.kmem_cache_free
8.45 ± 3% +1.1 9.57 perf-profile.children.cycles-pp._raw_spin_lock
7.96 ± 3% +1.1 9.11 perf-profile.children.cycles-pp.__free_pages_ok
7.93 ± 3% +1.2 9.08 perf-profile.children.cycles-pp.free_one_page
8.19 ± 3% +1.2 9.36 perf-profile.children.cycles-pp.put_cpu_partial
8.15 ± 3% +1.2 9.32 perf-profile.children.cycles-pp.unfreeze_partials
7.59 ± 3% +1.3 8.89 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
11.12 +1.7 12.83 perf-profile.children.cycles-pp.rcu_cblist_dequeue
2.88 +1.7 4.61 perf-profile.children.cycles-pp.new_slab
8.73 +1.8 10.54 perf-profile.children.cycles-pp.kmem_cache_alloc
3.34 +2.2 5.56 perf-profile.children.cycles-pp.___slab_alloc
3.39 +2.3 5.65 perf-profile.children.cycles-pp.__slab_alloc
11.45 +2.7 14.12 perf-profile.children.cycles-pp.alloc_empty_file
10.98 +2.7 13.70 perf-profile.children.cycles-pp.__alloc_file
10.53 -2.1 8.47 perf-profile.self.cycles-pp.file_free_rcu
2.37 -0.3 2.05 perf-profile.self.cycles-pp.kmem_cache_alloc
1.43 -0.3 1.17 perf-profile.self.cycles-pp.strncpy_from_user
0.50 -0.2 0.26 ± 2% perf-profile.self.cycles-pp.__close_fd
2.22 -0.2 2.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
2.07 ± 2% -0.2 1.86 perf-profile.self.cycles-pp.___might_sleep
1.01 ± 7% -0.2 0.82 perf-profile.self.cycles-pp.inode_permission
1.13 -0.2 0.96 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
3.10 -0.2 2.93 perf-profile.self.cycles-pp.selinux_inode_permission
0.52 -0.2 0.35 perf-profile.self.cycles-pp.selinux_task_getsecid
1.33 -0.1 1.18 ± 2% perf-profile.self.cycles-pp.do_dentry_open
1.55 -0.1 1.42 perf-profile.self.cycles-pp.kmem_cache_free
1.55 -0.1 1.43 perf-profile.self.cycles-pp.link_path_walk
0.17 ± 4% -0.1 0.04 ± 57% perf-profile.self.cycles-pp.try_module_get
1.35 -0.1 1.24 perf-profile.self.cycles-pp.fsnotify
0.96 -0.1 0.85 perf-profile.self.cycles-pp.__fsnotify_parent
1.25 ± 2% -0.1 1.14 perf-profile.self.cycles-pp.__might_sleep
0.87 ± 2% -0.1 0.78 ± 2% perf-profile.self.cycles-pp.lookup_fast
0.79 ± 2% -0.1 0.70 ± 2% perf-profile.self.cycles-pp.do_syscall_64
1.02 -0.1 0.93 perf-profile.self.cycles-pp.do_sys_open
0.30 -0.1 0.22 perf-profile.self.cycles-pp.file_ra_state_init
0.58 -0.1 0.50 perf-profile.self.cycles-pp.__check_heap_object
0.80 ± 3% -0.1 0.72 perf-profile.self.cycles-pp._raw_spin_lock_irq
0.59 ± 2% -0.1 0.52 ± 2% perf-profile.self.cycles-pp.generic_permission
1.17 -0.1 1.09 perf-profile.self.cycles-pp.__fput
0.68 -0.1 0.60 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.84 -0.1 0.77 perf-profile.self.cycles-pp.__inode_security_revalidate
0.73 -0.1 0.66 perf-profile.self.cycles-pp.task_work_add
0.39 -0.1 0.32 ± 2% perf-profile.self.cycles-pp.rcu_all_qs
0.54 ± 5% -0.1 0.47 ± 2% perf-profile.self.cycles-pp.mntput_no_expire
0.50 ± 6% -0.1 0.44 ± 4% perf-profile.self.cycles-pp.dput
0.46 -0.1 0.40 perf-profile.self.cycles-pp._cond_resched
0.93 -0.1 0.88 ± 2% perf-profile.self.cycles-pp.close
0.11 ± 4% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.ima_file_free
0.83 -0.1 0.77 perf-profile.self.cycles-pp.__slab_free
0.61 -0.1 0.56 perf-profile.self.cycles-pp.__alloc_fd
0.87 -0.1 0.82 perf-profile.self.cycles-pp._raw_spin_lock
0.69 -0.0 0.64 perf-profile.self.cycles-pp.lockref_put_or_lock
0.67 -0.0 0.62 perf-profile.self.cycles-pp.rcu_segcblist_enqueue
0.46 -0.0 0.41 perf-profile.self.cycles-pp.do_filp_open
0.56 -0.0 0.51 perf-profile.self.cycles-pp.percpu_counter_add_batch
1.05 ± 2% -0.0 1.01 perf-profile.self.cycles-pp.path_openat
0.28 ± 2% -0.0 0.24 ± 4% perf-profile.self.cycles-pp.security_file_open
0.94 ± 2% -0.0 0.90 perf-profile.self.cycles-pp.open64
0.21 ± 6% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.get_unused_fd_flags
0.17 ± 13% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.mntget
0.39 -0.0 0.35 perf-profile.self.cycles-pp.fput_many
0.37 -0.0 0.34 perf-profile.self.cycles-pp.getname_flags
0.43 -0.0 0.41 perf-profile.self.cycles-pp.path_init
0.33 -0.0 0.30 ± 2% perf-profile.self.cycles-pp.lockref_get
0.20 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.filp_close
0.52 -0.0 0.50 perf-profile.self.cycles-pp.selinux_file_alloc_security
0.22 -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__x64_sys_open
0.28 ± 2% -0.0 0.26 perf-profile.self.cycles-pp.inode_security_rcu
0.19 ± 4% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.expand_files
0.09 ± 9% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.putname
0.10 -0.0 0.08 ± 5% perf-profile.self.cycles-pp.security_file_free
0.15 ± 3% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.find_next_zero_bit
0.12 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.nd_jump_root
0.08 -0.0 0.07 perf-profile.self.cycles-pp.fd_install
0.06 -0.0 0.05 perf-profile.self.cycles-pp.path_get
0.12 +0.0 0.13 ± 3% perf-profile.self.cycles-pp.__list_del_entry_valid
0.07 ± 5% +0.0 0.09 perf-profile.self.cycles-pp.get_partial_node
0.12 ± 3% +0.0 0.14 ± 3% perf-profile.self.cycles-pp.discard_slab
0.11 +0.0 0.13 ± 3% perf-profile.self.cycles-pp.blkcg_maybe_throttle_current
0.06 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.kick_process
0.28 ± 2% +0.0 0.30 perf-profile.self.cycles-pp.__x64_sys_close
0.04 ± 57% +0.0 0.07 ± 7% perf-profile.self.cycles-pp.native_irq_return_iret
0.39 +0.0 0.42 perf-profile.self.cycles-pp.lockref_get_not_dead
0.53 +0.0 0.57 perf-profile.self.cycles-pp.exit_to_usermode_loop
0.06 ± 7% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.rcu_segcblist_pend_cbs
0.01 ±173% +0.0 0.06 ± 11% perf-profile.self.cycles-pp.native_write_msr
0.28 +0.0 0.33 perf-profile.self.cycles-pp.terminate_walk
0.27 ± 5% +0.1 0.32 ± 11% perf-profile.self.cycles-pp.ktime_get
0.00 +0.1 0.05 ± 9% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.43 ± 5% +0.1 0.49 ± 3% perf-profile.self.cycles-pp.security_inode_permission
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.____fput
0.00 +0.1 0.08 perf-profile.self.cycles-pp.__alloc_pages_nodemask
0.05 +0.1 0.15 ± 3% perf-profile.self.cycles-pp.__mod_zone_page_state
0.25 ± 3% +0.1 0.35 ± 2% perf-profile.self.cycles-pp.locks_remove_posix
0.12 ± 17% +0.1 0.23 ± 11% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.14 +0.1 0.25 perf-profile.self.cycles-pp.setup_object_debug
0.93 +0.1 1.04 perf-profile.self.cycles-pp.__call_rcu
0.46 +0.1 0.58 perf-profile.self.cycles-pp.__check_object_size
0.13 ± 3% +0.1 0.26 ± 3% perf-profile.self.cycles-pp.get_page_from_freelist
0.00 +0.1 0.13 ± 3% perf-profile.self.cycles-pp.legitimize_links
0.68 +0.1 0.81 perf-profile.self.cycles-pp.__virt_addr_valid
0.40 ± 11% +0.2 0.58 perf-profile.self.cycles-pp.may_open
0.12 +0.2 0.31 perf-profile.self.cycles-pp.check_stack_object
0.90 +0.3 1.22 perf-profile.self.cycles-pp.task_work_run
0.30 +0.4 0.73 perf-profile.self.cycles-pp.___slab_alloc
2.88 +0.7 3.59 perf-profile.self.cycles-pp.__alloc_file
2.27 +1.1 3.38 perf-profile.self.cycles-pp.new_slab
7.59 ± 3% +1.3 8.89 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
11.04 +1.7 12.73 perf-profile.self.cycles-pp.rcu_cblist_dequeue



will-it-scale.per_process_ops

385000 +-+----------------------------------------------------------------+
| .+ .+. .+ |
380000 +-+.++ : +.++.+ +.++.+ : |
375000 +-+ : + : |
| +.+ .+.++.+ ++. .+.+.+ .+.+.+ .+.|
370000 +-+ +.+ +.+.++ + + |
365000 +-+ |
| |
360000 +-+ |
355000 +-+ |
| |
350000 +-+ |
345000 O-+ OO O O OO O O OO O O O |
| O O O O O OO O O |
340000 +-+----------------------------------------------------------------+


will-it-scale.workload

7.4e+07 +-+---------------------------------------------------------------+
| .+ .+ .+ |
7.3e+07 +-++.+ : ++.+.+ +.+.++ : |
7.2e+07 +-+ : + : |
| ++. .++.+.+ +.+ +.+.+.++.+.+. +.|
7.1e+07 +-+ +.+ +.+.+.+ + |
7e+07 +-+ |
| |
6.9e+07 +-+ |
6.8e+07 +-+ |
| |
6.7e+07 +-+ O O |
6.6e+07 O-OO O O OO O OO O O O O O O O |
| O O O O |
6.5e+07 +-+---------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (45.86 kB)
config-5.3.0-10170-ge0e7df8d5b71b (203.56 kB)
job-script (7.36 kB)
job.yaml (4.98 kB)
reproduce (320.00 B)
Download all attachments

2019-09-25 15:13:48

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On 23/09/2019 17.52, Tejun Heo wrote:
> Hello, Konstantin.
>
> On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
>> With vm.dirty_write_behind 1 or 2 files are written even faster and
>
> Is the faster speed reproducible? I don't quite understand why this
> would be.

Writing to disk simply starts earlier.

>
>> during copying amount of dirty memory always stays around at 16MiB.
>
> The following is the test part of a slightly modified version of your
> test script which should run fine on any modern systems.
>
> for mode in 0 1; do
> if [ $mode == 0 ]; then
> prefix=''
> else
> prefix='systemd-run --user --scope -p MemoryMax=64M'
> fi
>
> echo COPY
> time $prefix cp -r dummy copy
>
> grep Dirty /proc/meminfo
>
> echo SYNC
> time sync
>
> rm -fr copy
> done
>
> and the result looks like the following.
>
> $ ./test-writebehind.sh
> SIZE
> 3.3G dummy
> COPY
>
> real 0m2.859s
> user 0m0.015s
> sys 0m2.843s
> Dirty: 3416780 kB
> SYNC
>
> real 0m34.008s
> user 0m0.000s
> sys 0m0.008s
> COPY
> Running scope as unit: run-r69dca5326a9a435d80e036435ff9e1da.scope
>
> real 0m32.267s
> user 0m0.032s
> sys 0m4.186s
> Dirty: 14304 kB
> SYNC
>
> real 0m1.783s
> user 0m0.000s
> sys 0m0.006s
>
> This is how we are solving the massive dirtier problem. It's easy,
> works pretty well and can easily be tailored to the specific
> requirements.
>
> Generic write-behind would definitely have other benefits and also a
> bunch of regression possibilities. I'm not trying to say that
> write-behind isn't a good idea but it'd be useful to consider that a
> good portion of the benefits can already be obtained fairly easily.
>

I'm afraid this could end badly if each simple task like file copying
will require own systemd job and container with manual tuning.

2019-09-25 15:29:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On 9/20/19 5:10 PM, Linus Torvalds wrote:
> On Fri, Sep 20, 2019 at 4:05 PM Linus Torvalds
> <[email protected]> wrote:
>>
>>
>> Now, I hear you say "those are so small these days that it doesn't
>> matter". And maybe you're right. But particularly for slow media,
>> triggering good streaming write behavior has been a problem in the
>> past.
>
> Which reminds me: the writebehind trigger should likely be tied to the
> estimate of the bdi write speed.
>
> We _do_ have that avg_write_bandwidth thing in the bdi_writeback
> structure, it sounds like a potentially good idea to try to use that
> to estimate when to do writebehind.
>
> No?

I really like the feature, and agree it should be tied to the bdi write
speed. How about just making the tunable acceptable time of write behind
dirty? Eg if write_behind_msec is 1000, allow 1s of pending dirty before
starting writbeack.

--
Jens Axboe

2019-09-25 15:59:31

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On 23/09/2019 18.36, Jens Axboe wrote:
> On 9/20/19 5:10 PM, Linus Torvalds wrote:
>> On Fri, Sep 20, 2019 at 4:05 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>>
>>> Now, I hear you say "those are so small these days that it doesn't
>>> matter". And maybe you're right. But particularly for slow media,
>>> triggering good streaming write behavior has been a problem in the
>>> past.
>>
>> Which reminds me: the writebehind trigger should likely be tied to the
>> estimate of the bdi write speed.
>>
>> We _do_ have that avg_write_bandwidth thing in the bdi_writeback
>> structure, it sounds like a potentially good idea to try to use that
>> to estimate when to do writebehind.
>>
>> No?
>
> I really like the feature, and agree it should be tied to the bdi write
> speed. How about just making the tunable acceptable time of write behind
> dirty? Eg if write_behind_msec is 1000, allow 1s of pending dirty before
> starting writbeack.
>

I haven't digged into it yet.

But IIRR writeback speed estimation has some problems:

There is no "slow start" - initial speed is 100MiB/s.
This is especially bad for slow usb disks - right after plugging
we'll accumulate too much dirty cache before starting writeback.

And I've seen problems with cgroup-writeback:
each cgroup has own estimation, doesn't work well for short-living cgroups.

2019-09-25 19:11:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

Hello, Konstantin.

On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> With vm.dirty_write_behind 1 or 2 files are written even faster and

Is the faster speed reproducible? I don't quite understand why this
would be.

> during copying amount of dirty memory always stays around at 16MiB.

The following is the test part of a slightly modified version of your
test script which should run fine on any modern systems.

for mode in 0 1; do
if [ $mode == 0 ]; then
prefix=''
else
prefix='systemd-run --user --scope -p MemoryMax=64M'
fi

echo COPY
time $prefix cp -r dummy copy

grep Dirty /proc/meminfo

echo SYNC
time sync

rm -fr copy
done

and the result looks like the following.

$ ./test-writebehind.sh
SIZE
3.3G dummy
COPY

real 0m2.859s
user 0m0.015s
sys 0m2.843s
Dirty: 3416780 kB
SYNC

real 0m34.008s
user 0m0.000s
sys 0m0.008s
COPY
Running scope as unit: run-r69dca5326a9a435d80e036435ff9e1da.scope

real 0m32.267s
user 0m0.032s
sys 0m4.186s
Dirty: 14304 kB
SYNC

real 0m1.783s
user 0m0.000s
sys 0m0.006s

This is how we are solving the massive dirtier problem. It's easy,
works pretty well and can easily be tailored to the specific
requirements.

Generic write-behind would definitely have other benefits and also a
bunch of regression possibilities. I'm not trying to say that
write-behind isn't a good idea but it'd be useful to consider that a
good portion of the benefits can already be obtained fairly easily.

Thanks.

--
tejun

2019-09-25 19:21:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

Hello,

On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
> On 23/09/2019 17.52, Tejun Heo wrote:
> >Hello, Konstantin.
> >
> >On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> >>With vm.dirty_write_behind 1 or 2 files are written even faster and
> >
> >Is the faster speed reproducible? I don't quite understand why this
> >would be.
>
> Writing to disk simply starts earlier.

I see.

> >Generic write-behind would definitely have other benefits and also a
> >bunch of regression possibilities. I'm not trying to say that
> >write-behind isn't a good idea but it'd be useful to consider that a
> >good portion of the benefits can already be obtained fairly easily.
> >
>
> I'm afraid this could end badly if each simple task like file copying
> will require own systemd job and container with manual tuning.

At least the write window size part of it is pretty easy - the range
of acceptable values is fiarly wide - and setting up a cgroup and
running a command in it isn't that expensive. It's not like these
need full-on containers. That said, yes, there sure are benefits to
the kernel being able to detect and handle these conditions
automagically.

Thanks.

--
tejun

2019-09-25 23:07:27

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [mm] e0e7df8d5b: will-it-scale.per_process_ops -7.3% regression

On Mon, Sep 23, 2019 at 3:37 AM kernel test robot <[email protected]> wrote:
>
> Greeting,
>
> FYI, we noticed a -7.3% regression of will-it-scale.per_process_ops due to commit:

Most likely this caused by changing struct file layout after adding new field.

>
>
> commit: e0e7df8d5b71bf59ad93fe75e662c929b580d805 ("[PATCH v2] mm: implement write-behind policy for sequential file writes")
> url: https://github.com/0day-ci/linux/commits/Konstantin-Khlebnikov/mm-implement-write-behind-policy-for-sequential-file-writes/20190920-155606
>
>
> in testcase: will-it-scale
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> nr_task: 100%
> mode: process
> test: open1
> cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-05-14.cgz/lkp-csl-2ap4/open1/will-it-scale
>
> commit:
> 574cc45397 (" drm main pull for 5.4-rc1")
> e0e7df8d5b ("mm: implement write-behind policy for sequential file writes")
>
> 574cc4539762561d e0e7df8d5b71bf59ad93fe75e66
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 370456 -7.3% 343238 will-it-scale.per_process_ops
> 71127653 -7.3% 65901758 will-it-scale.workload
> 828565 ± 23% +66.8% 1381984 ± 23% cpuidle.C1.time
> 1499 +1.1% 1515 turbostat.Avg_MHz
> 163498 ± 5% +26.4% 206691 ± 4% slabinfo.filp.active_slabs
> 163498 ± 5% +26.4% 206691 ± 4% slabinfo.filp.num_slabs
> 39055 ± 2% +17.1% 45720 ± 5% meminfo.Inactive
> 38615 ± 2% +17.3% 45291 ± 5% meminfo.Inactive(anon)
> 51382 ± 3% +19.6% 61469 ± 7% meminfo.Mapped
> 5163010 ± 2% +12.7% 5819765 ± 3% meminfo.Memused
> 2840181 ± 3% +22.5% 3478003 ± 5% meminfo.SUnreclaim
> 2941874 ± 3% +21.7% 3579791 ± 5% meminfo.Slab
> 67755 ± 5% +23.8% 83884 ± 3% meminfo.max_used_kB
> 79719901 +17.3% 93512842 numa-numastat.node0.local_node
> 79738690 +17.3% 93533079 numa-numastat.node0.numa_hit
> 81987497 +16.6% 95625946 numa-numastat.node1.local_node
> 82018695 +16.6% 95652480 numa-numastat.node1.numa_hit
> 82693483 +15.8% 95762465 numa-numastat.node2.local_node
> 82705924 +15.8% 95789007 numa-numastat.node2.numa_hit
> 80329941 +17.1% 94048289 numa-numastat.node3.local_node
> 80361116 +17.1% 94068512 numa-numastat.node3.numa_hit
> 9678 ± 2% +17.1% 11334 ± 5% proc-vmstat.nr_inactive_anon
> 13001 ± 3% +19.2% 15503 ± 7% proc-vmstat.nr_mapped
> 738232 ± 4% +18.5% 875062 ± 2% proc-vmstat.nr_slab_unreclaimable
> 9678 ± 2% +17.1% 11334 ± 5% proc-vmstat.nr_zone_inactive_anon
> 2391 ± 92% -84.5% 369.50 ± 46% proc-vmstat.numa_hint_faults
> 3.243e+08 +16.8% 3.789e+08 proc-vmstat.numa_hit
> 3.242e+08 +16.8% 3.788e+08 proc-vmstat.numa_local
> 1.296e+09 +16.8% 1.514e+09 proc-vmstat.pgalloc_normal
> 1.296e+09 +16.8% 1.514e+09 proc-vmstat.pgfree
> 862.61 ± 5% +37.7% 1188 ± 5% sched_debug.cfs_rq:/.exec_clock.stddev
> 229663 ± 62% +113.3% 489907 ± 29% sched_debug.cfs_rq:/.load.max
> 491.04 ± 4% -9.5% 444.29 ± 7% sched_debug.cfs_rq:/.nr_spread_over.min
> 229429 ± 62% +113.4% 489618 ± 29% sched_debug.cfs_rq:/.runnable_weight.max
> -1959962 +36.2% -2669681 sched_debug.cfs_rq:/.spread0.min
> 1416008 ± 2% -13.3% 1227494 ± 5% sched_debug.cpu.avg_idle.avg
> 1240763 ± 8% -28.2% 891028 ± 18% sched_debug.cpu.avg_idle.stddev
> 352361 ± 6% -29.6% 248105 ± 25% sched_debug.cpu.max_idle_balance_cost.stddev
> -20.00 +51.0% -30.21 sched_debug.cpu.nr_uninterruptible.min
> 6618 ± 10% -20.8% 5240 ± 8% sched_debug.cpu.ttwu_count.max
> 1452719 ± 4% +7.2% 1557262 ± 3% numa-meminfo.node0.MemUsed
> 797565 ± 2% +20.8% 963538 ± 2% numa-meminfo.node0.SUnreclaim
> 835343 ± 3% +19.6% 998867 ± 2% numa-meminfo.node0.Slab
> 831114 ± 2% +20.1% 998248 ± 2% numa-meminfo.node1.SUnreclaim
> 848052 +19.8% 1016069 ± 2% numa-meminfo.node1.Slab
> 1441558 ± 6% +15.7% 1668466 ± 3% numa-meminfo.node2.MemUsed
> 879835 ± 2% +20.4% 1059441 numa-meminfo.node2.SUnreclaim
> 901359 ± 3% +20.3% 1084727 ± 2% numa-meminfo.node2.Slab
> 1446041 ± 5% +15.5% 1669477 ± 3% numa-meminfo.node3.MemUsed
> 899442 ± 5% +23.0% 1106354 numa-meminfo.node3.SUnreclaim
> 924903 ± 5% +22.1% 1129709 numa-meminfo.node3.Slab
> 198945 +19.8% 238298 ± 2% numa-vmstat.node0.nr_slab_unreclaimable
> 40181885 +17.3% 47129598 numa-vmstat.node0.numa_hit
> 40163521 +17.3% 47110122 numa-vmstat.node0.numa_local
> 208512 +20.9% 252000 ± 2% numa-vmstat.node1.nr_slab_unreclaimable
> 41144466 +16.7% 48021716 numa-vmstat.node1.numa_hit
> 41027051 +16.8% 47908675 numa-vmstat.node1.numa_local
> 220763 ± 2% +21.9% 269115 ± 2% numa-vmstat.node2.nr_slab_unreclaimable
> 41437805 +16.2% 48167791 numa-vmstat.node2.numa_hit
> 41338581 +16.2% 48054485 numa-vmstat.node2.numa_local
> 225216 ± 2% +24.7% 280851 ± 2% numa-vmstat.node3.nr_slab_unreclaimable
> 40385721 +16.9% 47195289 numa-vmstat.node3.numa_hit
> 40268228 +16.9% 47088405 numa-vmstat.node3.numa_local
> 77.00 ± 29% +494.8% 458.00 ±110% interrupts.CPU10.RES:Rescheduling_interrupts
> 167.25 ± 65% +347.8% 749.00 ± 85% interrupts.CPU103.RES:Rescheduling_interrupts
> 136.50 ± 42% +309.2% 558.50 ± 85% interrupts.CPU107.RES:Rescheduling_interrupts
> 132.50 ± 26% +637.5% 977.25 ± 50% interrupts.CPU109.RES:Rescheduling_interrupts
> 212.50 ± 51% -65.2% 74.00 ± 9% interrupts.CPU115.RES:Rescheduling_interrupts
> 270.25 ± 20% -77.2% 61.50 ± 10% interrupts.CPU121.RES:Rescheduling_interrupts
> 184.00 ± 50% -57.5% 78.25 ± 51% interrupts.CPU128.RES:Rescheduling_interrupts
> 85.25 ± 38% +911.4% 862.25 ±135% interrupts.CPU137.RES:Rescheduling_interrupts
> 72.25 ± 6% +114.2% 154.75 ± 25% interrupts.CPU147.RES:Rescheduling_interrupts
> 415.00 ± 75% -69.8% 125.25 ± 59% interrupts.CPU15.RES:Rescheduling_interrupts
> 928.25 ± 93% -89.8% 94.50 ± 50% interrupts.CPU182.RES:Rescheduling_interrupts
> 359.75 ± 76% -58.8% 148.25 ± 85% interrupts.CPU19.RES:Rescheduling_interrupts
> 95.75 ± 30% +103.9% 195.25 ± 48% interrupts.CPU45.RES:Rescheduling_interrupts
> 60.25 ± 9% +270.5% 223.25 ± 93% interrupts.CPU83.RES:Rescheduling_interrupts
> 906.75 ±136% -90.5% 85.75 ± 36% interrupts.CPU85.RES:Rescheduling_interrupts
> 199.25 ± 25% -52.1% 95.50 ± 43% interrupts.CPU90.RES:Rescheduling_interrupts
> 5192 ± 34% +41.5% 7347 ± 24% interrupts.CPU95.NMI:Non-maskable_interrupts
> 5192 ± 34% +41.5% 7347 ± 24% interrupts.CPU95.PMI:Performance_monitoring_interrupts
> 1.75 +26.1% 2.20 perf-stat.i.MPKI
> 7.975e+10 -6.8% 7.435e+10 perf-stat.i.branch-instructions
> 3.782e+08 -5.9% 3.558e+08 perf-stat.i.branch-misses
> 75.36 +0.9 76.29 perf-stat.i.cache-miss-rate%
> 5.484e+08 +18.8% 6.515e+08 perf-stat.i.cache-misses
> 7.276e+08 +17.3% 8.539e+08 perf-stat.i.cache-references
> 1.37 +8.2% 1.48 perf-stat.i.cpi
> 5.701e+11 +0.7% 5.744e+11 perf-stat.i.cpu-cycles
> 1040 -15.2% 882.10 perf-stat.i.cycles-between-cache-misses
> 1.253e+11 -7.2% 1.163e+11 perf-stat.i.dTLB-loads
> 7.443e+10 -7.2% 6.904e+10 perf-stat.i.dTLB-stores
> 3.336e+08 +12.6% 3.755e+08 perf-stat.i.iTLB-load-misses
> 5004598 ± 7% -60.9% 1954451 ± 6% perf-stat.i.iTLB-loads
> 4.175e+11 -6.9% 3.887e+11 perf-stat.i.instructions
> 1251 -17.3% 1035 perf-stat.i.instructions-per-iTLB-miss
> 0.73 -7.6% 0.68 perf-stat.i.ipc
> 19.77 -1.5 18.31 perf-stat.i.node-load-miss-rate%
> 5003202 ± 2% +16.5% 5829006 perf-stat.i.node-load-misses
> 20521507 +28.1% 26283838 perf-stat.i.node-loads
> 1.84 +0.4 2.28 perf-stat.i.node-store-miss-rate%
> 1469703 +29.0% 1895783 perf-stat.i.node-store-misses
> 78304054 +4.0% 81463725 perf-stat.i.node-stores
> 1.74 +26.1% 2.20 perf-stat.overall.MPKI
> 75.37 +0.9 76.30 perf-stat.overall.cache-miss-rate%
> 1.37 +8.2% 1.48 perf-stat.overall.cpi
> 1039 -15.2% 881.41 perf-stat.overall.cycles-between-cache-misses
> 1251 -17.3% 1035 perf-stat.overall.instructions-per-iTLB-miss
> 0.73 -7.6% 0.68 perf-stat.overall.ipc
> 19.59 -1.5 18.14 perf-stat.overall.node-load-miss-rate%
> 1.84 +0.4 2.27 perf-stat.overall.node-store-miss-rate%
> 7.943e+10 -6.8% 7.404e+10 perf-stat.ps.branch-instructions
> 3.767e+08 -5.9% 3.543e+08 perf-stat.ps.branch-misses
> 5.465e+08 +18.8% 6.492e+08 perf-stat.ps.cache-misses
> 7.25e+08 +17.4% 8.508e+08 perf-stat.ps.cache-references
> 5.68e+11 +0.7% 5.722e+11 perf-stat.ps.cpu-cycles
> 1.248e+11 -7.2% 1.158e+11 perf-stat.ps.dTLB-loads
> 7.413e+10 -7.3% 6.874e+10 perf-stat.ps.dTLB-stores
> 3.322e+08 +12.5% 3.739e+08 perf-stat.ps.iTLB-load-misses
> 4986239 ± 7% -61.0% 1946378 ± 6% perf-stat.ps.iTLB-loads
> 4.158e+11 -6.9% 3.87e+11 perf-stat.ps.instructions
> 4982520 ± 2% +16.5% 5803884 perf-stat.ps.node-load-misses
> 20448588 +28.1% 26201547 perf-stat.ps.node-loads
> 1463675 +29.0% 1887791 perf-stat.ps.node-store-misses
> 77979119 +4.0% 81107191 perf-stat.ps.node-stores
> 1.25e+14 -6.8% 1.165e+14 perf-stat.total.instructions
> 10.11 -1.9 8.21 perf-profile.calltrace.cycles-pp.file_free_rcu.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd
> 17.28 -0.8 16.48 perf-profile.calltrace.cycles-pp.close
> 9.41 -0.7 8.69 perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 6.32 -0.7 5.64 perf-profile.calltrace.cycles-pp.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 5.27 -0.5 4.72 perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.96 -0.5 13.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.close
> 13.58 -0.4 13.14 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
> 0.92 -0.3 0.64 perf-profile.calltrace.cycles-pp.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
> 3.10 -0.2 2.86 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
> 2.44 -0.2 2.21 perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_open
> 4.02 -0.2 3.80 perf-profile.calltrace.cycles-pp.selinux_inode_permission.security_inode_permission.link_path_walk.path_openat.do_filp_open
> 1.82 -0.2 1.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.open64
> 9.26 -0.2 9.04 perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
> 2.12 ± 2% -0.2 1.90 perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open
> 1.03 ± 10% -0.2 0.82 perf-profile.calltrace.cycles-pp.inode_permission.link_path_walk.path_openat.do_filp_open.do_sys_open
> 2.55 ± 2% -0.2 2.36 ± 3% perf-profile.calltrace.cycles-pp.security_inode_permission.may_open.path_openat.do_filp_open.do_sys_open
> 1.37 -0.2 1.18 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.15 -0.2 0.95 perf-profile.calltrace.cycles-pp.ima_file_check.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 1.79 -0.2 1.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.close
> 2.41 ± 3% -0.2 2.22 ± 3% perf-profile.calltrace.cycles-pp.selinux_inode_permission.security_inode_permission.may_open.path_openat.do_filp_open
> 2.88 -0.2 2.71 perf-profile.calltrace.cycles-pp.security_file_open.do_dentry_open.path_openat.do_filp_open.do_sys_open
> 2.38 -0.2 2.22 perf-profile.calltrace.cycles-pp.security_file_alloc.__alloc_file.alloc_empty_file.path_openat.do_filp_open
> 4.31 -0.2 4.16 perf-profile.calltrace.cycles-pp.security_inode_permission.link_path_walk.path_openat.do_filp_open.do_sys_open
> 9.93 -0.1 9.80 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close
> 1.63 -0.1 1.50 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.security_file_alloc.__alloc_file.alloc_empty_file.path_openat
> 1.38 -0.1 1.26 perf-profile.calltrace.cycles-pp.__alloc_fd.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> 5.16 -0.1 5.04 perf-profile.calltrace.cycles-pp.getname_flags.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> 1.13 -0.1 1.02 perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_open
> 2.26 -0.1 2.15 perf-profile.calltrace.cycles-pp.selinux_file_open.security_file_open.do_dentry_open.path_openat.do_filp_open
> 0.63 -0.1 0.52 ± 2% perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.strncpy_from_user.getname_flags.do_sys_open
> 1.29 -0.1 1.18 perf-profile.calltrace.cycles-pp.lookup_fast.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 1.75 -0.1 1.65 perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 0.67 -0.1 0.58 perf-profile.calltrace.cycles-pp.kmem_cache_free.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> 1.22 ± 2% -0.1 1.12 perf-profile.calltrace.cycles-pp.avc_has_perm_noaudit.selinux_inode_permission.security_inode_permission.link_path_walk.path_openat
> 1.21 -0.1 1.12 perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.74 -0.1 0.66 perf-profile.calltrace.cycles-pp.__inode_security_revalidate.selinux_file_open.security_file_open.do_dentry_open.path_openat
> 0.89 -0.1 0.81 perf-profile.calltrace.cycles-pp.inode_security_rcu.selinux_inode_permission.security_inode_permission.may_open.path_openat
> 0.79 ± 4% -0.1 0.72 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.76 -0.1 0.70 perf-profile.calltrace.cycles-pp.__inode_security_revalidate.inode_security_rcu.selinux_inode_permission.security_inode_permission.may_open
> 0.67 ± 3% -0.1 0.61 perf-profile.calltrace.cycles-pp.__d_lookup_rcu.lookup_fast.path_openat.do_filp_open.do_sys_open
> 0.66 ± 3% -0.1 0.60 perf-profile.calltrace.cycles-pp.inode_permission.may_open.path_openat.do_filp_open.do_sys_open
> 1.02 -0.1 0.96 perf-profile.calltrace.cycles-pp.path_init.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 0.81 -0.1 0.75 perf-profile.calltrace.cycles-pp.task_work_add.fput_many.filp_close.__x64_sys_close.do_syscall_64
> 0.67 -0.0 0.63 perf-profile.calltrace.cycles-pp.rcu_segcblist_enqueue.__call_rcu.task_work_run.exit_to_usermode_loop.do_syscall_64
> 0.78 -0.0 0.74 perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.__softirqentry_text_start
> 0.55 -0.0 0.53 perf-profile.calltrace.cycles-pp.selinux_file_alloc_security.security_file_alloc.__alloc_file.alloc_empty_file.path_openat
> 0.71 +0.1 0.82 perf-profile.calltrace.cycles-pp.memset_erms.kmem_cache_alloc.__alloc_file.alloc_empty_file.path_openat
> 3.38 +0.1 3.50 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.66 +0.1 1.78 perf-profile.calltrace.cycles-pp.__call_rcu.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.70 +0.1 0.84 perf-profile.calltrace.cycles-pp.__virt_addr_valid.__check_object_size.strncpy_from_user.getname_flags.do_sys_open
> 1.81 +0.4 2.23 perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.do_sys_open.do_syscall_64
> 39.47 +0.7 40.17 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
> 0.00 +0.8 0.75 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.new_slab.___slab_alloc.__slab_alloc
> 38.69 +0.8 39.45 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +0.8 0.84 perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.new_slab.___slab_alloc.__slab_alloc.kmem_cache_alloc
> 29.90 +0.9 30.79 perf-profile.calltrace.cycles-pp.__softirqentry_text_start.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
> 29.90 +0.9 30.79 perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
> 29.87 +0.9 30.76 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd.smpboot_thread_fn
> 29.88 +0.9 30.78 perf-profile.calltrace.cycles-pp.rcu_core.__softirqentry_text_start.run_ksoftirqd.smpboot_thread_fn.kthread
> 29.93 +0.9 30.84 perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork
> 29.94 +0.9 30.85 perf-profile.calltrace.cycles-pp.ret_from_fork
> 29.94 +0.9 30.85 perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
> 0.89 ± 29% +0.9 1.81 perf-profile.calltrace.cycles-pp.setup_object_debug.new_slab.___slab_alloc.__slab_alloc.kmem_cache_alloc
> 7.25 ± 3% +1.1 8.36 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_one_page.__free_pages_ok.unfreeze_partials
> 7.75 ± 3% +1.1 8.87 perf-profile.calltrace.cycles-pp.__free_pages_ok.unfreeze_partials.put_cpu_partial.kmem_cache_free.rcu_do_batch
> 7.72 ± 3% +1.1 8.85 perf-profile.calltrace.cycles-pp.free_one_page.__free_pages_ok.unfreeze_partials.put_cpu_partial.kmem_cache_free
> 7.29 ± 3% +1.1 8.41 perf-profile.calltrace.cycles-pp._raw_spin_lock.free_one_page.__free_pages_ok.unfreeze_partials.put_cpu_partial
> 9.12 ± 3% +1.1 10.25 perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd
> 7.96 ± 3% +1.1 9.10 perf-profile.calltrace.cycles-pp.put_cpu_partial.kmem_cache_free.rcu_do_batch.rcu_core.__softirqentry_text_start
> 7.92 ± 3% +1.1 9.07 perf-profile.calltrace.cycles-pp.unfreeze_partials.put_cpu_partial.kmem_cache_free.rcu_do_batch.rcu_core
> 2.38 +1.5 3.83 perf-profile.calltrace.cycles-pp.new_slab.___slab_alloc.__slab_alloc.kmem_cache_alloc.__alloc_file
> 10.53 +1.7 12.19 perf-profile.calltrace.cycles-pp.rcu_cblist_dequeue.rcu_do_batch.rcu_core.__softirqentry_text_start.run_ksoftirqd
> 5.47 +2.2 7.64 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.__alloc_file.alloc_empty_file.path_openat.do_filp_open
> 3.34 +2.2 5.56 perf-profile.calltrace.cycles-pp.___slab_alloc.__slab_alloc.kmem_cache_alloc.__alloc_file.alloc_empty_file
> 3.39 +2.3 5.65 perf-profile.calltrace.cycles-pp.__slab_alloc.kmem_cache_alloc.__alloc_file.alloc_empty_file.path_openat
> 11.39 +2.7 14.08 perf-profile.calltrace.cycles-pp.alloc_empty_file.path_openat.do_filp_open.do_sys_open.do_syscall_64
> 10.91 +2.7 13.63 perf-profile.calltrace.cycles-pp.__alloc_file.alloc_empty_file.path_openat.do_filp_open.do_sys_open
> 10.62 -2.1 8.54 perf-profile.children.cycles-pp.file_free_rcu
> 17.31 -0.8 16.51 perf-profile.children.cycles-pp.close
> 9.47 -0.7 8.74 perf-profile.children.cycles-pp.link_path_walk
> 6.37 -0.7 5.68 perf-profile.children.cycles-pp.do_dentry_open
> 5.48 -0.6 4.90 perf-profile.children.cycles-pp.__fput
> 6.49 -0.4 6.08 perf-profile.children.cycles-pp.selinux_inode_permission
> 6.95 -0.3 6.60 perf-profile.children.cycles-pp.security_inode_permission
> 3.48 -0.3 3.15 perf-profile.children.cycles-pp.lookup_fast
> 2.38 -0.3 2.09 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 1.74 ± 5% -0.3 1.46 perf-profile.children.cycles-pp.inode_permission
> 0.94 -0.3 0.66 perf-profile.children.cycles-pp.__close_fd
> 3.10 -0.2 2.86 perf-profile.children.cycles-pp.__x64_sys_close
> 2.27 ± 2% -0.2 2.04 ± 2% perf-profile.children.cycles-pp.dput
> 2.47 -0.2 2.24 perf-profile.children.cycles-pp.walk_component
> 2.21 ± 2% -0.2 1.98 perf-profile.children.cycles-pp.___might_sleep
> 2.24 -0.2 2.02 perf-profile.children.cycles-pp.syscall_return_via_sysret
> 9.32 -0.2 9.12 perf-profile.children.cycles-pp.task_work_run
> 1.17 -0.2 0.97 perf-profile.children.cycles-pp.ima_file_check
> 1.99 -0.2 1.80 perf-profile.children.cycles-pp.__inode_security_revalidate
> 2.92 -0.2 2.73 perf-profile.children.cycles-pp.security_file_open
> 0.56 -0.2 0.38 perf-profile.children.cycles-pp.selinux_task_getsecid
> 0.69 -0.2 0.51 perf-profile.children.cycles-pp.security_task_getsecid
> 2.40 -0.2 2.24 perf-profile.children.cycles-pp.security_file_alloc
> 0.20 ± 4% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.try_module_get
> 1.44 -0.1 1.31 perf-profile.children.cycles-pp.__might_sleep
> 10.01 -0.1 9.88 perf-profile.children.cycles-pp.exit_to_usermode_loop
> 1.46 -0.1 1.33 perf-profile.children.cycles-pp.inode_security_rcu
> 1.00 -0.1 0.87 perf-profile.children.cycles-pp._cond_resched
> 5.20 -0.1 5.08 perf-profile.children.cycles-pp.getname_flags
> 1.05 -0.1 0.93 perf-profile.children.cycles-pp.__fsnotify_parent
> 1.42 -0.1 1.30 perf-profile.children.cycles-pp.fsnotify
> 1.41 -0.1 1.29 perf-profile.children.cycles-pp.__alloc_fd
> 2.29 -0.1 2.18 perf-profile.children.cycles-pp.selinux_file_open
> 0.64 -0.1 0.53 perf-profile.children.cycles-pp.__check_heap_object
> 1.42 ± 2% -0.1 1.31 ± 2% perf-profile.children.cycles-pp.irq_exit
> 1.80 -0.1 1.69 perf-profile.children.cycles-pp.terminate_walk
> 0.33 -0.1 0.23 perf-profile.children.cycles-pp.file_ra_state_init
> 0.65 ± 3% -0.1 0.56 perf-profile.children.cycles-pp.generic_permission
> 1.23 -0.1 1.15 perf-profile.children.cycles-pp.fput_many
> 0.83 ± 3% -0.1 0.74 perf-profile.children.cycles-pp._raw_spin_lock_irq
> 0.53 -0.1 0.45 ± 2% perf-profile.children.cycles-pp.rcu_all_qs
> 0.58 ± 5% -0.1 0.51 ± 2% perf-profile.children.cycles-pp.mntput_no_expire
> 0.75 -0.1 0.69 perf-profile.children.cycles-pp.lockref_put_or_lock
> 1.03 -0.1 0.97 perf-profile.children.cycles-pp.path_init
> 0.84 -0.1 0.78 perf-profile.children.cycles-pp.task_work_add
> 0.14 ± 3% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.ima_file_free
> 0.26 ± 7% -0.1 0.21 ± 2% perf-profile.children.cycles-pp.path_get
> 0.83 -0.0 0.78 perf-profile.children.cycles-pp.__slab_free
> 0.62 -0.0 0.58 perf-profile.children.cycles-pp.percpu_counter_add_batch
> 0.67 -0.0 0.63 perf-profile.children.cycles-pp.rcu_segcblist_enqueue
> 0.20 ± 11% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.mntget
> 0.22 ± 4% -0.0 0.19 ± 3% perf-profile.children.cycles-pp.get_unused_fd_flags
> 0.10 ± 14% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.close@plt
> 0.34 ± 2% -0.0 0.31 perf-profile.children.cycles-pp.lockref_get
> 0.24 -0.0 0.21 ± 2% perf-profile.children.cycles-pp.__x64_sys_open
> 0.11 ± 8% -0.0 0.08 ± 10% perf-profile.children.cycles-pp.putname
> 0.18 ± 2% -0.0 0.16 perf-profile.children.cycles-pp.should_failslab
> 0.55 -0.0 0.53 perf-profile.children.cycles-pp.selinux_file_alloc_security
> 0.21 ± 3% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.expand_files
> 0.07 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.module_put
> 0.12 -0.0 0.10 ± 4% perf-profile.children.cycles-pp.security_file_free
> 0.17 -0.0 0.15 ± 3% perf-profile.children.cycles-pp.find_next_zero_bit
> 0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.memset
> 0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__mutex_init
> 0.10 -0.0 0.09 perf-profile.children.cycles-pp.mntput
> 0.12 +0.0 0.13 ± 3% perf-profile.children.cycles-pp.__list_del_entry_valid
> 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.discard_slab
> 0.08 +0.0 0.10 ± 4% perf-profile.children.cycles-pp.kick_process
> 0.04 ± 57% +0.0 0.07 ± 7% perf-profile.children.cycles-pp.native_irq_return_iret
> 0.12 ± 4% +0.0 0.15 ± 3% perf-profile.children.cycles-pp.blkcg_maybe_throttle_current
> 1.31 +0.0 1.34 perf-profile.children.cycles-pp.memset_erms
> 0.40 +0.0 0.44 perf-profile.children.cycles-pp.lockref_get_not_dead
> 0.07 ± 6% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.rcu_segcblist_pend_cbs
> 0.01 ±173% +0.0 0.06 ± 11% perf-profile.children.cycles-pp.native_write_msr
> 0.27 ± 6% +0.1 0.33 ± 10% perf-profile.children.cycles-pp.ktime_get
> 0.16 ± 5% +0.1 0.22 ± 4% perf-profile.children.cycles-pp.get_partial_node
> 0.01 ±173% +0.1 0.07 ± 30% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
> 0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.____fput
> 0.05 +0.1 0.15 ± 3% perf-profile.children.cycles-pp.__mod_zone_page_state
> 0.12 ± 16% +0.1 0.23 ± 10% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
> 0.05 ± 8% +0.1 0.16 ± 2% perf-profile.children.cycles-pp.legitimize_links
> 1.71 +0.1 1.83 perf-profile.children.cycles-pp.__call_rcu
> 3.40 +0.1 3.52 perf-profile.children.cycles-pp.strncpy_from_user
> 0.30 ± 2% +0.1 0.43 ± 2% perf-profile.children.cycles-pp.locks_remove_posix
> 0.72 +0.1 0.86 perf-profile.children.cycles-pp.__virt_addr_valid
> 0.08 +0.1 0.23 ± 3% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 0.84 ± 9% +0.2 1.02 ± 8% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.15 ± 3% +0.2 0.38 perf-profile.children.cycles-pp.check_stack_object
> 0.65 +0.4 1.05 perf-profile.children.cycles-pp.setup_object_debug
> 0.36 +0.4 0.76 perf-profile.children.cycles-pp.get_page_from_freelist
> 1.90 +0.4 2.31 perf-profile.children.cycles-pp.__check_object_size
> 0.39 +0.4 0.84 perf-profile.children.cycles-pp.__alloc_pages_nodemask
> 39.52 +0.7 40.22 perf-profile.children.cycles-pp.do_filp_open
> 38.84 +0.8 39.59 perf-profile.children.cycles-pp.path_openat
> 31.27 +0.8 32.05 perf-profile.children.cycles-pp.rcu_core
> 31.26 +0.8 32.03 perf-profile.children.cycles-pp.rcu_do_batch
> 31.31 +0.8 32.09 perf-profile.children.cycles-pp.__softirqentry_text_start
> 29.90 +0.9 30.79 perf-profile.children.cycles-pp.run_ksoftirqd
> 29.93 +0.9 30.84 perf-profile.children.cycles-pp.smpboot_thread_fn
> 29.94 +0.9 30.85 perf-profile.children.cycles-pp.kthread
> 29.94 +0.9 30.85 perf-profile.children.cycles-pp.ret_from_fork
> 10.63 ± 2% +1.0 11.61 perf-profile.children.cycles-pp.kmem_cache_free
> 8.45 ± 3% +1.1 9.57 perf-profile.children.cycles-pp._raw_spin_lock
> 7.96 ± 3% +1.1 9.11 perf-profile.children.cycles-pp.__free_pages_ok
> 7.93 ± 3% +1.2 9.08 perf-profile.children.cycles-pp.free_one_page
> 8.19 ± 3% +1.2 9.36 perf-profile.children.cycles-pp.put_cpu_partial
> 8.15 ± 3% +1.2 9.32 perf-profile.children.cycles-pp.unfreeze_partials
> 7.59 ± 3% +1.3 8.89 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 11.12 +1.7 12.83 perf-profile.children.cycles-pp.rcu_cblist_dequeue
> 2.88 +1.7 4.61 perf-profile.children.cycles-pp.new_slab
> 8.73 +1.8 10.54 perf-profile.children.cycles-pp.kmem_cache_alloc
> 3.34 +2.2 5.56 perf-profile.children.cycles-pp.___slab_alloc
> 3.39 +2.3 5.65 perf-profile.children.cycles-pp.__slab_alloc
> 11.45 +2.7 14.12 perf-profile.children.cycles-pp.alloc_empty_file
> 10.98 +2.7 13.70 perf-profile.children.cycles-pp.__alloc_file
> 10.53 -2.1 8.47 perf-profile.self.cycles-pp.file_free_rcu
> 2.37 -0.3 2.05 perf-profile.self.cycles-pp.kmem_cache_alloc
> 1.43 -0.3 1.17 perf-profile.self.cycles-pp.strncpy_from_user
> 0.50 -0.2 0.26 ± 2% perf-profile.self.cycles-pp.__close_fd
> 2.22 -0.2 2.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
> 2.07 ± 2% -0.2 1.86 perf-profile.self.cycles-pp.___might_sleep
> 1.01 ± 7% -0.2 0.82 perf-profile.self.cycles-pp.inode_permission
> 1.13 -0.2 0.96 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
> 3.10 -0.2 2.93 perf-profile.self.cycles-pp.selinux_inode_permission
> 0.52 -0.2 0.35 perf-profile.self.cycles-pp.selinux_task_getsecid
> 1.33 -0.1 1.18 ± 2% perf-profile.self.cycles-pp.do_dentry_open
> 1.55 -0.1 1.42 perf-profile.self.cycles-pp.kmem_cache_free
> 1.55 -0.1 1.43 perf-profile.self.cycles-pp.link_path_walk
> 0.17 ± 4% -0.1 0.04 ± 57% perf-profile.self.cycles-pp.try_module_get
> 1.35 -0.1 1.24 perf-profile.self.cycles-pp.fsnotify
> 0.96 -0.1 0.85 perf-profile.self.cycles-pp.__fsnotify_parent
> 1.25 ± 2% -0.1 1.14 perf-profile.self.cycles-pp.__might_sleep
> 0.87 ± 2% -0.1 0.78 ± 2% perf-profile.self.cycles-pp.lookup_fast
> 0.79 ± 2% -0.1 0.70 ± 2% perf-profile.self.cycles-pp.do_syscall_64
> 1.02 -0.1 0.93 perf-profile.self.cycles-pp.do_sys_open
> 0.30 -0.1 0.22 perf-profile.self.cycles-pp.file_ra_state_init
> 0.58 -0.1 0.50 perf-profile.self.cycles-pp.__check_heap_object
> 0.80 ± 3% -0.1 0.72 perf-profile.self.cycles-pp._raw_spin_lock_irq
> 0.59 ± 2% -0.1 0.52 ± 2% perf-profile.self.cycles-pp.generic_permission
> 1.17 -0.1 1.09 perf-profile.self.cycles-pp.__fput
> 0.68 -0.1 0.60 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.84 -0.1 0.77 perf-profile.self.cycles-pp.__inode_security_revalidate
> 0.73 -0.1 0.66 perf-profile.self.cycles-pp.task_work_add
> 0.39 -0.1 0.32 ± 2% perf-profile.self.cycles-pp.rcu_all_qs
> 0.54 ± 5% -0.1 0.47 ± 2% perf-profile.self.cycles-pp.mntput_no_expire
> 0.50 ± 6% -0.1 0.44 ± 4% perf-profile.self.cycles-pp.dput
> 0.46 -0.1 0.40 perf-profile.self.cycles-pp._cond_resched
> 0.93 -0.1 0.88 ± 2% perf-profile.self.cycles-pp.close
> 0.11 ± 4% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.ima_file_free
> 0.83 -0.1 0.77 perf-profile.self.cycles-pp.__slab_free
> 0.61 -0.1 0.56 perf-profile.self.cycles-pp.__alloc_fd
> 0.87 -0.1 0.82 perf-profile.self.cycles-pp._raw_spin_lock
> 0.69 -0.0 0.64 perf-profile.self.cycles-pp.lockref_put_or_lock
> 0.67 -0.0 0.62 perf-profile.self.cycles-pp.rcu_segcblist_enqueue
> 0.46 -0.0 0.41 perf-profile.self.cycles-pp.do_filp_open
> 0.56 -0.0 0.51 perf-profile.self.cycles-pp.percpu_counter_add_batch
> 1.05 ± 2% -0.0 1.01 perf-profile.self.cycles-pp.path_openat
> 0.28 ± 2% -0.0 0.24 ± 4% perf-profile.self.cycles-pp.security_file_open
> 0.94 ± 2% -0.0 0.90 perf-profile.self.cycles-pp.open64
> 0.21 ± 6% -0.0 0.17 ± 4% perf-profile.self.cycles-pp.get_unused_fd_flags
> 0.17 ± 13% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.mntget
> 0.39 -0.0 0.35 perf-profile.self.cycles-pp.fput_many
> 0.37 -0.0 0.34 perf-profile.self.cycles-pp.getname_flags
> 0.43 -0.0 0.41 perf-profile.self.cycles-pp.path_init
> 0.33 -0.0 0.30 ± 2% perf-profile.self.cycles-pp.lockref_get
> 0.20 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.filp_close
> 0.52 -0.0 0.50 perf-profile.self.cycles-pp.selinux_file_alloc_security
> 0.22 -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__x64_sys_open
> 0.28 ± 2% -0.0 0.26 perf-profile.self.cycles-pp.inode_security_rcu
> 0.19 ± 4% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.expand_files
> 0.09 ± 9% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.putname
> 0.10 -0.0 0.08 ± 5% perf-profile.self.cycles-pp.security_file_free
> 0.15 ± 3% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.find_next_zero_bit
> 0.12 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.nd_jump_root
> 0.08 -0.0 0.07 perf-profile.self.cycles-pp.fd_install
> 0.06 -0.0 0.05 perf-profile.self.cycles-pp.path_get
> 0.12 +0.0 0.13 ± 3% perf-profile.self.cycles-pp.__list_del_entry_valid
> 0.07 ± 5% +0.0 0.09 perf-profile.self.cycles-pp.get_partial_node
> 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.self.cycles-pp.discard_slab
> 0.11 +0.0 0.13 ± 3% perf-profile.self.cycles-pp.blkcg_maybe_throttle_current
> 0.06 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.kick_process
> 0.28 ± 2% +0.0 0.30 perf-profile.self.cycles-pp.__x64_sys_close
> 0.04 ± 57% +0.0 0.07 ± 7% perf-profile.self.cycles-pp.native_irq_return_iret
> 0.39 +0.0 0.42 perf-profile.self.cycles-pp.lockref_get_not_dead
> 0.53 +0.0 0.57 perf-profile.self.cycles-pp.exit_to_usermode_loop
> 0.06 ± 7% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.rcu_segcblist_pend_cbs
> 0.01 ±173% +0.0 0.06 ± 11% perf-profile.self.cycles-pp.native_write_msr
> 0.28 +0.0 0.33 perf-profile.self.cycles-pp.terminate_walk
> 0.27 ± 5% +0.1 0.32 ± 11% perf-profile.self.cycles-pp.ktime_get
> 0.00 +0.1 0.05 ± 9% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.43 ± 5% +0.1 0.49 ± 3% perf-profile.self.cycles-pp.security_inode_permission
> 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.____fput
> 0.00 +0.1 0.08 perf-profile.self.cycles-pp.__alloc_pages_nodemask
> 0.05 +0.1 0.15 ± 3% perf-profile.self.cycles-pp.__mod_zone_page_state
> 0.25 ± 3% +0.1 0.35 ± 2% perf-profile.self.cycles-pp.locks_remove_posix
> 0.12 ± 17% +0.1 0.23 ± 11% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
> 0.14 +0.1 0.25 perf-profile.self.cycles-pp.setup_object_debug
> 0.93 +0.1 1.04 perf-profile.self.cycles-pp.__call_rcu
> 0.46 +0.1 0.58 perf-profile.self.cycles-pp.__check_object_size
> 0.13 ± 3% +0.1 0.26 ± 3% perf-profile.self.cycles-pp.get_page_from_freelist
> 0.00 +0.1 0.13 ± 3% perf-profile.self.cycles-pp.legitimize_links
> 0.68 +0.1 0.81 perf-profile.self.cycles-pp.__virt_addr_valid
> 0.40 ± 11% +0.2 0.58 perf-profile.self.cycles-pp.may_open
> 0.12 +0.2 0.31 perf-profile.self.cycles-pp.check_stack_object
> 0.90 +0.3 1.22 perf-profile.self.cycles-pp.task_work_run
> 0.30 +0.4 0.73 perf-profile.self.cycles-pp.___slab_alloc
> 2.88 +0.7 3.59 perf-profile.self.cycles-pp.__alloc_file
> 2.27 +1.1 3.38 perf-profile.self.cycles-pp.new_slab
> 7.59 ± 3% +1.3 8.89 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 11.04 +1.7 12.73 perf-profile.self.cycles-pp.rcu_cblist_dequeue
>
>
>
> will-it-scale.per_process_ops
>
> 385000 +-+----------------------------------------------------------------+
> | .+ .+. .+ |
> 380000 +-+.++ : +.++.+ +.++.+ : |
> 375000 +-+ : + : |
> | +.+ .+.++.+ ++. .+.+.+ .+.+.+ .+.|
> 370000 +-+ +.+ +.+.++ + + |
> 365000 +-+ |
> | |
> 360000 +-+ |
> 355000 +-+ |
> | |
> 350000 +-+ |
> 345000 O-+ OO O O OO O O OO O O O |
> | O O O O O OO O O |
> 340000 +-+----------------------------------------------------------------+
>
>
> will-it-scale.workload
>
> 7.4e+07 +-+---------------------------------------------------------------+
> | .+ .+ .+ |
> 7.3e+07 +-++.+ : ++.+.+ +.+.++ : |
> 7.2e+07 +-+ : + : |
> | ++. .++.+.+ +.+ +.+.+.++.+.+. +.|
> 7.1e+07 +-+ +.+ +.+.+.+ + |
> 7e+07 +-+ |
> | |
> 6.9e+07 +-+ |
> 6.8e+07 +-+ |
> | |
> 6.7e+07 +-+ O O |
> 6.6e+07 O-OO O O OO O OO O O O O O O O |
> | O O O O |
> 6.5e+07 +-+---------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen
>

2019-09-26 00:46:35

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
> On 23/09/2019 17.52, Tejun Heo wrote:
> > Hello, Konstantin.
> >
> > On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> > > With vm.dirty_write_behind 1 or 2 files are written even faster and
> >
> > Is the faster speed reproducible? I don't quite understand why this
> > would be.
>
> Writing to disk simply starts earlier.

Stupid question: how is this any different to simply winding down
our dirty writeback and throttling thresholds like so:

# echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes

to start background writeback when there's 100MB of dirty pages in
memory, and then:

# echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes

So that writers are directly throttled at 200MB of dirty pages in
memory?

This effectively gives us global writebehind behaviour with a
100-200MB cache write burst for initial writes.

ANd, really such strict writebehind behaviour is going to cause all
sorts of unintended problesm with filesystems because there will be
adverse interactions with delayed allocation. We need a substantial
amount of dirty data to be cached for writeback for fragmentation
minimisation algorithms to be able to do their job....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-09-26 00:57:52

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On 21/09/2019 02.05, Linus Torvalds wrote:
> On Fri, Sep 20, 2019 at 12:35 AM Konstantin Khlebnikov
> <[email protected]> wrote:
>>
>> This patch implements write-behind policy which tracks sequential writes
>> and starts background writeback when file have enough dirty pages.
>
> Apart from a spelling error ("contigious"), my only reaction is that
> I've wanted this for the multi-file writes, not just for single big
> files.
>
> Yes, single big files may be a simpler and perhaps the "10% effort for
> 90% of the gain", and thus the right thing to do, but I do wonder if
> you've looked at simply extending it to cover multiple files when
> people copy a whole directory (or unpack a tar-file, or similar).
>
> Now, I hear you say "those are so small these days that it doesn't
> matter". And maybe you're right. But partiocularly for slow media,
> triggering good streaming write behavior has been a problem in the
> past.
>
> So I'm wondering whether the "writebehind" state should perhaps be
> considered be a process state, rather than "struct file" state, and
> also start triggering for writing smaller files.

It's simple to extend existing state with per-task counter of sequential
writes to detect patterns like unpacking tarball with small files.
After reaching some threshold write-behind could flush files in at close.

But in this case it's hard to wait previous writes to limit amount of
requests and pages in writeback for each stream.

Theoretically we could build chain of inodes for delaying and batching.

>
> Maybe this was already discussed and people decided that the big-file
> case was so much easier that it wasn't worth worrying about
> writebehind for multiple files.
>
> Linus
>

2019-09-26 07:44:56

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On 24/09/2019 10.39, Dave Chinner wrote:
> On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
>> On 23/09/2019 17.52, Tejun Heo wrote:
>>> Hello, Konstantin.
>>>
>>> On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
>>>> With vm.dirty_write_behind 1 or 2 files are written even faster and
>>>
>>> Is the faster speed reproducible? I don't quite understand why this
>>> would be.
>>
>> Writing to disk simply starts earlier.
>
> Stupid question: how is this any different to simply winding down
> our dirty writeback and throttling thresholds like so:
>
> # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
>
> to start background writeback when there's 100MB of dirty pages in
> memory, and then:
>
> # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
>
> So that writers are directly throttled at 200MB of dirty pages in
> memory?
>
> This effectively gives us global writebehind behaviour with a
> 100-200MB cache write burst for initial writes.

Global limits affect all dirty pages including memory-mapped and
randomly touched. Write-behind aims only into sequential streams.

>
> ANd, really such strict writebehind behaviour is going to cause all
> sorts of unintended problesm with filesystems because there will be
> adverse interactions with delayed allocation. We need a substantial
> amount of dirty data to be cached for writeback for fragmentation
> minimisation algorithms to be able to do their job....

I think most sequentially written files never change after close.
Except of knowing final size of huge files (>16Mb in my patch)
there should be no difference for delayed allocation.

Probably write behind could provide hint about streaming pattern:
pass something like "MSG_MORE" into writeback call.

>
> Cheers,
>
> Dave.
>

2019-09-26 08:41:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner <[email protected]> wrote:
>
> Stupid question: how is this any different to simply winding down
> our dirty writeback and throttling thresholds like so:
>
> # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes

Our dirty_background stuff is very questionable, but it exists (and
has those insane defaults) because of various legacy reasons.

But it probably _shouldn't_ exist any more (except perhaps as a
last-ditch hard limit), and I don't think it really ends up being the
primary throttling any more in many cases.

It used to make sense to make it a "percentage of memory" back when we
were talking old machines with 8MB of RAM, and having an appreciable
percentage of memory dirty was "normal".

And we've kept that model and not touched it, because some benchmarks
really want enormous amounts of dirty data (particularly various dirty
shared mappings).

But out default really is fairly crazy and questionable. 10% of memory
being dirty may be ok when you have a small amount of memory, but it's
rather less sane if you have gigs and gigs of RAM.

Of course, SSD's made it work slightly better again, but our
"dirty_background" stuff really is legacy and not very good.

The whole dirty limit when seen as percentage of memory (which is our
default) is particularly questionable, but even when seen as total
bytes is bad.

If you have slow filesystems (say, FAT on a USB stick), the limit
should be very different from a fast one (eg XFS on a RAID of proper
SSDs).

So the limit really needs be per-bdi, not some global ratio or bytes.

As a result we've grown various _other_ heuristics over time, and the
simplistic dirty_background stuff is only a very small part of the
picture these days.

To the point of almost being irrelevant in many situations, I suspect.

> to start background writeback when there's 100MB of dirty pages in
> memory, and then:
>
> # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes

The thing is, that also accounts for dirty shared mmap pages. And it
really will kill some benchmarks that people take very very seriously.

And 200MB is peanuts when you're doing a benchmark on some studly
machine that has a million iops per second, and 200MB of dirty data is
nothing.

Yet it's probably much too big when you're on a workstation that still
has rotational media.

And the whole memcg code obviously makes this even more complicated.

Anyway, the end result of all this is that we have that
balance_dirty_pages() that is pretty darn complex and I suspect very
few people understand everything that goes on in that function.

So I think that the point of any write-behind logic would be to avoid
triggering the global limits as much as humanly possible - not just
getting the simple cases to write things out more quickly, but to
remove the complex global limit questions from (one) common and fairly
simple case.

Now, whether write-behind really _does_ help that, or whether it's
just yet another tweak and complication, I can't actually say. But I
don't think 'dirty_background_bytes' is really an argument against
write-behind, it's just one knob on the very complex dirty handling we
have.

Linus

2019-09-26 09:19:16

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Tue, Sep 24, 2019 at 12:00:17PM +0300, Konstantin Khlebnikov wrote:
> On 24/09/2019 10.39, Dave Chinner wrote:
> > On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
> > > On 23/09/2019 17.52, Tejun Heo wrote:
> > > > Hello, Konstantin.
> > > >
> > > > On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> > > > > With vm.dirty_write_behind 1 or 2 files are written even faster and
> > > >
> > > > Is the faster speed reproducible? I don't quite understand why this
> > > > would be.
> > >
> > > Writing to disk simply starts earlier.
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:
> >
> > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
> >
> > to start background writeback when there's 100MB of dirty pages in
> > memory, and then:
> >
> > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
> >
> > So that writers are directly throttled at 200MB of dirty pages in
> > memory?
> >
> > This effectively gives us global writebehind behaviour with a
> > 100-200MB cache write burst for initial writes.
>
> Global limits affect all dirty pages including memory-mapped and
> randomly touched. Write-behind aims only into sequential streams.

There are apps that do sequential writes via mmap()d files.
They should do writebehind too, yes?

> > ANd, really such strict writebehind behaviour is going to cause all
> > sorts of unintended problesm with filesystems because there will be
> > adverse interactions with delayed allocation. We need a substantial
> > amount of dirty data to be cached for writeback for fragmentation
> > minimisation algorithms to be able to do their job....
>
> I think most sequentially written files never change after close.

There are lots of apps that write zeros to initialise and allocate
space, then go write real data to them. Database WAL files are
commonly initialised like this...

> Except of knowing final size of huge files (>16Mb in my patch)
> there should be no difference for delayed allocation.

There is, because you throttle the writes down such that there is
only 16MB of dirty data in memory. Hence filesystems will only
typically allocate in 16MB chunks as that's all the delalloc range
spans.

I'm not so concerned for XFS here, because our speculative
preallocation will handle this just fine, but for ext4 and btrfs
it's going to interleave the allocate of concurrent streaming writes
and fragment the crap out of the files.

In general, the smaller you make the individual file writeback
window, the worse the fragmentation problems gets....

> Probably write behind could provide hint about streaming pattern:
> pass something like "MSG_MORE" into writeback call.

How does that help when we've only got dirty data and block
reservations up to EOF which is no more than 16MB away?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-09-26 09:19:26

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Tue, Sep 24, 2019 at 12:08:04PM -0700, Linus Torvalds wrote:
> On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner <[email protected]> wrote:
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:
> >
> > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
>
> Our dirty_background stuff is very questionable, but it exists (and
> has those insane defaults) because of various legacy reasons.

That's not what I was asking about. The context is in the previous
lines you didn't quote:

> > > > Is the faster speed reproducible? I don't quite understand why this
> > > > would be.
> > >
> > > Writing to disk simply starts earlier.
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:

i.e. I'm asking about the reasons for the performance differential
not asking for an explanation of what writebehind is. If the
performance differential really is caused by writeback starting
sooner, then winding down dirty_background_bytes should produce
exactly the same performance because it will start writeback -much
faster-.

If it doesn't, then the assertion that the difference is caused by
earlier writeout is questionable and the code may not actually be
doing what is claimed....

Basically, I'm asking for proof that the explanation is correct.

> > to start background writeback when there's 100MB of dirty pages in
> > memory, and then:
> >
> > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
>
> The thing is, that also accounts for dirty shared mmap pages. And it
> really will kill some benchmarks that people take very very seriously.

Yes, I know that. I'm not suggesting that we do this,

[snip]

> Anyway, the end result of all this is that we have that
> balance_dirty_pages() that is pretty darn complex and I suspect very
> few people understand everything that goes on in that function.

I'd agree with you there - most of the ground work for the
balance_dirty_pages IO throttling feedback loop was all based on
concepts I developed to solve dirty page writeback thrashing
problems on Irix back in 2003. The code we have in Linux was
written by Fenguang Wu with help for a lot of people, but the
underlying concepts of delegating IO to dedicated writeback threads
that calculate and track page cleaning rates (BDI writeback rates)
and then throttling incoming page dirtying rate to the page cleaning
rate all came out of my head....

So, much as it may surprise you, I am one of the few people who do
actually understand how that whole complex mass of accounting and
feedback is supposed to work. :)

> Now, whether write-behind really _does_ help that, or whether it's
> just yet another tweak and complication, I can't actually say.

Neither can I at this point - I lack the data and that's why I was
asking if there was a perf difference with the existing limits wound
right down. Knowing whether the performance difference is simply a
result of starting writeback IO sooner tells me an awful lot about
what other behaviour is happening as a result of the changes in this
patch.

> But I
> don't think 'dirty_background_bytes' is really an argument against
> write-behind, it's just one knob on the very complex dirty handling we
> have.

Never said it was - just trying to determine if a one line
explanation is true or not.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-09-26 09:20:56

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On 25/09/2019 10.18, Dave Chinner wrote:
> On Tue, Sep 24, 2019 at 12:00:17PM +0300, Konstantin Khlebnikov wrote:
>> On 24/09/2019 10.39, Dave Chinner wrote:
>>> On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
>>>> On 23/09/2019 17.52, Tejun Heo wrote:
>>>>> Hello, Konstantin.
>>>>>
>>>>> On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
>>>>>> With vm.dirty_write_behind 1 or 2 files are written even faster and
>>>>>
>>>>> Is the faster speed reproducible? I don't quite understand why this
>>>>> would be.
>>>>
>>>> Writing to disk simply starts earlier.
>>>
>>> Stupid question: how is this any different to simply winding down
>>> our dirty writeback and throttling thresholds like so:
>>>
>>> # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
>>>
>>> to start background writeback when there's 100MB of dirty pages in
>>> memory, and then:
>>>
>>> # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
>>>
>>> So that writers are directly throttled at 200MB of dirty pages in
>>> memory?
>>>
>>> This effectively gives us global writebehind behaviour with a
>>> 100-200MB cache write burst for initial writes.
>>
>> Global limits affect all dirty pages including memory-mapped and
>> randomly touched. Write-behind aims only into sequential streams.
>
> There are apps that do sequential writes via mmap()d files.
> They should do writebehind too, yes?

I see no reason for that. This is different scenario.

Mmap have no clear signal about "end of write", only page fault at
beginning. Theoretically we could implement similar sliding window and
start writeback on consequent page faults.

But applications who use memory mapped files probably knows better what
to do with this data. I prefer to leave them alone for now.

>
>>> ANd, really such strict writebehind behaviour is going to cause all
>>> sorts of unintended problesm with filesystems because there will be
>>> adverse interactions with delayed allocation. We need a substantial
>>> amount of dirty data to be cached for writeback for fragmentation
>>> minimisation algorithms to be able to do their job....
>>
>> I think most sequentially written files never change after close.
>
> There are lots of apps that write zeros to initialise and allocate
> space, then go write real data to them. Database WAL files are
> commonly initialised like this...

Those zeros are just bunch of dirty pages which have to be written.
Sync and memory pressure will do that, why write-behind don't have to?

>
>> Except of knowing final size of huge files (>16Mb in my patch)
>> there should be no difference for delayed allocation.
>
> There is, because you throttle the writes down such that there is
> only 16MB of dirty data in memory. Hence filesystems will only
> typically allocate in 16MB chunks as that's all the delalloc range
> spans.
>
> I'm not so concerned for XFS here, because our speculative
> preallocation will handle this just fine, but for ext4 and btrfs
> it's going to interleave the allocate of concurrent streaming writes
> and fragment the crap out of the files.
>
> In general, the smaller you make the individual file writeback
> window, the worse the fragmentation problems gets....

AFAIR ext4 already preallocates extent beyond EOF too.

But this must be carefully tested for all modern fs for sure.

>
>> Probably write behind could provide hint about streaming pattern:
>> pass something like "MSG_MORE" into writeback call.
>
> How does that help when we've only got dirty data and block
> reservations up to EOF which is no more than 16MB away?

Block allocator should interpret this flags as "more data are
expected" and preallocate extent bigger than data and beyond EOF.

>
> Cheers,
>
> Dave.
>

2019-09-26 09:32:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Wed, Sep 25, 2019 at 05:18:54PM +1000, Dave Chinner wrote:
> > > ANd, really such strict writebehind behaviour is going to cause all
> > > sorts of unintended problesm with filesystems because there will be
> > > adverse interactions with delayed allocation. We need a substantial
> > > amount of dirty data to be cached for writeback for fragmentation
> > > minimisation algorithms to be able to do their job....
> >
> > I think most sequentially written files never change after close.
>
> There are lots of apps that write zeros to initialise and allocate
> space, then go write real data to them. Database WAL files are
> commonly initialised like this...

Fortunately, most of the time Enterprise Database files which are
initialized with a fd which is then kept open. And it's only a single
file. So that's a hueristic that's not too bad to handle so long as
it's only triggered when there are no open file descriptors on said
inode. If something is still keeping the file open, then we do need
to be very careful about writebehind.

That behind said, with databases, they are goind to be calling
fdatasync(2) and fsync(2) all the time, so it's unlikely writebehind
is goint to be that much of an issue, so long as the max writebehind
knob isn't set too insanely low. It's been over ten years since I
last looked at this, and so things may have very likely changed, but
one enterprise database I looked at would fallocate 32M, and then
write 32M of zeros to make sure blocks were marked as initialized, so
that further random writes wouldn't cause metadata updates.

Now, there *are* applications which log to files via append, and in
the worst case, they don't actually keep a fd open. Examples of this
would include scripts that call logger(1) very often. But in general,
taking into account whether or not there is still a fd holding the
inode open to influence how aggressively we do writeback does make
sense.

Finally, we should remember that this will impact battery life on
laptops. Perhaps not so much now that most laptops have SSD's instead
of HDD's, but aggressive writebehind does certainly have tradeoffs,
and what makes sense for a NVMe attached SSD is going to be very
different for a $2 USB thumb drive picked up at the checkout aisle of
Staples....

- Ted

2019-09-26 10:06:01

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

On Wed, Sep 25, 2019 at 11:15:30AM +0300, Konstantin Khlebnikov wrote:
> On 25/09/2019 10.18, Dave Chinner wrote:
> > On Tue, Sep 24, 2019 at 12:00:17PM +0300, Konstantin Khlebnikov wrote:
> > > On 24/09/2019 10.39, Dave Chinner wrote:
> > > > On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
> > > > > On 23/09/2019 17.52, Tejun Heo wrote:
> > > > > > Hello, Konstantin.
> > > > > >
> > > > > > On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> > > > > > > With vm.dirty_write_behind 1 or 2 files are written even faster and
> > > > > >
> > > > > > Is the faster speed reproducible? I don't quite understand why this
> > > > > > would be.
> > > > >
> > > > > Writing to disk simply starts earlier.
> > > >
> > > > Stupid question: how is this any different to simply winding down
> > > > our dirty writeback and throttling thresholds like so:
> > > >
> > > > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
> > > >
> > > > to start background writeback when there's 100MB of dirty pages in
> > > > memory, and then:
> > > >
> > > > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
> > > >
> > > > So that writers are directly throttled at 200MB of dirty pages in
> > > > memory?
> > > >
> > > > This effectively gives us global writebehind behaviour with a
> > > > 100-200MB cache write burst for initial writes.
> > >
> > > Global limits affect all dirty pages including memory-mapped and
> > > randomly touched. Write-behind aims only into sequential streams.
> >
> > There are apps that do sequential writes via mmap()d files.
> > They should do writebehind too, yes?
>
> I see no reason for that. This is different scenario.

It is?

> Mmap have no clear signal about "end of write", only page fault at
> beginning. Theoretically we could implement similar sliding window and
> start writeback on consequent page faults.

sequential IO doing pwrite() in a loop has no clear signal about
"end of write", either. It's exactly the same as doing a memset(0)
on a mmap()d region to zero the file. i.e. the write doesn't stop
until EOF is reached...

> But applications who use memory mapped files probably knows better what
> to do with this data. I prefer to leave them alone for now.

By that argument, we shouldn't have readahead for mmap() access or
even read-around for page faults. We can track read and write faults
exactly for mmap(), so if you are tracking sequential page dirtying
for writebehind we can do that jsut as easily for mmap (via
->page_mkwrite) as we can for write() IO.

> > > > ANd, really such strict writebehind behaviour is going to cause all
> > > > sorts of unintended problesm with filesystems because there will be
> > > > adverse interactions with delayed allocation. We need a substantial
> > > > amount of dirty data to be cached for writeback for fragmentation
> > > > minimisation algorithms to be able to do their job....
> > >
> > > I think most sequentially written files never change after close.
> >
> > There are lots of apps that write zeros to initialise and allocate
> > space, then go write real data to them. Database WAL files are
> > commonly initialised like this...
>
> Those zeros are just bunch of dirty pages which have to be written.
> Sync and memory pressure will do that, why write-behind don't have to?

Huh? IIUC, the writebehind flag is a global behaviour flag for the
kernel - everything does writebehind or nothing does it, right?

Hence if you turn on writebehind, the writebehind will write the
zeros to disk before real data can be written. We no longer have
zeroing as something that sits in the cache until it's overwritten
with real data - that file now gets written twice and it delays the
application from actually writing real data until the zeros are all
on disk.

strict writebehind without the ability to burst temporary/short-term
data/state into the cache is going to cause a lot of performance
regressions in applications....

> > > Except of knowing final size of huge files (>16Mb in my patch)
> > > there should be no difference for delayed allocation.
> >
> > There is, because you throttle the writes down such that there is
> > only 16MB of dirty data in memory. Hence filesystems will only
> > typically allocate in 16MB chunks as that's all the delalloc range
> > spans.
> >
> > I'm not so concerned for XFS here, because our speculative
> > preallocation will handle this just fine, but for ext4 and btrfs
> > it's going to interleave the allocate of concurrent streaming writes
> > and fragment the crap out of the files.
> >
> > In general, the smaller you make the individual file writeback
> > window, the worse the fragmentation problems gets....
>
> AFAIR ext4 already preallocates extent beyond EOF too.

Only via fallocate(), not for delayed allocation.

> > > Probably write behind could provide hint about streaming pattern:
> > > pass something like "MSG_MORE" into writeback call.
> >
> > How does that help when we've only got dirty data and block
> > reservations up to EOF which is no more than 16MB away?
>
> Block allocator should interpret this flags as "more data are
> expected" and preallocate extent bigger than data and beyond EOF.

Can't do that: delayed allocation is a 2-phase operation that is not
seperable from the context that is dirtying the pages. The space
is _accounted as used_ during the write() context, but the _physical
allocation_ of that space is done in the writeback context. We
cannot reserve more space in the writeback context, because we may
already be at ENOSPC by the time writeback comes along. Hence
writeback must already have all the space it needs to write back the
dirty pages in memory already accounted as used space before it
starts running physical allocations.

IOWs, we cannot magically allocate more space than was reserved for
the data being written in because of some special flag from the
writeback code. That way lies angry users because we lost their
data due to ENOSPC issues in writeback.

Cheers,

Dave.
--
Dave Chinner
[email protected]