2021-03-13 02:56:41

by Zhang Yi

[permalink] [raw]
Subject: [RFC PATCH 0/3] block_dump: remove block dump

Hi,

block_dump is an old debugging interface and can be replaced by
tracepoints, and we also found a deadlock issue relate to it[1]. As Jan
suggested, this patch set delete the whole block_dump feature, we can
use tracepoints to get the similar information. If someone still using
this feature cannot switch to use tracepoints or any other suggestions,
please let us know.

Thanks,
Yi.

[1]. https://lore.kernel.org/linux-fsdevel/[email protected]/T/#m385b0f84bc381d3089740e33d94f6e1b67dd06d2

zhangyi (F) (3):
block_dump: remove block_dump feature in mark_inode_dirty()
block_dump: remove block_dump feature
block_dump: remove comments in docs

.../admin-guide/laptops/laptop-mode.rst | 11 --------
Documentation/admin-guide/sysctl/vm.rst | 8 ------
block/blk-core.c | 9 -------
fs/fs-writeback.c | 25 -------------------
include/linux/writeback.h | 1 -
kernel/sysctl.c | 8 ------
mm/page-writeback.c | 5 ----
7 files changed, 67 deletions(-)

--
2.25.4


2021-03-13 02:56:41

by Zhang Yi

[permalink] [raw]
Subject: [RFC PATCH 3/3] block_dump: remove comments in docs

Now block_dump feature is gone, remove all comments in docs.

Signed-off-by: zhangyi (F) <[email protected]>
---
Documentation/admin-guide/laptops/laptop-mode.rst | 11 -----------
Documentation/admin-guide/sysctl/vm.rst | 8 --------
2 files changed, 19 deletions(-)

diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst
index c984c4262f2e..b61cc601d298 100644
--- a/Documentation/admin-guide/laptops/laptop-mode.rst
+++ b/Documentation/admin-guide/laptops/laptop-mode.rst
@@ -101,17 +101,6 @@ this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.

-If you want to find out which process caused the disk to spin up, you can
-gather information by setting the flag /proc/sys/vm/block_dump. When this flag
-is set, Linux reports all disk read and write operations that take place, and
-all block dirtyings done to files. This makes it possible to debug why a disk
-needs to spin up, and to increase battery life even more. The output of
-block_dump is written to the kernel output, and it can be retrieved using
-"dmesg". When you use block_dump and your kernel logging level also includes
-kernel debugging messages, you probably want to turn off klogd, otherwise
-the output of block_dump will be logged, causing disk activity that is not
-normally there.
-

Configuration
-------------
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 586cd4b86428..3ca6679f16ea 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -25,7 +25,6 @@ files can be found in mm/swap.c.
Currently, these files are in /proc/sys/vm:

- admin_reserve_kbytes
-- block_dump
- compact_memory
- compaction_proactiveness
- compact_unevictable_allowed
@@ -106,13 +105,6 @@ On x86_64 this is about 128MB.
Changing this takes effect whenever an application requests memory.


-block_dump
-==========
-
-block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
-
-
compact_memory
==============

--
2.25.4

2021-03-13 02:56:50

by Zhang Yi

[permalink] [raw]
Subject: [RFC PATCH 1/3] block_dump: remove block_dump feature in mark_inode_dirty()

block_dump is an old debugging interface, one of it's functions is used
to print the information about who write which file on disk. If we
enable block_dump through /proc/sys/vm/block_dump and turn on debug log
level, we can gather information about write process name, target file
name and disk from kernel message. This feature is realized in
block_dump___mark_inode_dirty(), it print above information into kernel
message directly when marking inode dirty, so it is noisy and can easily
trigger log storm. At the same time, get the dentry refcount is also not
safe, we found it will lead to deadlock on ext4 file system with
data=journal mode.

After tracepoints has been introduced into the kernel, we got a
tracepoint in __mark_inode_dirty(), which is a better replacement of
block_dump___mark_inode_dirty(). The only downside is that it only trace
the inode number and not a file name, but it probably doesn't matter
because the original printed file name in block_dump is not accurate in
some cases, and we can still find it through the inode number and device
id. So this patch delete the dirting inode part of block_dump feature.

Signed-off-by: zhangyi (F) <[email protected]>
---
fs/fs-writeback.c | 25 -------------------------
1 file changed, 25 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e91980f49388..7c46d1588a19 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2205,28 +2205,6 @@ int dirtytime_interval_handler(struct ctl_table *table, int write,
return ret;
}

-static noinline void block_dump___mark_inode_dirty(struct inode *inode)
-{
- if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
- struct dentry *dentry;
- const char *name = "?";
-
- dentry = d_find_alias(inode);
- if (dentry) {
- spin_lock(&dentry->d_lock);
- name = (const char *) dentry->d_name.name;
- }
- printk(KERN_DEBUG
- "%s(%d): dirtied inode %lu (%s) on %s\n",
- current->comm, task_pid_nr(current), inode->i_ino,
- name, inode->i_sb->s_id);
- if (dentry) {
- spin_unlock(&dentry->d_lock);
- dput(dentry);
- }
- }
-}
-
/**
* __mark_inode_dirty - internal function to mark an inode dirty
*
@@ -2296,9 +2274,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
(dirtytime && (inode->i_state & I_DIRTY_INODE)))
return;

- if (unlikely(block_dump))
- block_dump___mark_inode_dirty(inode);
-
spin_lock(&inode->i_lock);
if (dirtytime && (inode->i_state & I_DIRTY_INODE))
goto out_unlock_inode;
--
2.25.4

2021-03-13 02:57:38

by Zhang Yi

[permalink] [raw]
Subject: [RFC PATCH 2/3] block_dump: remove block_dump feature

We have already delete block_dump feature in mark_inode_dirty() because
it can be replaced by tracepoints, now we also remove the part in
submit_bio() for the same reason. The part of block dump feature in
submit_bio() dump the write process, write region and sectors on the
target disk into kernel message. it can be replaced by
block_bio_queue tracepoint in submit_bio_checks(), so we do not need
block_dump anymore, remove the whole block_dump feature.

Signed-off-by: zhangyi (F) <[email protected]>
---
block/blk-core.c | 9 ---------
include/linux/writeback.h | 1 -
kernel/sysctl.c | 8 --------
mm/page-writeback.c | 5 -----
4 files changed, 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff208497..9731b0ca8166 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1086,15 +1086,6 @@ blk_qc_t submit_bio(struct bio *bio)
task_io_account_read(bio->bi_iter.bi_size);
count_vm_events(PGPGIN, count);
}
-
- if (unlikely(block_dump)) {
- char b[BDEVNAME_SIZE];
- printk(KERN_DEBUG "%s(%d): %s block %Lu on %s (%u sectors)\n",
- current->comm, task_pid_nr(current),
- op_is_write(bio_op(bio)) ? "WRITE" : "READ",
- (unsigned long long)bio->bi_iter.bi_sector,
- bio_devname(bio, b), count);
- }
}

/*
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 8e5c5bb16e2d..9ef50176f3a1 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -360,7 +360,6 @@ extern unsigned int dirty_writeback_interval;
extern unsigned int dirty_expire_interval;
extern unsigned int dirtytime_expire_interval;
extern int vm_highmem_is_dirtyable;
-extern int block_dump;
extern int laptop_mode;

int dirty_background_ratio_handler(struct ctl_table *table, int write,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 62fbd09b5dc1..35fce9a8402f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2957,14 +2957,6 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
},
- {
- .procname = "block_dump",
- .data = &block_dump,
- .maxlen = sizeof(block_dump),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- },
{
.procname = "vfs_cache_pressure",
.data = &sysctl_vfs_cache_pressure,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eb34d204d4ee..b72da123f242 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -108,11 +108,6 @@ EXPORT_SYMBOL_GPL(dirty_writeback_interval);
*/
unsigned int dirty_expire_interval = 30 * 100; /* centiseconds */

-/*
- * Flag that makes the machine dump writes/reads and block dirtyings.
- */
-int block_dump;
-
/*
* Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
* a full sync is triggered after this time elapses without any disk activity.
--
2.25.4

2021-03-13 03:45:26

by Jens Axboe

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] block_dump: remove block dump

On 3/12/21 8:01 PM, zhangyi (F) wrote:
> Hi,
>
> block_dump is an old debugging interface and can be replaced by
> tracepoints, and we also found a deadlock issue relate to it[1]. As Jan
> suggested, this patch set delete the whole block_dump feature, we can
> use tracepoints to get the similar information. If someone still using
> this feature cannot switch to use tracepoints or any other suggestions,
> please let us know.

Totally agree, that's long overdue. The feature is no longer useful
and has been deprecated by much better methods. Unless anyone objects,
I'll queue this up for 5.13.

--
Jens Axboe

2021-03-15 09:49:13

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] block_dump: remove block_dump feature

On Sat 13-03-21 11:01:45, zhangyi (F) wrote:
> We have already delete block_dump feature in mark_inode_dirty() because
> it can be replaced by tracepoints, now we also remove the part in
> submit_bio() for the same reason. The part of block dump feature in
> submit_bio() dump the write process, write region and sectors on the
> target disk into kernel message. it can be replaced by
> block_bio_queue tracepoint in submit_bio_checks(), so we do not need
> block_dump anymore, remove the whole block_dump feature.
>
> Signed-off-by: zhangyi (F) <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> block/blk-core.c | 9 ---------
> include/linux/writeback.h | 1 -
> kernel/sysctl.c | 8 --------
> mm/page-writeback.c | 5 -----
> 4 files changed, 23 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fc60ff208497..9731b0ca8166 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1086,15 +1086,6 @@ blk_qc_t submit_bio(struct bio *bio)
> task_io_account_read(bio->bi_iter.bi_size);
> count_vm_events(PGPGIN, count);
> }
> -
> - if (unlikely(block_dump)) {
> - char b[BDEVNAME_SIZE];
> - printk(KERN_DEBUG "%s(%d): %s block %Lu on %s (%u sectors)\n",
> - current->comm, task_pid_nr(current),
> - op_is_write(bio_op(bio)) ? "WRITE" : "READ",
> - (unsigned long long)bio->bi_iter.bi_sector,
> - bio_devname(bio, b), count);
> - }
> }
>
> /*
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 8e5c5bb16e2d..9ef50176f3a1 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -360,7 +360,6 @@ extern unsigned int dirty_writeback_interval;
> extern unsigned int dirty_expire_interval;
> extern unsigned int dirtytime_expire_interval;
> extern int vm_highmem_is_dirtyable;
> -extern int block_dump;
> extern int laptop_mode;
>
> int dirty_background_ratio_handler(struct ctl_table *table, int write,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 62fbd09b5dc1..35fce9a8402f 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2957,14 +2957,6 @@ static struct ctl_table vm_table[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec_jiffies,
> },
> - {
> - .procname = "block_dump",
> - .data = &block_dump,
> - .maxlen = sizeof(block_dump),
> - .mode = 0644,
> - .proc_handler = proc_dointvec_minmax,
> - .extra1 = SYSCTL_ZERO,
> - },
> {
> .procname = "vfs_cache_pressure",
> .data = &sysctl_vfs_cache_pressure,
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index eb34d204d4ee..b72da123f242 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -108,11 +108,6 @@ EXPORT_SYMBOL_GPL(dirty_writeback_interval);
> */
> unsigned int dirty_expire_interval = 30 * 100; /* centiseconds */
>
> -/*
> - * Flag that makes the machine dump writes/reads and block dirtyings.
> - */
> -int block_dump;
> -
> /*
> * Flag that puts the machine in "laptop mode". Doubles as a timeout in jiffies:
> * a full sync is triggered after this time elapses without any disk activity.
> --
> 2.25.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-03-15 09:49:37

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] block_dump: remove block_dump feature in mark_inode_dirty()

On Sat 13-03-21 11:01:44, zhangyi (F) wrote:
> block_dump is an old debugging interface, one of it's functions is used
> to print the information about who write which file on disk. If we
> enable block_dump through /proc/sys/vm/block_dump and turn on debug log
> level, we can gather information about write process name, target file
> name and disk from kernel message. This feature is realized in
> block_dump___mark_inode_dirty(), it print above information into kernel
> message directly when marking inode dirty, so it is noisy and can easily
> trigger log storm. At the same time, get the dentry refcount is also not
> safe, we found it will lead to deadlock on ext4 file system with
> data=journal mode.
>
> After tracepoints has been introduced into the kernel, we got a
> tracepoint in __mark_inode_dirty(), which is a better replacement of
> block_dump___mark_inode_dirty(). The only downside is that it only trace
> the inode number and not a file name, but it probably doesn't matter
> because the original printed file name in block_dump is not accurate in
> some cases, and we can still find it through the inode number and device
> id. So this patch delete the dirting inode part of block_dump feature.
>
> Signed-off-by: zhangyi (F) <[email protected]>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/fs-writeback.c | 25 -------------------------
> 1 file changed, 25 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e91980f49388..7c46d1588a19 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2205,28 +2205,6 @@ int dirtytime_interval_handler(struct ctl_table *table, int write,
> return ret;
> }
>
> -static noinline void block_dump___mark_inode_dirty(struct inode *inode)
> -{
> - if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
> - struct dentry *dentry;
> - const char *name = "?";
> -
> - dentry = d_find_alias(inode);
> - if (dentry) {
> - spin_lock(&dentry->d_lock);
> - name = (const char *) dentry->d_name.name;
> - }
> - printk(KERN_DEBUG
> - "%s(%d): dirtied inode %lu (%s) on %s\n",
> - current->comm, task_pid_nr(current), inode->i_ino,
> - name, inode->i_sb->s_id);
> - if (dentry) {
> - spin_unlock(&dentry->d_lock);
> - dput(dentry);
> - }
> - }
> -}
> -
> /**
> * __mark_inode_dirty - internal function to mark an inode dirty
> *
> @@ -2296,9 +2274,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> (dirtytime && (inode->i_state & I_DIRTY_INODE)))
> return;
>
> - if (unlikely(block_dump))
> - block_dump___mark_inode_dirty(inode);
> -
> spin_lock(&inode->i_lock);
> if (dirtytime && (inode->i_state & I_DIRTY_INODE))
> goto out_unlock_inode;
> --
> 2.25.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-03-15 09:49:41

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] block_dump: remove comments in docs

On Sat 13-03-21 11:01:46, zhangyi (F) wrote:
> Now block_dump feature is gone, remove all comments in docs.
>
> Signed-off-by: zhangyi (F) <[email protected]>

Nice. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> Documentation/admin-guide/laptops/laptop-mode.rst | 11 -----------
> Documentation/admin-guide/sysctl/vm.rst | 8 --------
> 2 files changed, 19 deletions(-)
>
> diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst
> index c984c4262f2e..b61cc601d298 100644
> --- a/Documentation/admin-guide/laptops/laptop-mode.rst
> +++ b/Documentation/admin-guide/laptops/laptop-mode.rst
> @@ -101,17 +101,6 @@ this results in concentration of disk activity in a small time interval which
> occurs only once every 10 minutes, or whenever the disk is forced to spin up by
> a cache miss. The disk can then be spun down in the periods of inactivity.
>
> -If you want to find out which process caused the disk to spin up, you can
> -gather information by setting the flag /proc/sys/vm/block_dump. When this flag
> -is set, Linux reports all disk read and write operations that take place, and
> -all block dirtyings done to files. This makes it possible to debug why a disk
> -needs to spin up, and to increase battery life even more. The output of
> -block_dump is written to the kernel output, and it can be retrieved using
> -"dmesg". When you use block_dump and your kernel logging level also includes
> -kernel debugging messages, you probably want to turn off klogd, otherwise
> -the output of block_dump will be logged, causing disk activity that is not
> -normally there.
> -
>
> Configuration
> -------------
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 586cd4b86428..3ca6679f16ea 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -25,7 +25,6 @@ files can be found in mm/swap.c.
> Currently, these files are in /proc/sys/vm:
>
> - admin_reserve_kbytes
> -- block_dump
> - compact_memory
> - compaction_proactiveness
> - compact_unevictable_allowed
> @@ -106,13 +105,6 @@ On x86_64 this is about 128MB.
> Changing this takes effect whenever an application requests memory.
>
>
> -block_dump
> -==========
> -
> -block_dump enables block I/O debugging when set to a nonzero value. More
> -information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
> -
> -
> compact_memory
> ==============
>
> --
> 2.25.4
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-03-15 09:59:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] block_dump: remove block dump

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2021-05-10 01:31:32

by Zhang Yi

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] block_dump: remove block dump

On 2021/3/13 11:37, Jens Axboe wrote:
> On 3/12/21 8:01 PM, zhangyi (F) wrote:
>> Hi,
>>
>> block_dump is an old debugging interface and can be replaced by
>> tracepoints, and we also found a deadlock issue relate to it[1]. As Jan
>> suggested, this patch set delete the whole block_dump feature, we can
>> use tracepoints to get the similar information. If someone still using
>> this feature cannot switch to use tracepoints or any other suggestions,
>> please let us know.
>
> Totally agree, that's long overdue. The feature is no longer useful
> and has been deprecated by much better methods. Unless anyone objects,
> I'll queue this up for 5.13.
>
Hi, Jens. Could you please send these patches upstream ?

Thanks,
Yi.

2021-05-10 15:31:07

by Jens Axboe

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] block_dump: remove block dump

On 3/12/21 8:01 PM, zhangyi (F) wrote:
> Hi,
>
> block_dump is an old debugging interface and can be replaced by
> tracepoints, and we also found a deadlock issue relate to it[1]. As Jan
> suggested, this patch set delete the whole block_dump feature, we can
> use tracepoints to get the similar information. If someone still using
> this feature cannot switch to use tracepoints or any other suggestions,
> please let us know.

Applied for 5.14, thanks.

--
Jens Axboe