2018-07-09 22:03:00

by David Rientjes

[permalink] [raw]
Subject: [patch] docs, debugfs: start explicit debugfs documentation

There is no canonical location for debugfs docuemntation, so start one.

This is primarily motivated to describe the oom_free_timeout_ms interface
but it is extended for all the debugfs files that I am personally
interested in.

Hopefully this can be expanded in the future for better insight into how
the various interfaces can be used.

Suggested-by: Andrew Morton <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
---
Documentation/clearing-warn-once.txt | 7 --
Documentation/debugfs/00-INDEX | 8 ++
Documentation/debugfs/extfrag.txt | 46 +++++++
Documentation/debugfs/provoke-crashes.txt | 8 ++
Documentation/debugfs/root.txt | 137 +++++++++++++++++++++
Documentation/filesystems/debugfs.txt | 46 +++++++
Documentation/power/basic-pm-debugging.txt | 25 +---
Documentation/sysctl/vm.txt | 7 +-
8 files changed, 251 insertions(+), 33 deletions(-)
delete mode 100644 Documentation/clearing-warn-once.txt
create mode 100644 Documentation/debugfs/00-INDEX
create mode 100644 Documentation/debugfs/extfrag.txt
create mode 100644 Documentation/debugfs/provoke-crashes.txt
create mode 100644 Documentation/debugfs/root.txt

diff --git a/Documentation/clearing-warn-once.txt b/Documentation/clearing-warn-once.txt
deleted file mode 100644
index 5b1f5d547be1..000000000000
--- a/Documentation/clearing-warn-once.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-
-WARN_ONCE / WARN_ON_ONCE only print a warning once.
-
-echo 1 > /sys/kernel/debug/clear_warn_once
-
-clears the state and allows the warnings to print once again.
-This can be useful after test suite runs to reproduce problems.
diff --git a/Documentation/debugfs/00-INDEX b/Documentation/debugfs/00-INDEX
new file mode 100644
index 000000000000..5ad3c7e1af51
--- /dev/null
+++ b/Documentation/debugfs/00-INDEX
@@ -0,0 +1,8 @@
+00-INDEX
+ - this file
+extfrag.txt
+ - External fragmentation (compaction)
+provoke-crash.txt
+ - LKDTM triggers
+root.txt
+ - Documentation for files at the debugfs root
diff --git a/Documentation/debugfs/extfrag.txt b/Documentation/debugfs/extfrag.txt
new file mode 100644
index 000000000000..4a351e34dd98
--- /dev/null
+++ b/Documentation/debugfs/extfrag.txt
@@ -0,0 +1,46 @@
+External fragmentation debugfs files
+
+This subdirectory is only available if memory compaction (CONFIG_COMPACTION) is
+enabled for defragmentation.
+
+
+extfrag_index
+=============
+The fragmentation index is a value between 0 and 1 and indicates how much
+external fragmentation there is for each allocation order, from order-0 to
+MAX_ORDER-1, for reach zone. This can be used for memory compaction heuristics
+to determine if migrating memory is likely to allow an allocation at a specific
+order to become successful. The higher the value specifies that the allocation
+at that order would fail due to fragmentation. The lower the value specifies
+that the allocation at that order would fail to being low on memory. A value
+of -1.000 specifies the allocation at that order would immediately succeed.
+
+Example output:
+
+Node 0, zone DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
+Node 0, zone Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
+
+This file cannot be written.
+
+This is often used to tune the vm.extfrag_threshold sysctl, see
+Documentation/sysctl/vm.txt, to define memory compaction behavior.
+
+
+unusable_index
+==============
+The unusable free space index is a value between 0 and 1 and indicates how much
+of each zone's free memory cannot be used for an allocation of a given order,
+from order-0 to MAX_ORDER-1. The higher the value, the more free memory is
+unusuable for that order and implicates external fragmentation. This can be
+used in conjunction with extfrag_index to understand the external fragmentation
+of a zone.
+
+Example output:
+
+Node 0, zone DMA32 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.004
+Node 0, zone Normal 0.000 0.000 0.001 0.003 0.005 0.007 0.008 0.008 0.008 0.008 0.008
+
+This file cannot be written.
+
+This is often used to tune the vm.extfrag_threshold sysctl, see
+Documentation/sysctl/vm.txt, to define memory compaction behavior.
diff --git a/Documentation/debugfs/provoke-crashes.txt b/Documentation/debugfs/provoke-crashes.txt
new file mode 100644
index 000000000000..69ec3a0a5a86
--- /dev/null
+++ b/Documentation/debugfs/provoke-crashes.txt
@@ -0,0 +1,8 @@
+Provokes crashes LKDTM interface
+
+When the Linux Kernel Dump Test Tool Module (LKDTM) is available, this directory
+exports triggers that are available to induce specific actions, usually
+triggering different dumping mechanisms, at predefined crash points.
+
+See Documentation/fault-injection/provoke-crashes.txt for examples of how to
+induce exceptions, panics, overflow, etc, at predefined crash points.
diff --git a/Documentation/debugfs/root.txt b/Documentation/debugfs/root.txt
new file mode 100644
index 000000000000..3bb2ff395aff
--- /dev/null
+++ b/Documentation/debugfs/root.txt
@@ -0,0 +1,137 @@
+Debugfs root files
+Started by David Rientjes <[email protected]>
+
+This file documents files at the root of debugfs. For information on mounting
+or creating debugfs interfaces, please see
+Documentation/filesystems/debugfs.txt.
+
+Files under subdirectories of debugfs should be documented in a file of its
+subdirectory name in Documentation/debugfs.
+
+
+clear_warn_once
+===============
+Normally, WARN_ONCE() and WARN_ON_ONCE() prints a particular warning only a
+single time during a system's uptime.
+
+This file cannot be read.
+
+When written with any value, this clears the state of all such warnings. This
+will cause the warnings to be emitted once again if reached.
+
+This is often useful for test suites to reproduce problems and detect errors
+that would otherwise be suppressed.
+
+
+fault_around_bytes
+==================
+On read fault, the VM attempts to fault pages surrounding the fault address for
+spacial locality.
+
+When read, this specifices that number of bytes that the VM will attempt to map
+around the faulting address.
+
+When written with a power-of-2 size, or the minimum of the native page size of
+the system, this defines the number of bytes to fault around. The value will
+be rounded down to the nearest power-of-2. The maximum value is the typically
+the amount of memory mapped by a pmd.
+
+
+oom_free_timeout_ms
+===================
+When a process is out of memory (oom) killed, a grace period is allowed for the
+process to handle the SIGKILL and free its memory before additional processes
+are oom killed. In such situations, it is possible that the system becomes
+livelocked because the oom victim is waiting on a lock held by an allocator.
+
+When read, this specifies the minimum number of millisecs that the oom killer
+will wait before killing additional processes because it is assumed the original
+victim cannot make forward progress.
+
+When written, this increases or decreases the number of millisecs to wait before
+additional processes are oom killed. A lower value will cause the oom killer
+to more aggressively kill additional processes, perhaps unnecessarily because
+the original victim could exit. A higher value allows more time for the victim
+to exit.
+
+Since the oom reaper can usually free a least part of the victim's memory
+before it actually exits, it is recommended to set this to enough time such
+that additional processes are not killed unnecessarily.
+
+
+sleep_time
+==========
+Timekeeping keeps track of how much time is spent in suspend.
+
+When read, this file shows a histogram that describes the number of times that
+timekeeping was suspended for the shown range, in seconds.
+
+This file cannot be written.
+
+
+split_huge_pages
+================
+When transparent hugepages is enabled, hugepages may be transparently split
+without knowledge of the application that maps them.
+
+This file cannot be read.
+
+When '1' is written, this walks all memory and synchronously splits all
+transparent hugepages. The number of hugepages split is shown in the kernel
+log.
+
+This is typically only needed for debugging.
+
+
+suspend_stats
+=============
+Supend to RAM provides statistics on the number of successes, and the number of
+failures in suspend, as well as a breakdown of how many failures are for the
+various possible reasons.
+
+When read, the following is example output:
+ success: 20
+ fail: 5
+ failed_freeze: 0
+ failed_prepare: 0
+ failed_suspend: 5
+ failed_suspend_noirq: 0
+ failed_resume: 0
+ failed_resume_noirq: 0
+ failures:
+ last_failed_dev: alarm
+ adc
+ last_failed_errno: -16
+ -16
+ last_failed_step: suspend
+ suspend
+
+This specifies the last two failed devices, error number, and failed suspend
+step.
+
+This file cannot be written.
+
+
+wakeup_sources
+==============
+For power management sleep, it is helpful to know the source of any wakeups
+that cause the sleep state to be interrupted.
+
+When read, this file specifies the source of wakeups (normally a device or
+timer), the active, event, and wakeup counts, total time, max time, and last
+change.
+
+The following is example output:
+name active_count event_count wakeup_count expire_count active_since total_time max_time last_change prevent_suspend_time
+0000:00:1d.2 0 0 0 0 0 0 0 17416 0
+0000:00:1d.1 0 0 0 0 0 0 0 17415 0
+0000:00:1d.0 0 0 0 0 0 0 0 17414 0
+0000:00:1a.2 0 0 0 0 0 0 0 17413 0
+0000:00:1a.0 0 0 0 0 0 0 0 17412 0
+0000:00:1d.7 0 0 0 0 0 0 0 17406 0
+0000:00:1a.7 0 0 0 0 0 0 0 17395 0
+
+This file cannot be written.
+
+This is often helpful to determine the source of wakeups that may otherwise
+be unknown and for debugging.
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
index 4f45f71149cb..100fbd623b85 100644
--- a/Documentation/filesystems/debugfs.txt
+++ b/Documentation/filesystems/debugfs.txt
@@ -21,6 +21,10 @@ options can be used.

Note that the debugfs API is exported GPL-only to modules.

+This document describes how information can be exported to and manipulated by
+user space. For information on individual files present in debugfs, at least
+those that have been documented, see Documentation/debugfs.
+
Code using debugfs should include <linux/debugfs.h>. Then, the first order
of business will be to create at least one directory to hold a set of
debugfs files:
@@ -51,6 +55,48 @@ operations should be provided; others can be included as needed. Again,
the return value will be a dentry pointer to the created file, NULL for
error, or ERR_PTR(-ENODEV) if debugfs support is missing.

+For simplicity, it is possible to use the generic DEFINE_SIMPLE_ATTRIBUTE()
+macro to specify the file operations:
+
+ DEFINE_SIMPLE_ATTRIBUTE(noop_debugfs_fops, noop_debugfs_read,
+ noop_debugfs_write, "%lu\n");
+
+And then define the static callback functions using the "val" formal to
+pass information to be read or written:
+
+ static int noop_debugfs_read(void *data, u64 *val)
+ {
+ u64 p = *data;
+
+ *val = p;
+ return 0;
+ }
+
+ static int noop_debugfs_write(void *data, u64 val)
+ {
+ u64 *p = data;
+
+ *p = val;
+ return 0;
+ }
+
+The "data" pointer from debugfs_create_file() is passed to these callbacks.
+In the simplest form, DEFINE_SIMPLE_ATTRIBUTE() can be used by passing NULL
+for its "data" argument and the read and write callbacks can modify data
+directly:
+
+ static u64 my_noop_value;
+ static int noop_debugfs_read(void *data, u64 *val)
+ {
+ *val = my_noop_value;
+ return 0;
+ }
+ static int noop_debugfs_write(void *data, u64 val)
+ {
+ my_noop_value = val;
+ return 0;
+ }
+
Create a file with an initial size, the following function can be used
instead:

diff --git a/Documentation/power/basic-pm-debugging.txt b/Documentation/power/basic-pm-debugging.txt
index 708f87f78a75..b1a57763b0e6 100644
--- a/Documentation/power/basic-pm-debugging.txt
+++ b/Documentation/power/basic-pm-debugging.txt
@@ -229,26 +229,5 @@ analogous to the one described in section 1. If you find some failing drivers,
you will have to unload them every time before an STR transition (ie. before
you run s2ram), and please report the problems with them.

-There is a debugfs entry which shows the suspend to RAM statistics. Here is an
-example of its output.
- # mount -t debugfs none /sys/kernel/debug
- # cat /sys/kernel/debug/suspend_stats
- success: 20
- fail: 5
- failed_freeze: 0
- failed_prepare: 0
- failed_suspend: 5
- failed_suspend_noirq: 0
- failed_resume: 0
- failed_resume_noirq: 0
- failures:
- last_failed_dev: alarm
- adc
- last_failed_errno: -16
- -16
- last_failed_step: suspend
- suspend
-Field success means the success number of suspend to RAM, and field fail means
-the failure number. Others are the failure number of different steps of suspend
-to RAM. suspend_stats just lists the last 2 failed devices, error number and
-failed step of suspend.
+See Documentation/debugfs/root.txt for suspend to RAM statistics if debugfs is
+mounted.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 960e82759ffb..8eb3917cbd3d 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -244,9 +244,10 @@ extfrag_threshold
This parameter affects whether the kernel will compact memory or direct
reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
debugfs shows what the fragmentation index for each order is in each zone in
-the system. Values tending towards 0 imply allocations would fail due to lack
-of memory, values towards 1000 imply failures are due to fragmentation and -1
-implies that the allocation will succeed as long as watermarks are met.
+the system. See Documentation/debugfs/extfrag.txt. Values tending towards 0
+imply allocations would fail due to lack of memory, values towards 1000 imply
+failures are due to fragmentation and -1 implies that the allocation will
+succeed as long as watermarks are met.

The kernel will not compact memory in a zone if the
fragmentation index is <= extfrag_threshold. The default value is 500.


2018-07-09 22:13:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] docs, debugfs: start explicit debugfs documentation

On Mon, 9 Jul 2018 15:00:17 -0700 (PDT) David Rientjes <[email protected]> wrote:

> There is no canonical location for debugfs docuemntation, so start one.
>
> This is primarily motivated to describe the oom_free_timeout_ms interface
> but it is extended for all the debugfs files that I am personally
> interested in.
>
> Hopefully this can be expanded in the future for better insight into how
> the various interfaces can be used.

(cc Greg)

2018-07-10 06:43:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [patch] docs, debugfs: start explicit debugfs documentation

On Mon, Jul 09, 2018 at 03:12:48PM -0700, Andrew Morton wrote:
> On Mon, 9 Jul 2018 15:00:17 -0700 (PDT) David Rientjes <[email protected]> wrote:
>
> > There is no canonical location for debugfs docuemntation, so start one.

What is wrong with Documenation/filesystems/debugfs.txt?

> > This is primarily motivated to describe the oom_free_timeout_ms interface
> > but it is extended for all the debugfs files that I am personally
> > interested in.
> >
> > Hopefully this can be expanded in the future for better insight into how
> > the various interfaces can be used.

Ah, you are talking about the contents of debugfs files, right? Just
use Documenation/ABI/ if you really want to document these things. But
really, it's debugfs, so things can, and will, change, you should never
build tools that rely on the specific information contained in those
files.

thanks,

greg k-h

2018-07-10 06:46:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [patch] docs, debugfs: start explicit debugfs documentation

On Mon, Jul 09, 2018 at 03:00:17PM -0700, David Rientjes wrote:
> There is no canonical location for debugfs docuemntation, so start one.
>
> This is primarily motivated to describe the oom_free_timeout_ms interface
> but it is extended for all the debugfs files that I am personally
> interested in.
>
> Hopefully this can be expanded in the future for better insight into how
> the various interfaces can be used.
>
> Suggested-by: Andrew Morton <[email protected]>
> Signed-off-by: David Rientjes <[email protected]>
> ---
> Documentation/clearing-warn-once.txt | 7 --
> Documentation/debugfs/00-INDEX | 8 ++
> Documentation/debugfs/extfrag.txt | 46 +++++++
> Documentation/debugfs/provoke-crashes.txt | 8 ++
> Documentation/debugfs/root.txt | 137 +++++++++++++++++++++
> Documentation/filesystems/debugfs.txt | 46 +++++++
> Documentation/power/basic-pm-debugging.txt | 25 +---
> Documentation/sysctl/vm.txt | 7 +-

This is a mix of a lot of different things all at once.

I'll gladly take the update to the debugfs.txt file for the newer api
calls as a separate file.

For the "this is what a specific debugfs file contains", those should go
into Documenation/ABI/ if you really want to document those types of
things.

But as it is, this single patch does too many different things all at
once.

thanks,

greg k-h

2018-07-10 11:29:54

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [patch] docs, debugfs: start explicit debugfs documentation

On Tue, 10 Jul 2018 08:45:06 +0200
Greg KH <[email protected]> wrote:

> For the "this is what a specific debugfs file contains", those should go
> into Documenation/ABI/ if you really want to document those types of
> things.

Do we really want to start populating Documentation/ABI with stuff that's
explicitly *not* ABI? Keeping it separate might make more sense, IMO.
I'd put extfrag_index with the MM docs, for example.

> But as it is, this single patch does too many different things all at
> once.

...and yet not enough :) This stuff is almost in RST already, it would
be nice to go all the way and integrate it into the rest of our docs.

Thanks,

jon

2018-07-10 11:48:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [patch] docs, debugfs: start explicit debugfs documentation

On Tue, Jul 10, 2018 at 05:28:35AM -0600, Jonathan Corbet wrote:
> On Tue, 10 Jul 2018 08:45:06 +0200
> Greg KH <[email protected]> wrote:
>
> > For the "this is what a specific debugfs file contains", those should go
> > into Documenation/ABI/ if you really want to document those types of
> > things.
>
> Do we really want to start populating Documentation/ABI with stuff that's
> explicitly *not* ABI? Keeping it separate might make more sense, IMO.
> I'd put extfrag_index with the MM docs, for example.

I personally don't think that debugfs files should be documented
anywhere, unless it makes sense from a "help debug the kernel" point of
view. And yes, you are right, we don't want to document things in /ABI/
that are not ABI stuff, as debugfs files can, and do, change at random
times.

Putting the info in the subsystem specific documents makes sense.

thanks,

greg k-h