Date:   Wed, 1 Feb 2023 11:03:54 +0700
From:   Bagas Sanjaya <bagasdotme@gmail.com>
To:     Shuah Khan <skhan@linuxfoundation.org>, corbet@lwn.net
Cc:     sshefali021@gmail.com, kstewart@linuxfoundation.org,
        linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] docs: add workload-tracing document to admin-guide
Message-ID: <Y9nkqhAS6EW2Lu8Z@debian.me>
References: <20230131221105.39216-1-skhan@linuxfoundation.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
        protocol="application/pgp-signature"; boundary="D4aY0NVIhLZOPm37"
Content-Disposition: inline
In-Reply-To: <20230131221105.39216-1-skhan@linuxfoundation.org>
Precedence: bulk


--D4aY0NVIhLZOPm37
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 31, 2023 at 03:11:05PM -0700, Shuah Khan wrote:
> Add a new section to the admin-guide with information of interest to
> application developers and system integrators doing analysis of the
> Linux kernel for safety critical applications.
>=20
> This section will contain documents supporting analysis of kernel
> interactions with applications, and key kernel subsystems expectations.
>=20
> Add a new workload-tracing document to this new section.
>=20
> Signed-off-by: Shefali Sharma <sshefali021@gmail.com>
> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
> ---
> Changes since v2: Addressed review comments on v2

I think you haven't addressed my comments there [1], so I have to
write the improv:

---- >8 ----

diff --git a/Documentation/admin-guide/workload-tracing.rst b/Documentation=
/admin-guide/workload-tracing.rst
index 5fad64b4ebd66f..ac60ff9dec8f0e 100644
--- a/Documentation/admin-guide/workload-tracing.rst
+++ b/Documentation/admin-guide/workload-tracing.rst
@@ -27,10 +27,10 @@ Methodology
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
 `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ is a
-diagnostic, instructional, and debugging tool and can be used to discover
-the system resources in use by a workload. Once we discover and understand
-the workload needs, we can focus on them to avoid regressions and use it
-to evaluate safety considerations. We use strace tool to trace workloads.
+diagnostic, instructional, and debugging tool and can be used to
+discover the system resources in use by a workload by tracing it. Once
+we discover and understand the workload needs, we can focus on them to
+avoid regressions and use it to evaluate safety considerations.
=20
 This method of tracing using strace tells us the system calls invoked by
 the workload and doesn't include all the system calls that can be invoked
@@ -43,7 +43,7 @@ outlined here will trace and find all possible code paths=
=2E The completeness
 of the system usage information depends on the completeness of coverage of=
 a
 workload.
=20
-The goal is tracing a workload on a system running a default kernel without
+The goal is to trace workloads on a system running a default kernel without
 requiring custom kernel installs.
=20
 How do we gather fine-grained system information?
@@ -63,9 +63,9 @@ insight into the process. "perf annotate" tool generates =
the statistics of
 each instruction of the program. This document goes over the details of how
 to gather fine-grained information on a workload's usage of system resourc=
es.
=20
-We used strace to trace the perf, stress-ng, paxtest workloads to illustra=
te
-our methodology to discover resources used by a workload. This process can
-be applied to trace other workloads.
+In this document, we use strace to trace the perf, stress-ng, paxtest
+workloads to illustrate our methodology to discover resources used by a
+workload. This process can be applied to trace other workloads.
=20
 Getting the system ready for tracing
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@@ -73,34 +73,38 @@ Getting the system ready for tracing
 Before we can get started we will show you how to get your system ready.
 We assume that you have a Linux distribution running on a physical system
 or a virtual machine. Most distributions will include strace command. Let=
=E2=80=99s
-install other tools that aren=E2=80=99t usually included to build Linux ke=
rnel.
+install other tools that aren=E2=80=99t usually pre-installed to build Lin=
ux kernel.
 Please note that the following works on Debian based distributions. You
 might have to find equivalent packages on other Linux distributions.
=20
 Install tools to build Linux kernel and tools in kernel repository.
 scripts/ver_linux is a good way to check if your system already has
-the necessary tools: ::
+the necessary tools::
=20
   sudo apt-get build-essentials flex bison yacc
   sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev=
 libperl-dev libdw-dev
=20
-cscope is a good tool to browse kernel sources. Let's install it now: ::
+cscope is a good tool to browse kernel sources. Let's install it now::
=20
   sudo apt-get install cscope
=20
-Install stress-ng and paxtest: ::
+Install stress-ng and paxtest::
=20
   apt-get install stress-ng
   apt-get install paxtest
=20
+You will also need to clone Linus's mainline tree, which can be grabbed
+by::
+ =20
+  git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.g=
it linux
+
 Workload overview
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
-As mentioned earlier, we used strace to trace perf bench, stress-ng and
-paxtest workloads to show how to analyze a workload and identify Linux
-subsystems used by these workloads. Let's start with an overview of these
-three workloads to get a better understanding of what they do and how to
-use them.
+As mentioned earlier, the workloads to be analyzed here are perf,
+stress-ng, and paxtest. Let's start with an overview of these three
+workloads to get a better understanding of what they do and how to use
+them.
=20
 perf bench (all) workload
 -------------------------
@@ -108,32 +112,34 @@ perf bench (all) workload
 The perf bench command contains multiple multi-threaded microkernel
 benchmarks for executing different subsystems in the Linux kernel and
 system calls. This allows us to easily measure the impact of changes,
-which can help mitigate performance regressions. It also acts as a common
-benchmarking framework, enabling developers to easily create test cases,
-integrate transparently, and use performance-rich tooling subsystems.
+which can help to mitigate performance regressions. It also acts as a
+common benchmarking framework, enabling developers to easily create test
+cases, integrate transparently, and use performance-rich tooling
+subsystems.
=20
 Stress-ng netdev stressor workload
 ----------------------------------
=20
 stress-ng is used for performing stress testing on the kernel. It allows
 you to exercise various physical subsystems of the computer, as well as
-interfaces of the OS kernel, using "stressor-s". They are available for
+interfaces of the OS kernel, using stressors. They are available for
 CPU, CPU cache, devices, I/O, interrupts, file system, memory, network,
-operating system, pipelines, schedulers, and virtual machines. Please refer
-to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_ to
-find the description of all the available stressor-s. The netdev stressor
-starts specified number (N) of workers that exercise various netdevice
-ioctl commands across all the available network devices.
+operating system, pipelines, schedulers, and virtual machines. Please
+refer to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_
+for the details of all available stressors. The netdev stressor starts
+specified number (N) of workers that exercise various netdevice ioctl
+commands across all the available network devices.
=20
 paxtest kiddie workload
 -----------------------
=20
 paxtest is a program that tests buffer overflows in the kernel. It tests
-kernel enforcements over memory usage. Generally, execution in some memory
-segments makes buffer overflows possible. It runs a set of programs that
-attempt to subvert memory usage. It is used as a regression test suite for
-PaX, but might be useful to test other memory protection patches for the
-kernel. We used paxtest kiddie mode which looks for simple vulnerabilities.
+kernel enforcements over memory usage. Generally, execution in some
+memory segments makes buffer overflows possible. It runs a set of
+programs that attempt to subvert memory usage. It is originally intended
+as a regression test suite for PaX, but can also useful to test other
+memory protection patches for the kernel. Here, we use paxtest kiddie
+mode which looks for simple vulnerabilities.
=20
 What is strace and how do we use it?
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@@ -155,51 +161,40 @@ suppressing the regular output. This attempts to show=
 system time (CPU time
 spent running in the kernel) independent of wall clock time. We plan to use
 these features to get information on workload system usage.
=20
-strace command supports basic, verbose, and stats modes. strace command wh=
en
-run in verbose mode gives more detailed information about the system calls
-invoked by a process.
+strace command supports basic ("strace <process>", verbose ("strace -v
+<process>"), and stats ("strace -v") modes. In verbose mode, strace
+gives more detailed information about syscalls invoked by a process.
=20
-Running strace -c generates a report of the percentage of time spent in ea=
ch
-system call, the total time in seconds, the microseconds per call, the tot=
al
-number of calls, the count of each system call that has failed with an err=
or
-and the type of system call made.
-
- * Usage: strace <command we want to trace>
- * Verbose mode usage: strace -v <command>
- * Gather statistics: strace -c <command>
-
-We used the =E2=80=9C-c=E2=80=9D option to gather fine-grained run-time st=
atistics in use
-by three workloads we have chose for this analysis.
-
- * perf
- * stress-ng
- * paxtest
+In stats mode, strace generates fine-grained run-time statistics report
+which consisted of: percentage of time spent in each system call; the
+total time in seconds; the microseconds per call; the total number of
+calls, the count of each system call that has failed with an error and
+the type of system call made.
=20
 What is cscope and how do we use it?
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
 Now let=E2=80=99s look at `cscope <https://cscope.sourceforge.net/>`_, a c=
ommand
-line tool for browsing C, C++ or Java code-bases. We can use it to find
+line tool for browsing C, C++ or Java code-bases. You can use it to find
 all the references to a symbol, global definitions, functions called by a
 function, functions calling a function, text strings, regular expression
 patterns, files including a file.
=20
-We can use cscope to find which system call belongs to which subsystem.
-This way we can find the kernel subsystems used by a process when it is
-executed.
+In context of this document, you can use cscope to find which system
+call belongs to which subsystem. This way you can find the kernel
+subsystems used by a process when it is executed.
=20
-Let=E2=80=99s checkout the latest Linux repository and build cscope databa=
se: ::
+To begin using cscope, cd to the kernel sources directory and build the
+database::
=20
-  git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.g=
it linux
   cd linux
   cscope -R -p10  # builds cscope.out database before starting browse sess=
ion
   cscope -d -p10  # starts browse session on cscope.out database
=20
-Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to
-enter into the browsing session. cscope by default cscope.out database.
-To get out of this mode press ctrl+d. -p option is used to specify the
-number of file path components to display. -p10 is optimal for browsing
-kernel sources.
+Here, "cscope -R -p10" builds the database and "cscope -d -p10" to
+browses the resulting database, which is by default in cscope.out. To
+quit browsing, type ctrl+d. -p option is used to specify the directory
+depth. The 10-level depth is sufficient for browsing kernel sources.
=20
 What is perf and how do we use it?
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
@@ -210,21 +205,20 @@ a simple command line interface. Perf is based on the=
 perf_events interface
 exported by the kernel. It is very useful for profiling the system and
 finding performance bottlenecks in an application.
=20
-If you haven't already checked out the Linux mainline repository, you can =
do
-so and then build kernel and perf tool: ::
+Change to the kernel sources directory and build both kernel and perf tool=
::
=20
-  git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.g=
it linux
   cd linux
   make -j3 all
   cd tools/perf
   make
=20
-Note: The perf command can be built without building the kernel in the
-repository and can be run on older kernels. However matching the kernel
-and perf revisions gives more accurate information on the subsystem usage.
+.. note::
+   The perf command can be built without building the kernel in the
+   repository and can be run on any kernels. However matching the kernel
+   and perf revisions gives more accurate information on the subsystem usa=
ge.
=20
-We used "perf stat" and "perf bench" options. For a detailed information on
-the perf tool, run "perf -h".
+Below, we will describe "perf stat" and "perf bench" options. For
+detailed help on perf tool, see "perf -h".
=20
 perf stat
 ---------
@@ -268,17 +262,17 @@ exercised:
  * SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFMETRIC, SIOCGIFMTU
  * SIOCGIFHWADDR, SIOCGIFMAP, SIOCGIFTXQLEN
=20
-The following command runs the stressor: ::
+To run the netdev stressor::
=20
   stress-ng --netdev 1 -t 60 --metrics command.
=20
-We can use the perf record command to record the events and information
-associated with a process. This command records the profiling data in the
-perf.data file in the same directory.
+Then you can use "perf record" command to record the events and
+information associated with a process. This command records the
+profiling data in the perf.data file in the same directory.
=20
-Using the following commands you can record the events associated with the
-netdev stressor, view the generated report perf.data and annotate the to
-view the statistics of each instruction of the program: ::
+For example, to record stress-ng stressor above, view the generated
+report and annotate it to gather statistics of each instruction of the
+program::
=20
   perf record stress-ng --netdev 1 -t 60 --metrics command.
   perf report
@@ -288,22 +282,21 @@ What is paxtest and how do we use it?
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
 paxtest is a program that tests buffer overflows in the kernel. It tests
-kernel enforcements over memory usage. Generally, execution in some memory
-segments makes buffer overflows possible. It runs a set of programs that
-attempt to subvert memory usage. It is used as a regression test suite for
-PaX, and will be useful to test other memory protection patches for the
-kernel.
+kernel enforcements over memory usage. Generally, execution in some
+memory segments makes buffer overflows possible. It runs a set of
+programs that attempt to subvert memory usage. It is originally intend
+as a regression test suite for PaX, but it can also be useful to test
+other memory protection patches for the kernel.
=20
-paxtest provides kiddie and blackhat modes. The paxtest kiddie mode runs
-in normal mode, whereas the blackhat mode tries to get around the protecti=
on
+paxtest provides kiddie and blackhat modes. The former runs
+in normal mode, whereas the latter tries to get around the protection
 of the kernel testing for vulnerabilities. We focus on the kiddie mode here
-and combine "paxtest kiddie" run with "perf record" to collect CPU stack
-traces for the paxtest kiddie run to see which function is calling other
-functions in the performance profile. Then the "dwarf" (DWARF's Call Frame
-Information) mode can be used to unwind the stack.
+and combine it with perf to collect CPU stack
+traces for the paxtest run to see which function is calling other
+functions in the performance profile. Then stack unwinding can be done
+by specifying "--call-graph dwarf" option to perf.
=20
-The following command can be used to view resulting report in call-graph
-format: ::
+Thus, the combined commands are::
=20
   perf record --call-graph dwarf paxtest kiddie
   perf report --stdio
@@ -316,14 +309,17 @@ Now that we understand the workloads, let's start tra=
cing them.
 Tracing perf bench all workload
 -------------------------------
=20
-Run the following command to trace perf bench all workload: ::
+To trace all workloads under perf benchmark::
=20
- strace -c perf bench all
+  strace -c perf bench all
=20
-**System Calls made by the workload**
+The below table is the list of invoked syscalls, with number of
+times each is invoked, and the corresponding Linux subsystem.
=20
-The below table shows the system calls invoked by the workload, number of
-times each system call is invoked, and the corresponding Linux subsystem.
+.. note::
+
+   The syscall tables below are generated from example workloads. The actu=
al
+   figures may differ depending on workload being traced.
=20
 +-------------------+-----------+-----------------+-----------------------=
--+
 | System Call       | # calls   | Linux Subsystem | System Call (API)     =
  |
@@ -426,14 +422,11 @@ times each system call is invoked, and the correspond=
ing Linux subsystem.
 Tracing stress-ng netdev stressor workload
 ------------------------------------------
=20
-Run the following command to trace stress-ng netdev stressor workload: ::
+To trace netdev stress-ng workload::
=20
   strace -c  stress-ng --netdev 1 -t 60 --metrics
=20
-**System Calls made by the workload**
-
-The below table shows the system calls invoked by the workload, number of
-times each system call is invoked, and the corresponding Linux subsystem.
+The corresponding syscall table is:
=20
 +-------------------+-----------+-----------------+-----------------------=
--+
 | System Call       | # calls   | Linux Subsystem | System Call (API)     =
  |
@@ -520,14 +513,11 @@ times each system call is invoked, and the correspond=
ing Linux subsystem.
 Tracing paxtest kiddie workload
 -------------------------------
=20
-Run the following command to trace paxtest kiddie workload: ::
+To trace paxtest (kiddie mode) workload::
=20
- strace -c paxtest kiddie
+  strace -c paxtest kiddie
=20
-**System Calls made by the workload**
-
-The below table shows the system calls invoked by the workload, number of
-times each system call is invoked, and the corresponding Linux subsystem.
+The corresponding syscall table is:
=20
 +-------------------+-----------+-----------------+----------------------+
 | System Call       | # calls   | Linux Subsystem | System Call (API)    |
@@ -590,8 +580,10 @@ times each system call is invoked, and the correspondi=
ng Linux subsystem.
 Conclusion
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
-This document is intended to be used as a guide on how to gather fine-grai=
ned
-information on the resources in use by workloads using strace.
+This document is intended to be used as a guide on how to gather
+fine-grained information on the resources in use by workloads using
+strace. You may want to consult references below in case you want to run
+strace for your workload needs.
=20
 References
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


Thanks.

[1]: https://lore.kernel.org/linux-doc/Y9STCwt2FnYf4%2FX4@debian.me/

--=20
An old man doll... just what I always wanted! - Clara

--D4aY0NVIhLZOPm37
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEABYKAB0WIQSSYQ6Cy7oyFNCHrUH2uYlJVVFOowUCY9nkpgAKCRD2uYlJVVFO
o6TdAQDQk9+khkkcYA/f5UlDG8O2WFHmy6IkTe33UD2qUwzJAgD9HmLX6XjNVzhG
qcvSqijRfE9jJhDgKjFiOV/vfL7qbQc=
=RaSQ
-----END PGP SIGNATURE-----

--D4aY0NVIhLZOPm37--