2019-06-25 19:21:00

by Arseny Maslennikov

[permalink] [raw]
Subject: [PATCH v2 0/7] TTY Keyboard Status Request

This patch series introduces TTY keyboard status request, a feature of
the n_tty line discipline that reserves a character in struct termios
(^T by default) and reacts to it by printing a short informational line
to the terminal and sending a Unix signal to the tty's foreground
process group. The processes may, in response to the signal, output a
textual description of what they're doing.

The feature has been present in a similar form at least in
Free/Open/NetBSD; it would be nice to have something like this in Linux
as well. There is an LKML thread[1] where users have previously
expressed the rationale for this.

The current implementation does not break existing kernel API in any
way, since, fortunately, all the architectures supported by the kernel
happen to have at least 1 free byte in the termios control character
array.

The series should cleanly apply to tty-next.

To thoroughly test these, one might need at least a patched stty among
other tools, so I've brought up a simple initrd generator[2] which can
be used to create a lightweight environment to boot up in a VM and to
fiddle with.

[1] https://lore.kernel.org/lkml/1415200663.3247743.187387481.75CE9317@webmail.messagingengine.com/
[2] https://github.com/porrided/tty-kb-status-userspace

v2 <- v1: removed useless debugging bits.

Discussion of v1:
https://lore.kernel.org/lkml/[email protected]/

Arseny Maslennikov (7):
signal.h: Define SIGINFO on all architectures
tty: termios: Reserve space for VSTATUS in .c_cc
n_tty: Send SIGINFO to fg pgrp on status request character
linux/signal.h: Ignore SIGINFO by default in new tasks
tty: Add NOKERNINFO lflag to termios
n_tty: ->ops->write: Cut core logic out to a separate function
n_tty: Provide an informational line on VSTATUS receipt

arch/alpha/include/asm/termios.h | 4 +-
arch/alpha/include/uapi/asm/termbits.h | 2 +
arch/arm/include/uapi/asm/signal.h | 1 +
arch/h8300/include/uapi/asm/signal.h | 1 +
arch/ia64/include/asm/termios.h | 4 +-
arch/ia64/include/uapi/asm/signal.h | 1 +
arch/ia64/include/uapi/asm/termbits.h | 2 +
arch/m68k/include/uapi/asm/signal.h | 1 +
arch/mips/include/asm/termios.h | 4 +-
arch/mips/include/uapi/asm/signal.h | 1 +
arch/mips/include/uapi/asm/termbits.h | 2 +
arch/parisc/include/asm/termios.h | 4 +-
arch/parisc/include/uapi/asm/signal.h | 1 +
arch/parisc/include/uapi/asm/termbits.h | 2 +
arch/powerpc/include/asm/termios.h | 4 +-
arch/powerpc/include/uapi/asm/signal.h | 1 +
arch/powerpc/include/uapi/asm/termbits.h | 2 +
arch/s390/include/asm/termios.h | 4 +-
arch/s390/include/uapi/asm/signal.h | 1 +
arch/sparc/include/asm/termios.h | 4 +-
arch/sparc/include/uapi/asm/signal.h | 2 +
arch/sparc/include/uapi/asm/termbits.h | 2 +
arch/x86/include/uapi/asm/signal.h | 1 +
arch/xtensa/include/uapi/asm/signal.h | 1 +
arch/xtensa/include/uapi/asm/termbits.h | 2 +
drivers/tty/Makefile | 3 +-
drivers/tty/n_tty.c | 70 ++++-
drivers/tty/n_tty_status.c | 337 +++++++++++++++++++++++
include/asm-generic/termios.h | 4 +-
include/linux/sched.h | 7 +
include/linux/signal.h | 5 +-
include/linux/tty.h | 7 +-
include/uapi/asm-generic/signal.h | 1 +
include/uapi/asm-generic/termbits.h | 2 +
34 files changed, 457 insertions(+), 33 deletions(-)
create mode 100644 drivers/tty/n_tty_status.c

--
2.20.1


2019-06-25 19:21:08

by Arseny Maslennikov

[permalink] [raw]
Subject: [PATCH v2 6/7] n_tty: ->ops->write: Cut core logic out to a separate function

To simplify internal re-use of the line discipline's write method,
we isolate the work it does to its own function.

Since in-kernel callers might not refer to the tty through a file,
the struct file* argument might make no sense, so we also stop
tty_io_nonblock() from dereferencing file too early, allowing
to pass NULL as the referring file here.

Signed-off-by: Arseny Maslennikov <[email protected]>
---
drivers/tty/n_tty.c | 33 ++++++++++++++++++++++-----------
include/linux/tty.h | 2 +-
2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 29f33798a6cd..7d7d84adcffe 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -2309,22 +2309,15 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
* lock themselves)
*/

-static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
- const unsigned char *buf, size_t nr)
+static ssize_t do_n_tty_write(struct tty_struct *tty, struct file *file,
+ const unsigned char *buf, size_t nr)
{
const unsigned char *b = buf;
DEFINE_WAIT_FUNC(wait, woken_wake_function);
int c;
ssize_t retval = 0;

- /* Job control check -- must be done at start (POSIX.1 7.1.1.4). */
- if (L_TOSTOP(tty) && file->f_op->write != redirected_tty_write) {
- retval = tty_check_change(tty);
- if (retval)
- return retval;
- }
-
- down_read(&tty->termios_rwsem);
+ lockdep_assert_held_read(&tty->termios_rwsem);

/* Write out any echoed characters that are still pending */
process_echoes(tty);
@@ -2392,10 +2385,28 @@ static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
remove_wait_queue(&tty->write_wait, &wait);
if (nr && tty->fasync)
set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
- up_read(&tty->termios_rwsem);
return (b - buf) ? b - buf : retval;
}

+static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
+ const unsigned char *buf, size_t nr)
+{
+ ssize_t retval = 0;
+
+ /* Job control check -- must be done at start (POSIX.1 7.1.1.4). */
+ if (L_TOSTOP(tty) && file->f_op->write != redirected_tty_write) {
+ retval = tty_check_change(tty);
+ if (retval)
+ return retval;
+ }
+
+ down_read(&tty->termios_rwsem);
+ retval = do_n_tty_write(tty, file, buf, nr);
+ up_read(&tty->termios_rwsem);
+
+ return retval;
+}
+
/**
* n_tty_poll - poll method for N_TTY
* @tty: terminal device
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 07189f18f93d..ddb97704956d 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -388,7 +388,7 @@ static inline void tty_set_flow_change(struct tty_struct *tty, int val)

static inline bool tty_io_nonblock(struct tty_struct *tty, struct file *file)
{
- return file->f_flags & O_NONBLOCK ||
+ return (file && file->f_flags & O_NONBLOCK) ||
test_bit(TTY_LDISC_CHANGING, &tty->flags);
}

--
2.20.1

2019-06-25 19:21:14

by Arseny Maslennikov

[permalink] [raw]
Subject: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

If the three termios local flags isig, icanon, iexten are enabled
and the local flag nokerninfo is disabled for a tty governed
by the n_tty line discipline, then on receiving the keyboard status
character n_tty will generate a status message and write it out to
the tty before sending SIGINFO to the tty's foreground process group.

This kerninfo line contains information about the current system load
as well as some properties of "the most interesting" process in the
tty's current foreground process group, namely:
- its PID as seen inside its deepest PID namespace;
* the whole process group ought to be in a single PID namespace,
so this is actually deterministic
- its saved command name truncated to 16 bytes (task_struct::comm);
* at the time of writing TASK_COMM_LEN == 16
- its state and some related bits, procps-style;
- for S and D: its symbolic wait channel, if available; or a short
description for other process states instead;
- its user, system and real rusage time values;
- its resident set size (as well as the high watermark) in kilobytes.

The "most interesting" process is chosen as follows:
- runnables over everything
- uninterruptibles over everything else
- among 2 runnables pick the biggest utime + stime
- any unresolved ties are decided in favour of greatest PID.

While the kerninfo line is not very useful for debugging the kernel
itself, since we have much more powerful debugging tools, it still gives
the user behind the terminal some meaningful feedback to a VSTATUS that
works even if no processes respond.

Signed-off-by: Arseny Maslennikov <[email protected]>
---
drivers/tty/Makefile | 3 +-
drivers/tty/n_tty.c | 22 +++
drivers/tty/n_tty_status.c | 337 +++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 7 +
include/linux/tty.h | 3 +
5 files changed, 371 insertions(+), 1 deletion(-)
create mode 100644 drivers/tty/n_tty_status.c

diff --git a/drivers/tty/Makefile b/drivers/tty/Makefile
index 020b1cd9294f..8bb5efb40b81 100644
--- a/drivers/tty/Makefile
+++ b/drivers/tty/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_TTY) += tty_io.o n_tty.o tty_ioctl.o tty_ldisc.o \
+obj-$(CONFIG_TTY) += tty_io.o tty_ioctl.o tty_ldisc.o \
tty_buffer.o tty_port.o tty_mutex.o \
tty_ldsem.o tty_baudrate.o tty_jobctrl.o \
+ n_tty.o n_tty_status.o \
n_null.o
obj-$(CONFIG_LEGACY_PTYS) += pty.o
obj-$(CONFIG_UNIX98_PTYS) += pty.o
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 7d7d84adcffe..74e6b4bf86bd 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -34,6 +34,7 @@
#include <linux/signal.h>
#include <linux/fcntl.h>
#include <linux/sched.h>
+#include <linux/kallsyms.h>
#include <linux/interrupt.h>
#include <linux/tty.h>
#include <linux/timer.h>
@@ -79,6 +80,8 @@
#define ECHO_BLOCK 256
#define ECHO_DISCARD_WATERMARK N_TTY_BUF_SIZE - (ECHO_BLOCK + 32)

+#define STATUS_LINE_LEN (2 * KSYM_NAME_LEN)
+
#define SIG_FLUSHING_MASK ( \
rt_sigmask(SIGINT) | rt_sigmask(SIGQUIT) | \
rt_sigmask(SIGTSTP) )
@@ -158,6 +161,8 @@ static inline unsigned char *echo_buf_addr(struct n_tty_data *ldata, size_t i)
return &ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)];
}

+static void n_tty_status_line(struct tty_struct *tty);
+
/* If we are not echoing the data, perhaps this is a secret so erase it */
static void zero_buffer(struct tty_struct *tty, u8 *buffer, int size)
{
@@ -1300,6 +1305,8 @@ n_tty_receive_char_special(struct tty_struct *tty, unsigned char c)
n_tty_receive_signal_char(tty, SIGTSTP, c);
return 0;
} else if (c == STATUS_CHAR(tty)) {
+ if (!L_NOKERNINFO(tty))
+ n_tty_status_line(tty);
n_tty_receive_signal_char(tty, SIGINFO, c);
return 0;
}
@@ -2489,6 +2496,21 @@ static int n_tty_ioctl(struct tty_struct *tty, struct file *file,
}
}

+static void n_tty_status_line(struct tty_struct *tty)
+{
+ struct n_tty_data *ldata = tty->disc_data;
+ char *msg, *buf;
+ msg = buf = kzalloc(STATUS_LINE_LEN, GFP_KERNEL);
+ tty_sprint_status_line(tty, buf + 1, STATUS_LINE_LEN - 1);
+ /* The only current caller of this takes output_lock for us. */
+ if (ldata->column != 0)
+ *msg = '\n';
+ else
+ msg++;
+ do_n_tty_write(tty, NULL, msg, strlen(msg));
+ kfree(buf);
+}
+
static struct tty_ldisc_ops n_tty_ops = {
.magic = TTY_LDISC_MAGIC,
.name = "n_tty",
diff --git a/drivers/tty/n_tty_status.c b/drivers/tty/n_tty_status.c
new file mode 100644
index 000000000000..4836710a8306
--- /dev/null
+++ b/drivers/tty/n_tty_status.c
@@ -0,0 +1,337 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/kallsyms.h>
+#include <linux/pid.h>
+#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/loadavg.h>
+#include <linux/sched/cputime.h>
+#include <linux/sched/mm.h>
+#include <linux/slab.h>
+#include <linux/tty.h>
+
+#define BCOMPARE(lbool, rbool) ((lbool) << 1 | (rbool))
+#define BCOMPARE_NONE 0
+#define BCOMPARE_RIGHT 1
+#define BCOMPARE_LEFT 2
+#define BCOMPARE_BOTH (BCOMPARE_LEFT | BCOMPARE_RIGHT)
+
+/*
+ * Select the most interesting task of two.
+ *
+ * The implemented approach is simple for now:
+ * - pick runnable
+ * - if no runnables, pick uninterruptible
+ * - if tie between runnables, pick highest utime + stime
+ * - if a tie is not broken by the above, pick highest pid nr.
+ *
+ * For reference, here's the one used in FreeBSD:
+ * - pick runnables over anything
+ * - if both runnables, pick highest cpu utilization
+ * - if no runnables, pick shortest sleep time (the scheduler
+ * actually takes care to maintain this statistic)
+ * - other ties are decided in favour of youngest process.
+ */
+static struct task_struct *__better_proc_R(struct task_struct *a,
+ struct task_struct *b)
+{
+ unsigned long flags;
+ u64 atime, btime, tgutime, tgstime;
+ struct task_struct *ret;
+
+ if (!lock_task_sighand(a, &flags))
+ goto out_a_unlocked;
+ thread_group_cputime_adjusted(a, &tgutime, &tgstime);
+ atime = tgutime + tgstime;
+ unlock_task_sighand(a, &flags);
+
+ if (!lock_task_sighand(b, &flags))
+ goto out_b_unlocked;
+ thread_group_cputime_adjusted(b, &tgutime, &tgstime);
+ btime = tgutime + tgstime;
+ unlock_task_sighand(b, &flags);
+
+ ret = atime > btime ? a : b;
+
+ return ret;
+
+out_b_unlocked:
+out_a_unlocked:
+ return task_pid_vinr(a) > task_pid_vinr(b) ? a : b;
+}
+
+static struct task_struct *__better_proc(struct task_struct *a,
+ struct task_struct *b)
+{
+ if (!pid_alive(a))
+ return b;
+ if (!pid_alive(b))
+ return a;
+
+ switch (BCOMPARE(a->state == TASK_RUNNING,
+ b->state == TASK_RUNNING)) {
+ case BCOMPARE_LEFT:
+ return a;
+ case BCOMPARE_RIGHT:
+ return b;
+ case BCOMPARE_BOTH:
+ return __better_proc_R(a, b);
+ }
+
+ switch (BCOMPARE(a->state == TASK_UNINTERRUPTIBLE,
+ b->state == TASK_UNINTERRUPTIBLE)) {
+ case BCOMPARE_LEFT:
+ return a;
+ case BCOMPARE_RIGHT:
+ return b;
+ case BCOMPARE_BOTH:
+ break;
+ }
+
+ /* TODO: Perhaps we should check something else... */
+ return task_pid_vinr(a) > task_pid_vinr(b) ? a : b;
+}
+
+/* Weed out NULLs. */
+static inline struct task_struct *better_proc(struct task_struct *a,
+ struct task_struct *b) {
+ return a ? (b ? __better_proc(a, b) : a) : b;
+}
+
+static int scnprint_load(char *msgp, size_t size)
+{
+ unsigned long la[3];
+
+ get_avenrun(la, FIXED_1/200, 0);
+ return scnprintf(msgp, size, "load: %lu.%02lu; ",
+ LOAD_INT(la[0]), LOAD_FRAC(la[0]));
+}
+
+static int scnprint_task(char *msgp, size_t size, struct task_struct *task)
+{
+ char commname[TASK_COMM_LEN];
+
+ get_task_comm(commname, task);
+ return scnprintf(msgp, size, "%d/%s:", task_pid_vinr(task), commname);
+}
+
+static int scnprint_rusage(char *msgp, ssize_t size,
+ struct task_struct *task, struct mm_struct *mm)
+{
+ struct rusage ru;
+ struct timeval utime, stime;
+ struct timespec rtime;
+ u64 now;
+ int ret = 0;
+ int psz = 0;
+
+ getrusage(task, RUSAGE_BOTH, &ru);
+ now = ktime_get_ns();
+
+ utime = ru.ru_utime;
+ stime = ru.ru_stime;
+ rtime.tv_nsec = now - task->start_time;
+ rtime.tv_sec = rtime.tv_nsec / 1000000000;
+ rtime.tv_nsec %= 1000000000;
+
+ psz = scnprintf(msgp, size,
+ "%lu.%03lur %lu.%03luu %lu.%03lus",
+ rtime.tv_sec, rtime.tv_nsec / 1000000,
+ utime.tv_sec, utime.tv_usec / 1000,
+ stime.tv_sec, stime.tv_usec / 1000);
+ ret += psz;
+ msgp += psz;
+ size -= psz;
+
+ if (mm) {
+ psz = scnprintf(msgp, size,
+ " %luk/%luk",
+ get_mm_rss(mm) * PAGE_SIZE / 1024,
+ get_mm_hiwater_rss(mm) * PAGE_SIZE / 1024);
+ ret += psz;
+ }
+ return ret;
+}
+
+static int scnprint_state(char *msgp, ssize_t size,
+ struct task_struct *task, struct mm_struct *mm)
+{
+ char stat[8] = {0};
+ const char *state_descr = "";
+ struct task_struct *parent = NULL;
+ struct task_struct *real_parent = NULL;
+ unsigned long wchan = 0;
+ int psz = 0;
+ char symname[KSYM_NAME_LEN];
+
+ stat[psz++] = task_state_to_char(task);
+ if (task_nice(task) < 0)
+ stat[psz++] = '<';
+ else if (task_nice(task) > 0)
+ stat[psz++] = 'N';
+ if (mm && mm->locked_vm)
+ stat[psz++] = 'L';
+ if (get_nr_threads(task) > 1)
+ stat[psz++] = 'l';
+
+ switch (stat[0]) {
+ case 'R':
+ if (task_curr(task))
+ stat[psz++] = '!';
+ break;
+ case 'S':
+ case 'D':
+ wchan = get_wchan(task);
+ if (!wchan)
+ break;
+ if (!lookup_symbol_name(wchan, symname))
+ state_descr = symname;
+ else
+ /* get_wchan() returned something
+ * that looks like no kernel symbol.
+ */
+ state_descr = "*unknown*";
+ break;
+ case 'T':
+ state_descr = "stopped";
+ break;
+ case 't':
+ state_descr = "traced";
+ break;
+ case 'Z':
+ rcu_read_lock();
+ real_parent = rcu_dereference(task->real_parent);
+ parent = rcu_dereference(task->parent);
+ psz = sprintf(symname, "zombie; ppid=%d",
+ task_tgid_nr_ns(real_parent,
+ ns_of_pid(task_pid(task))));
+ if (parent != real_parent)
+ sprintf(symname + psz, " reaper=%d",
+ task_tgid_nr_ns(parent,
+ ns_of_pid(task_pid(task))));
+ rcu_read_unlock();
+ state_descr = symname;
+ break;
+ case 'I':
+ /* Can this even happen? */
+ state_descr = "idle";
+ break;
+ default:
+ state_descr = "unknown";
+ }
+
+ psz = scnprintf(msgp, size, "%s", stat);
+ msgp += psz;
+ size -= psz;
+ if (*state_descr)
+ psz += scnprintf(msgp, size, wchan ? " [%s]" : " (%s)", state_descr);
+
+ return psz;
+}
+
+/**
+ * tty_sprint_status_line - produce kerninfo line
+ * @tty: terminal device
+ * @msg: preallocated memory buffer
+ * @length: maximum line length
+ *
+ * Reports state of foreground process group in a null-terminated string
+ * located at @msg, @length bytes long. If @length is insufficient,
+ * the line gets truncated.
+ */
+void tty_sprint_status_line(struct tty_struct *t, char *msg, size_t length)
+{
+ struct task_struct *tsk = NULL, *mit = NULL;
+ struct mm_struct *mm;
+ struct pid *pgrp = NULL;
+ char *msgp = msg;
+ int psz = 0;
+
+ if (!length)
+ return;
+ length--; /* Make room for trailing '\n' */
+
+ psz = scnprint_load(msgp, length);
+ if (psz > 0) {
+ msgp += psz;
+ length -= psz;
+ }
+ if (!length)
+ goto finalize_message;
+
+ /* Not sure if session pid is protected by ctrl_lock
+ * or tasklist_lock...
+ */
+ pgrp = t->session;
+ if (pgrp == NULL) {
+ psz = scnprintf(msgp, length, "not a controlling tty");
+ if (psz > 0)
+ msgp += psz;
+ goto finalize_message;
+ }
+ pgrp = tty_get_pgrp(t);
+ if (pgrp == NULL) {
+ psz = scnprintf(msgp, length, "no foreground process group");
+ if (psz > 0)
+ msgp += psz;
+ goto finalize_message;
+ }
+
+ read_lock(&tasklist_lock);
+ do_each_pid_task(pgrp, PIDTYPE_PGID, tsk)
+ {
+ /* Select the most interesting task. */
+ if (tsk == better_proc(mit, tsk))
+ mit = tsk;
+ } while_each_pid_task(pgrp, PIDTYPE_PGID, tsk);
+ read_unlock(&tasklist_lock);
+
+ if (!pid_alive(mit))
+ goto finalize_message;
+
+ /* Gather intel on most interesting task. */
+ /* Can the mm of a foreground process turn out to be NULL?
+ * Definitely; for example, if it is a zombie.
+ */
+ mm = get_task_mm(mit);
+
+ psz = scnprint_task(msgp, length, mit);
+ if (psz > 0) {
+ msgp += psz;
+ length -= psz;
+ }
+ if (!length)
+ goto finalize_message;
+ *msgp++ = ' ';
+ length--;
+
+ psz = scnprint_state(msgp, length, mit, mm);
+ if (psz > 0) {
+ msgp += psz;
+ length -= psz;
+ }
+ if (!length)
+ goto finalize_message;
+ *msgp++ = ' ';
+ length--;
+
+ psz = scnprint_rusage(msgp, length, mit, mm);
+ if (psz > 0) {
+ msgp += psz;
+ length -= psz;
+ }
+ if (!length)
+ goto finalize_message;
+ *msgp++ = ' ';
+ length--;
+
+ if (!mm)
+ goto finalize_message;
+
+ mmput(mm);
+
+finalize_message:
+ *msgp++ = '\n';
+ if (pgrp)
+ put_pid(pgrp);
+}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..21bcb7fb0888 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1229,6 +1229,8 @@ static inline struct pid *task_pid(struct task_struct *task)
* task_xid_nr() : global id, i.e. the id seen from the init namespace;
* task_xid_vnr() : virtual id, i.e. the id seen from the pid namespace of
* current.
+ * task_xid_vinr() : virtual inner id, i.e. the id seen from the namespace of
+ * the task itself;
* task_xid_nr_ns() : id seen from the ns specified;
*
* see also pid_nr() etc in include/linux/pid.h
@@ -1250,6 +1252,11 @@ static inline pid_t task_pid_vnr(struct task_struct *tsk)
return __task_pid_nr_ns(tsk, PIDTYPE_PID, NULL);
}

+static inline pid_t task_pid_vinr(struct task_struct *tsk)
+{
+ return __task_pid_nr_ns(tsk, PIDTYPE_PID, ns_of_pid(task_pid(tsk)));
+}
+

static inline pid_t task_tgid_nr(struct task_struct *tsk)
{
diff --git a/include/linux/tty.h b/include/linux/tty.h
index ddb97704956d..a8e4df6d64b4 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -725,6 +725,9 @@ extern void __init n_tty_init(void);
static inline void n_tty_init(void) { }
#endif

+/* n_tty_status.c */
+extern void tty_sprint_status_line(struct tty_struct *tty, char *msg, size_t size);
+
/* tty_audit.c */
#ifdef CONFIG_AUDIT
extern void tty_audit_add_data(struct tty_struct *tty, const void *data,
--
2.20.1

2019-06-25 19:21:41

by Arseny Maslennikov

[permalink] [raw]
Subject: [PATCH v2 4/7] linux/signal.h: Ignore SIGINFO by default in new tasks

This matches the behaviour of other Unix-like systems that have SIGINFO
and causes less harm to processes that do not install handlers for this
signal, making the keyboard status character non-fatal for them.

This is implemented with the assumption that SIGINFO is defined
to be equivalent to SIGPWR; still, there is no reason for PWR to
result in termination of the signal recipient anyway — it does not
indicate there is a fatal problem with the recipient's execution
context (like e.g. FPE/ILL do), and we have TERM/KILL for explicit
termination requests.

Signed-off-by: Arseny Maslennikov <[email protected]>
---
include/linux/signal.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index 9702016734b1..c365754ea647 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -360,7 +360,7 @@ extern bool unhandled_signal(struct task_struct *tsk, int sig);
* | SIGSYS/SIGUNUSED | coredump |
* | SIGSTKFLT | terminate |
* | SIGWINCH | ignore |
- * | SIGPWR | terminate |
+ * | SIGPWR | ignore |
* | SIGRTMIN-SIGRTMAX | terminate |
* +--------------------+------------------+
* | non-POSIX signal | default action |
@@ -411,7 +411,8 @@ extern bool unhandled_signal(struct task_struct *tsk, int sig);

#define SIG_KERNEL_IGNORE_MASK (\
rt_sigmask(SIGCONT) | rt_sigmask(SIGCHLD) | \
- rt_sigmask(SIGWINCH) | rt_sigmask(SIGURG) )
+ rt_sigmask(SIGWINCH) | rt_sigmask(SIGURG) | \
+ rt_sigmask(SIGINFO) )

#define SIG_SPECIFIC_SICODES_MASK (\
rt_sigmask(SIGILL) | rt_sigmask(SIGFPE) | \
--
2.20.1

2019-06-25 21:35:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] linux/signal.h: Ignore SIGINFO by default in new tasks

On Tue, Jun 25, 2019 at 07:11:50PM +0300, Arseny Maslennikov wrote:
> This matches the behaviour of other Unix-like systems that have SIGINFO
> and causes less harm to processes that do not install handlers for this
> signal, making the keyboard status character non-fatal for them.
>
> This is implemented with the assumption that SIGINFO is defined
> to be equivalent to SIGPWR; still, there is no reason for PWR to
> result in termination of the signal recipient anyway — it does not
> indicate there is a fatal problem with the recipient's execution
> context (like e.g. FPE/ILL do), and we have TERM/KILL for explicit
> termination requests.

So this is a consequence of trying to overload SIGINFO with SIGPWR.
At least on some legacy Unix systems, the way SIGPWR worked was that
init would broadcast SIGPWR to all userspace process (with system
daemons on an exception list so they could get shutdown last).
Applications which didn't have a SIGPWR handler registered, would just
exit. Those that did, were expected to clean up in preparation with
the impending shutdown. After some period of time, then processes
would get hard killed and then system daemons would get shut down and
the system would cleanly shut itself down.

So SIGPWR acted much like SIGHUP, and that's why the default behavior
was "terminate". Now, as far as I know, we've not actually used in
that way, at least for most common distributions, but there is a sane
reason why things are they way there are, and in there are people who
have been using SIGPWR the way it had originally been intended, there
is risk in overloading SIGINFO with SIGPWR --- in particular, typing
^T might cause some enterprise database to start shutting itself down. :-)

(In particular it might be worth checking Linux ports of Oracle and
DB2.)

- Ted

2019-06-26 13:51:42

by Arseny Maslennikov

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] linux/signal.h: Ignore SIGINFO by default in new tasks

On Tue, Jun 25, 2019 at 05:32:15PM -0400, Theodore Ts'o wrote:
> On Tue, Jun 25, 2019 at 07:11:50PM +0300, Arseny Maslennikov wrote:
> > This matches the behaviour of other Unix-like systems that have SIGINFO
> > and causes less harm to processes that do not install handlers for this
> > signal, making the keyboard status character non-fatal for them.
> >
> > This is implemented with the assumption that SIGINFO is defined
> > to be equivalent to SIGPWR; still, there is no reason for PWR to
> > result in termination of the signal recipient anyway — it does not
> > indicate there is a fatal problem with the recipient's execution
> > context (like e.g. FPE/ILL do), and we have TERM/KILL for explicit
> > termination requests.
>
> So this is a consequence of trying to overload SIGINFO with SIGPWR.

Pretty much.

> At least on some legacy Unix systems, the way SIGPWR worked was that
> init would broadcast SIGPWR to all userspace process (with system
> daemons on an exception list so they could get shutdown last).
> Applications which didn't have a SIGPWR handler registered, would just
> exit. Those that did, were expected to clean up in preparation with
> the impending shutdown. After some period of time, then processes
> would get hard killed and then system daemons would get shut down and
> the system would cleanly shut itself down.
>
> So SIGPWR acted much like SIGHUP, and that's why the default behavior
> was "terminate". Now, as far as I know, we've not actually used in
> that way, at least for most common distributions, but there is a sane
> reason why things are they way there are, and in there are people who
> have been using SIGPWR the way it had originally been intended, there
> is risk in overloading SIGINFO with SIGPWR --- in particular, typing
> ^T might cause some enterprise database to start shutting itself down. :-)

Only if that enterprise database:
1) has a controlling terminal (a strange condition for a daemon, IMHO)
2) that terminal has the status character set in the struct termios.

>
> (In particular it might be worth checking Linux ports of Oracle and
> DB2.)

A quick google got me this document:
https://docs.oracle.com/cd/B28359_01/server.111/b32009.pdf
It only ever mentions CHLD, CONT, INT, PIPE, TERM, URG and IO.

I have no real experience with neither DB2 nor Oracle DB, though, and no
way to get hold of that kind of software, so I don't know where to look
further. If we'd like to be really sure no one's hurt we'll need someone
with actual expertise here.


Attachments:
(No filename) (2.52 kB)
signature.asc (849.00 B)
Download all attachments

2019-07-29 10:57:44

by Arseny Maslennikov

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] linux/signal.h: Ignore SIGINFO by default in new tasks

On Tue, Jun 25, 2019 at 05:32:15PM -0400, Theodore Ts'o wrote:
> <...>
>
> (In particular it might be worth checking Linux ports of Oracle and
> DB2.)

A couple of weeks of asking around got me to have a look at a DB2 9.7
instance.

sn1:~ # uname -a
Linux sn1 2.6.16.60-0.54.5-ppc64 #1 SMP Fri Sep 4 01:28:03 UTC 2009 ppc64 ppc64 ppc64 GNU/Linux

That particular deployment used the following set of processes:

sn1:~ # ps -eo user,ppid,pid,pgid,sid,tty,state,cmd | grep db2 | grep -v java
bgpsysdb 8505 672 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,171fc0,0x110000000,0x110000000,1600000,18002,2,384f00b7
bgpsysdb 8505 687 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,191fc0,0x110000000,0x110000000,1600000,18002,2,384f80b9
bgpsysdb 8505 4309 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,151fc0,0x110000000,0x110000000,1600000,18002,2,6f5080c5
root 1 8505 8504 7146 ? S db2wdog 0
bgpsysdb 8505 8511 8511 7146 ? S db2sysc 0
root 8511 8512 8511 7146 ? S db2ckpwd 0
root 8511 8513 8511 7146 ? S db2ckpwd 0
root 8511 8514 8511 7146 ? S db2ckpwd 0
bgpsysdb 8505 8630 8504 7146 ? S db2acd 0 ,0,0,0,1,0,0,0,1,0,8a6678,14,1e014,2,0,1,11fc0,0x110000000,0x110000000,1600000,18002,2,398072
bgpsysdb 8505 15280 8504 7146 ? S db2fmp ( ,1,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,31fc0,0x110000000,0x110000000,1600000,18002,2,bd80bb
bgpsysdb 8505 16032 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,b1fc0,0x110000000,0x110000000,1600000,18002,2,81980ba
bgpsysdb 8505 16073 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,91fc0,0x110000000,0x110000000,1600000,18002,2,12300c0
bgpsysdb 8505 17683 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,131fc0,0x110000000,0x110000000,1600000,18002,2,248600db
bgpsysdb 8505 30731 8504 7146 ? S db2fmp ( ,0,0,0,0,0,0,0,1,0,8a6678,14,1e014,2,0,1,f1fc0,0x110000000,0x110000000,1600000,18002,2,370080d1

None of them has a controlling tty, so ^T pressed at any tty won't send
them a signal.

To get an idea on how do they handle signals, we can look at
/proc/*/status:

sn1:~ # cat /proc/{8512,8505,687,8511}/status | egrep '^(Pid|Sig|Shd)'
Pid: 687
SigQ: 0/128000
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: fffffffe2bbaf007
SigCgt: 00000001c44004f8
Pid: 8505
SigQ: 0/128000
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffbfeff
SigIgn: fffffffe2fbaf007
SigCgt: 00000001c0410ef8
Pid: 8511
SigQ: 0/128000
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffbfeff
SigIgn: fffffffe23b3c005
SigCgt: 00000001dc483efa
Pid: 8512
SigQ: 0/128000
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: fffffffe2fbbf007
SigCgt: 00000001c0400ef8

It can be seen from the above that SIGPWR in particular is always
ignored and sometimes blocked, which means its default disposition has
no effect.

This leads me to think that DB2 in particular would be unaffected by the
patch set.

p.s.: I do not have shell access to the machine, and never did; the commands
cited in this email were executed by a person who does, and the output
was handed back to me.


Attachments:
(No filename) (3.36 kB)
signature.asc (849.00 B)
Download all attachments

2019-07-29 10:58:20

by Arseny Maslennikov

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] TTY Keyboard Status Request

On Tue, Jun 25, 2019 at 07:11:46PM +0300, Arseny Maslennikov wrote:
> This patch series introduces TTY keyboard status request, a feature of
> the n_tty line discipline that reserves a character in struct termios
> (^T by default) and reacts to it by printing a short informational line
> to the terminal and sending a Unix signal to the tty's foreground
> process group. The processes may, in response to the signal, output a
> textual description of what they're doing.
>
> The feature has been present in a similar form at least in
> Free/Open/NetBSD; it would be nice to have something like this in Linux
> as well. There is an LKML thread[1] where users have previously
> expressed the rationale for this.
>
> The current implementation does not break existing kernel API in any
> way, since, fortunately, all the architectures supported by the kernel
> happen to have at least 1 free byte in the termios control character
> array.
>
> The series should cleanly apply to tty-next.
>
> To thoroughly test these, one might need at least a patched stty among
> other tools, so I've brought up a simple initrd generator[2] which can
> be used to create a lightweight environment to boot up in a VM and to
> fiddle with.
>
> [1] https://lore.kernel.org/lkml/1415200663.3247743.187387481.75CE9317@webmail.messagingengine.com/
> [2] https://github.com/porrided/tty-kb-status-userspace
>
> v2 <- v1: removed useless debugging bits.
>
> Discussion of v1:
> https://lore.kernel.org/lkml/[email protected]/
>
> Arseny Maslennikov (7):
> signal.h: Define SIGINFO on all architectures
> tty: termios: Reserve space for VSTATUS in .c_cc
> n_tty: Send SIGINFO to fg pgrp on status request character
> linux/signal.h: Ignore SIGINFO by default in new tasks
> tty: Add NOKERNINFO lflag to termios
> n_tty: ->ops->write: Cut core logic out to a separate function
> n_tty: Provide an informational line on VSTATUS receipt
>
> arch/alpha/include/asm/termios.h | 4 +-
> arch/alpha/include/uapi/asm/termbits.h | 2 +
> arch/arm/include/uapi/asm/signal.h | 1 +
> arch/h8300/include/uapi/asm/signal.h | 1 +
> arch/ia64/include/asm/termios.h | 4 +-
> arch/ia64/include/uapi/asm/signal.h | 1 +
> arch/ia64/include/uapi/asm/termbits.h | 2 +
> arch/m68k/include/uapi/asm/signal.h | 1 +
> arch/mips/include/asm/termios.h | 4 +-
> arch/mips/include/uapi/asm/signal.h | 1 +
> arch/mips/include/uapi/asm/termbits.h | 2 +
> arch/parisc/include/asm/termios.h | 4 +-
> arch/parisc/include/uapi/asm/signal.h | 1 +
> arch/parisc/include/uapi/asm/termbits.h | 2 +
> arch/powerpc/include/asm/termios.h | 4 +-
> arch/powerpc/include/uapi/asm/signal.h | 1 +
> arch/powerpc/include/uapi/asm/termbits.h | 2 +
> arch/s390/include/asm/termios.h | 4 +-
> arch/s390/include/uapi/asm/signal.h | 1 +
> arch/sparc/include/asm/termios.h | 4 +-
> arch/sparc/include/uapi/asm/signal.h | 2 +
> arch/sparc/include/uapi/asm/termbits.h | 2 +
> arch/x86/include/uapi/asm/signal.h | 1 +
> arch/xtensa/include/uapi/asm/signal.h | 1 +
> arch/xtensa/include/uapi/asm/termbits.h | 2 +
> drivers/tty/Makefile | 3 +-
> drivers/tty/n_tty.c | 70 ++++-
> drivers/tty/n_tty_status.c | 337 +++++++++++++++++++++++
> include/asm-generic/termios.h | 4 +-
> include/linux/sched.h | 7 +
> include/linux/signal.h | 5 +-
> include/linux/tty.h | 7 +-
> include/uapi/asm-generic/signal.h | 1 +
> include/uapi/asm-generic/termbits.h | 2 +
> 34 files changed, 457 insertions(+), 33 deletions(-)
> create mode 100644 drivers/tty/n_tty_status.c
>

> Date: Tue, 25 Jun 2019 19:11:46 +0300
Ping.

The series should cleanly apply to 5.3-rc2 and to tty-next as of writing
this email.


Attachments:
(No filename) (3.99 kB)
signature.asc (849.00 B)
Download all attachments

2019-07-30 18:05:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

On Tue, Jun 25, 2019 at 07:11:53PM +0300, Arseny Maslennikov wrote:
> If the three termios local flags isig, icanon, iexten are enabled
> and the local flag nokerninfo is disabled for a tty governed
> by the n_tty line discipline, then on receiving the keyboard status
> character n_tty will generate a status message and write it out to
> the tty before sending SIGINFO to the tty's foreground process group.
>
> This kerninfo line contains information about the current system load
> as well as some properties of "the most interesting" process in the
> tty's current foreground process group, namely:
> - its PID as seen inside its deepest PID namespace;
> * the whole process group ought to be in a single PID namespace,
> so this is actually deterministic
> - its saved command name truncated to 16 bytes (task_struct::comm);
> * at the time of writing TASK_COMM_LEN == 16
> - its state and some related bits, procps-style;
> - for S and D: its symbolic wait channel, if available; or a short
> description for other process states instead;
> - its user, system and real rusage time values;
> - its resident set size (as well as the high watermark) in kilobytes.

Why is this really all needed as we have the SysRq handlers that report
all of this today?

> The "most interesting" process is chosen as follows:
> - runnables over everything
> - uninterruptibles over everything else
> - among 2 runnables pick the biggest utime + stime
> - any unresolved ties are decided in favour of greatest PID.

This does not feel like something that the tty core code should be doing
at all.

> While the kerninfo line is not very useful for debugging the kernel
> itself, since we have much more powerful debugging tools, it still gives
> the user behind the terminal some meaningful feedback to a VSTATUS that
> works even if no processes respond.

That's what SysRq is for. If there's a specific set of values that we
don't currently report in that facility, why not just add the
information there? It's much simpler and "safer" that way.

thanks,

greg k-h

2019-07-31 22:25:43

by Arseny Maslennikov

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

On Tue, Jul 30, 2019 at 06:19:40PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 25, 2019 at 07:11:53PM +0300, Arseny Maslennikov wrote:
> > If the three termios local flags isig, icanon, iexten are enabled
> > and the local flag nokerninfo is disabled for a tty governed
> > by the n_tty line discipline, then on receiving the keyboard status
> > character n_tty will generate a status message and write it out to
> > the tty before sending SIGINFO to the tty's foreground process group.
> >
> > This kerninfo line contains information about the current system load
> > as well as some properties of "the most interesting" process in the
> > tty's current foreground process group, namely:
> > - its PID as seen inside its deepest PID namespace;
> > * the whole process group ought to be in a single PID namespace,
> > so this is actually deterministic
> > - its saved command name truncated to 16 bytes (task_struct::comm);
> > * at the time of writing TASK_COMM_LEN == 16
> > - its state and some related bits, procps-style;
> > - for S and D: its symbolic wait channel, if available; or a short
> > description for other process states instead;
> > - its user, system and real rusage time values;
> > - its resident set size (as well as the high watermark) in kilobytes.
>
> Why is this really all needed as we have the SysRq handlers that report
> all of this today?

Different use-cases have different needs; SysRq is targeted at a different
audience; see below.

> > The "most interesting" process is chosen as follows:
> > - runnables over everything
> > - uninterruptibles over everything else
> > - among 2 runnables pick the biggest utime + stime
> > - any unresolved ties are decided in favour of greatest PID.
>
> This does not feel like something that the tty core code should be doing
> at all.

Yes, this selection part is quite clumsy. In defense of it, one could
argue that we already have the whole n_tty implemented in kernel-space.

One way we could get rid of this is to display a summarized statistic
for the whole pgrp: pgid, oldest real time, cumulative utime and stime,
cumulative memory usage. Would this be more acceptable? Are there any
other ideas?

> > While the kerninfo line is not very useful for debugging the kernel
> > itself, since we have much more powerful debugging tools, it still gives
> > the user behind the terminal some meaningful feedback to a VSTATUS that
> > works even if no processes respond.
>
> That's what SysRq is for. If there's a specific set of values that we
> don't currently report in that facility, why not just add the
> information there? It's much simpler and "safer" that way.

SysRq is intended for the person either administrating the system to be used in
emergency (e.g. f for the oom kill, the famous s-u-b combo also comes to
mind) or debugging the kernel, and it indeed does a much better job for
those purposes. In both use-cases mentioned the person has access to
the system console, where the sysrq button handlers produce all their
output, if any, and to either a physical keyboard / serial console or to
/proc/sysrq-trigger, whose mode is 0200 (writable by uid 0 only).

The use-case for this is different: the ^T-line as proposed by this
patch is for the user that interacts with a system through a terminal, who
wants to be informed not about the whole system (sort of what SysRq-t
tells you), but about what they run on that particular tty.
This is much less about "why does my system/kernel seem to hang?" or
exposing low-level internals (registers, hrtimers, locks, ...), and more
about "is my SSH terminal session unresponsive?" and "I ran a command,
it doesn't finish, how's it doing?".
e.g. A user might want to know if their SSH connection is alive without
interrupting anything, while having no access both to SysRq and console,
and no one in fg pgrp actually handles SIGINFO.

SysRq is system-wide, whereas this is per-terminal and only cares about
one tty which the status char is pressed at and its foreground pgrp
(most likely it's the foreground shell job).

I hope this is clear enough.


Attachments:
(No filename) (4.09 kB)
signature.asc (849.00 B)
Download all attachments

2019-08-01 09:55:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

On Thu, Aug 01, 2019 at 01:23:59AM +0300, Arseny Maslennikov wrote:
> On Tue, Jul 30, 2019 at 06:19:40PM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jun 25, 2019 at 07:11:53PM +0300, Arseny Maslennikov wrote:
> > > If the three termios local flags isig, icanon, iexten are enabled
> > > and the local flag nokerninfo is disabled for a tty governed
> > > by the n_tty line discipline, then on receiving the keyboard status
> > > character n_tty will generate a status message and write it out to
> > > the tty before sending SIGINFO to the tty's foreground process group.
> > >
> > > This kerninfo line contains information about the current system load
> > > as well as some properties of "the most interesting" process in the
> > > tty's current foreground process group, namely:
> > > - its PID as seen inside its deepest PID namespace;
> > > * the whole process group ought to be in a single PID namespace,
> > > so this is actually deterministic
> > > - its saved command name truncated to 16 bytes (task_struct::comm);
> > > * at the time of writing TASK_COMM_LEN == 16
> > > - its state and some related bits, procps-style;
> > > - for S and D: its symbolic wait channel, if available; or a short
> > > description for other process states instead;
> > > - its user, system and real rusage time values;
> > > - its resident set size (as well as the high watermark) in kilobytes.
> >
> > Why is this really all needed as we have the SysRq handlers that report
> > all of this today?
>
> Different use-cases have different needs; SysRq is targeted at a different
> audience; see below.
>
> > > The "most interesting" process is chosen as follows:
> > > - runnables over everything
> > > - uninterruptibles over everything else
> > > - among 2 runnables pick the biggest utime + stime
> > > - any unresolved ties are decided in favour of greatest PID.
> >
> > This does not feel like something that the tty core code should be doing
> > at all.
>
> Yes, this selection part is quite clumsy. In defense of it, one could
> argue that we already have the whole n_tty implemented in kernel-space.

One could argue that :)

> One way we could get rid of this is to display a summarized statistic
> for the whole pgrp: pgid, oldest real time, cumulative utime and stime,
> cumulative memory usage. Would this be more acceptable? Are there any
> other ideas?

Given that I really think you are just making something up here that no
one really is needing for their workflow, but would just be "cool to
have", I say do whatever you think is right.

And there is nothing wrong with "cool to have" things, I'm not trying to
dismiss this, it's just that all new features come with the "you must
support this until the end of time" requirement that we must balance it
with.

> > > While the kerninfo line is not very useful for debugging the kernel
> > > itself, since we have much more powerful debugging tools, it still gives
> > > the user behind the terminal some meaningful feedback to a VSTATUS that
> > > works even if no processes respond.
> >
> > That's what SysRq is for. If there's a specific set of values that we
> > don't currently report in that facility, why not just add the
> > information there? It's much simpler and "safer" that way.
>
> SysRq is intended for the person either administrating the system to be used in
> emergency (e.g. f for the oom kill, the famous s-u-b combo also comes to
> mind) or debugging the kernel, and it indeed does a much better job for
> those purposes. In both use-cases mentioned the person has access to
> the system console, where the sysrq button handlers produce all their
> output, if any, and to either a physical keyboard / serial console or to
> /proc/sysrq-trigger, whose mode is 0200 (writable by uid 0 only).
>
> The use-case for this is different: the ^T-line as proposed by this
> patch is for the user that interacts with a system through a terminal, who
> wants to be informed not about the whole system (sort of what SysRq-t
> tells you), but about what they run on that particular tty.

Ok, fair enough, although if you just add a new sysrq option for "what
is running on this tty", would that help resolve this?

> This is much less about "why does my system/kernel seem to hang?" or
> exposing low-level internals (registers, hrtimers, locks, ...), and more
> about "is my SSH terminal session unresponsive?" and "I ran a command,
> it doesn't finish, how's it doing?".
> e.g. A user might want to know if their SSH connection is alive without
> interrupting anything, while having no access both to SysRq and console,
> and no one in fg pgrp actually handles SIGINFO.

If you have access to a tty, you should have access to sysrq, right?

> SysRq is system-wide, whereas this is per-terminal and only cares about
> one tty which the status char is pressed at and its foreground pgrp
> (most likely it's the foreground shell job).
>
> I hope this is clear enough.

It is, yes. My big objection is the crazy code I point out above, as
well as the "create a totally new interface when we might be able to use
an existing one" that you need to convince me is really required :)

thanks,

greg k-h

2019-08-01 10:11:18

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

Hi!

> > The use-case for this is different: the ^T-line as proposed by this
> > patch is for the user that interacts with a system through a terminal, who
> > wants to be informed not about the whole system (sort of what SysRq-t
> > tells you), but about what they run on that particular tty.
>
> Ok, fair enough, although if you just add a new sysrq option for "what
> is running on this tty", would that help resolve this?

This is meant for unpriviledged users, unlike sysrq.

> > This is much less about "why does my system/kernel seem to hang?" or
> > exposing low-level internals (registers, hrtimers, locks, ...), and more
> > about "is my SSH terminal session unresponsive?" and "I ran a command,
> > it doesn't finish, how's it doing?".
> > e.g. A user might want to know if their SSH connection is alive without
> > interrupting anything, while having no access both to SysRq and console,
> > and no one in fg pgrp actually handles SIGINFO.
>
> If you have access to a tty, you should have access to sysrq, right?

No. This is supposed to work over ssh. SysRq is not supposed to work
over ssh; that would be a security hole.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.30 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2019-08-01 12:34:22

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

On 7/30/19 11:19 AM, Greg Kroah-Hartman wrote:
> On Tue, Jun 25, 2019 at 07:11:53PM +0300, Arseny Maslennikov wrote:
>> If the three termios local flags isig, icanon, iexten are enabled
>> and the local flag nokerninfo is disabled for a tty governed
>> by the n_tty line discipline, then on receiving the keyboard status
>> character n_tty will generate a status message and write it out to
>> the tty before sending SIGINFO to the tty's foreground process group.
>>
>> This kerninfo line contains information about the current system load
>> as well as some properties of "the most interesting" process in the
>> tty's current foreground process group, namely:
>> - its PID as seen inside its deepest PID namespace;
>> * the whole process group ought to be in a single PID namespace,
>> so this is actually deterministic
>> - its saved command name truncated to 16 bytes (task_struct::comm);
>> * at the time of writing TASK_COMM_LEN == 16
>> - its state and some related bits, procps-style;
>> - for S and D: its symbolic wait channel, if available; or a short
>> description for other process states instead;
>> - its user, system and real rusage time values;
>> - its resident set size (as well as the high watermark) in kilobytes.
>
> Why is this really all needed as we have the SysRq handlers that report
> all of this today?

People were lamenting the lack of siginfo in linux back in May, I offered to try
to implement it, several people jumped in to offer suggestions, and it turns out
you can't really do it without kernel support.

https://twitter.com/landley/status/1131764323196522498

>> The "most interesting" process is chosen as follows:
>> - runnables over everything
>> - uninterruptibles over everything else
>> - among 2 runnables pick the biggest utime + stime
>> - any unresolved ties are decided in favour of greatest PID.
>
> This does not feel like something that the tty core code should be doing
> at all.

I couldn't figure out how to do it without kernel support when I tried.

http://lists.landley.net/pipermail/toybox-landley.net/2019-May/010461.html

Rob

2019-08-01 12:45:05

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

On 8/1/19 4:20 AM, Greg Kroah-Hartman wrote:
>> SysRq is system-wide, whereas this is per-terminal and only cares about
>> one tty which the status char is pressed at and its foreground pgrp
>> (most likely it's the foreground shell job).
>>
>> I hope this is clear enough.
>
> It is, yes. My big objection is the crazy code I point out above, as
> well as the "create a totally new interface when we might be able to use
> an existing one" that you need to convince me is really required :)

It's not a new interface, it's a multiple decades old BSD interface our
tcgetattr man page already mentions, which seems to be one of the big things BSD
people miss when using Linux, and which I tried and failed to implement without
kernel support months ago.

I wasn't involved in this kernel patch effort, I got pointed at news coverage
about it by the Android Bionic maintainer:

http://lists.landley.net/pipermail/toybox-landley.net/2019-June/010536.html

Which is how I wound up cc'd on this thread.

I don't think Android specifically cares about SIGINFO, but they're trying to
support building Android on MacOSX, which means trying to support building it on
FreeBSD, which involves outreach to the BSD community, and they brought up the
lack of ctrl-T and siginfo as a thing they really missed when having to deal
with the Linux command line.

(The fact there _was_ news coverage of the patch for somebody to point me at may
also be an indication of interest floating around out there...)

Rob

2019-08-02 21:58:00

by Arseny Maslennikov

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

Sorry for the delay, yesterday was rough.

On Thu, Aug 01, 2019 at 11:20:20AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Aug 01, 2019 at 01:23:59AM +0300, Arseny Maslennikov wrote:
> > On Tue, Jul 30, 2019 at 06:19:40PM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Jun 25, 2019 at 07:11:53PM +0300, Arseny Maslennikov wrote:
> > > > If the three termios local flags isig, icanon, iexten are enabled
> > > > and the local flag nokerninfo is disabled for a tty governed
> > > > by the n_tty line discipline, then on receiving the keyboard status
> > > > character n_tty will generate a status message and write it out to
> > > > the tty before sending SIGINFO to the tty's foreground process group.
> > > >
> > > > <...>
> > >
> > > Why is this really all needed as we have the SysRq handlers that report
> > > all of this today?
> >
> > Different use-cases have different needs; SysRq is targeted at a different
> > audience; see below.
> >
> > > > The "most interesting" process is chosen as follows:
> > > > - runnables over everything
> > > > - uninterruptibles over everything else
> > > > - among 2 runnables pick the biggest utime + stime
> > > > - any unresolved ties are decided in favour of greatest PID.
> > >
> > > This does not feel like something that the tty core code should be doing
> > > at all.
> >
> > Yes, this selection part is quite clumsy. In defense of it, one could
> > argue that we already have the whole n_tty implemented in kernel-space.
>
> One could argue that :)
>
> > One way we could get rid of this is to display a summarized statistic
> > for the whole pgrp: pgid, oldest real time, cumulative utime and stime,
> > cumulative memory usage. Would this be more acceptable? Are there any
> > other ideas?
>
> Given that I really think you are just making something up here that no
> one really is needing for their workflow, but would just be "cool to
> have", I say do whatever you think is right.
>
> And there is nothing wrong with "cool to have" things, I'm not trying to
> dismiss this, it's just that all new features come with the "you must
> support this until the end of time" requirement that we must balance it
> with.
>

To be fair, the kerninfo line itself is intended for the user's eyes; it
can't even be programmatically read by anyone in a Linux system except
pty masters, so I don't feel it has to be particularly stable, since
changing its contents won't break anyone.
So IMO we could still tweak something about it after we merge this.
Of course this cannot be said about the signal part of the patch series,
which interfaces with processes and is a proper part of the outward
kernel API, thus subject to the kernel's policy.

> > > > While the kerninfo line is not very useful for debugging the kernel
> > > > itself, since we have much more powerful debugging tools, it still gives
> > > > the user behind the terminal some meaningful feedback to a VSTATUS that
> > > > works even if no processes respond.
> > >
> > > That's what SysRq is for. If there's a specific set of values that we
> > > don't currently report in that facility, why not just add the
> > > information there? It's much simpler and "safer" that way.
> >
> > SysRq is intended for the person either administrating the system to be used in
> > emergency (e.g. f for the oom kill, the famous s-u-b combo also comes to
> > mind) or debugging the kernel, and it indeed does a much better job for
> > those purposes. In both use-cases mentioned the person has access to
> > the system console, where the sysrq button handlers produce all their
> > output, if any, and to either a physical keyboard / serial console or to
> > /proc/sysrq-trigger, whose mode is 0200 (writable by uid 0 only).
> >
> > The use-case for this is different: the ^T-line as proposed by this
> > patch is for the user that interacts with a system through a terminal, who
> > wants to be informed not about the whole system (sort of what SysRq-t
> > tells you), but about what they run on that particular tty.
>
> Ok, fair enough, although if you just add a new sysrq option for "what
> is running on this tty", would that help resolve this?
>

I see at least two problems here:
1) sysrq is not accessible by unprivileged users (and here goes the
whole idea of sysrq being designed for different purposes, etc.)
2) Even if we fix it and somehow allow unpriv users to do the equivalent
of `echo $letter > /proc/sysrq-trigger' for a particular letter,
there still are UI issues. (yes, UI issues are better dealt with
outside the kernel, but that's not the point) It still has to dump
info to the particular tty (let's remember, console is unaccessible
and shared to the whole system), that is a novelty for sysrq. There
still is the open question about how to make a VSTATUS press trigger
that sysrq button in addition to signalling fg pgrp without shoving
this bit into every other userspace program.

To sum up, this looks worse to me than integrating with n_tty.

In general, the main appeal of the feature is in it being activated by a
single keypress handled by the line discipline, from any tty.

> > This is much less about "why does my system/kernel seem to hang?" or
> > exposing low-level internals (registers, hrtimers, locks, ...), and more
> > about "is my SSH terminal session unresponsive?" and "I ran a command,
> > it doesn't finish, how's it doing?".
> > e.g. A user might want to know if their SSH connection is alive without
> > interrupting anything, while having no access both to SysRq and console,
> > and no one in fg pgrp actually handles SIGINFO.
>
> If you have access to a tty, you should have access to sysrq, right?
>

No.
SSH gives you access to a pty, which doesn't seem to handle BREAK; the
BREAK SysRq activation method does not work there.
As Pavel correctly noted, if the user is unprivileged, they can't use
/proc/sysrq-trigger as well.

> > SysRq is system-wide, whereas this is per-terminal and only cares about
> > one tty which the status char is pressed at and its foreground pgrp
> > (most likely it's the foreground shell job).
> >
> > I hope this is clear enough.
>
> It is, yes. My big objection is the crazy code I point out above, as
> well as the "create a totally new interface when we might be able to use
> an existing one" that you need to convince me is really required :)
>

The signal part (patches [1-4]/7) is not really new, since we've had
Unix-like signals for ages.
(You've probably meant the [5-7]/7, but there's no such thing as a too
detailed explanation).
The compelling reasons for a new signal are:
- it provides asynchronous process notification, as opposed to e. g. a
POLLIN on a file descriptor, which is not noticed by the recipient
until they poll(2);
- to make use of the mechanism, an existing app does not have to be
rewritten in a certain way — a signal handler just gets called, if the
signal is not blocked — so it's easy to use from applications.
- it aligns well with the current practice of the line discipline
relaying user requests to processes via signals.

As for the line: before I started writing the patch I thought a lot
about how we could move the status line logic away from n_tty and, when
^T is pressed, have it notify someone in userspace, who would look up
all the pieces in /proc and write() the line to the tty. Obviously this
approach should be much more flexible and easier to maintain from the
kernel perspective. One candidate for this could actually be the tty's
session leader (most often that's the interactive shell).

To be convenient, the feature needs to have at least the following
qualities:
- the status line has to come first, since it's the only invariant; this
way it's easy to visually find if needed and just as easy to ignore
otherwise;
- the reaction to VSTATUS has to be as responsive as it can reasonably
be and as the system load allows. (btw, SysRq does well in this
regard, but the drawbacks outweigh this, see above)
The only way we can secure the first point is to signal a process, wait
for it to get scheduled, wait a little more for it to compose and send
output, and only then to signal the pgrp and be done. This is
potentially slow and might lock up the tty's input receive queue for
sizable amounts of time. What if the session leader opts-in to do this
and then ignores this kind of request / never responds altogether? This
gets messy fast. So we can't guarantee the second point.
If we focus on responsiveness, we can't wait for processes at all, so we
ping them and be done with it. This means no output ordering guarantees,
it all gets interleaved and, consequently, less readable and less useful
overall.
Further tweaking the protocol gives the risk of making it too complicated.

So I gave up on the idea and went with this patch, which ticks both boxes
mentioned above. Of course, I'd love to have an ideal solution, and a
pony too. :)

(Or perhaps I'm thinking inside the box and missing a clue and someone
reading this does not. This is a public list, anyway)


Attachments:
(No filename) (9.04 kB)
signature.asc (849.00 B)
Download all attachments