2010-11-03 01:29:56

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 00/12] netoops support

This patchset applies to v2.6.36.

The following series implements support for 'netoops', a simple driver that
will deliver kmsg logs together with machine specifics over the network.

This driver is based on code used in Google's production server environment.
We internally call the driver 'netdump', but are planning on changing the name
to 'netoops' to follow the convention set by both the mtdoops and ramoops
drivers. We use these facilities to gather crash data from our entire fleet of
machines in a light-weight manner. We things this way because it simply isn't
feasible to gather full crash data off of every machine in the wild that
decides it is time to die.

Currently, this driver only supports udp over ipv4. When configured with an
IPv4 address, netoops will initiate an ARP and hold a reference to the
associated neighbor. It uses this information when invoked via kmsg_dump() to
fire off UDP packets via the netpoll interface.

I'm posting these patches in an effort to eventually get this sort of
functionality mainlined. I have tried to clean this code up internally, but
there are still several unresolved issues that would need to be worked out. In
particular:

* I am _NOT_ happy with the userland ABIs presented in this patchset. They
were cobbled together by a variety of engineers over the years, and they
aren't very pretty. I present them none-the-less to express the scope of
the functionality that we would like to maintain.

* I am _NOT_ happy with the data format of the transmitted packets. It is
very specific to our server environment and currently:

* is hard-coded to support both userland provided information (that may
not be applicable to others) and

* only supports i386 and x86_64.

* UDP Port numbers are currently hard-coded.

* I am uncertain if the handling of the neighbor's validity is correct,

* The netdev event handling is probably incomplete to handle reloading of
the neighbor.

I'd like to resolve each of the above issues in subsequent versions of this
patchset. I need help in identifying what the ABI should look like in
particular.


Patchset summary
================

1 - Pass pt_regs on to oops_exit()
2 - Let kmsg_dumpers get access to pt_regs if available
3 - Introduce KMSG_DUMP_SOFT
4 - Add a sysrq trigger that calls kmsg_dump(KMSG_DUMP_SOFT) (USER ABI)
5 - Netoops core driver. (USER ABI)
6 - Addition of x86 specific bits to the network packets (NETWORK ABI)
7 - Addition of userland specific bits to the network packets (USER AND
NETWORK ABI)
8 - Support for "one-shot" dumping (USER ABI)
9 - An userland interface for triggering crashes (USER ABI)
10 - EXPORT_SYMBOL_GPL(kmsg_dump)
11 - EXPORT_SYMBOL_GPL(arp_bind_neighbor)
12 - Allow netoops to be compiled as a module


2010-11-03 01:30:14

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 01/12] Oops: Pass regs to oops_exit()

Later commits in this series will want to see pt_regs if available when a
machine oopses. Pass regs if available on to oops_exit().

Signed-off-by: Mike Waychison <[email protected]>
---
arch/arm/kernel/traps.c | 2 +-
arch/parisc/kernel/traps.c | 2 +-
arch/powerpc/kernel/traps.c | 2 +-
arch/s390/kernel/traps.c | 2 +-
arch/sh/kernel/traps_32.c | 2 +-
arch/x86/kernel/dumpstack.c | 2 +-
include/linux/kernel.h | 2 +-
kernel/panic.c | 2 +-
8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index cda78d5..45cc7c0 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -280,7 +280,7 @@ void die(const char *str, struct pt_regs *regs, int err)
bust_spinlocks(0);
add_taint(TAINT_DIE);
spin_unlock_irq(&die_lock);
- oops_exit();
+ oops_exit(regs);

if (in_interrupt())
panic("Fatal exception in interrupt");
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index 8b58bf0..4aa5514 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -294,7 +294,7 @@ void die_if_kernel(char *str, struct pt_regs *regs, long err)
panic("Fatal exception");
}

- oops_exit();
+ oops_exit(regs);
do_exit(SIGSEGV);
}

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a45a63c..918c498 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -170,7 +170,7 @@ int die(const char *str, struct pt_regs *regs, long err)
if (panic_on_oops)
panic("Fatal exception");

- oops_exit();
+ oops_exit(regs);
do_exit(err);

return 0;
diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c
index 5d8f0f3..90c8259 100644
--- a/arch/s390/kernel/traps.c
+++ b/arch/s390/kernel/traps.c
@@ -305,7 +305,7 @@ void die(const char * str, struct pt_regs * regs, long err)
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception: panic_on_oops");
- oops_exit();
+ oops_exit(regs);
do_exit(SIGSEGV);
}

diff --git a/arch/sh/kernel/traps_32.c b/arch/sh/kernel/traps_32.c
index c3d86fa..6b3f31e 100644
--- a/arch/sh/kernel/traps_32.c
+++ b/arch/sh/kernel/traps_32.c
@@ -102,7 +102,7 @@ void die(const char * str, struct pt_regs * regs, long err)
bust_spinlocks(0);
add_taint(TAINT_DIE);
spin_unlock_irq(&die_lock);
- oops_exit();
+ oops_exit(regs);

if (kexec_should_crash(current))
crash_kexec(regs);
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6e8752c..89c720c 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -254,7 +254,7 @@ void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
/* Nest count reaches zero, release the lock. */
arch_spin_unlock(&die_lock);
raw_local_irq_restore(flags);
- oops_exit();
+ oops_exit(regs);

if (!signr)
return;
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2b0a35e..995eb70 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -181,7 +181,7 @@ extern long (*panic_blink)(int state);
NORET_TYPE void panic(const char * fmt, ...)
__attribute__ ((NORET_AND format (printf, 1, 2))) __cold;
extern void oops_enter(void);
-extern void oops_exit(void);
+extern void oops_exit(struct pt_regs *);
void print_oops_end_marker(void);
extern int oops_may_print(void);
NORET_TYPE void do_exit(long error_code)
diff --git a/kernel/panic.c b/kernel/panic.c
index 4c13b1a..c3f39cd 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -349,7 +349,7 @@ void print_oops_end_marker(void)
* Called when the architecture exits its oops handler, after printing
* everything.
*/
-void oops_exit(void)
+void oops_exit(struct pt_regs *regs)
{
do_oops_enter_exit();
print_oops_end_marker();

2010-11-03 01:30:18

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 04/12] sys-rq: Add option to soft dump

It is very useful to provide some means to force the kernel logs to make it out
via the kmsg_oops implementations on the console. Add a new option 'Y' to
sysrq to allow dumping of logs to kmsg_dumper drivers.

Signed-off-by: Mike Waychison <[email protected]>
---
drivers/char/sysrq.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/char/sysrq.c b/drivers/char/sysrq.c
index ef31bb8..888efc5 100644
--- a/drivers/char/sysrq.c
+++ b/drivers/char/sysrq.c
@@ -41,6 +41,7 @@
#include <linux/oom.h>
#include <linux/slab.h>
#include <linux/input.h>
+#include <linux/kmsg_dump.h>

#include <asm/ptrace.h>
#include <asm/irq_regs.h>
@@ -395,6 +396,17 @@ static struct sysrq_key_op sysrq_unrt_op = {
.enable_mask = SYSRQ_ENABLE_RTNICE,
};

+static void sysrq_handle_softdump(int key)
+{
+ kmsg_dump(KMSG_DUMP_SOFT, NULL);
+}
+static struct sysrq_key_op sysrq_softdump_op = {
+ .handler = sysrq_handle_softdump,
+ .help_msg = "soft-dump(Y)",
+ .action_msg = "Trigger a soft dump",
+ .enable_mask = SYSRQ_ENABLE_DUMP,
+};
+
/* Key Operations table and lock */
static DEFINE_SPINLOCK(sysrq_key_table_lock);

@@ -451,7 +463,7 @@ static struct sysrq_key_op *sysrq_key_table[36] = {
/* x: May be registered on ppc/powerpc for xmon */
NULL, /* x */
/* y: May be registered on sparc64 for global register dump */
- NULL, /* y */
+ &sysrq_softdump_op, /* y */
&sysrq_ftrace_dump_op, /* z */
};

2010-11-03 01:30:22

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 03/12] kmsg_dumper: Introduce a new 'SOFT' dump reason

It is a useful to be able to exercise kmsg_dumper implementations without
requiring a kernel oops or panic. This commit adds a new reason called
KMSG_DUMP_SOFT, which signifies that the system isn't really going down.

This logic is used in a later commit that introduces the netoops driver.

Signed-off-by: Mike Waychison <[email protected]>
---

It is also possible that we not introduce KMSG_DUMP_SOFT, and simply overload
the existing KMSG_DUMP_OOPS reason, but I figured that this would be cleaner.

TODO: Make sure mtdoops and ramoops do something useful with this flag?
---
include/linux/kmsg_dump.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index a229acc..0abc2d7 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -20,6 +20,7 @@ enum kmsg_dump_reason {
KMSG_DUMP_OOPS,
KMSG_DUMP_PANIC,
KMSG_DUMP_KEXEC,
+ KMSG_DUMP_SOFT,
};

/**

2010-11-03 01:30:35

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 08/12] netoops: Add one-shot mode

Sometimes it is possible to have a kernel crashing that continuously Oopses.
In this case, we do not want the network dumper get called over and over again,
as we may only be interested in the first Oops that a kernel emits (especially
in cases where we panic_on_oops).

In order to support this dumping policy, this commit introduces a file call
/proc/sys/kernel/net_dump_one_shot that contains a boolean as to whether or not
the netoops driver will operate in one-shot mode.

Note that 'soft' dumps do not disable the netoops driver when one-shot mode is
enabled.

Signed-off-by: Mike Waychison <[email protected]>
---
TODO: ABI needs a better home.
---
drivers/net/netoops.c | 16 +++++++++++++++-
1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/net/netoops.c b/drivers/net/netoops.c
index 13c4d51..01432d7 100644
--- a/drivers/net/netoops.c
+++ b/drivers/net/netoops.c
@@ -86,6 +86,8 @@ static struct netpoll *np;
static struct neighbour *netoops_neighbour;
static DEFINE_SPINLOCK(netoops_lock);
static struct netoops_msg msg;
+static int network_dumper_one_shot = 1;
+static int network_dumper_disabled;

static char netoops_fw_version[80];
static char netoops_board_name[80];
@@ -209,6 +211,7 @@ static void enable_netoops(char *dev_name, u32 dest_ip, u32 src_ip,
np = new_np;
neighbour = netoops_neighbour;
netoops_neighbour = dest_neighbour;
+ network_dumper_disabled = 0;
spin_unlock_irqrestore(&netoops_lock, flags);

/* refcount cannot be decreased with interrupt disabled. */
@@ -415,7 +418,7 @@ static void netoops(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
if (!spin_trylock(&netoops_lock))
return;

- if (np == NULL || np->dev == NULL) {
+ if (np == NULL || np->dev == NULL || network_dumper_disabled) {
spin_unlock(&netoops_lock);
return;
}
@@ -442,6 +445,10 @@ static void netoops(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
netoops_send_segment(packet_count_2, s1, l1);
}

+ /* Should we disable ourselves now? */
+ if (!soft_dump && network_dumper_one_shot)
+ network_dumper_disabled = 1;
+
spin_unlock(&netoops_lock);
}

@@ -577,6 +584,13 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = &proc_netoops_device,
},
+ {
+ .procname = "net_dump_one_shot",
+ .data = &network_dumper_one_shot,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
{},
};

2010-11-03 01:30:41

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 09/12] netoops: Add an interface to trigger various types of crashes.

It is very useful to be able to test the crash path, but in order to do so, we
need to expose various ways to crash the kernel in a deterministic fashion.

This commit adds a file, /proc/sys/kernel/net_dump_now that takes various
tokens that will crash the kernel in various ways.

Signed-off-by: Mike Waychison <[email protected]>
---

This trigger is probably better done using an ABI that is much more generic.
It could perhaps be an extension of /proc/sysrq-trigger, though it would be
nice to have the 'types' of crashes enumerated in a programmable manner somehow
to know if they are available or not.
---
drivers/net/netoops.c | 31 +++++++++++++++++++++++++++++++
1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netoops.c b/drivers/net/netoops.c
index 01432d7..87b2122 100644
--- a/drivers/net/netoops.c
+++ b/drivers/net/netoops.c
@@ -578,6 +578,32 @@ static int proc_netoops_device(struct ctl_table *table, int write,
return 0;
}

+static int proc_netoops_now(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ if (write) {
+ char magic[20];
+ /* just crash in kernel mode. */
+ if (copy_from_user(magic, buffer, min(*lenp, sizeof(magic))))
+ return -EFAULT;
+ magic[min(*lenp, sizeof(magic))-1] = 0;
+ if (!strcmp(magic, "elgooG")) {
+ /* Test a simple crash */
+ *(unsigned long *)0 = 0;
+ } else if (!strcmp(magic, "guB")) {
+ /* Test the BUG() handler */
+ BUG();
+ } else if (!strcmp(magic, "cinaP")) {
+ panic("Testing panic");
+ } else if (!strcmp(magic, "pmuD")) {
+ kmsg_dump(KMSG_DUMP_SOFT, NULL);
+ }
+ return 0;
+ }
+ *lenp = 0;
+ return 0;
+}
+
static struct ctl_table kern_table[] = {
{
.procname = "net_dump_device",
@@ -585,6 +611,11 @@ static struct ctl_table kern_table[] = {
.proc_handler = &proc_netoops_device,
},
{
+ .procname = "net_dump_now",
+ .mode = 0600,
+ .proc_handler = &proc_netoops_now,
+ },
+ {
.procname = "net_dump_one_shot",
.data = &network_dumper_one_shot,
.maxlen = sizeof(int),

2010-11-03 01:30:51

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 11/12] ipv4: Export arp_bind_neighbour() symbol to GPL modules

The netoops driver would like to use this function to do do ARP resolution when
being configured with an IPV4 address.

Signed-off-by: Mike Waychison <[email protected]>
---
TODO: Is this really the right interface to get an L2 address?
---
net/ipv4/arp.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 96c1955..97b488c 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -528,6 +528,7 @@ int arp_bind_neighbour(struct dst_entry *dst)
}
return 0;
}
+EXPORT_SYMBOL_GPL(arp_bind_neighbour);

/*
* Check if we can use proxy ARP for this path

2010-11-03 01:30:53

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 10/12] kmsg_dump: Export symbol kmsg_dump() to GPL modules

The netoops driver would like to call this function to initiate software
induced kernel dumps.

Signed-off-by: Mike Waychison <[email protected]>
---

This patch can go away if we have a good way to induce a 'soft' kmsg_dump in
the core kernel.
---
kernel/printk.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index ad0c74e..2aa0a28 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1572,4 +1572,5 @@ void kmsg_dump(enum kmsg_dump_reason reason, struct pt_regs *pt_regs)
dumper->dump(dumper, reason, pt_regs, s1, l1, s2, l2);
spin_unlock_irqrestore(&dump_list_lock, flags);
}
+EXPORT_SYMBOL_GPL(kmsg_dump);
#endif

2010-11-03 01:31:08

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 12/12] netoops: Allow the driver to be built as a module

Now that the netoops driver is completely decoupled from the rest of the
kernel, it can be compiled as a module.

Signed-off-by: Mike Waychison <[email protected]>
---
drivers/net/Kconfig | 2 +-
drivers/net/netoops.c | 2 ++
2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 4dc53d4..c50411b 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3321,7 +3321,7 @@ config NETCONSOLE_DYNAMIC
See <file:Documentation/networking/netconsole.txt> for details.

config NETOOPS
- bool "Network oops support"
+ tristate "Network oops support"
depends on PROC_FS
help
This option enables the ability to have the kernel logs emitted on
diff --git a/drivers/net/netoops.c b/drivers/net/netoops.c
index 87b2122..3c6d592 100644
--- a/drivers/net/netoops.c
+++ b/drivers/net/netoops.c
@@ -688,3 +688,5 @@ static void __exit netoops_exit(void)

module_init(netoops_init);
module_exit(netoops_exit);
+
+MODULE_LICENSE("GPL");

2010-11-03 01:30:10

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 02/12] kmsg_dumper: Pass pt_regs along to dumpers.

A subsequent commit adding netoops functionality would like access to the
pt_regs to have them recorded in the dump. Pass the register values along if
available.

Signed-off-by: Mike Waychison <[email protected]>
---
drivers/char/ramoops.c | 4 +++-
drivers/mtd/mtdoops.c | 4 +++-
include/linux/kmsg_dump.h | 8 ++++++--
kernel/kexec.c | 5 +++--
kernel/panic.c | 4 ++--
kernel/printk.c | 4 ++--
6 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/char/ramoops.c b/drivers/char/ramoops.c
index 74f00b5..9acf52c 100644
--- a/drivers/char/ramoops.c
+++ b/drivers/char/ramoops.c
@@ -56,7 +56,9 @@ static struct ramoops_context {
} oops_cxt;

static void ramoops_do_dump(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason, const char *s1, unsigned long l1,
+ enum kmsg_dump_reason reason,
+ struct pt_regs *pt_regs,
+ const char *s1, unsigned long l1,
const char *s2, unsigned long l2)
{
struct ramoops_context *cxt = container_of(dumper,
diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
index 1ee72f3..cf07cf5 100644
--- a/drivers/mtd/mtdoops.c
+++ b/drivers/mtd/mtdoops.c
@@ -298,7 +298,9 @@ static void find_next_position(struct mtdoops_context *cxt)
}

static void mtdoops_do_dump(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason, const char *s1, unsigned long l1,
+ enum kmsg_dump_reason reason,
+ struct pt_regs *pt_regs,
+ const char *s1, unsigned long l1,
const char *s2, unsigned long l2)
{
struct mtdoops_context *cxt = container_of(dumper,
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 24b4414..a229acc 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -14,6 +14,8 @@

#include <linux/list.h>

+struct pt_regs;
+
enum kmsg_dump_reason {
KMSG_DUMP_OOPS,
KMSG_DUMP_PANIC,
@@ -30,6 +32,7 @@ enum kmsg_dump_reason {
*/
struct kmsg_dumper {
void (*dump)(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
+ struct pt_regs *pt_regs,
const char *s1, unsigned long l1,
const char *s2, unsigned long l2);
struct list_head list;
@@ -37,13 +40,14 @@ struct kmsg_dumper {
};

#ifdef CONFIG_PRINTK
-void kmsg_dump(enum kmsg_dump_reason reason);
+void kmsg_dump(enum kmsg_dump_reason reason, struct pt_regs *pt_regs);

int kmsg_dump_register(struct kmsg_dumper *dumper);

int kmsg_dump_unregister(struct kmsg_dumper *dumper);
#else
-static inline void kmsg_dump(enum kmsg_dump_reason reason)
+static inline void kmsg_dump(enum kmsg_dump_reason reason,
+ struct pt_regs *pt_regs)
{
}

diff --git a/kernel/kexec.c b/kernel/kexec.c
index c0613f7..6e89a52 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1078,10 +1078,11 @@ void crash_kexec(struct pt_regs *regs)
if (kexec_crash_image) {
struct pt_regs fixed_regs;

- kmsg_dump(KMSG_DUMP_KEXEC);
-
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
+
+ kmsg_dump(KMSG_DUMP_KEXEC, &fixed_regs);
+
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
}
diff --git a/kernel/panic.c b/kernel/panic.c
index c3f39cd..b3a4440 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -87,7 +87,7 @@ NORET_TYPE void panic(const char * fmt, ...)
*/
crash_kexec(NULL);

- kmsg_dump(KMSG_DUMP_PANIC);
+ kmsg_dump(KMSG_DUMP_PANIC, NULL);

/*
* Note smp_send_stop is the usual smp shutdown function, which
@@ -353,7 +353,7 @@ void oops_exit(struct pt_regs *regs)
{
do_oops_enter_exit();
print_oops_end_marker();
- kmsg_dump(KMSG_DUMP_OOPS);
+ kmsg_dump(KMSG_DUMP_OOPS, regs);
}

#ifdef WANT_WARN_ON_SLOWPATH
diff --git a/kernel/printk.c b/kernel/printk.c
index 8fe465a..ad0c74e 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1532,7 +1532,7 @@ static const char *kmsg_to_str(enum kmsg_dump_reason reason)
* Iterate through each of the dump devices and call the oops/panic
* callbacks with the log buffer.
*/
-void kmsg_dump(enum kmsg_dump_reason reason)
+void kmsg_dump(enum kmsg_dump_reason reason, struct pt_regs *pt_regs)
{
unsigned long end;
unsigned chars;
@@ -1569,7 +1569,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
return;
}
list_for_each_entry(dumper, &dump_list, list)
- dumper->dump(dumper, reason, s1, l1, s2, l2);
+ dumper->dump(dumper, reason, pt_regs, s1, l1, s2, l2);
spin_unlock_irqrestore(&dump_list_lock, flags);
}
#endif

2010-11-03 01:31:33

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 07/12] netoops: Add user programmable fields to the netoops packet.

In our environment, it is important for us to capture motherboard name,
firmware version and boot number for consideration when analyzing netoops
dumps.

We collect this information in userland at system startup (in platform specific
ways) and plug this information into the netoops driver via files available in
sysfs.

Files introduced:
/sys/kernel/netdump/netdump_board_name
/sys/kernel/netdump/netdump_fw_version
/sys/kernel/netdump/netdump_boot_id

If this were to be rewritten, we would probably like to have a more general way
for userland to add to netdump packets.

Signed-off-by: Mike Waychison <[email protected]>
---
TODO: This would probably be better handled by havin the kernel accept some blob
of data. It doesn't need to know what the data itself is, as long as it is
delivered in each packet.
---
drivers/net/netoops.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/drivers/net/netoops.c b/drivers/net/netoops.c
index 0865212..13c4d51 100644
--- a/drivers/net/netoops.c
+++ b/drivers/net/netoops.c
@@ -54,7 +54,7 @@ struct netoops_msg {
u16 type;
u32 packet_count;
u32 packet_no;
- u32 __reserved1;
+ u32 blog_boot_id;
u8 x86_family;
u8 x86_model;
u8 x86_stepping;
@@ -63,8 +63,13 @@ struct netoops_msg {
* termination not required.
*/
char kernel_version[64];
- char __reserved2[64];
- char __reserved3[64];
+ /*
+ * bios and board info come from smbios which
+ * has a maximum string length of 64 according
+ * to page 26 of the smbios 2.3 spec.
+ */
+ char bios_version[64];
+ char motherboard_type[64];
/* NOTE: regs is 60 or 168 bytes */
struct pt_regs regs; /* arch specific. */
/*
@@ -82,6 +87,10 @@ static struct neighbour *netoops_neighbour;
static DEFINE_SPINLOCK(netoops_lock);
static struct netoops_msg msg;

+static char netoops_fw_version[80];
+static char netoops_board_name[80];
+static int netoops_boot_number;
+
static void save_and_disable_netoops(struct net_device *dev);
static void restore_netoops(struct net_device *dev);

@@ -320,11 +329,16 @@ static void setup_packet_header(int packet_count, struct pt_regs *regs,
msg.header.dump_id = (jiffies/HZ) & 0xffff;
msg.header.packet_count = packet_count;
msg.header.header_size = sizeof(msg.header);
+ msg.header.blog_boot_id = (u32)netoops_boot_number;
#ifndef CONFIG_UML
msg.header.x86_family = current_cpu_data.x86;
msg.header.x86_model = current_cpu_data.x86_model;
msg.header.x86_stepping = current_cpu_data.x86_mask;
#endif
+ strncpy(msg.header.bios_version, netoops_fw_version,
+ sizeof(msg.header.bios_version));
+ strncpy(msg.header.motherboard_type, netoops_board_name,
+ sizeof(msg.header.motherboard_type));
strncpy(msg.header.kernel_version,
utsname()->release,
min(sizeof(msg.header.kernel_version),
@@ -431,6 +445,60 @@ static void netoops(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
spin_unlock(&netoops_lock);
}

+static ssize_t netoops_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf) {
+
+ if (!strcmp(attr->attr.name, "netdump_fw_version"))
+ strncpy(buf, netoops_fw_version, PAGE_SIZE);
+ else if (!strcmp(attr->attr.name, "netdump_board_name"))
+ strncpy(buf, netoops_board_name, PAGE_SIZE);
+ else if (!strcmp(attr->attr.name, "netdump_boot_number"))
+ snprintf(buf, PAGE_SIZE, "%d\n", netoops_boot_number);
+ buf[PAGE_SIZE - 1] = '\0';
+ return strnlen(buf, PAGE_SIZE);
+}
+
+static ssize_t netoops_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf,
+ size_t count) {
+ if (!count)
+ return count;
+
+ if (!strcmp(attr->attr.name, "netdump_fw_version")) {
+ strncpy(netoops_fw_version, buf, sizeof(netoops_fw_version));
+ netoops_fw_version[sizeof(netoops_fw_version) - 1] = '\0';
+ } else if (!strcmp(attr->attr.name, "netdump_board_name")) {
+ strncpy(netoops_board_name, buf, sizeof(netoops_board_name));
+ netoops_board_name[sizeof(netoops_board_name) - 1] = '\0';
+ } else if (!strcmp(attr->attr.name, "netdump_boot_number")) {
+ ((char *)buf)[count - 1] = '\0';
+ sscanf(buf, "%du", &netoops_boot_number);
+ }
+ return count;
+}
+
+static struct kobj_attribute netoops_fw_version_attribute =
+ __ATTR(netoops_fw_version, 0666, netoops_show, netoops_store);
+static struct kobj_attribute netoops_board_name_attribute =
+ __ATTR(netoops_board_name, 0666, netoops_show, netoops_store);
+static struct kobj_attribute netoops_boot_number_attribute =
+ __ATTR(netoops_boot_number, 0666, netoops_show, netoops_store);
+
+static struct attribute *attrs[] = {
+ &netoops_fw_version_attribute.attr,
+ &netoops_board_name_attribute.attr,
+ &netoops_boot_number_attribute.attr,
+ NULL,
+};
+
+static struct attribute_group attr_group = {
+ .attrs = attrs,
+};
+
+static struct kobject *netoops_kobj;
+
static struct kmsg_dumper netoops_dumper = {
.dump = netoops,
};
@@ -530,9 +598,15 @@ static int __init netoops_init(void)
BUILD_BUG_ON(offsetof(struct netoops_msg, header.version) != 0);
BUILD_BUG_ON(offsetof(struct netoops_msg, header.dump_id) != 6);

+ netoops_kobj = kobject_create_and_add("netdump", kernel_kobj);
+ if (!netoops_kobj)
+ return -ENOMEM;
+ retval = sysfs_create_group(netoops_kobj, &attr_group);
+ if (retval)
+ goto cleanup_kobj;
retval = kmsg_dump_register(&netoops_dumper);
if (retval)
- goto out;
+ goto cleanup_sysfs_group;

/* Register hooks */
retval = register_netdevice_notifier(&netoops_notifier);
@@ -550,7 +624,10 @@ cleanup_netdevice:
unregister_netdevice_notifier(&netoops_notifier);
cleanup_kmsg_dump:
kmsg_dump_unregister(&netoops_dumper);
-out:
+cleanup_sysfs_group:
+ sysfs_remove_group(netoops_kobj, &attr_group);
+cleanup_kobj:
+ kobject_put(netoops_kobj);
return retval;
}

@@ -560,6 +637,8 @@ static void __exit netoops_exit(void)
unregister_netdevice_notifier(&netoops_notifier);
disable_netoops();
kmsg_dump_unregister(&netoops_dumper);
+ sysfs_remove_group(netoops_kobj, &attr_group);
+ kobject_put(netoops_kobj);
}

module_init(netoops_init);

2010-11-03 01:31:45

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 06/12] netoops: Add x86 specific bits to packet headers

We need to be able to gather information about the CPUs that caused the crash.

This commit only handles x86, but it is desirable to come up with some new
packet format that can accommodate any architecture.

Signed-off-by: Mike Waychison <[email protected]>
---
TODO: This should be made more general to other architectures. As is, we are
probably okay exporting some value for the 'arch' field. Different
architectures though will likely want to gather different data.
---
drivers/net/netoops.c | 30 +++++++++++++++++++++++-------
1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netoops.c b/drivers/net/netoops.c
index e9fdda3..0865212 100644
--- a/drivers/net/netoops.c
+++ b/drivers/net/netoops.c
@@ -55,16 +55,24 @@ struct netoops_msg {
u32 packet_count;
u32 packet_no;
u32 __reserved1;
- u8 __reserved2;
- u8 __reserved3;
- u8 __reserved4;
+ u8 x86_family;
+ u8 x86_model;
+ u8 x86_stepping;
/*
* NOTE: fixed length strings for a packet. NULL
* termination not required.
*/
char kernel_version[64];
- char __reserved5[64];
- char __reserved6[64];
+ char __reserved2[64];
+ char __reserved3[64];
+ /* NOTE: regs is 60 or 168 bytes */
+ struct pt_regs regs; /* arch specific. */
+ /*
+ * NOTE: The header is potentially ~385 bytes
+ * already. That doesn't leave much room for
+ * expansion unless we reduce the data size
+ * or truncate above fields.
+ */
} __attribute__ ((packed)) header;
char data[NETOOPS_DATA_BYTES];
} __attribute__ ((packed));
@@ -302,7 +310,8 @@ static void get_netoops_dest(char *buff, int buffsize, u32 *dest, u32 *src)
spin_unlock_irqrestore(&netoops_lock, flags);
}

-static void setup_packet_header(int packet_count, int soft_dump)
+static void setup_packet_header(int packet_count, struct pt_regs *regs,
+ int soft_dump)
{
msg.header.version = NETOOPS_VERSION;
msg.header.arch = NETOOPS_ARCH;
@@ -311,10 +320,17 @@ static void setup_packet_header(int packet_count, int soft_dump)
msg.header.dump_id = (jiffies/HZ) & 0xffff;
msg.header.packet_count = packet_count;
msg.header.header_size = sizeof(msg.header);
+#ifndef CONFIG_UML
+ msg.header.x86_family = current_cpu_data.x86;
+ msg.header.x86_model = current_cpu_data.x86_model;
+ msg.header.x86_stepping = current_cpu_data.x86_mask;
+#endif
strncpy(msg.header.kernel_version,
utsname()->release,
min(sizeof(msg.header.kernel_version),
sizeof(utsname()->release)));
+ if (regs != NULL)
+ memcpy(&msg.header.regs, regs, sizeof(msg.header.regs));
}

static int packet_count_from_length(unsigned long l)
@@ -403,7 +419,7 @@ static void netoops(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,

/* setup the non varying parts of the message */
memset(&msg, 0, sizeof(msg));
- setup_packet_header(packet_count_1 + packet_count_2, soft_dump);
+ setup_packet_header(packet_count_1 + packet_count_2, regs, soft_dump);

/* Transmission loop */
for (i = 0; i < NETOOPS_RETRANSMIT_COUNT; i++) {

2010-11-03 01:32:06

by Mike Waychison

[permalink] [raw]
Subject: [PATCH v1 05/12] netoops: add core functionality

The kernel network dumper provides information about a crashed machine
on the network.

On a crash, the kernel spits out the contents of the kernel message buffer
along with a few other useful tidbits of information via unicast UDP.
Each packet is sent a total of three times to deal with packet loss on the
connection. Furthermore a small amount critical data is present in every
packet, so even if only a single packet gets through, we still witness the
crash. In the same vein, we send packet in reverse order to handle cases where
the kernel fatally crashes before transmission can be completed because often
the most interesting bits of a crash can be found in the tail of the log.

Configuration of the netoops device currently uses a file in proc. It is
programmed by writing in a u32 that represents the destination IP of the target
netoops catcher. This can probably be made more general.

/proc/sys/kernel/net_dump_device
Takes a u32 in ascii representing the destination ipv4 address for dumps.

Signed-off-by: Mike Waychison <[email protected]>
---

The packet format described in this and subsequent patches currently represent
the packet format used by Google. It is _not_ generally applicable though, as
it does contain several fields that are x86 specific. I've included them here
nevertheless to foster discussion as to how best to abstract this sort of
information away.

In this commit, there are several fields that are marked "__reserved*" in the
packet header. These are replaced with actual definitions in later commits.

TODO: Figure out a better interface than plugging in an integer representing
an IPv4 address.
TODO: The netdev event handling is sufficient for our environment, but it
probably could use some work. For instance, it may not make sense to actually
pin the neighbor, but instead just keep a copy of the MAC around.
TODO: UDP port numbers shouldn't be hardcoded like this.
---
drivers/net/Kconfig | 11 +
drivers/net/Makefile | 1
drivers/net/netoops.c | 550 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 561 insertions(+), 1 deletions(-)
create mode 100644 drivers/net/netoops.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 5db667c..4dc53d4 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3320,8 +3320,17 @@ config NETCONSOLE_DYNAMIC
at runtime through a userspace interface exported using configfs.
See <file:Documentation/networking/netconsole.txt> for details.

+config NETOOPS
+ bool "Network oops support"
+ depends on PROC_FS
+ help
+ This option enables the ability to have the kernel logs emitted on
+ the network when a machine Oopses or Panics. Configuration of this
+ option is done at runtime by configuring a destination IP address.
+ If unsure, say N.
+
config NETPOLL
- def_bool NETCONSOLE
+ def_bool NETCONSOLE || NETOOPS

config NETPOLL_TRAP
bool "Netpoll traffic trapping"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3e8f150..a8b0113 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -284,6 +284,7 @@ obj-$(CONFIG_ETRAX_ETHERNET) += cris/
obj-$(CONFIG_ENP2611_MSF_NET) += ixp2000/

obj-$(CONFIG_NETCONSOLE) += netconsole.o
+obj-$(CONFIG_NETOOPS) += netoops.o

obj-$(CONFIG_FS_ENET) += fs_enet/

diff --git a/drivers/net/netoops.c b/drivers/net/netoops.c
new file mode 100644
index 0000000..e9fdda3
--- /dev/null
+++ b/drivers/net/netoops.c
@@ -0,0 +1,550 @@
+/*
+ * drivers/net/netoops.c
+ * Copyright (C) 2004 and beyond Google Inc.
+ *
+ * Original Author Ross Biro
+ * Revisions Rebecca Schultz
+ * Cleaned up by Mike Waychison <[email protected]>
+ *
+ * This is very simple code to use the polling
+ * mode of the network drivers to send the
+ * contents of the printk buffer via udp w/o
+ * checksum to a unicast address.
+ */
+
+#include <linux/delay.h>
+#include <linux/in.h>
+#include <linux/notifier.h>
+#include <linux/kernel.h>
+#include <linux/netpoll.h>
+#include <linux/nmi.h>
+#include <linux/utsname.h>
+#include <linux/watchdog.h>
+#include <net/arp.h>
+#include <net/flow.h>
+#include <net/neighbour.h>
+#include <net/route.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/kmsg_dump.h>
+
+#define NETOOPS_TYPE_PRINTK_BUFFER 1
+#define NETOOPS_TYPE_PRINTK_BUFFER_SOFT 3
+#define NETOOPS_VERSION 0x0002
+#define NETOOPS_PORT 2004
+#define NETOOPS_RETRANSMIT_COUNT 3
+
+#if defined(__i386__) || defined(__x86_64__)
+#define NETOOPS_ARCH 2
+#else
+#error "unsupported architecture"
+#endif
+
+#define NETOOPS_DATA_BYTES 1024
+
+struct netoops_msg {
+ struct {
+ u16 version; /* MUST be @ offset 0 */
+ /*
+ * Size of this header before data[] starts.
+ */
+ u16 header_size;
+ u16 arch;
+ u16 dump_id; /* MUST be @ offset 6 */
+ u16 type;
+ u32 packet_count;
+ u32 packet_no;
+ u32 __reserved1;
+ u8 __reserved2;
+ u8 __reserved3;
+ u8 __reserved4;
+ /*
+ * NOTE: fixed length strings for a packet. NULL
+ * termination not required.
+ */
+ char kernel_version[64];
+ char __reserved5[64];
+ char __reserved6[64];
+ } __attribute__ ((packed)) header;
+ char data[NETOOPS_DATA_BYTES];
+} __attribute__ ((packed));
+
+static struct netpoll *np;
+static struct neighbour *netoops_neighbour;
+static DEFINE_SPINLOCK(netoops_lock);
+static struct netoops_msg msg;
+
+static void save_and_disable_netoops(struct net_device *dev);
+static void restore_netoops(struct net_device *dev);
+
+static int netoops_netdev_event(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+ struct net_device *event_dev = ptr;
+ switch (event) {
+ case NETDEV_UNREGISTER:
+ save_and_disable_netoops(event_dev);
+ break;
+ case NETDEV_CHANGE:
+ /* LINK UP */
+ if (netif_carrier_ok(event_dev))
+ restore_netoops(event_dev);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block netoops_notifier = {
+ .notifier_call = netoops_netdev_event,
+};
+
+static void disable_netoops(void);
+static void enable_netoops(char *dev_name, u32 dest_ip, u32 src_ip,
+ struct neighbour *dest_neighbour);
+
+static void set_netoops_dest(u32 dest_ip)
+{
+ struct rtable *rt = NULL;
+ struct flowi fli;
+
+ memset(&fli, 0, sizeof(fli));
+ fli.fl4_dst = htonl(dest_ip);
+ fli.fl4_src = htonl(0);
+ /* get a route to the destination */
+ if (!ip_route_output_key(&init_net, &rt, &fli))
+ /* now get an arp entry for the neighbour */
+ if (!arp_bind_neighbour(&rt->dst))
+ neigh_event_send(rt->dst.neighbour, NULL);
+
+ /* if route to destination was found */
+ if (rt && rt->dst.neighbour) {
+ /* use the device the route was found on,
+ * and take the remote mac from the neighbour entry */
+ enable_netoops(rt->dst.neighbour->dev->name, dest_ip,
+ ntohl(rt->rt_src),
+ neigh_clone(rt->dst.neighbour));
+ } else
+ disable_netoops();
+
+ /* decrement the reference counters on the route and neighbour */
+ if (rt)
+ ip_rt_put(rt);
+}
+
+static void disable_netoops(void)
+{
+ unsigned long flags;
+ struct netpoll *saved_np = NULL;
+ struct neighbour *cleanup_neighbour;
+
+ spin_lock_irqsave(&netoops_lock, flags);
+ cleanup_neighbour = netoops_neighbour;
+ netoops_neighbour = NULL;
+ saved_np = np;
+ np = NULL;
+ spin_unlock_irqrestore(&netoops_lock, flags);
+
+ if (saved_np && saved_np->dev)
+ netpoll_cleanup(saved_np);
+
+ kfree(saved_np);
+
+ /* refcount cannot be decreased with interrupt disabled. */
+ if (cleanup_neighbour)
+ neigh_release(cleanup_neighbour);
+}
+
+static void enable_netoops(char *dev_name, u32 dest_ip, u32 src_ip,
+ struct neighbour *dest_neighbour)
+{
+ unsigned long flags;
+ struct neighbour *neighbour = NULL;
+
+ struct netpoll *new_np = kzalloc(sizeof(struct netpoll), GFP_KERNEL);
+ struct netpoll *saved_np = NULL;
+
+ if (new_np == NULL) {
+ printk(KERN_ERR "netpoll_setup failed, netoops disabled.");
+ disable_netoops();
+ return;
+ }
+ new_np->name = "network_dumper";
+ new_np->rx_hook = NULL;
+ new_np->local_port = NETOOPS_PORT;
+ new_np->remote_port = NETOOPS_PORT;
+
+ strncpy(new_np->dev_name, dev_name, sizeof(np->dev_name));
+ new_np->local_ip = htonl(src_ip);
+ new_np->remote_ip = htonl(dest_ip);
+
+ if (netpoll_setup(new_np)) {
+ printk(KERN_ERR "netpoll_setup failed, netoops disabled.");
+ disable_netoops();
+ kfree(new_np);
+ return;
+ }
+
+ spin_lock_irqsave(&netoops_lock, flags);
+ saved_np = np;
+ np = new_np;
+ neighbour = netoops_neighbour;
+ netoops_neighbour = dest_neighbour;
+ spin_unlock_irqrestore(&netoops_lock, flags);
+
+ /* refcount cannot be decreased with interrupt disabled. */
+ if (neighbour)
+ neigh_release(neighbour);
+
+ if (saved_np) {
+ netpoll_cleanup(saved_np);
+ kfree(saved_np);
+ }
+ return;
+}
+
+static void restore_netoops(struct net_device *dev)
+{
+ unsigned long flags;
+ u32 dest_ip;
+ u32 src_ip;
+ struct netpoll *saved_np;
+
+ /* Disable netoopss while saving off the destination IP address */
+ spin_lock_irqsave(&netoops_lock, flags);
+ if (!np || strcmp(np->dev_name, dev->name)) {
+ spin_unlock_irqrestore(&netoops_lock, flags);
+ return;
+ }
+ src_ip = ntohl(np->local_ip);
+ dest_ip = ntohl(np->remote_ip);
+ saved_np = np;
+ np = NULL;
+ spin_unlock_irqrestore(&netoops_lock, flags);
+
+ /* Restart the netoops configuration using the previous target IP */
+ set_netoops_dest(dest_ip);
+
+ /* Check to see if that worked at all, if not, silently continue trying
+ * to use the existing configuration. */
+ spin_lock_irqsave(&netoops_lock, flags);
+ if (!np) {
+ np = saved_np;
+ spin_unlock_irqrestore(&netoops_lock, flags);
+ return;
+ }
+ spin_unlock_irqrestore(&netoops_lock, flags);
+
+ /* Cleanup the old netpoll config */
+ if (saved_np->dev)
+ netpoll_cleanup(saved_np);
+ kfree(saved_np);
+}
+
+static void save_and_disable_netoops(struct net_device *dev)
+{
+ unsigned long flags;
+ struct neighbour *neighbour = NULL;
+ struct netpoll *saved_np;
+
+ spin_lock_irqsave(&netoops_lock, flags);
+ if (np && np->dev && np->dev == dev) {
+ neighbour = netoops_neighbour;
+ netoops_neighbour = NULL;
+ saved_np = np;
+ np = NULL;
+
+ /* netpoll_cleanup calls __cancel_work_timer, and the later
+ * can sleep
+ */
+ spin_unlock_irqrestore(&netoops_lock, flags);
+ netpoll_cleanup(saved_np);
+ spin_lock_irqsave(&netoops_lock, flags);
+
+ BUG_ON(saved_np->dev != NULL);
+ BUG_ON(!strlen(saved_np->dev_name));
+ if (np) {
+ kfree(saved_np);
+ saved_np = NULL;
+ } else {
+ np = saved_np;
+ }
+ }
+ spin_unlock_irqrestore(&netoops_lock, flags);
+
+ /* refcount cannot be decreased with interrupt disabled. */
+ if (neighbour)
+ neigh_release(neighbour);
+}
+
+static void get_netoops_dest(char *buff, int buffsize, u32 *dest, u32 *src)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&netoops_lock, flags);
+ if (np != NULL) {
+ if (dest != NULL)
+ *dest = ntohl(np->remote_ip);
+ if (src != NULL)
+ *src = ntohl(np->local_ip);
+ memset(buff, 0, buffsize);
+ if (np->dev_name != NULL && buffsize > 1)
+ strncpy(buff, np->dev_name, min(buffsize-1, IFNAMSIZ));
+ } else {
+ if (dest != NULL)
+ *dest = 0;
+ if (src != NULL)
+ *src = 0;
+ memset(buff, 0, buffsize);
+ }
+ spin_unlock_irqrestore(&netoops_lock, flags);
+}
+
+static void setup_packet_header(int packet_count, int soft_dump)
+{
+ msg.header.version = NETOOPS_VERSION;
+ msg.header.arch = NETOOPS_ARCH;
+ msg.header.type = soft_dump ? NETOOPS_TYPE_PRINTK_BUFFER_SOFT :
+ NETOOPS_TYPE_PRINTK_BUFFER;
+ msg.header.dump_id = (jiffies/HZ) & 0xffff;
+ msg.header.packet_count = packet_count;
+ msg.header.header_size = sizeof(msg.header);
+ strncpy(msg.header.kernel_version,
+ utsname()->release,
+ min(sizeof(msg.header.kernel_version),
+ sizeof(utsname()->release)));
+}
+
+static int packet_count_from_length(unsigned long l)
+{
+ return (l + NETOOPS_DATA_BYTES - 1) / NETOOPS_DATA_BYTES;
+}
+
+static void netoops_send_packet(int packet_nr)
+{
+ msg.header.packet_no = packet_nr;
+
+ netpoll_send_udp(np, (char *)&msg, sizeof(msg));
+}
+
+
+/*
+ * Send the passed in segment of kmsg via netpoll. Packets are sent in reverse
+ * order, with the tail packet (the first one transmitted) zero-padded.
+ */
+static void netoops_send_segment(int packet_offset,
+ const char *s, unsigned long l)
+{
+ int packet_count = packet_count_from_length(l);
+ size_t data_length;
+ int i;
+
+ for (i = packet_count - 1; i >= 0; i--) {
+ /* Usually messages completely fill the data field */
+ data_length = NETOOPS_DATA_BYTES;
+ if (i == packet_count - 1) {
+ /* Except the tail packet, which is zero-padded */
+ data_length = l % NETOOPS_DATA_BYTES;
+ memset(msg.data + data_length, 0,
+ NETOOPS_DATA_BYTES - data_length);
+ }
+ BUG_ON(data_length > NETOOPS_DATA_BYTES);
+
+ /* Copy the payload into the packet and send */
+ memcpy(msg.data, s + (i * NETOOPS_DATA_BYTES), data_length);
+ netoops_send_packet((packet_count - i - 1) + packet_offset);
+
+ touch_nmi_watchdog();
+ }
+}
+
+/*
+ * Callback used by the kmsg_dumper.
+ *
+ * Called with interrupts disabled locally.
+ */
+static void netoops(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
+ struct pt_regs *regs,
+ const char *s1, unsigned long l1,
+ const char *s2, unsigned long l2) {
+ int packet_count_1, packet_count_2;
+ int soft_dump = 0;
+ int i;
+
+ /* Only handle fatal problems */
+ if (reason != KMSG_DUMP_OOPS
+ && reason != KMSG_DUMP_PANIC
+ && reason != KMSG_DUMP_SOFT)
+ return;
+
+ if (reason == KMSG_DUMP_SOFT)
+ soft_dump = 1;
+
+ if (!spin_trylock(&netoops_lock))
+ return;
+
+ if (np == NULL || np->dev == NULL) {
+ spin_unlock(&netoops_lock);
+ return;
+ }
+
+ /*
+ * This looks a little suspicious? Where are we sending the packets if
+ * the neighbor isn't valid?
+ */
+ if (netoops_neighbour && (netoops_neighbour->nud_state & NUD_VALID))
+ memcpy(np->remote_mac, netoops_neighbour->ha, IFHWADDRLEN);
+
+ /* compute total length of the message we are going to send */
+ packet_count_1 = packet_count_from_length(l1);
+ packet_count_2 = packet_count_from_length(l2);
+
+ /* setup the non varying parts of the message */
+ memset(&msg, 0, sizeof(msg));
+ setup_packet_header(packet_count_1 + packet_count_2, soft_dump);
+
+ /* Transmission loop */
+ for (i = 0; i < NETOOPS_RETRANSMIT_COUNT; i++) {
+ /* Send the full packets from the second segment */
+ netoops_send_segment(0, s2, l2);
+ netoops_send_segment(packet_count_2, s1, l1);
+ }
+
+ spin_unlock(&netoops_lock);
+}
+
+static struct kmsg_dumper netoops_dumper = {
+ .dump = netoops,
+};
+
+static int proc_netoops_device(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ char buff2[IFNAMSIZ+40];
+ char *s, *e;
+ u32 dest;
+ u32 src;
+
+ if (*ppos) {
+ *lenp = 0;
+ return 0;
+ }
+
+ if (!write) {
+ char devname[IFNAMSIZ+1];
+ if (!*lenp)
+ return 0;
+
+ get_netoops_dest(devname, sizeof(devname), &dest, &src);
+ snprintf(buff2, sizeof(buff2)-1, "%s,0x%08X,0x%08X\n",
+ devname, dest, src);
+ buff2[sizeof(buff2)-1] = 0;
+ if (copy_to_user(buffer, buff2, min(strlen(buff2), *lenp)))
+ return -EFAULT;
+
+ *lenp = min(*lenp, strlen(buff2));
+ *ppos += *lenp;
+ return 0;
+ }
+
+ /* The input is in one of three formats:
+ * 1) "off", disable netoops
+ * 2) a dest address as a 32 bit number in host order
+ *
+ * CASE 1) netoopss are disabled
+ * CASE 2) configure for unicast netoopss to that address.
+ *
+ * Supplying a broadcast or multicast address to case 2 or a
+ * unicast address to case 3 will result in netoopss being disabled.
+ */
+ if (copy_from_user(buff2, buffer, min(sizeof(buff2)-1, *lenp)))
+ return -EFAULT;
+
+ buff2[sizeof(buff2)-1] = 0;
+ e = buff2;
+ s = strsep(&e, ",");
+ if (!strncmp(s, "off", 3)) {
+ /* disable netoopss */
+ disable_netoops();
+ } else {
+ /*
+ * Check if the first thing in the input is a valid net address
+ */
+ if (strict_strtoul(buff2, 0, &dest)) {
+ disable_netoops();
+ return -EIO;
+ }
+ /* assume CASE 2 unicast netoopss */
+ set_netoops_dest(dest);
+ *ppos += *lenp;
+ return 0;
+ }
+
+ *ppos += *lenp;
+
+ return 0;
+}
+
+static struct ctl_table kern_table[] = {
+ {
+ .procname = "net_dump_device",
+ .mode = 0644,
+ .proc_handler = &proc_netoops_device,
+ },
+ {},
+};
+
+static struct ctl_table root_table[] = {
+ {
+ .procname = "kernel",
+ .mode = 0555,
+ .child = kern_table,
+ },
+ {}
+};
+
+static struct ctl_table_header *proc_table_header;
+
+static int __init netoops_init(void)
+{
+ int retval = -EINVAL;
+
+ BUILD_BUG_ON(offsetof(struct netoops_msg, header.version) != 0);
+ BUILD_BUG_ON(offsetof(struct netoops_msg, header.dump_id) != 6);
+
+ retval = kmsg_dump_register(&netoops_dumper);
+ if (retval)
+ goto out;
+
+ /* Register hooks */
+ retval = register_netdevice_notifier(&netoops_notifier);
+ if (retval)
+ goto cleanup_kmsg_dump;
+
+ proc_table_header = register_sysctl_table(root_table);
+ if (proc_table_header == NULL) {
+ retval = -EBUSY;
+ goto cleanup_netdevice;
+ }
+
+ return 0;
+cleanup_netdevice:
+ unregister_netdevice_notifier(&netoops_notifier);
+cleanup_kmsg_dump:
+ kmsg_dump_unregister(&netoops_dumper);
+out:
+ return retval;
+}
+
+static void __exit netoops_exit(void)
+{
+ unregister_sysctl_table(proc_table_header);
+ unregister_netdevice_notifier(&netoops_notifier);
+ disable_netoops();
+ kmsg_dump_unregister(&netoops_dumper);
+}
+
+module_init(netoops_init);
+module_exit(netoops_exit);

2010-11-03 03:00:01

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Tue, Nov 02, 2010 at 06:29:25PM -0700, Mike Waychison wrote:
> This patchset applies to v2.6.36.
>
> The following series implements support for 'netoops', a simple driver that
> will deliver kmsg logs together with machine specifics over the network.

We already have the ability to send oopses over the network today,
through the network consolst stuff. What does this patch set do that is
different from our existing stuff that warrants such a big change?

curious,

greg k-h

2010-11-03 03:38:09

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Tue, Nov 2, 2010 at 7:34 PM, Greg KH <[email protected]> wrote:
> On Tue, Nov 02, 2010 at 06:29:25PM -0700, Mike Waychison wrote:
>> This patchset applies to v2.6.36.
>>
>> The following series implements support for 'netoops', a simple driver that
>> will deliver kmsg logs together with machine specifics over the network.
>
> We already have the ability to send oopses over the network today,
> through the network consolst stuff. What does this patch set do that is
> different from our existing stuff that warrants such a big change?
>

Hi Greg,

I am a little familiar with the netconsole suppport. I should have
added a comparison to the cover email :(

We never adopted netconsole for a couple different reasons. The
reasons have slightly changed over the years, but even today we find
that it isn't a substitute for netoops' semantics.

With the number of machines we have, streaming large amounts of
consoles within the data center can really add up. This gets worse
when you take into account how reliant we are on kernel logging like
OOM conditions (which are very regular and very verbose). Events in
the data center (such as application growth) tend to be temporally
correlated, which causes large bursts of logging when we are OOM. We
aren't so interested in this kernel verbosity from a global collection
standpoint though, and haven't been keen on the amount of extra
un-regulated UDP traffic it would generate. We are however interested
in kernel oopses though (which occur far less often).

In terms of the data received, we've really benefited by having
structured data in the payload. We've been collecting kernel oopses
since sometime in 2006 and have a _vast_ collection of crashes that we
have indexed by just about anything you could ever want (registers,
full dmesg text, backtraces, motherboards, CPU types, kernel versions,
bios versions, etc). This has allowed us to quickly find 'big bugs'
vs 'rare bugs' (similar to kerneloops.org) and allow for automated
labeling of oopses/panics. This sort of structured data is either not
present in the dmesg logs or it is, but is extremely difficult to
parse (especially across kernel versions). Information like firmware
version information is also difficult to associate with crashes with
post-processing due to gaps in global sampling and the churn that
occurs in the lab where versions change quickly.

Another area where the two approaches have differed has been in
handling of network reliability. Historically (though less and less
now), we found that we had to transmit data several times. We also
used to explicitly space out packets with delays to handle switch chip
buffer overruns. Both of these functions I presume could be added to
netconsole without too much of a problem.

Lastly, this patchset also introduces a 'one-shot' mode, which has
saved our bacon several times in the past as well. It's not totally
uncommon for the kernel's crash path to be buggy, in turn causing the
kernel to emit Oopses until the cows come home (or rather, until the
hardware watchdogs trip). One-shot keeps us from emitting too much
garbage on the network when this happens.


I hope the above comparison of semantics outlines the motivations we
have for not using netconsole and favoring an approach like that used
in netoops :)

Mike Waychison

2010-11-03 18:16:21

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Tue, Nov 02, 2010 at 08:37:42PM -0700, Mike Waychison wrote:
> On Tue, Nov 2, 2010 at 7:34 PM, Greg KH <[email protected]> wrote:
> > On Tue, Nov 02, 2010 at 06:29:25PM -0700, Mike Waychison wrote:
> >> This patchset applies to v2.6.36.
> >>
> >> The following series implements support for 'netoops', a simple driver that
> >> will deliver kmsg logs together with machine specifics over the network.
> >
> > We already have the ability to send oopses over the network today,
> > through the network consolst stuff. What does this patch set do that is
> > different from our existing stuff that warrants such a big change?
> >
>
> Hi Greg,
>
> I am a little familiar with the netconsole suppport. I should have
> added a comparison to the cover email :(
>
> We never adopted netconsole for a couple different reasons. The
> reasons have slightly changed over the years, but even today we find
> that it isn't a substitute for netoops' semantics.

Ah, but it sounds like it would be better to fix up netoops to handle
your needs.

> With the number of machines we have, streaming large amounts of
> consoles within the data center can really add up. This gets worse
> when you take into account how reliant we are on kernel logging like
> OOM conditions (which are very regular and very verbose). Events in
> the data center (such as application growth) tend to be temporally
> correlated, which causes large bursts of logging when we are OOM. We
> aren't so interested in this kernel verbosity from a global collection
> standpoint though, and haven't been keen on the amount of extra
> un-regulated UDP traffic it would generate. We are however interested
> in kernel oopses though (which occur far less often).

Understood, I'm sure that a change to allow this to the existing netoops
code would be appreciated by many.

> In terms of the data received, we've really benefited by having
> structured data in the payload.

I bet the whole world would benefit by having the oops messages in a
more "structured" manner. We have done changes in the past to provide
this type of thing in a "more parsable" manner, to help stuff like
kerneloops.org. I'm sure that adding this type of information to the
main oops core/messages would be a good overall goal, instead of only
having it available to only this one option/user, right?

> Another area where the two approaches have differed has been in
> handling of network reliability. Historically (though less and less
> now), we found that we had to transmit data several times. We also
> used to explicitly space out packets with delays to handle switch chip
> buffer overruns. Both of these functions I presume could be added to
> netconsole without too much of a problem.

Yes, I agree netconsole would be good to get this type of change.

> Lastly, this patchset also introduces a 'one-shot' mode, which has
> saved our bacon several times in the past as well. It's not totally
> uncommon for the kernel's crash path to be buggy, in turn causing the
> kernel to emit Oopses until the cows come home (or rather, until the
> hardware watchdogs trip). One-shot keeps us from emitting too much
> garbage on the network when this happens.

I thought we had something like "only show the first oops" somewhere in
the kernel, perhaps I'm just imagining things...

If I am, adding this for all oopses would also be good.

> I hope the above comparison of semantics outlines the motivations we
> have for not using netconsole and favoring an approach like that used
> in netoops :)

I think you have just convinced me that you should add this type of
functionality for all oops messages even more, instead of only doing it
for your one type of oops transport :)

As for the user/kernel interface, perhaps exporting the data in a text
format that is "tagged" would be best? Then the whole world can parse
it easily.

thanks,

greg k-h

2010-11-03 18:51:58

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, 3 Nov 2010 11:16:34 -0700 Greg KH wrote:

> On Tue, Nov 02, 2010 at 08:37:42PM -0700, Mike Waychison wrote:
> > On Tue, Nov 2, 2010 at 7:34 PM, Greg KH <[email protected]> wrote:
> > > On Tue, Nov 02, 2010 at 06:29:25PM -0700, Mike Waychison wrote:
> > >> This patchset applies to v2.6.36.
> > >>
> > >> The following series implements support for 'netoops', a simple driver that
> > >> will deliver kmsg logs together with machine specifics over the network.
> > >
> > > We already have the ability to send oopses over the network today,
> > > through the network consolst stuff. What does this patch set do that is
> > > different from our existing stuff that warrants such a big change?
> > >
> >
> > Hi Greg,
> >
> > I am a little familiar with the netconsole suppport. I should have
> > added a comparison to the cover email :(
> >
> > We never adopted netconsole for a couple different reasons. The
> > reasons have slightly changed over the years, but even today we find
> > that it isn't a substitute for netoops' semantics.
>
> Ah, but it sounds like it would be better to fix up netoops to handle
> your needs.
>
> > With the number of machines we have, streaming large amounts of
> > consoles within the data center can really add up. This gets worse
> > when you take into account how reliant we are on kernel logging like
> > OOM conditions (which are very regular and very verbose). Events in
> > the data center (such as application growth) tend to be temporally
> > correlated, which causes large bursts of logging when we are OOM. We
> > aren't so interested in this kernel verbosity from a global collection
> > standpoint though, and haven't been keen on the amount of extra
> > un-regulated UDP traffic it would generate. We are however interested
> > in kernel oopses though (which occur far less often).
>
> Understood, I'm sure that a change to allow this to the existing netoops
> code would be appreciated by many.
>
> > In terms of the data received, we've really benefited by having
> > structured data in the payload.
>
> I bet the whole world would benefit by having the oops messages in a
> more "structured" manner. We have done changes in the past to provide
> this type of thing in a "more parsable" manner, to help stuff like
> kerneloops.org. I'm sure that adding this type of information to the
> main oops core/messages would be a good overall goal, instead of only
> having it available to only this one option/user, right?
>
> > Another area where the two approaches have differed has been in
> > handling of network reliability. Historically (though less and less
> > now), we found that we had to transmit data several times. We also
> > used to explicitly space out packets with delays to handle switch chip
> > buffer overruns. Both of these functions I presume could be added to
> > netconsole without too much of a problem.
>
> Yes, I agree netconsole would be good to get this type of change.
>
> > Lastly, this patchset also introduces a 'one-shot' mode, which has
> > saved our bacon several times in the past as well. It's not totally
> > uncommon for the kernel's crash path to be buggy, in turn causing the
> > kernel to emit Oopses until the cows come home (or rather, until the
> > hardware watchdogs trip). One-shot keeps us from emitting too much
> > garbage on the network when this happens.
>
> I thought we had something like "only show the first oops" somewhere in
> the kernel, perhaps I'm just imagining things...
>
> If I am, adding this for all oopses would also be good.
>
> > I hope the above comparison of semantics outlines the motivations we
> > have for not using netconsole and favoring an approach like that used
> > in netoops :)
>
> I think you have just convinced me that you should add this type of
> functionality for all oops messages even more, instead of only doing it
> for your one type of oops transport :)
>
> As for the user/kernel interface, perhaps exporting the data in a text
> format that is "tagged" would be best? Then the whole world can parse
> it easily.


I have been (occasionally) looking at critical kernel messages.
IMO we really need an easy way to find them.

They can begin with any of these strings (and others can be added
too easily):

BUG|panic|MCE|NMI|error:|Oops|Bad|Fatal|Unrecoverable|Unhandled|Weird

We need a simple (single?) tagging method to identify any/all of these,
/methinks.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2010-11-03 19:03:38

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, Nov 3, 2010 at 11:16 AM, Greg KH <[email protected]> wrote:
> On Tue, Nov 02, 2010 at 08:37:42PM -0700, Mike Waychison wrote:
>> On Tue, Nov 2, 2010 at 7:34 PM, Greg KH <[email protected]> wrote:
>> > On Tue, Nov 02, 2010 at 06:29:25PM -0700, Mike Waychison wrote:
>> >> This patchset applies to v2.6.36.
>> >>
>> >> The following series implements support for 'netoops', a simple driver that
>> >> will deliver kmsg logs together with machine specifics over the network.
>> >
>> > We already have the ability to send oopses over the network today,
>> > through the network consolst stuff. What does this patch set do that is
>> > different from our existing stuff that warrants such a big change?
>> >
>>
>> Hi Greg,
>>
>> I am a little familiar with the netconsole suppport. ?I should have
>> added a comparison to the cover email :(
>>
>> We never adopted netconsole for a couple different reasons. ?The
>> reasons have slightly changed over the years, but even today we find
>> that it isn't a substitute for netoops' semantics.
>
> Ah, but it sounds like it would be better to fix up netoops to handle
> your needs.

Perhaps, but I don't agree that overly complicating one simple driver
to handle a wildly different set of semantics is justified when it can
be implemented in another simple driver.

>
>> With the number of machines we have, streaming large amounts of
>> consoles within the data center can really add up. ?This gets worse
>> when you take into account how reliant we are on kernel logging like
>> OOM conditions (which are very regular and very verbose). ?Events in
>> the data center (such as application growth) tend to be temporally
>> correlated, which causes large bursts of logging when we are OOM. ?We
>> aren't so interested in this kernel verbosity from a global collection
>> standpoint though, and haven't been keen on the amount of extra
>> un-regulated UDP traffic it would generate. ?We are however interested
>> in kernel oopses though (which occur far less often).
>
> Understood, I'm sure that a change to allow this to the existing netoops
> code would be appreciated by many.

Do you mean netconsole here?

>
>> In terms of the data received, we've really benefited by having
>> structured data in the payload.
>
> I bet the whole world would benefit by having the oops messages in a
> more "structured" manner. ?We have done changes in the past to provide
> this type of thing in a "more parsable" manner, to help stuff like
> kerneloops.org. ?I'm sure that adding this type of information to the
> main oops core/messages would be a good overall goal, instead of only
> having it available to only this one option/user, right?

It'd be great to figure out how to make this sort of data structured
in the logs, but it's not really a battle that I'm willing to fight.
It seems that as a community, we haven't figured out a good way to
provide any sort of committed ABI via printk (perhaps rightly so).

Another aspect that makes me _not_ want to go this route is the
pushback we've received in the past about not having too much data get
added to the oops messages themselves, as folks still relied on
literal screenshots of oopses.

>
>> Another area where the two approaches have differed has been in
>> handling of network reliability. ?Historically (though less and less
>> now), we found that we had to transmit data several times. ?We also
>> used to explicitly space out packets with delays to handle switch chip
>> buffer overruns. ?Both of these functions I presume could be added to
>> netconsole without too much of a problem.
>
> Yes, I agree netconsole would be good to get this type of change.
>
>> Lastly, this patchset also introduces a 'one-shot' mode, which has
>> saved our bacon several times in the past as well. ?It's not totally
>> uncommon for the kernel's crash path to be buggy, in turn causing the
>> kernel to emit Oopses until the cows come home (or rather, until the
>> hardware watchdogs trip). ?One-shot keeps us from emitting too much
>> garbage on the network when this happens.
>
> I thought we had something like "only show the first oops" somewhere in
> the kernel, perhaps I'm just imagining things...
>
> If I am, adding this for all oopses would also be good.

Well, there is always panic_on_oops, which we use, but that alone
doesn't solve the recursive oops problem.

>
>> I hope the above comparison of semantics outlines the motivations we
>> have for not using netconsole and favoring an approach like that used
>> in netoops :)
>
> I think you have just convinced me that you should add this type of
> functionality for all oops messages even more, instead of only doing it
> for your one type of oops transport :)

I think you are heavily discounting the value of structured data :(
It also doesn't help that printk's aren't transactional to each other
when oopsing (which essentially makes concurrently oopsing CPUs shit
all over each other in the logs, leading to unparseable data). You
are welcome to push the rock up hill for structured log data though ;)

> As for the user/kernel interface, perhaps exporting the data in a text
> format that is "tagged" would be best? ?Then the whole world can parse
> it easily.

FWIW, another semantic difference between netconsole and netoops (that
I had missed in the last email) is filtering: we really do want to get
the whole log when a crash happens, debug messages and all.
Netconsole is subject to console filtering (which we _do_ want as
debug messages going out the uart slows the whole world down).

netconsole and netoops _do_ have bits in common, for instance the
handling of NETDEV events and source+target configuration. I'd rather
those bits become common between the two than figure out how to jam
the semantics we need into netconsole.

2010-11-03 19:40:33

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, 2010-11-03 at 11:50 -0700, Randy Dunlap wrote:
> On Wed, 3 Nov 2010 11:16:34 -0700 Greg KH wrote:
> > On Tue, Nov 02, 2010 at 08:37:42PM -0700, Mike Waychison wrote:
> > > On Tue, Nov 2, 2010 at 7:34 PM, Greg KH <[email protected]> wrote:
> > As for the user/kernel interface, perhaps exporting the data in a text
> > format that is "tagged" would be best? Then the whole world can parse
> > it easily.
> I have been (occasionally) looking at critical kernel messages.
> IMO we really need an easy way to find them.
> They can begin with any of these strings (and others can be added
> too easily):
> BUG|panic|MCE|NMI|error:|Oops|Bad|Fatal|Unrecoverable|Unhandled|Weird
> We need a simple (single?) tagging method to identify any/all of these,
> /methinks.

Simply marking these messages KERN_CRIT or higher would seem to work.

2010-11-03 20:29:13

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

Mike Waychison wrote:
> FWIW, another semantic difference between netconsole and netoops (that
> I had missed in the last email) is filtering: we really do want to get
> the whole log when a crash happens, debug messages and all.
> Netconsole is subject to console filtering (which we _do_ want as
> debug messages going out the uart slows the whole world down).
>
> netconsole and netoops _do_ have bits in common, for instance the
> handling of NETDEV events and source+target configuration. I'd rather
> those bits become common between the two than figure out how to jam
> the semantics we need into netconsole.

Hi Matt,

I've been reading through the netconsole driver in response to Greg's
comments on this thread, and it is definitely more robust in terms of
configuration and handling of network device events than the netoops
driver I proposed.

What are your thoughts on extending netconsole with the same sort of
semantics that are in the netoops patchset?

I'd still like to have blit-dmesg-to-the-network-on-oops semantics,
which seems doable by having a per-target flag for streaming of console
messages (enabled by default) and a flag to emit a structured full dmesg
dump (disabled by default).

I'd be happy to change this patchset to modify the netconsole driver if
you agree.

Thanks,

Mike Waychison

2010-11-03 20:54:49

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
> Mike Waychison wrote:
> > FWIW, another semantic difference between netconsole and netoops (that
> > I had missed in the last email) is filtering: we really do want to get
> > the whole log when a crash happens, debug messages and all.
> > Netconsole is subject to console filtering (which we _do_ want as
> > debug messages going out the uart slows the whole world down).
> >
> > netconsole and netoops _do_ have bits in common, for instance the
> > handling of NETDEV events and source+target configuration. I'd rather
> > those bits become common between the two than figure out how to jam
> > the semantics we need into netconsole.
>
> Hi Matt,
>
> I've been reading through the netconsole driver in response to Greg's
> comments on this thread, and it is definitely more robust in terms of
> configuration and handling of network device events than the netoops
> driver I proposed.

I've been following the discussion to see if it went anywhere
interesting..

> What are your thoughts on extending netconsole with the same sort of
> semantics that are in the netoops patchset?

My first thought is that it's a bit unfortunate that some of the the
netconsole configgy bits weren't implemented in a generic way that would
be applicable to other netpoll clients. Some people have never gotten it
into their heads that netconsole isn't the only client.

> I'd still like to have blit-dmesg-to-the-network-on-oops semantics,
> which seems doable by having a per-target flag for streaming of console
> messages (enabled by default) and a flag to emit a structured full dmesg
> dump (disabled by default).

I'd actually like to see you go forward with netoops. It's clear to me
that it's a different beast and complexifying netconsole with a bunch of
weird new options doesn't really sit well. If that means abstracting
some of the sysfs crap from netconsole, great.

That said, I don't think netoops is an ideal name, given how closely
bound oops _events_ are with their textual output. Presumably it covers
events other than oopsen like panics too.

Regarding rolling oopses: lots of machines regularly survive oopses, so
I think you ought to consider rate-limiting them (to a configurable rate
with a very low default) rather than suppressing all but the first.

--
Mathematics is the supreme nostalgia of our time.

2010-11-03 20:59:01

by Bruno Prémont

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, 03 November 2010 Joe Perches wrote:
> On Wed, 2010-11-03 at 11:50 -0700, Randy Dunlap wrote:
> > On Wed, 3 Nov 2010 11:16:34 -0700 Greg KH wrote:
> > > On Tue, Nov 02, 2010 at 08:37:42PM -0700, Mike Waychison wrote:
> > > > On Tue, Nov 2, 2010 at 7:34 PM, Greg KH wrote:
> > > As for the user/kernel interface, perhaps exporting the data in a text
> > > format that is "tagged" would be best? Then the whole world can parse
> > > it easily.
> > I have been (occasionally) looking at critical kernel messages.
> > IMO we really need an easy way to find them.
> > They can begin with any of these strings (and others can be added
> > too easily):
> > BUG|panic|MCE|NMI|error:|Oops|Bad|Fatal|Unrecoverable|Unhandled|Weird
> > We need a simple (single?) tagging method to identify any/all of these,
> > /methinks.
>
> Simply marking these messages KERN_CRIT or higher would seem to work.

That's an idea (probably better higher that can be enabled on a console with
a specific flag), though it also invites for filter levels on a per-console
base (e.g. let uart only show KERN_WARN..KERN_CRIT but having more verbosity
on termial or netconsole [per target]).

I've been playing with this last winter, need to dig out that code again and
clean it up for sharing.

Bruno

2010-11-04 01:18:55

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

Matt Mackall wrote:
> On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>> Mike Waychison wrote:
>>> FWIW, another semantic difference between netconsole and netoops (that
>>> I had missed in the last email) is filtering: we really do want to get
>>> the whole log when a crash happens, debug messages and all.
>>> Netconsole is subject to console filtering (which we _do_ want as
>>> debug messages going out the uart slows the whole world down).
>>>
>>> netconsole and netoops _do_ have bits in common, for instance the
>>> handling of NETDEV events and source+target configuration. I'd rather
>>> those bits become common between the two than figure out how to jam
>>> the semantics we need into netconsole.
>> Hi Matt,
>>
>> I've been reading through the netconsole driver in response to Greg's
>> comments on this thread, and it is definitely more robust in terms of
>> configuration and handling of network device events than the netoops
>> driver I proposed.
>
> I've been following the discussion to see if it went anywhere
> interesting..
>
>> What are your thoughts on extending netconsole with the same sort of
>> semantics that are in the netoops patchset?
>
> My first thought is that it's a bit unfortunate that some of the the
> netconsole configgy bits weren't implemented in a generic way that would
> be applicable to other netpoll clients. Some people have never gotten it
> into their heads that netconsole isn't the only client.
>
>> I'd still like to have blit-dmesg-to-the-network-on-oops semantics,
>> which seems doable by having a per-target flag for streaming of console
>> messages (enabled by default) and a flag to emit a structured full dmesg
>> dump (disabled by default).
>
> I'd actually like to see you go forward with netoops. It's clear to me
> that it's a different beast and complexifying netconsole with a bunch of
> weird new options doesn't really sit well. If that means abstracting
> some of the sysfs crap from netconsole, great.

I'd be happy to take a stab at this. This solves most of the ABI
reservations that I have with this v1 patchset.

Looking at netconsole, it looks to lack some locking for data
consistency, and it appears that we will deadlock if we ever get a
NETDEV_UNREGISTER event (due to recursively grabbing the rtnl in
netpoll_cleanup). I have a couple patches I've been hacking on this
afternoon that should clear those issues up.

I'm thinking of pushing all the target handling options down into
net/core/netpoll.c. I'll probably expose this interface as "struct
netpoll_targets" where ->lock and ->list could be completely exposed to
clients. netconsole would then get a lot smaller as would netoops.

> That said, I don't think netoops is an ideal name, given how closely
> bound oops _events_ are with their textual output. Presumably it covers
> events other than oopsen like panics too.

True. We call this code 'netdump' or 'network_dumper' internally, but I
figured it'd be better to follow current conventions with ramoops and
mtdoops already in the tree. I don't really care what it's called in
the end :)

>
> Regarding rolling oopses: lots of machines regularly survive oopses, so
> I think you ought to consider rate-limiting them (to a configurable rate
> with a very low default) rather than suppressing all but the first.
>

The trouble with Oopses is just that: We don't know whether we can
safely survive them or not and it's a total gamble each time we do Oops.
We can't programmatically know how crapped out the machine is, so
historically we've erred on not allowing bad things to continue
happening once someone notices something wrong.

It's easier for us to just shoot the machine in the head (panic_on_oops)
and move on than corrupt data or dead-lock in weird ways at some later
point in time. This is definitely not the behaviour I would want nor
expect from my desktop or phone, but for the cluster, it's just safer.

2010-11-04 06:11:06

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, Nov 03, 2010 at 03:54:45PM -0500, Matt Mackall wrote:
>On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>> Mike Waychison wrote:
>> > FWIW, another semantic difference between netconsole and netoops (that
>> > I had missed in the last email) is filtering: we really do want to get
>> > the whole log when a crash happens, debug messages and all.
>> > Netconsole is subject to console filtering (which we _do_ want as
>> > debug messages going out the uart slows the whole world down).
>> >
>> > netconsole and netoops _do_ have bits in common, for instance the
>> > handling of NETDEV events and source+target configuration. I'd rather
>> > those bits become common between the two than figure out how to jam
>> > the semantics we need into netconsole.
>>
>> Hi Matt,
>>
>> I've been reading through the netconsole driver in response to Greg's
>> comments on this thread, and it is definitely more robust in terms of
>> configuration and handling of network device events than the netoops
>> driver I proposed.
>
>I've been following the discussion to see if it went anywhere
>interesting..
>
>> What are your thoughts on extending netconsole with the same sort of
>> semantics that are in the netoops patchset?
>
>My first thought is that it's a bit unfortunate that some of the the
>netconsole configgy bits weren't implemented in a generic way that would
>be applicable to other netpoll clients. Some people have never gotten it
>into their heads that netconsole isn't the only client.
>

Really? What are other clients? I remember netdump *was* one client,
but it is not in upstream and is deprecated, so netconsole is the only
client in tree, AFAIK.


>> I'd still like to have blit-dmesg-to-the-network-on-oops semantics,
>> which seems doable by having a per-target flag for streaming of console
>> messages (enabled by default) and a flag to emit a structured full dmesg
>> dump (disabled by default).
>
>I'd actually like to see you go forward with netoops. It's clear to me
>that it's a different beast and complexifying netconsole with a bunch of
>weird new options doesn't really sit well. If that means abstracting
>some of the sysfs crap from netconsole, great.
>

That would be good.


>That said, I don't think netoops is an ideal name, given how closely
>bound oops _events_ are with their textual output. Presumably it covers
>events other than oopsen like panics too.
>
>Regarding rolling oopses: lots of machines regularly survive oopses, so
>I think you ought to consider rate-limiting them (to a configurable rate
>with a very low default) rather than suppressing all but the first.
>

We have WARN_ONCE(), maybe we can make one oops_once()...
At least, that is not hard.

2010-11-04 06:30:31

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

On Wed, Nov 03, 2010 at 06:18:41PM -0700, Mike Waychison wrote:
>Matt Mackall wrote:
>>On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>>>Mike Waychison wrote:
>>>>FWIW, another semantic difference between netconsole and netoops (that
>>>>I had missed in the last email) is filtering: we really do want to get
>>>>the whole log when a crash happens, debug messages and all.
>>>>Netconsole is subject to console filtering (which we _do_ want as
>>>>debug messages going out the uart slows the whole world down).
>>>>
>>>>netconsole and netoops _do_ have bits in common, for instance the
>>>>handling of NETDEV events and source+target configuration. I'd rather
>>>>those bits become common between the two than figure out how to jam
>>>>the semantics we need into netconsole.
>>>Hi Matt,
>>>
>>>I've been reading through the netconsole driver in response to
>>>Greg's comments on this thread, and it is definitely more robust
>>>in terms of configuration and handling of network device events
>>>than the netoops driver I proposed.
>>
>>I've been following the discussion to see if it went anywhere
>>interesting..
>>
>>>What are your thoughts on extending netconsole with the same sort
>>>of semantics that are in the netoops patchset?
>>
>>My first thought is that it's a bit unfortunate that some of the the
>>netconsole configgy bits weren't implemented in a generic way that would
>>be applicable to other netpoll clients. Some people have never gotten it
>>into their heads that netconsole isn't the only client.
>>
>>>I'd still like to have blit-dmesg-to-the-network-on-oops
>>>semantics, which seems doable by having a per-target flag for
>>>streaming of console messages (enabled by default) and a flag to
>>>emit a structured full dmesg dump (disabled by default).
>>
>>I'd actually like to see you go forward with netoops. It's clear to me
>>that it's a different beast and complexifying netconsole with a bunch of
>>weird new options doesn't really sit well. If that means abstracting
>>some of the sysfs crap from netconsole, great.
>
>I'd be happy to take a stab at this. This solves most of the ABI
>reservations that I have with this v1 patchset.
>
>Looking at netconsole, it looks to lack some locking for data
>consistency, and it appears that we will deadlock if we ever get a
>NETDEV_UNREGISTER event (due to recursively grabbing the rtnl in
>netpoll_cleanup). I have a couple patches I've been hacking on this
>afternoon that should clear those issues up.
>


You might want to look at net-next-2.6, it has some fixes
from Neil.


>I'm thinking of pushing all the target handling options down into
>net/core/netpoll.c. I'll probably expose this interface as "struct
>netpoll_targets" where ->lock and ->list could be completely exposed
>to clients. netconsole would then get a lot smaller as would
>netoops.
>
>>That said, I don't think netoops is an ideal name, given how closely
>>bound oops _events_ are with their textual output. Presumably it covers
>>events other than oopsen like panics too.
>
>True. We call this code 'netdump' or 'network_dumper' internally,
>but I figured it'd be better to follow current conventions with
>ramoops and mtdoops already in the tree. I don't really care what
>it's called in the end :)
>


"netdump" was used by a utility that do crash dumping over net.
It is deprecated now, since we have kdump.

>>
>>Regarding rolling oopses: lots of machines regularly survive
>>oopses, so I think you ought to consider rate-limiting them (to a
>>configurable rate
>>with a very low default) rather than suppressing all but the first.
>>
>
>The trouble with Oopses is just that: We don't know whether we can
>safely survive them or not and it's a total gamble each time we do
>Oops. We can't programmatically know how crapped out the machine is,
>so historically we've erred on not allowing bad things to continue
>happening once someone notices something wrong.
>
>It's easier for us to just shoot the machine in the head
>(panic_on_oops) and move on than corrupt data or dead-lock in weird
>ways at some later point in time. This is definitely not the
>behaviour I would want nor expect from my desktop or phone, but for
>the cluster, it's just safer.

We also have pause_on_oops, or we can invent a oops_once.

2010-11-04 17:21:23

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

Am?rico Wang wrote:
> On Wed, Nov 03, 2010 at 03:54:45PM -0500, Matt Mackall wrote:
>> On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>>> Mike Waychison wrote:
>>>> FWIW, another semantic difference between netconsole and netoops (that
>>>> I had missed in the last email) is filtering: we really do want to get
>>>> the whole log when a crash happens, debug messages and all.
>>>> Netconsole is subject to console filtering (which we _do_ want as
>>>> debug messages going out the uart slows the whole world down).
>>>>
>>>> netconsole and netoops _do_ have bits in common, for instance the
>>>> handling of NETDEV events and source+target configuration. I'd rather
>>>> those bits become common between the two than figure out how to jam
>>>> the semantics we need into netconsole.
>>> Hi Matt,
>>>
>>> I've been reading through the netconsole driver in response to Greg's
>>> comments on this thread, and it is definitely more robust in terms of
>>> configuration and handling of network device events than the netoops
>>> driver I proposed.
>> I've been following the discussion to see if it went anywhere
>> interesting..
>>
>>> What are your thoughts on extending netconsole with the same sort of
>>> semantics that are in the netoops patchset?
>> My first thought is that it's a bit unfortunate that some of the the
>> netconsole configgy bits weren't implemented in a generic way that would
>> be applicable to other netpoll clients. Some people have never gotten it
>> into their heads that netconsole isn't the only client.
>>
>
> Really? What are other clients? I remember netdump *was* one client,
> but it is not in upstream and is deprecated, so netconsole is the only
> client in tree, AFAIK.

I see the bonding and bridging drivers using netpoll.

>
>
>>> I'd still like to have blit-dmesg-to-the-network-on-oops semantics,
>>> which seems doable by having a per-target flag for streaming of console
>>> messages (enabled by default) and a flag to emit a structured full dmesg
>>> dump (disabled by default).
>> I'd actually like to see you go forward with netoops. It's clear to me
>> that it's a different beast and complexifying netconsole with a bunch of
>> weird new options doesn't really sit well. If that means abstracting
>> some of the sysfs crap from netconsole, great.
>>
>
> That would be good.
>
>
>> That said, I don't think netoops is an ideal name, given how closely
>> bound oops _events_ are with their textual output. Presumably it covers
>> events other than oopsen like panics too.
>>
>> Regarding rolling oopses: lots of machines regularly survive oopses, so
>> I think you ought to consider rate-limiting them (to a configurable rate
>> with a very low default) rather than suppressing all but the first.
>>
>
> We have WARN_ONCE(), maybe we can make one oops_once()...
> At least, that is not hard.

2010-11-04 17:38:23

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH v1 00/12] netoops support

Am?rico Wang wrote:
> On Wed, Nov 03, 2010 at 06:18:41PM -0700, Mike Waychison wrote:
>> Matt Mackall wrote:
>>> On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>>>> Mike Waychison wrote:
>>>>> FWIW, another semantic difference between netconsole and netoops (that
>>>>> I had missed in the last email) is filtering: we really do want to get
>>>>> the whole log when a crash happens, debug messages and all.
>>>>> Netconsole is subject to console filtering (which we _do_ want as
>>>>> debug messages going out the uart slows the whole world down).
>>>>>
>>>>> netconsole and netoops _do_ have bits in common, for instance the
>>>>> handling of NETDEV events and source+target configuration. I'd rather
>>>>> those bits become common between the two than figure out how to jam
>>>>> the semantics we need into netconsole.
>>>> Hi Matt,
>>>>
>>>> I've been reading through the netconsole driver in response to
>>>> Greg's comments on this thread, and it is definitely more robust
>>>> in terms of configuration and handling of network device events
>>>> than the netoops driver I proposed.
>>> I've been following the discussion to see if it went anywhere
>>> interesting..
>>>
>>>> What are your thoughts on extending netconsole with the same sort
>>>> of semantics that are in the netoops patchset?
>>> My first thought is that it's a bit unfortunate that some of the the
>>> netconsole configgy bits weren't implemented in a generic way that would
>>> be applicable to other netpoll clients. Some people have never gotten it
>>> into their heads that netconsole isn't the only client.
>>>
>>>> I'd still like to have blit-dmesg-to-the-network-on-oops
>>>> semantics, which seems doable by having a per-target flag for
>>>> streaming of console messages (enabled by default) and a flag to
>>>> emit a structured full dmesg dump (disabled by default).
>>> I'd actually like to see you go forward with netoops. It's clear to me
>>> that it's a different beast and complexifying netconsole with a bunch of
>>> weird new options doesn't really sit well. If that means abstracting
>>> some of the sysfs crap from netconsole, great.
>> I'd be happy to take a stab at this. This solves most of the ABI
>> reservations that I have with this v1 patchset.
>>
>> Looking at netconsole, it looks to lack some locking for data
>> consistency, and it appears that we will deadlock if we ever get a
>> NETDEV_UNREGISTER event (due to recursively grabbing the rtnl in
>> netpoll_cleanup). I have a couple patches I've been hacking on this
>> afternoon that should clear those issues up.
>>
>
>
> You might want to look at net-next-2.6, it has some fixes
> from Neil.

Excellent, yes, 3b410a31 fixes the recursive rtnl deadlock I was
referring to.

>
>
>> I'm thinking of pushing all the target handling options down into
>> net/core/netpoll.c. I'll probably expose this interface as "struct
>> netpoll_targets" where ->lock and ->list could be completely exposed
>> to clients. netconsole would then get a lot smaller as would
>> netoops.
>>
>>> That said, I don't think netoops is an ideal name, given how closely
>>> bound oops _events_ are with their textual output. Presumably it covers
>>> events other than oopsen like panics too.
>> True. We call this code 'netdump' or 'network_dumper' internally,
>> but I figured it'd be better to follow current conventions with
>> ramoops and mtdoops already in the tree. I don't really care what
>> it's called in the end :)
>>
>
>
> "netdump" was used by a utility that do crash dumping over net.
> It is deprecated now, since we have kdump.

Yup. If you go back far enough, I think this was a gut of that code
long long ago, hence the name.

>
>>> Regarding rolling oopses: lots of machines regularly survive
>>> oopses, so I think you ought to consider rate-limiting them (to a
>>> configurable rate
>>> with a very low default) rather than suppressing all but the first.
>>>
>> The trouble with Oopses is just that: We don't know whether we can
>> safely survive them or not and it's a total gamble each time we do
>> Oops. We can't programmatically know how crapped out the machine is,
>> so historically we've erred on not allowing bad things to continue
>> happening once someone notices something wrong.
>>
>> It's easier for us to just shoot the machine in the head
>> (panic_on_oops) and move on than corrupt data or dead-lock in weird
>> ways at some later point in time. This is definitely not the
>> behaviour I would want nor expect from my desktop or phone, but for
>> the cluster, it's just safer.
>
> We also have pause_on_oops, or we can invent a oops_once.