Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932417Ab0LNWrz (ORCPT ); Tue, 14 Dec 2010 17:47:55 -0500 Received: from waste.org ([173.11.57.241]:33285 "EHLO waste.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759164Ab0LNWry (ORCPT ); Tue, 14 Dec 2010 17:47:54 -0500 Subject: Re: [PATCH v3 21/22] netoops: Add user-programmable boot_id From: Matt Mackall To: Mike Waychison Cc: simon.kagstrom@netinsight.net, davem@davemloft.net, nhorman@tuxdriver.com, adurbin@google.com, linux-kernel@vger.kernel.org, chavey@google.com, Greg KH , netdev@vger.kernel.org, =?ISO-8859-1?Q?Am=E9rico?= Wang , akpm@linux-foundation.org, linux-api@vger.kernel.org In-Reply-To: References: <20101214212846.17022.64836.stgit@mike.mtv.corp.google.com> <20101214213048.17022.58746.stgit@mike.mtv.corp.google.com> <1292362957.3446.851.camel@calx> <1292364378.3446.854.camel@calx> Content-Type: text/plain; charset="UTF-8" Date: Tue, 14 Dec 2010 16:47:44 -0600 Message-ID: <1292366864.3446.875.camel@calx> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2299 Lines: 52 On Tue, 2010-12-14 at 14:33 -0800, Mike Waychison wrote: > On Tue, Dec 14, 2010 at 2:06 PM, Matt Mackall wrote: > > On Tue, 2010-12-14 at 13:59 -0800, Mike Waychison wrote: > >> On Tue, Dec 14, 2010 at 1:42 PM, Matt Mackall wrote: > >> > On Tue, 2010-12-14 at 13:30 -0800, Mike Waychison wrote: > >> >> Add support for letting userland define a 32bit boot id. This is useful > >> >> for users to be able to correlate netoops reports to specific boot > >> >> instances offline. > >> > > >> > This sounds a lot like the pre-existing /proc/sys/kernel/random/boot_id > >> > that's used by kerneloops.org. > >> > >> Could be. I'm looking at it now... There is no documentation for this > >> boot_id field? > > > > Probably not. It's just a random number generated at boot. > > > >> Reusing this guy would work, except that it doesn't appear to allow > >> arbitrary values to be set. We need to inject our boot sequence > >> number (which is figured out in userland) in the packet somehow as we > >> need to correlate it to our other monitoring systems. > > > > What happens if you oops before userspace is available? > > > > Either one of two general cases: > - The crash is a one-off and the machine comes back. The boot > number sequence will see a hole in it, which is a clue that something > bad happened. > - The machine is in a crash loop. This has the same failure mode > for us as if the machine never made it onto the network due to > whatever reason: bad cables, bad firmware, bad ram, ... > > In both cases, we can detect that something is wrong and handle it. > Note that our firmware is responsible for incrementing the boot > sequence at bootup, which is why the above works. In general though, > our machines do make it up to userland -- staying alive once booted is > the hard part ;) Interesting. Is this Google-specific firmware magic? I'd probably accept a hook in random.c to fold a number into the UUID, which would unify things. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/