Subject: Re: [KORG] Re: kernel.org lies about latest -mm kernel
From: "J.H." <warthog9@kernel.org>
To: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@osdl.org>, Pavel Machek <pavel@ucw.cz>,
       kernel list <linux-kernel@vger.kernel.org>, hpa@zytor.com,
       webmaster@kernel.org
In-Reply-To: <458434B0.4090506@oracle.com>
References: <20061214223718.GA3816@elf.ucw.cz>
	 <20061216094421.416a271e.randy.dunlap@oracle.com>
	 <20061216095702.3e6f1d1f.akpm@osdl.org>  <458434B0.4090506@oracle.com>
Content-Type: text/plain
Date: Sat, 16 Dec 2006 11:30:34 -0800
Message-Id: <1166297434.26330.34.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3711
Lines: 86

The problem has been hashed over quite a bit recently, and I would be
curious what you would consider the real problem after you see the
situation.

The root cause boils down to with git, gitweb and the normal mirroring
on the frontend machines our basic working set no longer stays resident
in memory, which is forcing more and more to actively go to disk causing
a much higher I/O load.  You have the added problem that one of the
frontend machines is getting hit harder than the other due to several
factors: various DNS servers not round robining, people explicitly
hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
probably several other factors we aren't aware of.  This has caused the
average load on that machine to hover around 150-200 and if for whatever
reason we have to take one of the machines down the load on the
remaining machine will skyrocket to 2000+.  

Since it's apparent not everyone is aware of what we are doing, I'll
mention briefly some of the bigger points.

- We have contacted HP to see if we can get additional hardware, mind
you though this is a long term solution and will take time, but if our
request is approved it will double the number of machines kernel.org
runs.

- Gitweb is causing us no end of headache, there are (known to me
anyway) two different things happening on that.  I am looking at Jeff
Garzik's suggested caching mechanism as a temporary stop-gap, with an
eye more on doing a rather heavy re-write of gitweb itself to include
semi-intelligent caching.  I've already started in on the later - and I
just about have the caching layer put in.  But this is still at least a
week out before we could even remotely consider deploying it.

- We've cut back on the number of ftp and rsync users to the machines.
Basically we are cutting back where we can in an attempt to keep the
load from spiraling out of control, this helped a bit when we recently
had to take one of the machines down and instead of loads spiking into
the 2000+ range we peaked at about 500-600 I believe.

So we know the problem is there, and we are working on it - we are
getting e-mails about it if not daily than every other day or so.  If
there are suggestions we are willing to hear them - but the general
feeling with the admins is that we are probably hitting the biggest
problems already.

- John 'Warthog9' Hawley
Kernel.org Admin

On Sat, 2006-12-16 at 10:02 -0800, Randy Dunlap wrote:
> Andrew Morton wrote:
> > On Sat, 16 Dec 2006 09:44:21 -0800
> > Randy Dunlap <randy.dunlap@oracle.com> wrote:
> > 
> >> On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:
> >>
> >>> Hi!
> >>>
> >>> pavel@amd:/data/pavel$ finger @www.kernel.org
> >>> [zeus-pub.kernel.org]
> >>> ...
> >>> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
> >>> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
> >>> VERSION = 2
> >>> PATCHLEVEL = 6
> >>> SUBLEVEL = 19
> >>> EXTRAVERSION = -mm1
> >>> ...
> >>> pavel@amd:/data/pavel$
> >>>
> >>> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
> >>> not understand that.
> >> Still true (not listed) for 2.6.20-rc1-mm1  :(
> >>
> >> Could someone explain what the problem is and what it would
> >> take to correct it?
> > 
> > 2.6.20-rc1-mm1 still hasn't propagated out to the servers (it's been 36
> > hours).  Presumably the front page non-update is a consequence of that.
> 
> Agreed on the latter part.  Can someone address the real problem???
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/