2002-10-01 10:21:35

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

The theoretical lru race possibly spotted in the wild...

On Tuesday 01 October 2002 11:22, Richard Zidlicky wrote:
> On Sat, Sep 28, 2002 at 04:38:50PM +0200, Roman Zippel wrote:
> > Hi,
> >
> > On Wed, 25 Sep 2002, Richard Zidlicky wrote:
> >
> > > First I suspected a stale TLB entry so I've added pretty many
> > > extra flushes througout the code. This does very much reduce the
> > > risk of the problem, but the problem still happens if swapping is
> > > increased so it might very well be something else.
> > >
> > > Any ideas?
> >
> > It sounds like a cache problem. I had to fix one early 2.4, so maybe there
> > is another one.
> > It would help a lot if you could reproduce it within gdb to get some more
> > information about the context of the crash (e.g. invalid data or code, the
> > type of mapping (from /proc/<pid>/maps)).
>
> hm, I am now testing the appended patch, which is a backport to 2.4.19
> of this:
>
<<<<<<<<<<<<
>
> From: Daniel Phillips <[email protected]>
> To: Marcelo Tosatti <[email protected]>,
> "Christian Ehrhardt" <[email protected]>
> Subject: [CFT] [PATCH] LRU race fix
> Date: Tue, 17 Sep 2002 19:03:19 +0200
> X-Mailer: KMail [version 1.3.2]
> Cc: <[email protected]>
>
> This patch against 2.4.20-pre7 fixes a theoretical race where a page could
> possibly be freed while still on the lru list. The details have been
> discussed at length earlier, see "[RFC] [PATCH] Include LRU in page count".
>
> The race may not even be that theoretical, it's just so rare that when it
> does happen, it might be dismissed as a driver problem or similar...
>
> [...]
>
>>>>>>>>>>>>

> Somehow this does completely fix my problem, I have taken out all
> the tlb related hacks and the testcase that caused the problem 100%
> now runs without any sign of problems :))
>
> Now I am wondering if that is just coincidence or why m68k hit that
> error so reliably.. is it supposed to have any effect at all on
> UP?

Are you running UP+preempt?

--
Daniel