2002-10-01 14:15:07

by Richard Zidlicky

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]


>
> The theoretical lru race possibly spotted in the wild...
>
> >
> > Now I am wondering if that is just coincidence or why m68k hit that
> > error so reliably.. is it supposed to have any effect at all on
> > UP?
>
> Are you running UP+preempt?

no preempt or anything fancy, m68k vanila 2.4.19 (well almost).

Richard


2002-10-01 15:10:54

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tuesday 01 October 2002 16:20, [email protected] wrote:
> >
> > The theoretical lru race possibly spotted in the wild...
> >
> > >
> > > Now I am wondering if that is just coincidence or why m68k hit that
> > > error so reliably.. is it supposed to have any effect at all on
> > > UP?
> >
> > Are you running UP+preempt?
>
> no preempt or anything fancy, m68k vanila 2.4.19 (well almost).

I'm having real trouble spotting substantive change in the patch that
would affect a UP kernel. I believe you when you say it fixes your
problem, but we don't know why, and it is worth making some effort to
find out why.

Ah wait, I see one candidate, would you like to try:

* the page as well.
*/
if (page->buffers) {
/* avoid to free a locked page */
- get_page(page);
spin_unlock(&pagemap_lru_lock);
+ get_page(page);

and see if your bug comes back? There are a couple of other changes
that could be considered substantive by stretching one's imagination
enough, but this is the leading candidate.

Oh wait, you could also try this, a little further down:

+ page_cache_release(page);
spin_lock(&pagemap_lru_lock);
- put_page_nofree(page);

By the way, the original patch you posted was reversed and your editor
apparently took the liberty of cleaning up some whitespace in the file.
Generally, we try do avoid patch chunks that just, e.g., change bogus
spaces to tabs, and save those for official whitespace patches.

--
Daniel

2002-10-01 15:24:20

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tuesday 01 October 2002 16:20, [email protected] wrote:
> >
> > The theoretical lru race possibly spotted in the wild...
> >
> > >
> > > Now I am wondering if that is just coincidence or why m68k hit that
> > > error so reliably.. is it supposed to have any effect at all on
> > > UP?
> >
> > Are you running UP+preempt?
>
> no preempt or anything fancy, m68k vanila 2.4.19 (well almost).

Vanilla would be CONFIG_SMP=y, is that what you have? Otherwise please
disregard the post just above (which hasn't appeared on the list yet)
because spin_lock/unlock would be null, and the tests I suggested would
have no effect.

We would then be left with a *very* small number of candidates, which
we will test in accordance with the "what remains must be the truth"
principle.

--
Daniel

2002-10-01 16:53:23

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tue, 1 Oct 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 16:20, [email protected] wrote:

> > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
>
> Vanilla would be CONFIG_SMP=y, is that what you have?

Somehow I doubt Linux supports m68k SMP machines ;)

Rik
--
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/ http://distro.conectiva.com/

2002-10-01 17:10:39

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > On Tuesday 01 October 2002 16:20, [email protected] wrote:
>
> > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> >
> > Vanilla would be CONFIG_SMP=y, is that what you have?
>
> Somehow I doubt Linux supports m68k SMP machines ;)

CONFIG_SMP=y works perfectly well on single cpu machines - it forces
the spinlocks to actually exist. It's not supposed to change any
behaviour, but you never know. Behaviour is obviously changing here.

--
Daniel

2002-10-01 17:32:13

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tue, Oct 01 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> > On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 16:20, [email protected] wrote:
> >
> > > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> > >
> > > Vanilla would be CONFIG_SMP=y, is that what you have?
> >
> > Somehow I doubt Linux supports m68k SMP machines ;)
>
> CONFIG_SMP=y works perfectly well on single cpu machines - it forces
> the spinlocks to actually exist. It's not supposed to change any
> behaviour, but you never know. Behaviour is obviously changing here.

Again, m68k was the target.

--
Jens Axboe

2002-10-01 18:10:22

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tuesday 01 October 2002 20:04, Jens Axboe wrote:
> On Tue, Oct 01 2002, Daniel Phillips wrote:
> > On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > > Again, m68k was the target.
> >
> > Sure fine, no good reason to be cryptic about it though.
> >
> > #error "m68k doesn't do SMP yet"
> >
> > So SMP must be off or the compile would abort. Well, the only interesting
>
> There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
> beating of dead horse :)

The horse isn't dead yet, it's still twitching a little. At this
point we still need to speculate about wny anyone would want an SMP
Dragonball machine ;-)

--
Daniel

2002-10-01 18:17:11

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tue, 1 Oct 2002, Daniel Phillips wrote:

> The horse isn't dead yet, it's still twitching a little. At this
> point we still need to speculate about wny anyone would want an
> SMP Dragonball machine ;-)

I've seen an SMP 68k box, a DIAB DATA machine. I think
Bull shipped them, too.

What is that coloured spot on the pavement ?
Could it be a horse died there, long ago ?
Now, stop beating the pavement.

cheers,

Rik
--
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/ http://distro.conectiva.com/

2002-10-01 18:03:23

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tue, Oct 01 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > On Tue, Oct 01 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> > > > On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > > > > On Tuesday 01 October 2002 16:20, [email protected] wrote:
> > > > > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> > > > >
> > > > > Vanilla would be CONFIG_SMP=y, is that what you have?
> > > >
> > > > Somehow I doubt Linux supports m68k SMP machines ;)
> > >
> > > CONFIG_SMP=y works perfectly well on single cpu machines - it forces
> > > the spinlocks to actually exist. It's not supposed to change any
> > > behaviour, but you never know. Behaviour is obviously changing here.
> >
> > Again, m68k was the target.
>
> Sure fine, no good reason to be cryptic about it though.
>
> #error "m68k doesn't do SMP yet"
>
> So SMP must be off or the compile would abort. Well, the only interesting

There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
beating of dead horse :)

--
Jens Axboe

2002-10-01 17:59:38

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> On Tue, Oct 01 2002, Daniel Phillips wrote:
> > On Tuesday 01 October 2002 18:56, Rik van Riel wrote:
> > > On Tue, 1 Oct 2002, Daniel Phillips wrote:
> > > > On Tuesday 01 October 2002 16:20, [email protected] wrote:
> > > > > no preempt or anything fancy, m68k vanila 2.4.19 (well almost).
> > > >
> > > > Vanilla would be CONFIG_SMP=y, is that what you have?
> > >
> > > Somehow I doubt Linux supports m68k SMP machines ;)
> >
> > CONFIG_SMP=y works perfectly well on single cpu machines - it forces
> > the spinlocks to actually exist. It's not supposed to change any
> > behaviour, but you never know. Behaviour is obviously changing here.
>
> Again, m68k was the target.

Sure fine, no good reason to be cryptic about it though.

#error "m68k doesn't do SMP yet"

So SMP must be off or the compile would abort. Well, the only interesting
difference remaining is the extra count for the LRU. I actually had that
parameterized at one time so you could turn it on/off easily, but akpm
complained about #ifdef's so I took that out ;-)

Richard, before I go making a test patch for you (it's not completely
straightforward) can you confirm that your bug comes back when you back
the lru race patch out?

--
Daniel

2002-10-02 10:07:34

by Richard Zidlicky

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tue, Oct 01, 2002 at 08:01:10PM +0200, Daniel Phillips wrote:

> Richard, before I go making a test patch for you (it's not completely
> straightforward) can you confirm that your bug comes back when you back
> the lru race patch out?

bad luck, the disappearance of the bug was rather accidental - I have
switched to a different swap partition in the meantime. So backing out
the changes doesn't make the bug reappear, restoring previous IDE
configuration does.

Very likely interrupt related trouble, somewhere a missing spinlock_irqsave
perhaps. The bug manifests itself so that pages from wrong procesess get
swapped in for some process, however I have also had the luck to crash the
kernel (no Oops) so it is not likely to be one of the TLB/cache problems.

What strikes me is that is always related to swap and I 've never got any
strange dmesg so far.

Richard

2002-10-02 12:07:50

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: 2.4 mm trouble [possible lru race]

On Tue, 1 Oct 2002, Daniel Phillips wrote:
> On Tuesday 01 October 2002 20:04, Jens Axboe wrote:
> > On Tue, Oct 01 2002, Daniel Phillips wrote:
> > > On Tuesday 01 October 2002 19:31, Jens Axboe wrote:
> > > > Again, m68k was the target.
> > >
> > > Sure fine, no good reason to be cryptic about it though.
> > >
> > > #error "m68k doesn't do SMP yet"
> > >
> > > So SMP must be off or the compile would abort. Well, the only interesting
> >
> > There's no CONFIG_SMP in the m68k arch config.in. Anyways, enough
> > beating of dead horse :)
>
> The horse isn't dead yet, it's still twitching a little. At this
> point we still need to speculate about wny anyone would want an SMP
> Dragonball machine ;-)

Dragonballs don't have an MMU, so they would run uClinux/m68k, not Linux/m68k.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds