2004-03-28 12:26:24

by Måns Rullgård

[permalink] [raw]
Subject: status of Linux on Alpha?

There was a thread a while ago about some odd problems with 2.6.4 on
Alpha. Were those problems ever resolved? Is anything past 2.6.3
stable on Alpha?

--
M?ns Rullg?rd
[email protected]


2004-03-28 16:17:45

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Sun, Mar 28, 2004 at 02:26:15PM +0200, M?ns Rullg?rd wrote:
> There was a thread a while ago about some odd problems with 2.6.4 on
> Alpha. Were those problems ever resolved?

No idea. I wasn't able to reproduce them. Perhaps it has something
to do with particular drivers (raid, XFS or something else).

> Is anything past 2.6.3 stable on Alpha?

There was nothing special about 2.6.4. Those problems must be present
in 2.6.3 and earlier kernels as well.

Ivan.

2004-03-28 16:19:16

by Måns Rullgård

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Ivan Kokshaysky <[email protected]> writes:

> On Sun, Mar 28, 2004 at 02:26:15PM +0200, M?ns Rullg?rd wrote:
>> There was a thread a while ago about some odd problems with 2.6.4 on
>> Alpha. Were those problems ever resolved?
>
> No idea. I wasn't able to reproduce them. Perhaps it has something
> to do with particular drivers (raid, XFS or something else).

Well, I'm using both raid and xfs...

>> Is anything past 2.6.3 stable on Alpha?
>
> There was nothing special about 2.6.4. Those problems must be present
> in 2.6.3 and earlier kernels as well.

So you're saying that if 2.6.3 is stable, 2.6.4 and later should be
fine too?

--
M?ns Rullg?rd
[email protected]

2004-03-28 16:43:35

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Sun, Mar 28, 2004 at 06:19:10PM +0200, M?ns Rullg?rd wrote:
> Well, I'm using both raid and xfs...

OK, good to know.

> So you're saying that if 2.6.3 is stable, 2.6.4 and later should be
> fine too?

I haven't tried 2.6.5 yet, but with 2.6.4 couple of my boxes have
16 days uptime and no problems so far.

Ivan.

2004-03-28 20:18:14

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hi Ivan, Hi M?ns

I haven't found the time to look deeper into the problem. All what I can
say ATM is, it is real and exists!

Ivan, perhaps you can give me some useful advise how to debug this
successfully? I think we should begin to try isolating the problem to a
single part of the kernel.
Is it possible that we have a deadlock in the VFS part? After a
while every process that accesses a file will be blocked (already
described).

Regards

Marc

On Sun, 28 Mar 2004 20:43:08 +0400
Ivan Kokshaysky <[email protected]> wrote:

> On Sun, Mar 28, 2004 at 06:19:10PM +0200, M?ns Rullg?rd wrote:
> > Well, I'm using both raid and xfs...
>
> OK, good to know.
>
> > So you're saying that if 2.6.3 is stable, 2.6.4 and later should be
> > fine too?
>
> I haven't tried 2.6.5 yet, but with 2.6.4 couple of my boxes have
> 16 days uptime and no problems so far.
>
> Ivan.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-03-28 20:39:04

by Måns Rullgård

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Marc Giger <[email protected]> writes:

> Hi Ivan, Hi M?ns
>
> I haven't found the time to look deeper into the problem. All what I can
> say ATM is, it is real and exists!
>
> Ivan, perhaps you can give me some useful advise how to debug this
> successfully? I think we should begin to try isolating the problem to a
> single part of the kernel.
> Is it possible that we have a deadlock in the VFS part? After a
> while every process that accesses a file will be blocked (already
> described).

We could start by comparing .config files. Mine is attached. I've
been running a 2.6.3 kernel with that configuration since it was
released. I compiled a gentoo installation using that kernel, so I'd
say it's quite stable.

--
M?ns Rullg?rd
[email protected]


Attachments:
config.gz (4.12 kB)

2004-03-29 18:53:38

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

>
> We could start by comparing .config files. Mine is attached. I've
> been running a 2.6.3 kernel with that configuration since it was
> released. I compiled a gentoo installation using that kernel, so I'd
> say it's quite stable.

Ok, I've attached my config. I will take some time this week to debug
this problem.
Firstly, I will try out 2.6.3 and see what happens. I think that's the
best thing that I can do ATM. If the problem doesn't exist with 2.6.3 on
my alpha then we know where to search for.

Regards from Switzerland

Marc


Attachments:
config.tar.gz (5.94 kB)

2004-04-04 10:10:35

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hi Ivan, Hi M?ns

I've tested 2.6.3 on my alpha. It seems to be working fine. I couldn't
trigger the problems that I had with 2.6.4.

So I will revert some patches witch I think could be the reason.

greets

Marc

On Mon, 29 Mar 2004 20:52:33 +0200
Marc Giger <[email protected]> wrote:

> >
> > We could start by comparing .config files. Mine is attached. I've
> > been running a 2.6.3 kernel with that configuration since it was
> > released. I compiled a gentoo installation using that kernel, so
> > I'd say it's quite stable.
>
> Ok, I've attached my config. I will take some time this week to debug
> this problem.
> Firstly, I will try out 2.6.3 and see what happens. I think that's the
> best thing that I can do ATM. If the problem doesn't exist with 2.6.3
> on my alpha then we know where to search for.
>
> Regards from Switzerland
>
> Marc
>

2004-04-09 11:45:55

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hello there,

Presently, I reached a stage on which I don't know longer what to do:-(
I isolated the problem between 2.6.3-rc1 and 2.6.3-rc2. I
also reverted 1.1608.56.1 , 1.1608.51.36 and all xfs related patches
from rc2 with no luck.
All other changes seems unrelated to me.

I'm really interested to solve the problem but I need your help.

What I noticed is that a make -j10 vmlinux triggers the problem the
fastest.

Thank you

Regards

Marc


On Sun, 4 Apr 2004 12:10:32 +0200
Marc Giger <[email protected]> wrote:

> Hi Ivan, Hi M?ns
>
> I've tested 2.6.3 on my alpha. It seems to be working fine. I couldn't
> trigger the problems that I had with 2.6.4.
>
> So I will revert some patches witch I think could be the reason.
>
> greets
>
> Marc
>
> On Mon, 29 Mar 2004 20:52:33 +0200
> Marc Giger <[email protected]> wrote:
>
> > >
> > > We could start by comparing .config files. Mine is attached.
> > > I've been running a 2.6.3 kernel with that configuration since it
> > > was released. I compiled a gentoo installation using that kernel,
> > > so I'd say it's quite stable.
> >
> > Ok, I've attached my config. I will take some time this week to
> > debug this problem.
> > Firstly, I will try out 2.6.3 and see what happens. I think that's
> > the best thing that I can do ATM. If the problem doesn't exist with
> > 2.6.3 on my alpha then we know where to search for.
> >
> > Regards from Switzerland
> >
> > Marc
> >
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-04-09 11:48:36

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Fri, 9 Apr 2004 13:45:34 +0200
Marc Giger <[email protected]> wrote:

> Hello there,
>
> Presently, I reached a stage on which I don't know longer what to
> do:-( I isolated the problem between 2.6.3-rc1 and 2.6.3-rc2. I
^^^^^^^^^^^^^^^^^^^^^^^
read as 2.6.4-rc1 and 2.6.4-rc2

> also reverted 1.1608.56.1 , 1.1608.51.36 and all xfs related patches
> from rc2 with no luck.
> All other changes seems unrelated to me.
>
> I'm really interested to solve the problem but I need your help.
>
> What I noticed is that a make -j10 vmlinux triggers the problem the
> fastest.
>
> Thank you
>
> Regards
>
> Marc
>
>
> On Sun, 4 Apr 2004 12:10:32 +0200
> Marc Giger <[email protected]> wrote:
>
> > Hi Ivan, Hi M?ns
> >
> > I've tested 2.6.3 on my alpha. It seems to be working fine. I
> > couldn't trigger the problems that I had with 2.6.4.
> >
> > So I will revert some patches witch I think could be the reason.
> >
> > greets
> >
> > Marc
> >
> > On Mon, 29 Mar 2004 20:52:33 +0200
> > Marc Giger <[email protected]> wrote:
> >
> > > >
> > > > We could start by comparing .config files. Mine is attached.
> > > > I've been running a 2.6.3 kernel with that configuration since
> > > > it was released. I compiled a gentoo installation using that
> > > > kernel, so I'd say it's quite stable.
> > >
> > > Ok, I've attached my config. I will take some time this week to
> > > debug this problem.
> > > Firstly, I will try out 2.6.3 and see what happens. I think that's
> > > the best thing that I can do ATM. If the problem doesn't exist
> > > with 2.6.3 on my alpha then we know where to search for.
> > >
> > > Regards from Switzerland
> > >
> > > Marc
> > >
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>

2004-04-09 19:07:14

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Fri, Apr 09, 2004 at 01:48:28PM +0200, Marc Giger wrote:
> > Presently, I reached a stage on which I don't know longer what to
> > do:-( I isolated the problem between 2.6.3-rc1 and 2.6.3-rc2. I
> ^^^^^^^^^^^^^^^^^^^^^^^
> read as 2.6.4-rc1 and 2.6.4-rc2

Thanks for your work.

> > also reverted 1.1608.56.1 , 1.1608.51.36 and all xfs related patches
> > from rc2 with no luck.
> > All other changes seems unrelated to me.

I'd also revert 1.1608.51.22 and all networking changes.

Ivan.

2004-04-13 17:49:23

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hi Ivan, All

First, sorry for the cross posting...

After long sessions of patching, recompiling, and testing I finally
found the cause of my problems. XFS people, please read:
http://marc.theaimsgroup.com/?l=linux-kernel&m=108047692409817&w=2
and
http://marc.theaimsgroup.com/?l=linux-kernel&m=107910319729364&w=2

After reverting 1.1608.29.12 all is fine again.
Interestingly, this patch was listed on bkbits between 2.6.3
and 2.6.4-rc1 but was added to the source tree between 2.6.4-rc1 and
2.6.4-rc2 :-( Again something learned for the future...

Ivan, I think your new semaphore code is still ok because it doesn't
matter if it is the new or old code. Both versions have a problem with
the xfs-patch.

For further questions you know how to reach me:-)

greets

Marc


On Fri, 9 Apr 2004 23:06:51 +0400
Ivan Kokshaysky <[email protected]> wrote:

> On Fri, Apr 09, 2004 at 01:48:28PM +0200, Marc Giger wrote:
> > > Presently, I reached a stage on which I don't know longer what to
> > > do:-( I isolated the problem between 2.6.3-rc1 and 2.6.3-rc2. I
> > ^^^^^^^^^^^^^^^^^^^^^^^
> > read as 2.6.4-rc1 and 2.6.4-rc2
>
> Thanks for your work.
>
> > > also reverted 1.1608.56.1 , 1.1608.51.36 and all xfs related
> > > patches from rc2 with no luck.
> > > All other changes seems unrelated to me.
>
> I'd also revert 1.1608.51.22 and all networking changes.
>
> Ivan.
>

2004-04-27 16:51:45

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hi,

What's the current status of the problem? Is nobody interested to fix
it, or am I just impatient? Did I provide not enough information?
I'm running 2.6.5 with the reverted patch for 2 weeks now without any
problems.

Regards

Marc

On Tue, 13 Apr 2004 19:49:07 +0200
Marc Giger <[email protected]> wrote:

> Hi Ivan, All
>
> First, sorry for the cross posting...
>
> After long sessions of patching, recompiling, and testing I finally
> found the cause of my problems. XFS people, please read:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=108047692409817&w=2
> and
> http://marc.theaimsgroup.com/?l=linux-kernel&m=107910319729364&w=2
>
> After reverting 1.1608.29.12 all is fine again.
> Interestingly, this patch was listed on bkbits between 2.6.3
> and 2.6.4-rc1 but was added to the source tree between 2.6.4-rc1 and
> 2.6.4-rc2 :-( Again something learned for the future...
>
> Ivan, I think your new semaphore code is still ok because it doesn't
> matter if it is the new or old code. Both versions have a problem with
> the xfs-patch.
>
> For further questions you know how to reach me:-)
>
> greets
>
> Marc
>
>
> On Fri, 9 Apr 2004 23:06:51 +0400
> Ivan Kokshaysky <[email protected]> wrote:
>
> > On Fri, Apr 09, 2004 at 01:48:28PM +0200, Marc Giger wrote:
> > > > Presently, I reached a stage on which I don't know longer what
> > > > to do:-( I isolated the problem between 2.6.3-rc1 and 2.6.3-rc2.
> > > > I
> > > ^^^^^^^^^^^^^^^^^^^^^^^
> > > read as 2.6.4-rc1 and 2.6.4-rc2
> >
> > Thanks for your work.
> >
> > > > also reverted 1.1608.56.1 , 1.1608.51.36 and all xfs related
> > > > patches from rc2 with no luck.
> > > > All other changes seems unrelated to me.
> >
> > I'd also revert 1.1608.51.22 and all networking changes.
> >
> > Ivan.
> >
>
>

2004-04-27 17:24:13

by Eric Sandeen

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Marc, do you have a patch associated with the changeset you found to be
the culprit?

I don't know how to get from that changeset number to a diff.

Thanks,

-Eric

On Tue, 2004-04-27 at 11:51, Marc Giger wrote:
> Hi,
>
> What's the current status of the problem? Is nobody interested to fix
> it, or am I just impatient? Did I provide not enough information?
> I'm running 2.6.5 with the reverted patch for 2 weeks now without any
> problems.
>
> Regards
>
> Marc
>

--
Eric Sandeen [C]XFS for Linux http://oss.sgi.com/projects/xfs
[email protected] SGI, Inc. 651-683-3102

2004-04-27 17:55:26

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Tue, Apr 27, 2004 at 06:51:24PM +0200, Marc Giger wrote:
> What's the current status of the problem?

Hopefully resolved - thanks to Dru <[email protected]>, who provided
an easy way to reproduce the problem.

What we have in lib/rwsem.c:__rwsem_do_wake():
int woken, loop;
^^^
and several lines below:
loop = woken;
woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
woken -= RWSEM_ACTIVE_BIAS;

However, rw_semaphore->count is 64-bit on Alpha, so
RWSEM_WAITING_BIAS has been defined as -0x0000000100000000L.
Obviously, this blows up in the write contention case.

Ivan.

--- linux.orig/lib/rwsem.c Mon Apr 26 20:11:36 2004
+++ linux/lib/rwsem.c Tue Apr 27 20:04:14 2004
@@ -40,8 +40,7 @@ static inline struct rw_semaphore *__rws
{
struct rwsem_waiter *waiter;
struct list_head *next;
- signed long oldcount;
- int woken, loop;
+ signed long oldcount, woken, loop;

rwsemtrace(sem,"Entering __rwsem_do_wake");

2004-04-27 17:59:31

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hi Eric,

On 27 Apr 2004 12:23:22 -0500
Eric Sandeen <[email protected]> wrote:

> Marc, do you have a patch associated with the changeset you found to
> be the culprit?
>
> I don't know how to get from that changeset number to a diff.

Yep, you will find all changesets and the belonging patches on
http://linux.bkbits.net:8080/linux-2.5

Reverting the following patch and all is fine...

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/02/27 18:17:12+11:00 [email protected]
# [XFS] Implement mrlocks on top of rwsems, instead of using our own
mrlock code.#
# SGI Modid: xfs-linux:xfs-kern:167181a
#
# BitKeeper/deleted/.del-mrlock.c~4fd914a7832bd60d
# 2004/02/27 18:16:18+11:00 [email protected] +0 -0
# Delete: fs/xfs/linux/mrlock.c
#
# fs/xfs/Makefile
# 2004/02/27 18:16:53+11:00 [email protected] +0 -1
# [XFS] Implement mrlocks on top of rwsems, instead of using our own
mrlock code.#
# fs/xfs/linux/mrlock.h
# 2004/02/27 18:16:53+11:00 [email protected] +62 -45
# [XFS] Implement mrlocks on top of rwsems, instead of using our own
mrlock code.#
diff -Nru a/fs/xfs/Makefile b/fs/xfs/Makefile
--- a/fs/xfs/Makefile Mon Apr 12 08:43:27 2004
+++ b/fs/xfs/Makefile Mon Apr 12 08:43:27 2004
@@ -130,7 +130,6 @@

# Objects in linux/
xfs-y += $(addprefix linux/, \
- mrlock.o \
xfs_aops.o \
xfs_buf.o \
xfs_file.o \
diff -Nru a/fs/xfs/linux/mrlock.c b/fs/xfs/linux/mrlock.c
--- a/fs/xfs/linux/mrlock.c Mon Apr 12 08:43:27 2004
+++ /dev/null Wed Dec 31 16:00:00 1969
@@ -1,274 +0,0 @@
-/*
- * Copyright (c) 2000-2003 Silicon Graphics, Inc. All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or modify
it- * under the terms of version 2 of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * Further, this software is distributed without any warranty that it
is- * free of the rightful claim of any third person regarding
infringement- * or the like. Any license provided herein, whether
implied or- * otherwise, applies only to this software file. Patent
licenses, if- * any, provided herein do not apply to combinations of
this program with- * other software, or any other product whatsoever.
- *
- * You should have received a copy of the GNU General Public License
along- * with this program; if not, write the Free Software Foundation,
Inc., 59- * Temple Place - Suite 330, Boston MA 02111-1307, USA.
- *
- * Contact information: Silicon Graphics, Inc., 1600 Amphitheatre Pkwy,
- * Mountain View, CA 94043, or:
- *
- * http://www.sgi.com
- *
- * For further information regarding this notice, see:
- *
- * http://oss.sgi.com/projects/GenInfo/SGIGPLNoticeExplan/
- */
-
-#include <linux/time.h>
-#include <linux/sched.h>
-#include <asm/system.h>
-#include <linux/interrupt.h>
-#include <asm/current.h>
-
-#include "mrlock.h"
-
-
-#if USE_RW_WAIT_QUEUE_SPINLOCK
-# define wq_write_lock write_lock
-#else
-# define wq_write_lock spin_lock
-#endif
-
-/*
- * We don't seem to need lock_type (only one supported), name, or
- * sequence. But, XFS will pass it so let's leave them here for now.
- */
-/* ARGSUSED */
-void
-mrlock_init(mrlock_t *mrp, int lock_type, char *name, long sequence)
-{
- mrp->mr_count = 0;
- mrp->mr_reads_waiting = 0;
- mrp->mr_writes_waiting = 0;
- init_waitqueue_head(&mrp->mr_readerq);
- init_waitqueue_head(&mrp->mr_writerq);
- mrp->mr_lock = SPIN_LOCK_UNLOCKED;
-}
-
-/*
- * Macros to lock/unlock the mrlock_t.
- */
-
-#define MRLOCK(m) spin_lock(&(m)->mr_lock);
-#define MRUNLOCK(m) spin_unlock(&(m)->mr_lock);
-
-
-/*
- * lock_wait should never be called in an interrupt thread.
- *
- * mrlocks can sleep (i.e. call schedule) and so they can't ever
- * be called from an interrupt thread.
- *
- * threads that wake-up should also never be invoked from interrupt
threads.- *
- * But, waitqueue_lock is locked from interrupt threads - and we are
- * called with interrupts disabled, so it is all OK.
- */
-
-/* ARGSUSED */
-void
-lock_wait(wait_queue_head_t *q, spinlock_t *lock, int rw)
-{
- DECLARE_WAITQUEUE( wait, current );
-
- __set_current_state(TASK_UNINTERRUPTIBLE);
-
- spin_lock(&q->lock);
- if (rw) {
- __add_wait_queue_tail(q, &wait);
- } else {
- __add_wait_queue(q, &wait);
- }
-
- spin_unlock(&q->lock);
- spin_unlock(lock);
-
- schedule();
-
- spin_lock(&q->lock);
- __remove_wait_queue(q, &wait);
- spin_unlock(&q->lock);
-
- spin_lock(lock);
-
- /* return with lock held */
-}
-
-/* ARGSUSED */
-void
-mrfree(mrlock_t *mrp)
-{
-}
-
-/* ARGSUSED */
-void
-mrlock(mrlock_t *mrp, int type, int flags)
-{
- if (type == MR_ACCESS)
- mraccess(mrp);
- else
- mrupdate(mrp);
-}
-
-/* ARGSUSED */
-void
-mraccessf(mrlock_t *mrp, int flags)
-{
- MRLOCK(mrp);
- if(mrp->mr_writes_waiting > 0) {
- mrp->mr_reads_waiting++;
- lock_wait(&mrp->mr_readerq, &mrp->mr_lock, 0);
- mrp->mr_reads_waiting--;
- }
- while (mrp->mr_count < 0) {
- mrp->mr_reads_waiting++;
- lock_wait(&mrp->mr_readerq, &mrp->mr_lock, 0);
- mrp->mr_reads_waiting--;
- }
- mrp->mr_count++;
- MRUNLOCK(mrp);
-}
-
-/* ARGSUSED */
-void
-mrupdatef(mrlock_t *mrp, int flags)
-{
- MRLOCK(mrp);
- while(mrp->mr_count) {
- mrp->mr_writes_waiting++;
- lock_wait(&mrp->mr_writerq, &mrp->mr_lock, 1);
- mrp->mr_writes_waiting--;
- }
-
- mrp->mr_count = -1; /* writer on it */
- MRUNLOCK(mrp);
-}
-
-int
-mrtryaccess(mrlock_t *mrp)
-{
- MRLOCK(mrp);
- /*
- * If anyone is waiting for update access or the lock is held for
update- * fail the request.
- */
- if(mrp->mr_writes_waiting > 0 || mrp->mr_count < 0) {
- MRUNLOCK(mrp);
- return 0;
- }
- mrp->mr_count++;
- MRUNLOCK(mrp);
- return 1;
-}
-
-int
-mrtrypromote(mrlock_t *mrp)
-{
- MRLOCK(mrp);
-
- if(mrp->mr_count == 1) { /* We are the only thread with the lock */
- mrp->mr_count = -1; /* writer on it */
- MRUNLOCK(mrp);
- return 1;
- }
-
- MRUNLOCK(mrp);
- return 0;
-}
-
-int
-mrtryupdate(mrlock_t *mrp)
-{
- MRLOCK(mrp);
-
- if(mrp->mr_count) {
- MRUNLOCK(mrp);
- return 0;
- }
-
- mrp->mr_count = -1; /* writer on it */
- MRUNLOCK(mrp);
- return 1;
-}
-
-static __inline__ void mrwake(mrlock_t *mrp)
-{
- /*
- * First, if the count is now 0, we need to wake-up anyone waiting.
- */
- if (!mrp->mr_count) {
- if (mrp->mr_writes_waiting) { /* Wake-up first writer waiting
*/- wake_up(&mrp->mr_writerq);
- } else if (mrp->mr_reads_waiting) { /* Wakeup any readers
waiting */- wake_up(&mrp->mr_readerq);
- }
- }
-}
-
-void
-mraccunlock(mrlock_t *mrp)
-{
- MRLOCK(mrp);
- mrp->mr_count--;
- mrwake(mrp);
- MRUNLOCK(mrp);
-}
-
-void
-mrunlock(mrlock_t *mrp)
-{
- MRLOCK(mrp);
- if (mrp->mr_count < 0) {
- mrp->mr_count = 0;
- } else {
- mrp->mr_count--;
- }
- mrwake(mrp);
- MRUNLOCK(mrp);
-}
-
-int
-ismrlocked(mrlock_t *mrp, int type) /* No need to lock since info can change
*/-{
- if (type == MR_ACCESS)
- return (mrp->mr_count > 0); /* Read lock */
- else if (type == MR_UPDATE)
- return (mrp->mr_count < 0); /* Write lock */
- else if (type == (MR_UPDATE | MR_ACCESS))
- return (mrp->mr_count); /* Any type of lock held */
- else /* Any waiters */
- return (mrp->mr_reads_waiting | mrp->mr_writes_waiting);
-}
-
-/*
- * Demote from update to access. We better be the only thread with the
- * lock in update mode so it should be easy to set to 1.
- * Wake-up any readers waiting.
- */
-
-void
-mrdemote(mrlock_t *mrp)
-{
- MRLOCK(mrp);
- mrp->mr_count = 1;
- if (mrp->mr_reads_waiting) { /* Wakeup all readers waiting */
- wake_up(&mrp->mr_readerq);
- }
- MRUNLOCK(mrp);
-}
diff -Nru a/fs/xfs/linux/mrlock.h b/fs/xfs/linux/mrlock.h
--- a/fs/xfs/linux/mrlock.h Mon Apr 12 08:43:27 2004
+++ b/fs/xfs/linux/mrlock.h Mon Apr 12 08:43:27 2004
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2000-2003 Silicon Graphics, Inc. All Rights Reserved.
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved.
*
* This program is free software; you can redistribute it and/or modify
it * under the terms of version 2 of the GNU General Public License as
@@ -32,56 +32,73 @@
#ifndef __XFS_SUPPORT_MRLOCK_H__
#define __XFS_SUPPORT_MRLOCK_H__

-#include <linux/time.h>
-#include <linux/wait.h>
-#include <asm/atomic.h>
-#include <asm/semaphore.h>
+#include <linux/rwsem.h>

-/*
- * Implement mrlocks on Linux that work for XFS.
- *
- * These are sleep locks and not spinlocks. If one wants read/write
spinlocks,- * use read_lock, write_lock, ... see spinlock.h.
- */
+enum { MR_NONE, MR_ACCESS, MR_UPDATE };

-typedef struct mrlock_s {
- int mr_count;
- unsigned short mr_reads_waiting;
- unsigned short mr_writes_waiting;
- wait_queue_head_t mr_readerq;
- wait_queue_head_t mr_writerq;
- spinlock_t mr_lock;
+typedef struct {
+ struct rw_semaphore mr_lock;
+ int mr_writer;
} mrlock_t;

-#define MR_ACCESS 1
-#define MR_UPDATE 2
-
-#define MRLOCK_BARRIER 0x1
-#define MRLOCK_ALLOW_EQUAL_PRI 0x8
+#define mrinit(mrp, name) \
+ ( (mrp)->mr_writer = 0, init_rwsem(&(mrp)->mr_lock) )
+#define mrlock_init(mrp, t,n,s) mrinit(mrp, n)
+#define mrfree(mrp) do { } while (0)
+#define mraccess(mrp) mraccessf(mrp, 0)
+#define mrupdate(mrp) mrupdatef(mrp, 0)
+
+static inline void mraccessf(mrlock_t *mrp, int flags)
+{
+ down_read(&mrp->mr_lock);
+}
+
+static inline void mrupdatef(mrlock_t *mrp, int flags)
+{
+ down_write(&mrp->mr_lock);
+ mrp->mr_writer = 1;
+}
+
+static inline int mrtryaccess(mrlock_t *mrp)
+{
+ return down_read_trylock(&mrp->mr_lock);
+}
+
+static inline int mrtryupdate(mrlock_t *mrp)
+{
+ if (!down_write_trylock(&mrp->mr_lock))
+ return 0;
+ mrp->mr_writer = 1;
+ return 1;
+}
+
+static inline void mrunlock(mrlock_t *mrp)
+{
+ if (mrp->mr_writer) {
+ mrp->mr_writer = 0;
+ up_write(&mrp->mr_lock);
+ } else {
+ up_read(&mrp->mr_lock);
+ }
+}
+
+static inline void mrdemote(mrlock_t *mrp)
+{
+ mrp->mr_writer = 0;
+ downgrade_write(&mrp->mr_lock);
+}

/*
- * mraccessf/mrupdatef take flags to be passed in while sleeping;
- * only PLTWAIT is currently supported.
+ * Debug-only routine, without some platform-specific asm code, we can
+ * now only answer requests regarding whether we hold the lock for
write+ * (reader state is outside our visibility, we only track writer
state).+ * Note: means !ismrlocked would give false positivies, so don't
do that. */
-
-extern void mraccessf(mrlock_t *, int);
-extern void mrupdatef(mrlock_t *, int);
-extern void mrlock(mrlock_t *, int, int);
-extern void mrunlock(mrlock_t *);
-extern void mraccunlock(mrlock_t *);
-extern int mrtryupdate(mrlock_t *);
-extern int mrtryaccess(mrlock_t *);
-extern int mrtrypromote(mrlock_t *);
-extern void mrdemote(mrlock_t *);
-
-extern int ismrlocked(mrlock_t *, int);
-extern void mrlock_init(mrlock_t *, int type, char *name, long
sequence);-extern void mrfree(mrlock_t *);
-
-#define mrinit(mrp, name) mrlock_init(mrp, MRLOCK_BARRIER, name, -1)
-#define mraccess(mrp) mraccessf(mrp, 0) /* grab for READ/ACCESS */
-#define mrupdate(mrp) mrupdatef(mrp, 0) /* grab for WRITE/UPDATE */
-#define mrislocked_access(mrp) ((mrp)->mr_count > 0)
-#define mrislocked_update(mrp) ((mrp)->mr_count < 0)
+static inline int ismrlocked(mrlock_t *mrp, int type)
+{
+ if (type == MR_UPDATE)
+ return mrp->mr_writer;
+ return 1;
+}

#endif /* __XFS_SUPPORT_MRLOCK_H__ */

2004-04-27 18:09:32

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Hi Ivan,

Cool!

I will try your patch after I finished moving to my new flat:-)

I wonder why it happens only with the XFS code. What I saw
rw_sem is used all over the place in the kernel.

Thank you and Dru for the work and hopefully it will fix my problem.

Regards

Marc


On Tue, 27 Apr 2004 21:55:14 +0400
Ivan Kokshaysky <[email protected]> wrote:

> On Tue, Apr 27, 2004 at 06:51:24PM +0200, Marc Giger wrote:
> > What's the current status of the problem?
>
> Hopefully resolved - thanks to Dru <[email protected]>, who provided
> an easy way to reproduce the problem.
>
> What we have in lib/rwsem.c:__rwsem_do_wake():
> int woken, loop;
> ^^^
> and several lines below:
> loop = woken;
> woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
> woken -= RWSEM_ACTIVE_BIAS;
>
> However, rw_semaphore->count is 64-bit on Alpha, so
> RWSEM_WAITING_BIAS has been defined as -0x0000000100000000L.
> Obviously, this blows up in the write contention case.
>
> Ivan.
>
> --- linux.orig/lib/rwsem.c Mon Apr 26 20:11:36 2004
> +++ linux/lib/rwsem.c Tue Apr 27 20:04:14 2004
> @@ -40,8 +40,7 @@ static inline struct rw_semaphore *__rws
> {
> struct rwsem_waiter *waiter;
> struct list_head *next;
> - signed long oldcount;
> - int woken, loop;
> + signed long oldcount, woken, loop;
>
> rwsemtrace(sem,"Entering __rwsem_do_wake");
>
>
>

2004-04-27 20:20:20

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Tue, Apr 27, 2004 at 08:08:30PM +0200, Marc Giger wrote:
> I wonder why it happens only with the XFS code. What I saw
> rw_sem is used all over the place in the kernel.

Dru says it happens with ext3 as well. XFS folks used their own
locking code (which hasn't suffered from that bug) until 2.6.4,
that's why you noticed the difference...
In either case, you need _really_ heavy write IO activity to
trigger the bug.

Ivan.

2004-04-27 20:42:36

by Marc Giger

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Wed, 28 Apr 2004 00:20:18 +0400
Ivan Kokshaysky <[email protected]> wrote:

> On Tue, Apr 27, 2004 at 08:08:30PM +0200, Marc Giger wrote:
> > I wonder why it happens only with the XFS code. What I saw
> > rw_sem is used all over the place in the kernel.
>
> Dru says it happens with ext3 as well. XFS folks used their own
> locking code (which hasn't suffered from that bug) until 2.6.4,
> that's why you noticed the difference...

Yes, as I saw that the patch uses the semaphore code in "arch" I was
not sure any longer if it is really a XFS related bug.

> In either case, you need _really_ heavy write IO activity to
> trigger the bug.

I noticed that. The best way to trigger it was "make -j20 vmlinux"
so that the pdflushd comes strongly into action.

Perhaps Dru can show me his "easy way" to reproduce the problem so that
I can test it more easily.

Thanks again

Marc

2004-04-27 21:27:39

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Tue, Apr 27, 2004 at 10:42:31PM +0200, Marc Giger wrote:
> Perhaps Dru can show me his "easy way" to reproduce the problem so that
> I can test it more easily.

http://marc.theaimsgroup.com/?l=linux-kernel&m=108296356805411&w=2

Ivan.

2004-04-28 06:00:54

by Nathan Scott

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Tue, Apr 27, 2004 at 08:08:30PM +0200, Marc Giger wrote:
> Hi Ivan,
>
> Cool!
>
> I will try your patch after I finished moving to my new flat:-)
>
> I wonder why it happens only with the XFS code. What I saw
> rw_sem is used all over the place in the kernel.

We do use the downgrade_write interface in XFS, which has
an architecture specific component and a generic component.
Its much less widely used than the rest of the rw_semaphore
code - that'd be a good spot to look if one architecture is
behaving oddly.

cheers.

--
Nathan

2004-04-28 13:14:38

by Dru

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

I've tested the patch on high loads and it handles it fine, its even
still very responsive under those loads (can't say the same for
my pentium4) Thanks a lot guys.

I'll put it though some more heavier tests over the next few days
to make certain its rock solid.

I did notice one other very minor issue, which was if i set it the alpha
system type to Nautilus instead of generic it doesnt boot.
It cycles lost interupt when detecting drives, it doesnt time out (each lost
intrupt times out but it keeps looking continally).




Marc Giger wrote:

>Hi Ivan,
>
>Cool!
>
>I will try your patch after I finished moving to my new flat:-)
>
>I wonder why it happens only with the XFS code. What I saw
>rw_sem is used all over the place in the kernel.
>
>Thank you and Dru for the work and hopefully it will fix my problem.
>
>Regards
>
>Marc
>
>
>On Tue, 27 Apr 2004 21:55:14 +0400
>Ivan Kokshaysky <[email protected]> wrote:
>
>
>
>>On Tue, Apr 27, 2004 at 06:51:24PM +0200, Marc Giger wrote:
>>
>>
>>>What's the current status of the problem?
>>>
>>>
>>Hopefully resolved - thanks to Dru <[email protected]>, who provided
>>an easy way to reproduce the problem.
>>
>>What we have in lib/rwsem.c:__rwsem_do_wake():
>> int woken, loop;
>> ^^^
>>and several lines below:
>> loop = woken;
>> woken *= RWSEM_ACTIVE_BIAS-RWSEM_WAITING_BIAS;
>> woken -= RWSEM_ACTIVE_BIAS;
>>
>>However, rw_semaphore->count is 64-bit on Alpha, so
>>RWSEM_WAITING_BIAS has been defined as -0x0000000100000000L.
>>Obviously, this blows up in the write contention case.
>>
>>Ivan.
>>
>>--- linux.orig/lib/rwsem.c Mon Apr 26 20:11:36 2004
>>+++ linux/lib/rwsem.c Tue Apr 27 20:04:14 2004
>>@@ -40,8 +40,7 @@ static inline struct rw_semaphore *__rws
>> {
>> struct rwsem_waiter *waiter;
>> struct list_head *next;
>>- signed long oldcount;
>>- int woken, loop;
>>+ signed long oldcount, woken, loop;
>>
>> rwsemtrace(sem,"Entering __rwsem_do_wake");
>>
>>
>>
>>
>>

2004-04-28 13:30:49

by Måns Rullgård

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

Dru <[email protected]> writes:

> I've tested the patch on high loads and it handles it fine, its even
> still very responsive under those loads (can't say the same for
> my pentium4) Thanks a lot guys.
>
> I'll put it though some more heavier tests over the next few days
> to make certain its rock solid.
>
> I did notice one other very minor issue, which was if i set it the alpha
> system type to Nautilus instead of generic it doesnt boot.
> It cycles lost interupt when detecting drives, it doesnt time out (each lost
> intrupt times out but it keeps looking continally).

Is that related to this patch, or is it a different issue? I run
2.6.3 kernels compiled for SX164, Miata, and Avanti with no problems.

--
M?ns Rullg?rd
[email protected]

2004-04-28 14:14:12

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: status of Linux on Alpha?

On Thu, Apr 29, 2004 at 01:13:12AM +1200, Dru wrote:
> I did notice one other very minor issue, which was if i set it the alpha
> system type to Nautilus instead of generic it doesnt boot.
> It cycles lost interupt when detecting drives, it doesnt time out (each lost
> intrupt times out but it keeps looking continally).

Strange. My UP1500 boots just fine with nautilus specific kernel.
Can you send me your .config?

Ivan.