2004-06-30 00:17:45

by linas

[permalink] [raw]
Subject: [PATCH] PPC64: lockfix for rtas error log


Paul,

Could you please apply the following path to the ameslab tree, and/or
forward it to the main 2.6 kernel maintainers.

This patch moves the location of a lock in order to protect
the contents of a buffer until it has been copied to its final
destination. Prior to this, a race existed whereby the buffer
could be filled even while it was being emptied.

Signed-off-by: Linas Vepstas <[email protected]>

--linas


Attachments:
(No filename) (420.00 B)
rtas-erbbuf-lockfix.patch (857.00 B)
Download all attachments

2004-06-30 01:30:15

by David Gibson

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

On Tue, Jun 29, 2004 at 05:50:07PM -0500, [email protected] wrote:
>
> Paul,
>
> Could you please apply the following path to the ameslab tree, and/or
> forward it to the main 2.6 kernel maintainers.
>
> This patch moves the location of a lock in order to protect
> the contents of a buffer until it has been copied to its final
> destination. Prior to this, a race existed whereby the buffer
> could be filled even while it was being emptied.
>
> Signed-off-by: Linas Vepstas <[email protected]>

Hmm... I note that log_error() does nothing but call ppc_md.log_error,
and I can't see anywhere that that is set to be non-NULL...

> +++ arch/ppc64/kernel/rtas.c 2004-06-29 17:14:05.000000000 -0500
> @@ -134,9 +134,10 @@ log_rtas_error(void)
>
> spin_lock_irqsave(&rtas.lock, s);
> rc = __log_rtas_error();
> - spin_unlock_irqrestore(&rtas.lock, s);
> - if (rc == 0)
> + if (rc == 0) {
> log_error(rtas_err_buf, ERR_TYPE_RTAS_LOG, 0);
> + }
> + spin_unlock_irqrestore(&rtas.lock, s);
> }
>
> int
> @@ -193,12 +194,13 @@ rtas_call(int token, int nargs, int nret
> outputs[i] = rtas_args->rets[i+1];
> ret = (int)((nret > 0) ? rtas_args->rets[0] : 0);
>
> + if (logit) {
> + log_error(rtas_err_buf, ERR_TYPE_RTAS_LOG, 0);
> + }
> +
> /* Gotta do something different here, use global lock for now... */
> spin_unlock_irqrestore(&rtas.lock, s);
>
> - if (logit)
> - log_error(rtas_err_buf, ERR_TYPE_RTAS_LOG, 0);
> -
> return ret;
> }
>


--
David Gibson | For every complex problem there is a
david AT gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson

2004-06-30 01:47:18

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

On Tue, 2004-06-29 at 17:50, [email protected] wrote:
> Paul,
>
> Could you please apply the following path to the ameslab tree, and/or
> forward it to the main 2.6 kernel maintainers.
>
> This patch moves the location of a lock in order to protect
> the contents of a buffer until it has been copied to its final
> destination. Prior to this, a race existed whereby the buffer
> could be filled even while it was being emptied.

Hrm....

That's bad, I moved that out of the lock on purpose to avoid deadlocks,
I think ppc_md.log_error can take the rtas lock again (nvram). We need to
take a separate lock for the err buf if that function can be called
concurrently I suppose.

Ben.



2004-06-30 11:38:06

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

Linas,

> This patch moves the location of a lock in order to protect
> the contents of a buffer until it has been copied to its final
> destination. Prior to this, a race existed whereby the buffer
> could be filled even while it was being emptied.

Given that log_error() seems to be a no-op at the moment AFAICT, and
that Ben H was concerned about possible deadlocks if log_error
actually did do something, I'd like to see a resolution of (a) what
log_error should be doing and (b) whether there is in fact any
possibility of deadlock with this patch once (a) is resolved.

Thanks,
Paul.

2004-06-30 17:40:11

by linas

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

On Tue, Jun 29, 2004 at 08:44:26PM -0500, Benjamin Herrenschmidt wrote:
> On Tue, 2004-06-29 at 17:50, [email protected] wrote:
> > Paul,
> >
> > Could you please apply the following path to the ameslab tree, and/or
> > forward it to the main 2.6 kernel maintainers.
> >
> > This patch moves the location of a lock in order to protect
> > the contents of a buffer until it has been copied to its final
> > destination. Prior to this, a race existed whereby the buffer
> > could be filled even while it was being emptied.
>
> Hrm....
>
> That's bad, I moved that out of the lock on purpose to avoid deadlocks,

I looked for deadlocks, but didn't see anything obvious. I'll take your
word, though.

> I think ppc_md.log_error can take the rtas lock again (nvram). We need to
> take a separate lock for the err buf if that function can be called
> concurrently I suppose.

Well, the problem was that there is no lock that is protecting the
use of the single, global buffer. Adding yet another lock is bad;
it makes hunting for deadlocks that much more tedious and difficult;
already, finding deadlocks is error-prone, and subject to bit-rot as
future hackers update the code. So instead, the problem can be easily
avoided by not using a global buffer. The code below mallocs/frees.
Its not perf-critcal, so I don't mind malloc overhead. Would this
work for you? Patch attached below.

Signed-off-by: Linas Vepstas <[email protected]>

--linas




Attachments:
(No filename) (1.42 kB)
rtas-lock-malloc.patch (1.42 kB)
Download all attachments

2004-06-30 17:51:31

by linas

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

On Wed, Jun 30, 2004 at 09:27:09PM +1000, Paul Mackerras wrote:
> Linas,
>
> > This patch moves the location of a lock in order to protect
> > the contents of a buffer until it has been copied to its final
> > destination. Prior to this, a race existed whereby the buffer
> > could be filled even while it was being emptied.
>
> Given that log_error() seems to be a no-op at the moment AFAICT, and

!!
Too many kernel trees. Someone must have whacked the ameslab tree
by accident, or on purpose, because they got sick of seeing rtas
messages. I don't find the RTAS messages to be pleasent, but simply
whacking them is not the right solution. The following diff is
between an older tree and the current tree. If you could re-add
these lines, that would be great.

--linas


Attachments:
(No filename) (781.00 B)
chrp-setup.patch (491.00 B)
Download all attachments

2004-06-30 18:52:29

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log


> Well, the problem was that there is no lock that is protecting the
> use of the single, global buffer. Adding yet another lock is bad;
> it makes hunting for deadlocks that much more tedious and difficult;
> already, finding deadlocks is error-prone, and subject to bit-rot as
> future hackers update the code. So instead, the problem can be easily
> avoided by not using a global buffer. The code below mallocs/frees.
> Its not perf-critcal, so I don't mind malloc overhead. Would this
> work for you? Patch attached below.

I prefer that, but couldn't we move the kmalloc outside of the spinlock
and so use GFP_KERNEL instead ?

Ben.


2004-06-30 19:00:15

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

Benjamin Herrenschmidt wrote:
>> So instead, the problem can be easily
>>avoided by not using a global buffer. The code below mallocs/frees.
>>Its not perf-critcal, so I don't mind malloc overhead. Would this
>>work for you? Patch attached below.
>
>
> I prefer that, but couldn't we move the kmalloc outside of the spinlock
> and so use GFP_KERNEL instead ?

Isn't the global buffer used since it's in BSS, and as such, in low
memory, guaranteed to be below 4GB? RTAS has limitations that restricts
any passed-in buffers to be addressable as 32-bit real mode pointers, right?


-Olof

2004-06-30 19:06:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log


> Isn't the global buffer used since it's in BSS, and as such, in low
> memory, guaranteed to be below 4GB? RTAS has limitations that restricts
> any passed-in buffers to be addressable as 32-bit real mode pointers, right?

The global buffer is still used within the spinlocked region for RTAS itself,
the patch just uses a local buffer and copies into it from the global one
before releasing the lock.

Ben.


2004-06-30 20:50:08

by linas

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

On Wed, Jun 30, 2004 at 11:17:08AM +1000, David Gibson wrote:
> On Tue, Jun 29, 2004 at 05:50:07PM -0500, [email protected] wrote:
> >
> > Paul,
> >
> > Could you please apply the following path to the ameslab tree, and/or
> > forward it to the main 2.6 kernel maintainers.
> >
> > This patch moves the location of a lock in order to protect
> > the contents of a buffer until it has been copied to its final
> > destination. Prior to this, a race existed whereby the buffer
> > could be filled even while it was being emptied.
> >
> > Signed-off-by: Linas Vepstas <[email protected]>
>
> Hmm... I note that log_error() does nothing but call ppc_md.log_error,
> and I can't see anywhere that that is set to be non-NULL...

grep log_error *.c
chrp_setup.c: ppc_md.log_error = pSeries_log_error;

--linas

2004-06-30 23:07:18

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] PPC64: lockfix for rtas error log

Linas,

> Too many kernel trees. Someone must have whacked the ameslab tree
> by accident, or on purpose, because they got sick of seeing rtas
> messages. I don't find the RTAS messages to be pleasent, but simply
> whacking them is not the right solution. The following diff is
> between an older tree and the current tree. If you could re-add
> these lines, that would be great.

As far as I can see, the ameslab tree has _never_ contained those
lines. The last change to chrp_setup.c was on 1 May 2004, and neither
that version nor any of the earlier versions that I looked at have
those lines. Are you sure you don't have that as a local change in
your copy? Do a bk sfiles -i and a bk push -n and see if it shows up.

In any case the #ifdef is redundant, because chrp_setup.o is only
linked in if CONFIG_PPC_PSERIES is set.

As for the RTAS messages being printk'd, the only possible
justification would be if there was a userspace tool to analyse them.
I don't know if such a thing exists, and if it does, I certainly don't
have a copy. Is anyone working on that?

Paul.

2004-07-01 00:40:15

by linas

[permalink] [raw]
Subject: [PATCH] 2.6 PPC64: lockfix for rtas error log (third-times-a-charm?)

On Wed, Jun 30, 2004 at 01:47:29PM -0500, Benjamin Herrenschmidt wrote:
>
> > Well, the problem was that there is no lock that is protecting the
> > use of the single, global buffer. Adding yet another lock is bad;
> > it makes hunting for deadlocks that much more tedious and difficult;
> > already, finding deadlocks is error-prone, and subject to bit-rot as
> > future hackers update the code. So instead, the problem can be easily
> > avoided by not using a global buffer. The code below mallocs/frees.
> > Its not perf-critcal, so I don't mind malloc overhead. Would this
> > work for you? Patch attached below.
>
> I prefer that, but couldn't we move the kmalloc outside of the spinlock
> and so use GFP_KERNEL instead ?

OK,

Upon closer analysis of the code, I see that log_rtas_error()
was incorrectly named, and was being used incorrectly. The
solution is to get rid of it entirely; see patch below. So:

-- In one case kmalloc must be GFP_ATOMIC because rtas_call()
can happen in any context, incl. irqs.
-- In the other case, I turned it into GFP_KENREL, at the cost
of doing a needless malloc/free in the vast majority of
cases where there is no error. Small price, as I beleive
that this routine is very rarely called.

Patch below,
Signed-off-by: Linas Vepstas <[email protected]>

--linas