2015-05-18 18:35:14

by Adrian Remonda

[permalink] [raw]
Subject: [PATCH 1/4] Staging: lustre: sparse static warning fix

This patch clears the warning given by sparse that the function should be static.
Done by moving the declaration of the structure 'nrs_conf_fifo' into the header
'ptlrpc_internal.h'

modified: drivers/staging/lustre/lustre/ptlrpc/nrs.c
modified: drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h

Signed-off-by: Adrian Remonda <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/nrs.c | 4 ----
drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/nrs.c b/drivers/staging/lustre/lustre/ptlrpc/nrs.c
index 63a05f4a902d..159db41c53a0 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/nrs.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/nrs.c
@@ -1698,10 +1698,6 @@ out:
return rc;
}

-
-/* ptlrpc/nrs_fifo.c */
-extern struct ptlrpc_nrs_pol_conf nrs_conf_fifo;
-
/**
* Adds all policies that ship with the ptlrpc module, to NRS core's list of
* policies \e nrs_core.nrs_policies.
diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
index a66dc3c6da41..e07e2aacc62c 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
+++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
@@ -309,4 +309,8 @@ static inline void ptlrpc_reqset_put(struct ptlrpc_request_set *set)
if (atomic_dec_and_test(&set->set_refcount))
OBD_FREE_PTR(set);
}
+
+/* ptlrpc/nrs_fifo.c */
+extern struct ptlrpc_nrs_pol_conf nrs_conf_fifo;
+
#endif /* PTLRPC_INTERNAL_H */
--
2.1.4


2015-05-18 18:35:23

by Adrian Remonda

[permalink] [raw]
Subject: [PATCH 2/4] Staging: lustre: sparse static warning fix

This patch clears the warning given by sparse that the function should be static.
Done by moving the declaration of the structure 'nrs_core' into the header
'ptlrpc_internal.h'


modified: drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
modified: drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h

Signed-off-by: Adrian Remonda <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c | 6 ------
drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h | 7 +++++++
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c b/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
index aeceef5152ac..300310e064f5 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c
@@ -417,12 +417,6 @@ ptlrpc_lprocfs_threads_max_seq_write(struct file *file,
LPROC_SEQ_FOPS(ptlrpc_lprocfs_threads_max);

/**
- * \addtogoup nrs
- * @{
- */
-extern struct nrs_core nrs_core;
-
-/**
* Translates \e ptlrpc_nrs_pol_state values to human-readable strings.
*
* \param[in] state The policy state
diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
index e07e2aacc62c..f9bfcff2ac9b 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
+++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h
@@ -313,4 +313,11 @@ static inline void ptlrpc_reqset_put(struct ptlrpc_request_set *set)
/* ptlrpc/nrs_fifo.c */
extern struct ptlrpc_nrs_pol_conf nrs_conf_fifo;

+/**
+ * \addtogoup nrs
+ * @{
+ */
+extern struct nrs_core nrs_core;
+
+
#endif /* PTLRPC_INTERNAL_H */
--
2.1.4

2015-05-18 18:35:58

by Adrian Remonda

[permalink] [raw]
Subject: [PATCH 3/4] Staging: lustre: Fixed typo

In the explanation of the function the name of the function was incorrect

Signed-off-by: Adrian Remonda <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/nrs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/nrs.c b/drivers/staging/lustre/lustre/ptlrpc/nrs.c
index 159db41c53a0..43da95f0bce2 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/nrs.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/nrs.c
@@ -478,7 +478,7 @@ static void nrs_resource_get_safe(struct ptlrpc_nrs *nrs,
*
* \param resp the resource hierarchy that is being released
*
- * \see ptlrpcnrs_req_hp_move()
+ * \see ptlrpc_nrs_req_hp_move()
* \see ptlrpc_nrs_req_finalize()
*/
static void nrs_resource_put_safe(struct ptlrpc_nrs_resource **resp)
--
2.1.4

2015-05-18 18:35:29

by Adrian Remonda

[permalink] [raw]
Subject: [PATCH 4/4] Staging: lustre: sparse lock warning fix

Fixed sparse warning: context imbalance in 'nrs_resource_put_safe' -
'different lock contexts for basic block' by releasing the lock on each
iteration of the for loop.

Signed-off-by: Adrian Remonda <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/nrs.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/nrs.c b/drivers/staging/lustre/lustre/ptlrpc/nrs.c
index 43da95f0bce2..3a1722437ca1 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/nrs.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/nrs.c
@@ -503,13 +503,11 @@ static void nrs_resource_put_safe(struct ptlrpc_nrs_resource **resp)

if (nrs == NULL) {
nrs = pols[i]->pol_nrs;
- spin_lock(&nrs->nrs_lock);
}
+ spin_lock(&nrs->nrs_lock);
nrs_policy_put_locked(pols[i]);
- }
-
- if (nrs != NULL)
spin_unlock(&nrs->nrs_lock);
+ }
}

/**
--
2.1.4

2015-05-18 21:21:40

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On Mon, May 18, 2015 at 08:34:51PM +0200, Adrian Remonda wrote:
> Fixed sparse warning: context imbalance in 'nrs_resource_put_safe' -
> 'different lock contexts for basic block' by releasing the lock on each
> iteration of the for loop.
>

That changelog doesn't sound correct at all. That's not a correct
motivation or explanation.

I reviewed the patch and it's likely going to cause dead locks. The code
is trying to take the spinlock for the first pointer in the array and
release it at the end. Now it takes the first pointer's spinlock a
bunch of times (dead lock) and releases it once (will not happen because
we are already dead).

regards,
dan carpenter

2015-05-18 21:23:57

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 1/4] Staging: lustre: sparse static warning fix

On Mon, May 18, 2015 at 08:34:48PM +0200, Adrian Remonda wrote:
> This patch clears the warning given by sparse that the function should be static.
> Done by moving the declaration of the structure 'nrs_conf_fifo' into the header
> 'ptlrpc_internal.h'
>
> modified: drivers/staging/lustre/lustre/ptlrpc/nrs.c
> modified: drivers/staging/lustre/lustre/ptlrpc/ptlrpc_internal.h

Don't include these lines in the changelog.

regards,
dan carpenter

2015-05-20 16:52:07

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On 2015/05/18, 3:21 PM, "Dan Carpenter" <[email protected]> wrote:

>On Mon, May 18, 2015 at 08:34:51PM +0200, Adrian Remonda wrote:
>> Fixed sparse warning: context imbalance in 'nrs_resource_put_safe' -
>> 'different lock contexts for basic block' by releasing the lock on each
>> iteration of the for loop.
>>
>
>That changelog doesn't sound correct at all. That's not a correct
>motivation or explanation.
>
>I reviewed the patch and it's likely going to cause dead locks. The code
>is trying to take the spinlock for the first pointer in the array and
>release it at the end. Now it takes the first pointer's spinlock a
>bunch of times (dead lock) and releases it once (will not happen because
>we are already dead).

It isn't clear to me what the checkpatch complaint actually means? Is it
that the spin_lock() and spin_unlock() calls have different amounts of
indentation?

Cheers, Andreas
--
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

2015-05-20 19:30:00

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On Wed, May 20, 2015 at 04:51:59PM +0000, Dilger, Andreas wrote:
> On 2015/05/18, 3:21 PM, "Dan Carpenter" <[email protected]> wrote:
>
> >On Mon, May 18, 2015 at 08:34:51PM +0200, Adrian Remonda wrote:
> >> Fixed sparse warning: context imbalance in 'nrs_resource_put_safe' -
> >> 'different lock contexts for basic block' by releasing the lock on each
> >> iteration of the for loop.
> >>
> >
> >That changelog doesn't sound correct at all. That's not a correct
> >motivation or explanation.
> >
> >I reviewed the patch and it's likely going to cause dead locks. The code
> >is trying to take the spinlock for the first pointer in the array and
> >release it at the end. Now it takes the first pointer's spinlock a
> >bunch of times (dead lock) and releases it once (will not happen because
> >we are already dead).
>
> It isn't clear to me what the checkpatch complaint actually means? Is it
> that the spin_lock() and spin_unlock() calls have different amounts of
> indentation?
>

It's not a checkpatch.pl warning, it's a Sparse warning. Sparse is
crappy at reporting locking bugs. It's mostly false positives.

I think it's saying that some paths lock and unlock some don't.

Smatch is also fairly crappy at finding locking bugs, unfortunately.
I need to re-write it using modern features and cross function analysis.

regards,
dan carpenter

2015-05-20 19:43:00

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

In Smatch, it the equivalent warning is turned off by default because
there are too many false positives, but you can enable it with the
--spammy flag.

kchecker --spammy drivers/staging/lustre/lustre/ptlrpc/nrs.c

drivers/staging/lustre/lustre/ptlrpc/nrs.c:512 nrs_resource_put_safe()
warn: 'spin_lock:&nrs->nrs_lock' is sometimes locked here and sometimes unlocked.

regards,
dan carpenter

2015-05-20 22:51:39

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On 2015/05/20, 1:42 PM, "Dan Carpenter" <[email protected]> wrote:

>In Smatch, it the equivalent warning is turned off by default because
>there are too many false positives, but you can enable it with the
>--spammy flag.
>
>kchecker --spammy drivers/staging/lustre/lustre/ptlrpc/nrs.c
>
>drivers/staging/lustre/lustre/ptlrpc/nrs.c:512 nrs_resource_put_safe()
>warn: 'spin_lock:&nrs->nrs_lock' is sometimes locked here and sometimes
>unlocked.

Would this be happier with something like:

for (i = 0; i < NRS_RES_MAX; i++) {
if (pols[i] == NULL)
continue;


if (nrs == NULL) {
nrs = pols[i]->pol_nrs;
if (likely(nrs != NULL)) /* make sparse happy */
spin_lock(&nrs->nrs_lock);
}
nrs_policy_put_locked(pols[i]);
}

if (nrs != NULL)
spin_unlock(&nrs->nrs_lock);

so that the "if" conditions are the same? The code definitely doesn't
have a bug, because the lock is only locked once when nrs is first set,
and only unlocked if it is set. Or is there a comment to put there that
will quiet the static checker?


Cheers, Andreas
--
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

2015-05-21 08:16:13

by Adrian Remonda

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On Tue, May 19, 2015 at 12:21:15AM +0300, Dan Carpenter wrote:
> On Mon, May 18, 2015 at 08:34:51PM +0200, Adrian Remonda wrote:
> > Fixed sparse warning: context imbalance in 'nrs_resource_put_safe' -
> > 'different lock contexts for basic block' by releasing the lock on each
> > iteration of the for loop.
> >
>
> That changelog doesn't sound correct at all. That's not a correct
> motivation or explanation.
>
> I reviewed the patch and it's likely going to cause dead locks. The code
> is trying to take the spinlock for the first pointer in the array and
> release it at the end. Now it takes the first pointer's spinlock a
> bunch of times (dead lock) and releases it once (will not happen because
> we are already dead).
>
>

Hello Dan,

thanks for the comments. The code would end up looking as next, I don't
undertand where the deadlock would be.
I know the older code would work, I just changed it to keep the lock
locked the less time as possible.

for (i = 0; i < NRS_RES_MAX; i++) {
if (pols[i] == NULL)
continue;

if (nrs == NULL) {
nrs = pols[i]->pol_nrs;
}
spin_lock(&nrs->nrs_lock);
nrs_policy_put_locked(pols[i]);
spin_unlock(&nrs->nrs_lock);
}

best regards,
Adrian

2015-05-21 15:12:43

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

Oh, sorry, I didn't read your patch very carefully. It won't cause a
deadlock. But I'm going to assume it's still not right until lustre
expert Acks it.

How well do you understand lustre locking?

regards,
dan carpenter

2015-05-21 15:33:46

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On May 21, 2015, at 11:12 AM, Dan Carpenter wrote:

> Oh, sorry, I didn't read your patch very carefully. It won't cause a
> deadlock. But I'm going to assume it's still not right until lustre
> expert Acks it.

I just took a closer look and it appears original code is buggy and the patch just propagates the bugginess.

If we look at the nrs_policy_put_locked, it eventually ends up in nrs_policy_stop0,
it would hold a lock on whatever happened to be the first policy in the array not NULL.
But nrs_policy_stop0 would unlock the lock on the policy it was called on (already likely a deadlock material) and then relock it.

The problems would arise only if there are more than one nrs policy registered which is theoretically possible, but certainly makes no sense a client (besides, none of the advanced NRS policies
made it in anyway and I almost feel like they just add unnecessary complication in client-only code).

The code looks elaborate enough as if the first policy lock is to be always used as the guardian lock, but then stop0 behavior might be a bug then?
Or it's possible we never end up in stop0 due to nrs state machine?
Let's see what Nikitas and Liang remember about any of this (one of them is the original author of this code, but I am not sure who.)

Nikitas, Liang: The code in question is in nrs_resource_put_safe:
for (i = 0; i < NRS_RES_MAX; i++) {
if (pols[i] == NULL)
continue;

if (nrs == NULL) {
nrs = pols[i]->pol_nrs;
spin_lock(&nrs->nrs_lock);
}
nrs_policy_put_locked(pols[i]);
}

if (nrs != NULL)
spin_unlock(&nrs->nrs_lock);

2015-05-22 01:11:32

by Nikitas Angelinas

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

(Apologies for the double-posting to individual recipients; resending
in plain text to get through vger list filters.)

The current code does not have a bug in this path (until we find one,
of course). nrs_resource_put_safe() releases references to the related
NRS resource hierarchies and policies for an RPC; as Oleg pointed out,
the server-side code which is included in the Intel/OpenSFS tree
(http://git.whamcloud.com/fs/lustre-release.git) has more than just
the FIFO policy, and so up to NRS_RES_MAX (currently 2) policies can
be related with an NRS head (struct ptlrpc_nrs).
nrs_resource_put_safe() takes the NRS head spinlock (nrs_lock),
releases references to the NRS head's policies, and releases the same
spinlock. nrs_policy_stop0() can be entered from
nrs_policy_put_locked(), but whichever policy is passed to it, it will
release the same spinlock, as all policies for an RPC share the same
NRS head, and thus the same nrs_lock.

Liang wrote this part; I believe the aim is to avoid a lock dance of
nrs_lock when releasing policy references for the NRS head; the
proposed patch would add the lock dance, so I am not sure it would
help much; not that it would hurt, but I don't see a good reason to
add it, other than to suppress the sparse warning.

Btw, I am guessing that the reason for having a part of both NRS and
the server-side of the PTLRPC code in the kernel is only to satisfy
build dependencies, and this code is unreachable; i.e. as the TODO
file suggests, the client and server sides of PTLRPC will be
decoupled, and so this code will eventually be removed from the
kernel, likely before transitioning from staging.



Cheers,
Nikitas

On Thu, May 21, 2015 at 8:33 AM, Drokin, Oleg <[email protected]> wrote:
> On May 21, 2015, at 11:12 AM, Dan Carpenter wrote:
>
>> Oh, sorry, I didn't read your patch very carefully. It won't cause a
>> deadlock. But I'm going to assume it's still not right until lustre
>> expert Acks it.
>
> I just took a closer look and it appears original code is buggy and the patch just propagates the bugginess.
>
> If we look at the nrs_policy_put_locked, it eventually ends up in nrs_policy_stop0,
> it would hold a lock on whatever happened to be the first policy in the array not NULL.
> But nrs_policy_stop0 would unlock the lock on the policy it was called on (already likely a deadlock material) and then relock it.
>
> The problems would arise only if there are more than one nrs policy registered which is theoretically possible, but certainly makes no sense a client (besides, none of the advanced NRS policies
> made it in anyway and I almost feel like they just add unnecessary complication in client-only code).
>
> The code looks elaborate enough as if the first policy lock is to be always used as the guardian lock, but then stop0 behavior might be a bug then?
> Or it's possible we never end up in stop0 due to nrs state machine?
> Let's see what Nikitas and Liang remember about any of this (one of them is the original author of this code, but I am not sure who.)
>
> Nikitas, Liang: The code in question is in nrs_resource_put_safe:
> for (i = 0; i < NRS_RES_MAX; i++) {
> if (pols[i] == NULL)
> continue;
>
> if (nrs == NULL) {
> nrs = pols[i]->pol_nrs;
> spin_lock(&nrs->nrs_lock);
> }
> nrs_policy_put_locked(pols[i]);
> }
>
> if (nrs != NULL)
> spin_unlock(&nrs->nrs_lock);
>

2015-05-22 13:38:23

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 4/4] Staging: lustre: sparse lock warning fix

On Wed, May 20, 2015 at 10:51:34PM +0000, Dilger, Andreas wrote:
> On 2015/05/20, 1:42 PM, "Dan Carpenter" <[email protected]> wrote:
>
> >In Smatch, it the equivalent warning is turned off by default because
> >there are too many false positives, but you can enable it with the
> >--spammy flag.
> >
> >kchecker --spammy drivers/staging/lustre/lustre/ptlrpc/nrs.c
> >
> >drivers/staging/lustre/lustre/ptlrpc/nrs.c:512 nrs_resource_put_safe()
> >warn: 'spin_lock:&nrs->nrs_lock' is sometimes locked here and sometimes
> >unlocked.
>
> Would this be happier with something like:
>
> for (i = 0; i < NRS_RES_MAX; i++) {
> if (pols[i] == NULL)
> continue;
>
>
> if (nrs == NULL) {
> nrs = pols[i]->pol_nrs;
> if (likely(nrs != NULL)) /* make sparse happy */
> spin_lock(&nrs->nrs_lock);
> }
> nrs_policy_put_locked(pols[i]);
> }
>
> if (nrs != NULL)
> spin_unlock(&nrs->nrs_lock);
>
> so that the "if" conditions are the same? The code definitely doesn't
> have a bug, because the lock is only locked once when nrs is first set,
> and only unlocked if it is set. Or is there a comment to put there that
> will quiet the static checker?

Forget about Sparse, it's good at some things and it's fast but it has
crappy flow analysis compared to Smatch.

Adding that check does actually silence the warning in Smatch but it's
a bad idea. Smatch is supposed to know that "nrs" is non-NULL because
of the dereference on the next line. I think this is a recent
regression. I'll look into it.

Smatch doesn't have a way to silence false positives. It's still
developing, so many of these false positives can be solved by improving
the flow analysis.

regards,
dan carpenter