2023-10-25 16:32:06

by Oleg Nesterov

[permalink] [raw]
Subject: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

Hello,

The usage of writeverf_lock is wrong and misleading no matter what and
I can not understand the intent.

nfsd_copy_write_verifier() uses read_seqbegin_or_lock() incorrectly.
"seq" is always even, so read_seqbegin_or_lock() can never take the
lock for writing. We need to make the counter odd for the 2nd round:

--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -359,11 +359,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
*/
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
{
- int seq = 0;
+ int seq, nextseq = 0;

do {
+ seq = nextseq;
read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
+ /* If lockless access failed, take the lock. */
+ nextseq = 1;
} while (need_seqretry(&nn->writeverf_lock, seq));
done_seqretry(&nn->writeverf_lock, seq);
}

OTOH. This function just copies 8 bytes, this makes me think that it doesn't
need the conditional locking and read_seqbegin_or_lock() at all. So perhaps
the (untested) patch below makes more sense? Please note that it should not
change the current behaviour, it just makes the code look correct (and more
optimal but this is minor).

Another question is why we can't simply turn nn->writeverf into seqcount_t.
I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to
serialise with itself, right?

Oleg.
---

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c7af1095f6b5..094b765c5397 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
*/
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
{
- int seq = 0;
+ unsigned seq;

do {
- read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
+ seq = read_seqbegin(&nn->writeverf_lock);
memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
- } while (need_seqretry(&nn->writeverf_lock, seq));
- done_seqretry(&nn->writeverf_lock, seq);
+ } while (read_seqretry(&nn->writeverf_lock, seq));
}

static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)


2023-10-25 17:42:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

Hi Chuck,

Thanks for your reply. But I am already sleeping and I can't understand it.
So let me ask a couple of questions.

1. Do you agree that the current nfsd_copy_write_verifier() code makes no sense?

I mean, the usage of read_seqbegin_or_lock() suggests that if the lockless
pass fails it should take writeverf_lock for writing. But this can't happen,
and thus this code doesn't look right no matter what. None of the
read_seqbegin_or_lock/need_seqretry/done_seqretry helpers make any sense
because "seq" is alway even.

2. If yes, which change do you prefer? I'd prefer the patch at the end.

Oleg.

On 10/25, Chuck Lever wrote:
>
> On Wed, Oct 25, 2023 at 06:30:06PM +0200, Oleg Nesterov wrote:
> > Hello,
> >
> > The usage of writeverf_lock is wrong and misleading no matter what and
> > I can not understand the intent.
>
> The structure of the seqlock was introduced in commit 27c438f53e79
> ("nfsd: Support the server resetting the boot verifier").
>
> The NFS write verifier is an 8-byte cookie that is supposed to
> indicate the boot epoch of the server -- simply put, when the server
> restarts, the epoch (and this verifier) changes.
>
> NFSv3 and later have a two-phase write scheme where the client
> sends data to the server (known as an UNSTABLE WRITE), then later
> asks the server to commit that data (a COMMIT). Before the COMMIT,
> that data is not durable and the client must hold onto it until
> the server's COMMIT Reply indicates it's safe for the client to
> discard that data and move on.
>
> When an UNSTABLE WRITE is done, the server reports its current
> epoch as part of each WRITE Reply. If this verifier cookie changes,
> the client knows that the server might have lost previously
> written written-but-uncommitted data, so it must send the WRITEs
> again in that (rare) case.
>
> NFSD abuses this slightly by changing the write verifier whenever
> there is an underlying local write error that might have occurred in
> the background (ie, there was no WRITE or COMMIT operation at the
> time that the server could use to convey the error back to the
> client). This is supposed to trigger clients to send UNSTABLE WRITEs
> again to ensure that data is properly committed to durable storage.
>
> The point of the seqlock is to ensure that
>
> a) a write verifier update does not tear the verifier
> b) a write verifier read does not see a torn verifier
>
> This is a hot path, so we don't want a full spinlock to achieve
> a) and b).
>
> Way back when, the verifier was updated by two separate 32-bit
> stores; hence the risk of tearing.
>
>
> > nfsd_copy_write_verifier() uses read_seqbegin_or_lock() incorrectly.
> > "seq" is always even, so read_seqbegin_or_lock() can never take the
> > lock for writing. We need to make the counter odd for the 2nd round:
> >
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -359,11 +359,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> > */
> > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> > {
> > - int seq = 0;
> > + int seq, nextseq = 0;
> >
> > do {
> > + seq = nextseq;
> > read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> > memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> > + /* If lockless access failed, take the lock. */
> > + nextseq = 1;
> > } while (need_seqretry(&nn->writeverf_lock, seq));
> > done_seqretry(&nn->writeverf_lock, seq);
> > }
> >
> > OTOH. This function just copies 8 bytes, this makes me think that it doesn't
> > need the conditional locking and read_seqbegin_or_lock() at all. So perhaps
> > the (untested) patch below makes more sense? Please note that it should not
> > change the current behaviour, it just makes the code look correct (and more
> > optimal but this is minor).
> >
> > Another question is why we can't simply turn nn->writeverf into seqcount_t.
> > I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to
> > serialise with itself, right?
>
> "reset" is supposed to be very rare operation. Using a lock in that
> case is probably quite acceptable, as long as reading the verifier
> is wait-free and guaranteed to be untorn.
>
> But a seqcount_t is only 32 bits.
>
>
> > Oleg.
> > ---
> >
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index c7af1095f6b5..094b765c5397 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> > */
> > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> > {
> > - int seq = 0;
> > + unsigned seq;
> >
> > do {
> > - read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> > + seq = read_seqbegin(&nn->writeverf_lock);
> > memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> > - } while (need_seqretry(&nn->writeverf_lock, seq));
> > - done_seqretry(&nn->writeverf_lock, seq);
> > + } while (read_seqretry(&nn->writeverf_lock, seq));
> > }
> >
> > static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
> >
>
> --
> Chuck Lever
>

2023-10-25 17:49:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

sorry for the noise, forgot to mention.

I personally don't care about nfsd_copy_write_verifier(), and this code doesn't
look really buggy. I am trying to audit the users of read_seqbegin_or_lock(),
see https://lore.kernel.org/all/[email protected]/


On 10/25, Oleg Nesterov wrote:
>
> Hi Chuck,
>
> Thanks for your reply. But I am already sleeping and I can't understand it.
> So let me ask a couple of questions.
>
> 1. Do you agree that the current nfsd_copy_write_verifier() code makes no sense?
>
> I mean, the usage of read_seqbegin_or_lock() suggests that if the lockless
> pass fails it should take writeverf_lock for writing. But this can't happen,
> and thus this code doesn't look right no matter what. None of the
> read_seqbegin_or_lock/need_seqretry/done_seqretry helpers make any sense
> because "seq" is alway even.
>
> 2. If yes, which change do you prefer? I'd prefer the patch at the end.
>
> Oleg.
>
> On 10/25, Chuck Lever wrote:
> >
> > On Wed, Oct 25, 2023 at 06:30:06PM +0200, Oleg Nesterov wrote:
> > > Hello,
> > >
> > > The usage of writeverf_lock is wrong and misleading no matter what and
> > > I can not understand the intent.
> >
> > The structure of the seqlock was introduced in commit 27c438f53e79
> > ("nfsd: Support the server resetting the boot verifier").
> >
> > The NFS write verifier is an 8-byte cookie that is supposed to
> > indicate the boot epoch of the server -- simply put, when the server
> > restarts, the epoch (and this verifier) changes.
> >
> > NFSv3 and later have a two-phase write scheme where the client
> > sends data to the server (known as an UNSTABLE WRITE), then later
> > asks the server to commit that data (a COMMIT). Before the COMMIT,
> > that data is not durable and the client must hold onto it until
> > the server's COMMIT Reply indicates it's safe for the client to
> > discard that data and move on.
> >
> > When an UNSTABLE WRITE is done, the server reports its current
> > epoch as part of each WRITE Reply. If this verifier cookie changes,
> > the client knows that the server might have lost previously
> > written written-but-uncommitted data, so it must send the WRITEs
> > again in that (rare) case.
> >
> > NFSD abuses this slightly by changing the write verifier whenever
> > there is an underlying local write error that might have occurred in
> > the background (ie, there was no WRITE or COMMIT operation at the
> > time that the server could use to convey the error back to the
> > client). This is supposed to trigger clients to send UNSTABLE WRITEs
> > again to ensure that data is properly committed to durable storage.
> >
> > The point of the seqlock is to ensure that
> >
> > a) a write verifier update does not tear the verifier
> > b) a write verifier read does not see a torn verifier
> >
> > This is a hot path, so we don't want a full spinlock to achieve
> > a) and b).
> >
> > Way back when, the verifier was updated by two separate 32-bit
> > stores; hence the risk of tearing.
> >
> >
> > > nfsd_copy_write_verifier() uses read_seqbegin_or_lock() incorrectly.
> > > "seq" is always even, so read_seqbegin_or_lock() can never take the
> > > lock for writing. We need to make the counter odd for the 2nd round:
> > >
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -359,11 +359,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> > > */
> > > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> > > {
> > > - int seq = 0;
> > > + int seq, nextseq = 0;
> > >
> > > do {
> > > + seq = nextseq;
> > > read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> > > memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> > > + /* If lockless access failed, take the lock. */
> > > + nextseq = 1;
> > > } while (need_seqretry(&nn->writeverf_lock, seq));
> > > done_seqretry(&nn->writeverf_lock, seq);
> > > }
> > >
> > > OTOH. This function just copies 8 bytes, this makes me think that it doesn't
> > > need the conditional locking and read_seqbegin_or_lock() at all. So perhaps
> > > the (untested) patch below makes more sense? Please note that it should not
> > > change the current behaviour, it just makes the code look correct (and more
> > > optimal but this is minor).
> > >
> > > Another question is why we can't simply turn nn->writeverf into seqcount_t.
> > > I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to
> > > serialise with itself, right?
> >
> > "reset" is supposed to be very rare operation. Using a lock in that
> > case is probably quite acceptable, as long as reading the verifier
> > is wait-free and guaranteed to be untorn.
> >
> > But a seqcount_t is only 32 bits.
> >
> >
> > > Oleg.
> > > ---
> > >
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index c7af1095f6b5..094b765c5397 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> > > */
> > > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> > > {
> > > - int seq = 0;
> > > + unsigned seq;
> > >
> > > do {
> > > - read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> > > + seq = read_seqbegin(&nn->writeverf_lock);
> > > memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> > > - } while (need_seqretry(&nn->writeverf_lock, seq));
> > > - done_seqretry(&nn->writeverf_lock, seq);
> > > + } while (read_seqretry(&nn->writeverf_lock, seq));
> > > }
> > >
> > > static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
> > >
> >
> > --
> > Chuck Lever
> >

2023-10-25 17:57:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

On 10/25, Chuck Lever wrote:
>
> > Another question is why we can't simply turn nn->writeverf into seqcount_t.
> > I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to
> > serialise with itself, right?
>
> "reset" is supposed to be very rare operation. Using a lock in that
> case is probably quite acceptable, as long as reading the verifier
> is wait-free and guaranteed to be untorn.
>
> But a seqcount_t is only 32 bits.

Again, I don't understand you.

Once again, we can turn writeverf into seqcount_t, see the patch below.

But this way nfsd_reset_write_verifier() can race with itself, no?

Oleg
---

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index ec49b200b797..3e2adf3eb15f 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -110,7 +110,7 @@ struct nfsd_net {
bool nfsd_net_up;
bool lockd_up;

- seqlock_t writeverf_lock;
+ seqcount_t writeverf_lock;
unsigned char writeverf[8];

/*
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7ed02fb88a36..6320491018f8 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1523,7 +1523,7 @@ static __net_init int nfsd_net_init(struct net *net)
nn->nfsd4_minorversions = NULL;
nfsd4_init_leases_net(nn);
get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key));
- seqlock_init(&nn->writeverf_lock);
+ seqcount_init(&nn->writeverf_lock);

return 0;

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c7af1095f6b5..fc4e31411508 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
*/
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
{
- int seq = 0;
+ int seq;

do {
- read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
+ seq = read_seqcount_begin(&nn->writeverf_lock);
memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
- } while (need_seqretry(&nn->writeverf_lock, seq));
- done_seqretry(&nn->writeverf_lock, seq);
+ } while (read_seqcount_retry(&nn->writeverf_lock, seq));
}

static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
@@ -397,9 +396,9 @@ static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
*/
void nfsd_reset_write_verifier(struct nfsd_net *nn)
{
- write_seqlock(&nn->writeverf_lock);
+ write_seqcount_begin(&nn->writeverf_lock);
nfsd_reset_write_verifier_locked(nn);
- write_sequnlock(&nn->writeverf_lock);
+ write_seqcount_end(&nn->writeverf_lock);
}

/*

2023-10-25 17:58:13

by Chuck Lever III

[permalink] [raw]
Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

On Wed, Oct 25, 2023 at 07:39:31PM +0200, Oleg Nesterov wrote:
> Hi Chuck,
>
> Thanks for your reply. But I am already sleeping and I can't understand it.

I was responding to "I can not understand the intent." But also I
was hoping that explanation would help you provide a correct
replacement for the existing code.


> So let me ask a couple of questions.
>
> 1. Do you agree that the current nfsd_copy_write_verifier() code makes no sense?

Probably.


> I mean, the usage of read_seqbegin_or_lock() suggests that if the lockless
> pass fails it should take writeverf_lock for writing. But this can't happen,
> and thus this code doesn't look right no matter what. None of the
> read_seqbegin_or_lock/need_seqretry/done_seqretry helpers make any sense
> because "seq" is alway even.

> 2. If yes, which change do you prefer? I'd prefer the patch at the end.

Based on my limited understanding of read_seqbegin(), the patch at
the end seems cleanest and is on-point. Please post an official
version of that to linux-nfs@ with a full patch description, and
I'll see that it gets into v6.8-rc with proper tags, review, and
testing.


> Oleg.
>
> On 10/25, Chuck Lever wrote:
> >
> > On Wed, Oct 25, 2023 at 06:30:06PM +0200, Oleg Nesterov wrote:
> > > Hello,
> > >
> > > The usage of writeverf_lock is wrong and misleading no matter what and
> > > I can not understand the intent.
> >
> > The structure of the seqlock was introduced in commit 27c438f53e79
> > ("nfsd: Support the server resetting the boot verifier").
> >
> > The NFS write verifier is an 8-byte cookie that is supposed to
> > indicate the boot epoch of the server -- simply put, when the server
> > restarts, the epoch (and this verifier) changes.
> >
> > NFSv3 and later have a two-phase write scheme where the client
> > sends data to the server (known as an UNSTABLE WRITE), then later
> > asks the server to commit that data (a COMMIT). Before the COMMIT,
> > that data is not durable and the client must hold onto it until
> > the server's COMMIT Reply indicates it's safe for the client to
> > discard that data and move on.
> >
> > When an UNSTABLE WRITE is done, the server reports its current
> > epoch as part of each WRITE Reply. If this verifier cookie changes,
> > the client knows that the server might have lost previously
> > written written-but-uncommitted data, so it must send the WRITEs
> > again in that (rare) case.
> >
> > NFSD abuses this slightly by changing the write verifier whenever
> > there is an underlying local write error that might have occurred in
> > the background (ie, there was no WRITE or COMMIT operation at the
> > time that the server could use to convey the error back to the
> > client). This is supposed to trigger clients to send UNSTABLE WRITEs
> > again to ensure that data is properly committed to durable storage.
> >
> > The point of the seqlock is to ensure that
> >
> > a) a write verifier update does not tear the verifier
> > b) a write verifier read does not see a torn verifier
> >
> > This is a hot path, so we don't want a full spinlock to achieve
> > a) and b).
> >
> > Way back when, the verifier was updated by two separate 32-bit
> > stores; hence the risk of tearing.
> >
> >
> > > nfsd_copy_write_verifier() uses read_seqbegin_or_lock() incorrectly.
> > > "seq" is always even, so read_seqbegin_or_lock() can never take the
> > > lock for writing. We need to make the counter odd for the 2nd round:
> > >
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -359,11 +359,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> > > */
> > > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> > > {
> > > - int seq = 0;
> > > + int seq, nextseq = 0;
> > >
> > > do {
> > > + seq = nextseq;
> > > read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> > > memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> > > + /* If lockless access failed, take the lock. */
> > > + nextseq = 1;
> > > } while (need_seqretry(&nn->writeverf_lock, seq));
> > > done_seqretry(&nn->writeverf_lock, seq);
> > > }
> > >
> > > OTOH. This function just copies 8 bytes, this makes me think that it doesn't
> > > need the conditional locking and read_seqbegin_or_lock() at all. So perhaps
> > > the (untested) patch below makes more sense? Please note that it should not
> > > change the current behaviour, it just makes the code look correct (and more
> > > optimal but this is minor).
> > >
> > > Another question is why we can't simply turn nn->writeverf into seqcount_t.
> > > I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to
> > > serialise with itself, right?
> >
> > "reset" is supposed to be very rare operation. Using a lock in that
> > case is probably quite acceptable, as long as reading the verifier
> > is wait-free and guaranteed to be untorn.
> >
> > But a seqcount_t is only 32 bits.
> >
> >
> > > Oleg.
> > > ---
> > >
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index c7af1095f6b5..094b765c5397 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> > > */
> > > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> > > {
> > > - int seq = 0;
> > > + unsigned seq;
> > >
> > > do {
> > > - read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> > > + seq = read_seqbegin(&nn->writeverf_lock);
> > > memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> > > - } while (need_seqretry(&nn->writeverf_lock, seq));
> > > - done_seqretry(&nn->writeverf_lock, seq);
> > > + } while (read_seqretry(&nn->writeverf_lock, seq));
> > > }
> > >
> > > static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
> > >
> >
> > --
> > Chuck Lever
> >
>

--
Chuck Lever

2023-10-25 18:09:20

by Chuck Lever III

[permalink] [raw]
Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

On Wed, Oct 25, 2023 at 07:54:36PM +0200, Oleg Nesterov wrote:
> On 10/25, Chuck Lever wrote:
> >
> > > Another question is why we can't simply turn nn->writeverf into seqcount_t.
> > > I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to
> > > serialise with itself, right?
> >
> > "reset" is supposed to be very rare operation. Using a lock in that
> > case is probably quite acceptable, as long as reading the verifier
> > is wait-free and guaranteed to be untorn.
> >
> > But a seqcount_t is only 32 bits.
>
> Again, I don't understand you.
>
> Once again, we can turn writeverf into seqcount_t, see the patch below.

The patch below does not turn "writeverf" into a seqcount_t, it
turns "writeverf_lock" into a seqcount_t. "writeverf" is an 8-byte
field, that's why I said "seqcount_t is only 32 bits" -- that is
not an adequate replacement for the 8-byte "writeverf" field.

Your original proposal made no sense. But I see now what you
would like to change.

I'm not familiar enough with these primitives to have a strong
opinion. What do you think would be the benefit?


> But this way nfsd_reset_write_verifier() can race with itself, no?

> Oleg
> ---
>
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index ec49b200b797..3e2adf3eb15f 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -110,7 +110,7 @@ struct nfsd_net {
> bool nfsd_net_up;
> bool lockd_up;
>
> - seqlock_t writeverf_lock;
> + seqcount_t writeverf_lock;
> unsigned char writeverf[8];
>
> /*
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 7ed02fb88a36..6320491018f8 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1523,7 +1523,7 @@ static __net_init int nfsd_net_init(struct net *net)
> nn->nfsd4_minorversions = NULL;
> nfsd4_init_leases_net(nn);
> get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key));
> - seqlock_init(&nn->writeverf_lock);
> + seqcount_init(&nn->writeverf_lock);
>
> return 0;
>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index c7af1095f6b5..fc4e31411508 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> */
> void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> {
> - int seq = 0;
> + int seq;
>
> do {
> - read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> + seq = read_seqcount_begin(&nn->writeverf_lock);
> memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> - } while (need_seqretry(&nn->writeverf_lock, seq));
> - done_seqretry(&nn->writeverf_lock, seq);
> + } while (read_seqcount_retry(&nn->writeverf_lock, seq));
> }
>
> static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
> @@ -397,9 +396,9 @@ static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
> */
> void nfsd_reset_write_verifier(struct nfsd_net *nn)
> {
> - write_seqlock(&nn->writeverf_lock);
> + write_seqcount_begin(&nn->writeverf_lock);
> nfsd_reset_write_verifier_locked(nn);
> - write_sequnlock(&nn->writeverf_lock);
> + write_seqcount_end(&nn->writeverf_lock);
> }
>
> /*
>

--
Chuck Lever

2023-10-25 18:12:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock()

On 10/25, Chuck Lever wrote:
>
> On Wed, Oct 25, 2023 at 07:39:31PM +0200, Oleg Nesterov wrote:
> > Hi Chuck,
> >
> > Thanks for your reply. But I am already sleeping and I can't understand it.
>
> I was responding to "I can not understand the intent." But also I
> was hoping that explanation would help you provide a correct
> replacement for the existing code.

In case I was not clear, I have already provided the replacement for the
existing code which looks "correct" for me ;) Nevermind, please forget.

> > 1. Do you agree that the current nfsd_copy_write_verifier() code makes no sense?
>
> Probably.
>
>
> > I mean, the usage of read_seqbegin_or_lock() suggests that if the lockless
> > pass fails it should take writeverf_lock for writing. But this can't happen,
> > and thus this code doesn't look right no matter what. None of the
> > read_seqbegin_or_lock/need_seqretry/done_seqretry helpers make any sense
> > because "seq" is alway even.
>
> > 2. If yes, which change do you prefer? I'd prefer the patch at the end.
>
> Based on my limited understanding of read_seqbegin(), the patch at
> the end seems cleanest and is on-point. Please post an official
> version of that to linux-nfs@ with a full patch description, and
> I'll see that it gets into v6.8-rc with proper tags, review, and
> testing.

Ok, will do tomorrow.

Thanks,

Oleg.

2023-10-26 14:52:29

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] nfsd_copy_write_verifier: use read_seqbegin() rather than read_seqbegin_or_lock()

The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect,
this code can never take ->writeverf_lock for writing.

I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
and nfsd_reset_write_verifier() is supposed to be very rare operation
so we do not need the adaptive locking in this case.

Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
without changing the behaviour.

Signed-off-by: Oleg Nesterov <[email protected]>
---
fs/nfsd/nfssvc.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c7af1095f6b5..094b765c5397 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
*/
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
{
- int seq = 0;
+ unsigned seq;

do {
- read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
+ seq = read_seqbegin(&nn->writeverf_lock);
memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
- } while (need_seqretry(&nn->writeverf_lock, seq));
- done_seqretry(&nn->writeverf_lock, seq);
+ } while (read_seqretry(&nn->writeverf_lock, seq));
}

static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
--
2.25.1.362.g51ebf55


2023-10-27 15:53:57

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] nfsd_copy_write_verifier: use read_seqbegin() rather than read_seqbegin_or_lock()

On Thu, Oct 26, 2023 at 04:50:18PM +0200, Oleg Nesterov wrote:
> The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
> is wrong. "seq" is always even and thus "or_lock" has no effect,
> this code can never take ->writeverf_lock for writing.
>
> I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
> and nfsd_reset_write_verifier() is supposed to be very rare operation
> so we do not need the adaptive locking in this case.
>
> Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
> without changing the behaviour.

I was debating whether to add Fixes/Cc-stable, but if the behavior
doesn't change, this doesn't need a backport.

Just waiting now on R-b.


> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> fs/nfsd/nfssvc.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index c7af1095f6b5..094b765c5397 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> */
> void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> {
> - int seq = 0;
> + unsigned seq;
>
> do {
> - read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> + seq = read_seqbegin(&nn->writeverf_lock);
> memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> - } while (need_seqretry(&nn->writeverf_lock, seq));
> - done_seqretry(&nn->writeverf_lock, seq);
> + } while (read_seqretry(&nn->writeverf_lock, seq));
> }
>
> static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)
> --
> 2.25.1.362.g51ebf55
>
>

--
Chuck Lever

2023-10-27 19:36:10

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] nfsd_copy_write_verifier: use read_seqbegin() rather than read_seqbegin_or_lock()

On 10/27, Chuck Lever wrote:
>
> On Thu, Oct 26, 2023 at 04:50:18PM +0200, Oleg Nesterov wrote:
> > The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
> > is wrong. "seq" is always even and thus "or_lock" has no effect,
> > this code can never take ->writeverf_lock for writing.
> >
> > I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
> > and nfsd_reset_write_verifier() is supposed to be very rare operation
> > so we do not need the adaptive locking in this case.
> >
> > Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
> > without changing the behaviour.
>
> I was debating whether to add Fixes/Cc-stable, but if the behavior
> doesn't change, this doesn't need a backport.

Yes, yes, sorry for confusion. This code is not buggy. Just a) it looks
confusing because read_seqbegin_or_lock() doesn't do what it is supposed
to do, and b) I am going to change the semantics of done_seqretry() to
enforce the locking on the 2nd pass.

Chuck, I can reword the changelog to make it more clear and send V2 if
you think this makes sense.

Thanks,

Oleg.

2023-10-27 19:41:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] nfsd_copy_write_verifier: use read_seqbegin() rather than read_seqbegin_or_lock()



> On Oct 27, 2023, at 12:34 PM, Oleg Nesterov <[email protected]> wrote:
>
> On 10/27, Chuck Lever wrote:
>>
>> On Thu, Oct 26, 2023 at 04:50:18PM +0200, Oleg Nesterov wrote:
>>> The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
>>> is wrong. "seq" is always even and thus "or_lock" has no effect,
>>> this code can never take ->writeverf_lock for writing.
>>>
>>> I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
>>> and nfsd_reset_write_verifier() is supposed to be very rare operation
>>> so we do not need the adaptive locking in this case.
>>>
>>> Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
>>> without changing the behaviour.
>>
>> I was debating whether to add Fixes/Cc-stable, but if the behavior
>> doesn't change, this doesn't need a backport.
>
> Yes, yes, sorry for confusion. This code is not buggy. Just a) it looks
> confusing because read_seqbegin_or_lock() doesn't do what it is supposed
> to do, and b) I am going to change the semantics of done_seqretry() to
> enforce the locking on the 2nd pass.
>
> Chuck, I can reword the changelog to make it more clear and send V2 if
> you think this makes sense.

No confusion, the changelog is clear to me. I'm simply stating
my intention for other reviewers and the lore archive that I
will leave off Fixes/Cc-stable when I commit your patch.

So far there has been no review comment that suggests we need a v2.


--
Chuck Lever


2023-10-27 20:28:42

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd_copy_write_verifier: use read_seqbegin() rather than read_seqbegin_or_lock()

On Thu, 2023-10-26 at 16:50 +0200, Oleg Nesterov wrote:
> The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
> is wrong. "seq" is always even and thus "or_lock" has no effect,
> this code can never take ->writeverf_lock for writing.
>
> I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
> and nfsd_reset_write_verifier() is supposed to be very rare operation
> so we do not need the adaptive locking in this case.
>
> Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
> without changing the behaviour.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> fs/nfsd/nfssvc.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index c7af1095f6b5..094b765c5397 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn)
> */
> void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn)
> {
> - int seq = 0;
> + unsigned seq;
>
> do {
> - read_seqbegin_or_lock(&nn->writeverf_lock, &seq);
> + seq = read_seqbegin(&nn->writeverf_lock);
> memcpy(verf, nn->writeverf, sizeof(nn->writeverf));
> - } while (need_seqretry(&nn->writeverf_lock, seq));
> - done_seqretry(&nn->writeverf_lock, seq);
> + } while (read_seqretry(&nn->writeverf_lock, seq));
> }
>
> static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn)

Reviewed-by: Jeff Layton <[email protected]>