Hi Al,
On Fri, 26 Feb 2016, Al Viro wrote:
> You have, modulo printks and BUG_ON(),
> {
> struct dentry *realdn;
> /* dn must be unhashed */
> if (!d_unhashed(dn))
> d_drop(dn);
> realdn = d_splice_alias(in, dn);
> if (IS_ERR(realdn)) {
> if (prehash)
> *prehash = false; /* don't rehash on error */
> dn = realdn; /* note realdn contains the error */
> goto out;
> } else if (realdn) {
> dput(dn);
> dn = realdn;
> }
> if ((!prehash || *prehash) && d_unhashed(dn))
> d_rehash(dn);
>
> When d_splice_alias() returns NULL it has hashed the dentry you'd given it;
> when it returns a different dentry, that dentry is also returned hashed.
> IOW, d_rehash(dn) in there should never be called.
>
> If you have a case when it _is_ called, you've found a bug somewhere and
> I'd like to see details. AFAICS, the whole prehash thing appears to be
> pointless - even the place where we modify *prehash, since in that case
> we return ERR_PTR() and the only caller passing non-NULL prehash (&have_lease)
> buggers off on such return value past all code that would look at have_lease
> value.
Right.
> One possible reading is that you want to prevent hashing in !have_lease
> case of
> dn = splice_dentry(dn, in, &have_lease);
> If that's the case, you might have a problem, since it will be hashed no
> matter what...
In this case it doesn't actually matter if it is hashed or not, since
we will look at the lease state on the dentry before trusting it...
This code dates back to when Ceph was originally upstreamed, so the
history is murky, but I expect at that point I wanted to avoid hashing in
the no-lease case. But I don't think it matters. We should just remove
the prehash argument from splice_dentry entirely.
Zheng, does that sound right?
Thanks!
sage
> On Mar 1, 2016, at 22:50, Sage Weil <[email protected]> wrote:
>
> Hi Al,
>
> On Fri, 26 Feb 2016, Al Viro wrote:
>> You have, modulo printks and BUG_ON(),
>> {
>> struct dentry *realdn;
>> /* dn must be unhashed */
>> if (!d_unhashed(dn))
>> d_drop(dn);
>> realdn = d_splice_alias(in, dn);
>> if (IS_ERR(realdn)) {
>> if (prehash)
>> *prehash = false; /* don't rehash on error */
>> dn = realdn; /* note realdn contains the error */
>> goto out;
>> } else if (realdn) {
>> dput(dn);
>> dn = realdn;
>> }
>> if ((!prehash || *prehash) && d_unhashed(dn))
>> d_rehash(dn);
>>
>> When d_splice_alias() returns NULL it has hashed the dentry you'd given it;
>> when it returns a different dentry, that dentry is also returned hashed.
>> IOW, d_rehash(dn) in there should never be called.
>>
>> If you have a case when it _is_ called, you've found a bug somewhere and
>> I'd like to see details. AFAICS, the whole prehash thing appears to be
>> pointless - even the place where we modify *prehash, since in that case
>> we return ERR_PTR() and the only caller passing non-NULL prehash (&have_lease)
>> buggers off on such return value past all code that would look at have_lease
>> value.
>
> Right.
>
>> One possible reading is that you want to prevent hashing in !have_lease
>> case of
>> dn = splice_dentry(dn, in, &have_lease);
>> If that's the case, you might have a problem, since it will be hashed no
>> matter what...
>
> In this case it doesn't actually matter if it is hashed or not, since
> we will look at the lease state on the dentry before trusting it...
>
> This code dates back to when Ceph was originally upstreamed, so the
> history is murky, but I expect at that point I wanted to avoid hashing in
> the no-lease case. But I don't think it matters. We should just remove
> the prehash argument from splice_dentry entirely.
>
> Zheng, does that sound right?
Yes. I think we can remove the d_rehash(dn) call and rehash parameter.
Regards
Yan, Zheng
On Wed, Mar 02, 2016 at 11:00:01AM +0800, Yan, Zheng wrote:
> > This code dates back to when Ceph was originally upstreamed, so the
> > history is murky, but I expect at that point I wanted to avoid hashing in
> > the no-lease case. But I don't think it matters. We should just remove
> > the prehash argument from splice_dentry entirely.
> >
> > Zheng, does that sound right?
>
> Yes. I think we can remove the d_rehash(dn) call and rehash parameter.
Another question in the same general area:
/* null dentry? */
if (!rinfo->head->is_target) {
dout("fill_trace null dentry\n");
if (d_really_is_positive(dn)) {
ceph_dir_clear_ordered(dir);
dout("d_delete %p\n", dn);
d_delete(dn);
} else {
dout("d_instantiate %p NULL\n", dn);
d_instantiate(dn, NULL);
if (have_lease && d_unhashed(dn))
d_rehash(dn);
update_dentry_lease(dn, rinfo->dlease,
session,
req->r_request_started);
}
goto done;
}
What's that d_instantiate() about? We have just checked that it's
negative; what's the point of setting ->d_inode to NULL again? Would it
be OK if we just do
} else {
if (have_lease && d_unhashed(dn))
d_add(dn, NULL);
update_dentry_lease(dn, rinfo->dlease,
session,
req->r_request_started);
}
in there? As an aside, tracking back to the originating fs method is
painful as hell ;-/ I _think_ that rehash can be hit during ->lookup()
returning a negative, but I wouldn't bet a dime on it not happening from
other methods... AFAICS, the change should be OK regardless of what
it's been called from, but... _ouch_. Is is documented anywhere public?
On Mon, 7 Mar 2016, Al Viro wrote:
> On Wed, Mar 02, 2016 at 11:00:01AM +0800, Yan, Zheng wrote:
>
> > > This code dates back to when Ceph was originally upstreamed, so the
> > > history is murky, but I expect at that point I wanted to avoid hashing in
> > > the no-lease case. But I don't think it matters. We should just remove
> > > the prehash argument from splice_dentry entirely.
> > >
> > > Zheng, does that sound right?
> >
> > Yes. I think we can remove the d_rehash(dn) call and rehash parameter.
>
> Another question in the same general area:
> /* null dentry? */
> if (!rinfo->head->is_target) {
> dout("fill_trace null dentry\n");
> if (d_really_is_positive(dn)) {
> ceph_dir_clear_ordered(dir);
> dout("d_delete %p\n", dn);
> d_delete(dn);
> } else {
> dout("d_instantiate %p NULL\n", dn);
> d_instantiate(dn, NULL);
> if (have_lease && d_unhashed(dn))
> d_rehash(dn);
> update_dentry_lease(dn, rinfo->dlease,
> session,
> req->r_request_started);
> }
> goto done;
> }
> What's that d_instantiate() about? We have just checked that it's
> negative; what's the point of setting ->d_inode to NULL again? Would it
> be OK if we just do
> } else {
> if (have_lease && d_unhashed(dn))
> d_add(dn, NULL);
> update_dentry_lease(dn, rinfo->dlease,
> session,
> req->r_request_started);
> }
> in there?
That looks okay, but changing d_rehash to d_add still means you're doing
te d_instantiate(dn, NULL) in the d_unhashed case; is there a reason you
changed that line? Is the dentry_rcuwalk_invalidate in __d_instantiate is
important before rehashing?
> As an aside, tracking back to the originating fs method is
> painful as hell ;-/ I _think_ that rehash can be hit during ->lookup()
> returning a negative, but I wouldn't bet a dime on it not happening from
> other methods... AFAICS, the change should be OK regardless of what
> it's been called from, but... _ouch_. Is is documented anywhere public?
It is a pain to follow, yes. FWIW this whole block is predicated in
req->r_locked_dir being non-NULL (i.e., VFS holds dir->i_mutex), which is
only true for lookup, create operations (mkdir/mknod/symlink/etc.),
atomic_open, and the .get_name export op. There's not much documentation
beyond a description of the meaning of fields (e.g. r_locked_dir) in
fs/ceph/mds_client.h ...
sage
On Mon, Mar 07, 2016 at 06:25:13AM -0500, Sage Weil wrote:
> That looks okay, but changing d_rehash to d_add still means you're doing
> te d_instantiate(dn, NULL) in the d_unhashed case; is there a reason you
> changed that line? Is the dentry_rcuwalk_invalidate in __d_instantiate is
> important before rehashing?
d_add() gets uninlined in my queue, along with some locking massage to avoid
extra bouncing of ->d_lock we do for no good reason. So that call is also
going away.
> > As an aside, tracking back to the originating fs method is
> > painful as hell ;-/ I _think_ that rehash can be hit during ->lookup()
> > returning a negative, but I wouldn't bet a dime on it not happening from
> > other methods... AFAICS, the change should be OK regardless of what
> > it's been called from, but... _ouch_. Is is documented anywhere public?
>
> It is a pain to follow, yes. FWIW this whole block is predicated in
> req->r_locked_dir being non-NULL (i.e., VFS holds dir->i_mutex), which is
> only true for lookup, create operations (mkdir/mknod/symlink/etc.),
> atomic_open, and the .get_name export op. There's not much documentation
> beyond a description of the meaning of fields (e.g. r_locked_dir) in
> fs/ceph/mds_client.h ...
Yes. Now consider the plans to make ->i_mutex an rwsem and weaken it for
->lookup() (i.e. held only shared when it's held for lookup alone). IOW,
the aforementioned queue... I would rather avoid d_rehash() in ->lookup()
instances; the tricky part is that we want to avoid parallel lookups on
the *same* name, so we need an "in parallel lookup" state for dentries and
mechanism allowing to wait for it to be finished. d_add()/d_splice_alias()
on such a dentry would take it out of that state and I would rather
avoid dealing with it in d_rehash()...
I was going to post the series last week, got sidetracked on various fun
things I'd found (ecryptfs and lustre). Hopefully will get all of that
sorted out and the whole series posted for review tomorrow or on Wednesday...