Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp5000881imm; Mon, 14 May 2018 17:38:00 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqf62sc1/9fahtaRIae/ZD5qcF4KZ/ArpVoJcI0sohH+MAvq/8+95qw+H2FK+e6e8g/ySj2 X-Received: by 2002:a17:902:7283:: with SMTP id d3-v6mr12188584pll.192.1526344680183; Mon, 14 May 2018 17:38:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526344680; cv=none; d=google.com; s=arc-20160816; b=z+0x1Zr9UPi6kyZRuKG0KktpsOtdneFPuPPzqXIJiLSwAeaflSYQKJpzuEerP7jx3h UfF1KdhzGWrSHm2/jB3zKRTpx37zinw8j/J4umKh/ywO+prs3+1PRM9ZL6/v6qHPV2Mt 9n70gYVnqdSoPCHzeN3nbEVr3QXaXlGiyG/bOvtuZKLusZatm9+4J21qgw/nBDOScDqg LRIViTt4dDCrSli4Gki370lx2bXdK3uBT2hL2UqedsA8O+c5R5+RN9f2MTL2og6dq9W9 Ns8IQd/PW1bWz6xF8gSVWAl7IdAUQgrsyiN0+oiSorwCLFPUmZPqs46u4qtjViuI55q1 mb8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=lURrf6k91yvZs8grGnc6Z1n3Ov72d0IidAzJXc22abY=; b=nYfgxr1QcH4/UqBc9Beiraqtcc//X8AAkkXLiRZV83bKzU7reyY5DoJzG4s5gricRq 2cLU/40wbpIrBuUNGHbHF4ZqdnDd5qGpSeawxrb+S7NBe1Oe7Od5LjA3TQNzzdX4jVEZ mHP0HzSiwI51ckXqnr8UpRQEXqNSDvVpVwUARtiHT+askBLy6JeAcgcnXKepwnAiX9GU wIPpUAofMxwvTv9x57Ki3JILBSGIiUpQ3fWJ1bdTYlVOacZCxRS3Yh6UmCTEuT14ixDr KRqVUbpE/4HD97aJiuGVmtdS8rzxD6i5llc9kuXOp/oemyZBPEV5VNwzdx1R+32xxnvC Wdbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=PoFxmlXV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g8-v6si1641456pgr.474.2018.05.14.17.37.45; Mon, 14 May 2018 17:38:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=PoFxmlXV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752414AbeEOAhd (ORCPT + 99 others); Mon, 14 May 2018 20:37:33 -0400 Received: from casper.infradead.org ([85.118.1.10]:33838 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752078AbeEOAhc (ORCPT ); Mon, 14 May 2018 20:37:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Message-ID:In-Reply-To:Subject:cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=lURrf6k91yvZs8grGnc6Z1n3Ov72d0IidAzJXc22abY=; b=PoFxmlXVQ6j16KlAQt3R/0LXD 0jxJrIu67O5XKt/7uOSW41kxHiaPk6qza/PwXkU4BvypUFrFJMnljUHuITiEYjxV32wjIoc2nghJn QLLKqPoxb3SiJ6NEaC6bPR/TYHgzadV0sQy3aYyXOgsVsHIUuGny+GnT9Upnk6OQ8VSlYtOBhzJNA YgXTkalMAd0tV08g71G6+8aPpbueAxhRn4Fg4uzIA17bq26VYuRWy0drpG8g96cu/xewRqQv09Txr 1QL6/Lv1IDKecw/coZJMau6ChYExPh1NAFqILdRKWBzHFR4U3hfq7pqepPO1q2LzV6yag+9coGhPD KovDS4lnQ==; Received: from jsimmons (helo=localhost) by casper.infradead.org with local-esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fINxl-0002jO-DI; Tue, 15 May 2018 00:37:12 +0000 Date: Tue, 15 May 2018 01:37:07 +0100 (BST) From: James Simmons To: NeilBrown cc: Greg Kroah-Hartman , devel@driverdev.osuosl.org, Andreas Dilger , Oleg Drokin , Lai Siyao , Jinshan Xiong , Linux Kernel Mailing List , Lustre Development List Subject: Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode In-Reply-To: <876044fcgg.fsf@notabene.neil.brown.name> Message-ID: References: <1525285308-15347-1-git-send-email-jsimmons@infradead.org> <1525285308-15347-5-git-send-email-jsimmons@infradead.org> <876044fcgg.fsf@notabene.neil.brown.name> User-Agent: Alpine 2.21 (LFD 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180515_013709_741458_6AB502B1 X-CRM114-Status: GOOD ( 35.39 ) X-Spam-Score: -0.0 (/) X-Spam-Report: SpamAssassin version 3.4.1 on casper.infradead.org summary: Content analysis details: (-0.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 NO_RELAYS Informational: message was not relayed via SMTP Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Wed, May 02 2018, James Simmons wrote: > > > From: Lai Siyao > > > > Currently we set LU_OBJECT_HEARD_BANSHEE on object when we want > > to remove object from cache, but this may lead to deadlock, because > > when other process lookup such object, it needs to wait for this > > object until release (done at last refcount put), while that process > > maybe already hold an LDLM lock. > > > > Now that current code can handle dying object correctly, we can just > > return such object in lookup, thus the above deadlock can be avoided. > > I think one of the reasons that I didn't apply this to mainline myself > is that "Now that" comment. When is the "now" that it is referring to? > Are were sure that all code in mainline "can handle dying objects > correctly"?? So I talked to Lai and he posted the LU-9049 ticket what patches need to land before this one. Only one patch is of concern and its for LU-9203 which doesn't apply to the staging tree since we don't have the LNet SMP updates in our tree. I saved notes about making sure LU-9203 lands together with the future LNet SMP changes. As it stands it is safe to land to staging. > > Signed-off-by: Lai Siyao > > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9049 > > Reviewed-on: https://review.whamcloud.com/26965 > > Reviewed-by: Alex Zhuravlev > > Tested-by: Cliff White > > Reviewed-by: Fan Yong > > Reviewed-by: Oleg Drokin > > Signed-off-by: James Simmons > > --- > > drivers/staging/lustre/lustre/include/lu_object.h | 2 +- > > drivers/staging/lustre/lustre/obdclass/lu_object.c | 82 +++++++++------------- > > 2 files changed, 36 insertions(+), 48 deletions(-) > > > > diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h > > index f29bbca..232063a 100644 > > --- a/drivers/staging/lustre/lustre/include/lu_object.h > > +++ b/drivers/staging/lustre/lustre/include/lu_object.h > > @@ -673,7 +673,7 @@ static inline void lu_object_get(struct lu_object *o) > > } > > > > /** > > - * Return true of object will not be cached after last reference to it is > > + * Return true if object will not be cached after last reference to it is > > * released. > > */ > > static inline int lu_object_is_dying(const struct lu_object_header *h) > > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c > > index 8b507f1..9311703 100644 > > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c > > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c > > @@ -589,19 +589,13 @@ static struct lu_object *htable_lookup(struct lu_site *s, > > const struct lu_fid *f, > > __u64 *version) > > { > > - struct cfs_hash *hs = s->ls_obj_hash; > > struct lu_site_bkt_data *bkt; > > struct lu_object_header *h; > > struct hlist_node *hnode; > > - __u64 ver; > > - wait_queue_entry_t waiter; > > + u64 ver = cfs_hash_bd_version_get(bd); > > > > -retry: > > - ver = cfs_hash_bd_version_get(bd); > > - > > - if (*version == ver) { > > + if (*version == ver) > > return ERR_PTR(-ENOENT); > > - } > > > > *version = ver; > > bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd); > > @@ -615,31 +609,13 @@ static struct lu_object *htable_lookup(struct lu_site *s, > > } > > > > h = container_of(hnode, struct lu_object_header, loh_hash); > > - if (likely(!lu_object_is_dying(h))) { > > - cfs_hash_get(s->ls_obj_hash, hnode); > > - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); > > - if (!list_empty(&h->loh_lru)) { > > - list_del_init(&h->loh_lru); > > - percpu_counter_dec(&s->ls_lru_len_counter); > > - } > > - return lu_object_top(h); > > + cfs_hash_get(s->ls_obj_hash, hnode); > > + lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); > > + if (!list_empty(&h->loh_lru)) { > > + list_del_init(&h->loh_lru); > > + percpu_counter_dec(&s->ls_lru_len_counter); > > } > > - > > - /* > > - * Lookup found an object being destroyed this object cannot be > > - * returned (to assure that references to dying objects are eventually > > - * drained), and moreover, lookup has to wait until object is freed. > > - */ > > - > > - init_waitqueue_entry(&waiter, current); > > - add_wait_queue(&bkt->lsb_marche_funebre, &waiter); > > - set_current_state(TASK_UNINTERRUPTIBLE); > > - lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE); > > - cfs_hash_bd_unlock(hs, bd, 1); > > - schedule(); > > - remove_wait_queue(&bkt->lsb_marche_funebre, &waiter); > > - cfs_hash_bd_lock(hs, bd, 1); > > - goto retry; > > + return lu_object_top(h); > > } > > > > /** > > @@ -680,6 +656,8 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev) > > } > > > > /** > > + * Core logic of lu_object_find*() functions. > > + * > > * Much like lu_object_find(), but top level device of object is specifically > > * \a dev rather than top level device of the site. This interface allows > > * objects of different "stacking" to be created within the same site. > > @@ -713,36 +691,46 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, > > * It is unnecessary to perform lookup-alloc-lookup-insert, instead, > > * just alloc and insert directly. > > * > > + * If dying object is found during index search, add @waiter to the > > + * site wait-queue and return ERR_PTR(-EAGAIN). > > It seems odd to add this comment here, when it seems to describe code > that is being removed. > I can see that this comment is added by the upstream patch > Commit: fa14bdf6b648 ("LU-9049 obdclass: change object lookup to no wait mode") > but I cannot see what it refers to. > > Otherwise that patch looks good. > > Thanks, > NeilBrown >