Received: by 10.192.165.148 with SMTP id m20csp4674312imm; Tue, 1 May 2018 01:23:43 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoaJaUpF8j/WcW5wCU3MXKppu545XmwYuM4pPV1tezu6KNe/SajAETMvsR5U82zEZpu3jKS X-Received: by 2002:a63:7a43:: with SMTP id j3-v6mr12241526pgn.172.1525163023165; Tue, 01 May 2018 01:23:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525163023; cv=none; d=google.com; s=arc-20160816; b=wDd6loy9+BsGbXYcszwNhQnyAPoyyLk8iRQ/MlAt2onWQxnb0BJ+0y5ti0OtoaPgYD +qUcnKz2rwUrtPpr1JJonLbm1rsKechpe+8sGWrTmfz4691P4TxY7vV9MgzJr2BXvL1F 7H4xKd1XWKo4/Yy83egpKNNpbGzKV/DHWJvf5ig9gpBuZOLxf5+/nBtwdPfrLbPUCVKN xLf0dBaR1+lAbFW3jpA+A5ZV5ArE0171AyfLmceqATHjr1fvUQvQm73RUC2inaINu/JL JE+7evI5qa3qjxoX54skCSImDUOC2c9U1mk5lcH6kcPxlGaImIc5nYNejKHmcvJZD+/2 iNtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from :arc-authentication-results; bh=0gztr8P9fSgj6H4NOJNGU6KfMNZAWWONJSS1nnJQ8Tc=; b=YtstQhsl3/hj7KOUdQIrNiHZPTpNalWdr6maI4/GXb9k46mwuDYtxZA0iAeNtkq/9n r/stZJz9E/CL6fGDVKW1YA33ToeGhNebEKYQn+Z5vZnMk0iWhb7+gWOm9RkLmCC1Xi26 a3mnbDvokAcjlIb0H7qVzqUoWcAzmKF8ZHFonBpa7L6kv9lJAx2MkG3eVKR13VPdYAov ZeKpoAI0bKfM66fLfZV8SYN0kPvCx6Lm2fq0eT1c1zE/uGLMxAWqKz/GxbN5LfZorU13 YdeG8pw+skU6JGvqlKBVavzCSFgs9NyaUnq/a372d2RtU/xPWRqZCRh+hZuEIEXrSqeQ IIFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 6si8983234pft.35.2018.05.01.01.23.29; Tue, 01 May 2018 01:23:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753848AbeEAIW4 convert rfc822-to-8bit (ORCPT + 99 others); Tue, 1 May 2018 04:22:56 -0400 Received: from mga18.intel.com ([134.134.136.126]:26870 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752673AbeEAIWw (ORCPT ); Tue, 1 May 2018 04:22:52 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 01:22:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,350,1520924400"; d="scan'208";a="36533882" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga008.fm.intel.com with ESMTP; 01 May 2018 01:22:51 -0700 Received: from fmsmsx154.amr.corp.intel.com (10.18.116.70) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 1 May 2018 01:22:51 -0700 Received: from FMSMSX109.amr.corp.intel.com ([169.254.15.12]) by FMSMSX154.amr.corp.intel.com ([169.254.6.14]) with mapi id 14.03.0319.002; Tue, 1 May 2018 01:22:50 -0700 From: "Dilger, Andreas" To: NeilBrown CC: "Drokin, Oleg" , Greg Kroah-Hartman , James Simmons , "Linux Kernel Mailing List" , Lustre Development List Subject: Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup Thread-Topic: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup Thread-Index: AQHT4P/85u70Xh3XOk6Z467GqtGvSKQa/oeA Date: Tue, 1 May 2018 08:22:50 +0000 Message-ID: References: <152514658325.17843.11455067361317157487.stgit@noble> <152514675897.17843.15112214060540196720.stgit@noble> In-Reply-To: <152514675897.17843.15112214060540196720.stgit@noble> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.252.137.157] Content-Type: text/plain; charset="us-ascii" Content-ID: <0C93B490BA3AF447803E59C5EAC1D61B@intel.com> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Apr 30, 2018, at 21:52, NeilBrown wrote: > > The current retry logic, to wait when a 'dying' object is found, > spans multiple functions. The process is attached to a waitqueue > and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status > is passed back through lu_object_find_try() to lu_object_find_at() > where schedule() is called and the process is removed from the queue. > > This can be simplified by moving all the logic (including > hashtable locking) inside htable_lookup(), which now never returns > EAGAIN. > > Note that htable_lookup() is called with the hash bucket lock > held, and will drop and retake it if it needs to schedule. > > I made this a 'goto' loop rather than a 'while(1)' loop as the > diff is easier to read. > > Signed-off-by: NeilBrown > --- > drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++++++------------- > 1 file changed, 27 insertions(+), 46 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c > index 2bf089817157..93daa52e2535 100644 > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c > @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print); > static struct lu_object *htable_lookup(struct lu_site *s, It's probably a good idea to add a comment for this function that it may drop and re-acquire the hash bucket lock internally. > struct cfs_hash_bd *bd, > const struct lu_fid *f, > - wait_queue_entry_t *waiter, > __u64 *version) > { > + struct cfs_hash *hs = s->ls_obj_hash; > struct lu_site_bkt_data *bkt; > struct lu_object_header *h; > struct hlist_node *hnode; > - __u64 ver = cfs_hash_bd_version_get(bd); > + __u64 ver; > + wait_queue_entry_t waiter; > > - if (*version == ver) > +retry: > + ver = cfs_hash_bd_version_get(bd); > + > + if (*version == ver) { > return ERR_PTR(-ENOENT); > + } (style) we don't need the {} around a single-line if statement > *version = ver; > bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd); > @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s, > * drained), and moreover, lookup has to wait until object is freed. > */ > > - init_waitqueue_entry(waiter, current); > - add_wait_queue(&bkt->lsb_marche_funebre, waiter); > + init_waitqueue_entry(&waiter, current); > + add_wait_queue(&bkt->lsb_marche_funebre, &waiter); > set_current_state(TASK_UNINTERRUPTIBLE); > lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE); > - return ERR_PTR(-EAGAIN); > + cfs_hash_bd_unlock(hs, bd, 1); This looks like it isn't unlocking and locking the hash bucket in the same manner that it was done in the caller. Here excl = 1, but in the caller you changed it to excl = 0? > + schedule(); > + remove_wait_queue(&bkt->lsb_marche_funebre, &waiter); Is it worthwhile to use your new helper function here to get the wq from "s"? > + cfs_hash_bd_lock(hs, bd, 1); > + goto retry; > } > > /** > @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env, > } > > /** > - * Core logic of lu_object_find*() functions. > + * Much like lu_object_find(), but top level device of object is specifically > + * \a dev rather than top level device of the site. This interface allows > + * objects of different "stacking" to be created within the same site. > */ > -static struct lu_object *lu_object_find_try(const struct lu_env *env, > - struct lu_device *dev, > - const struct lu_fid *f, > - const struct lu_object_conf *conf, > - wait_queue_entry_t *waiter) > +struct lu_object *lu_object_find_at(const struct lu_env *env, > + struct lu_device *dev, > + const struct lu_fid *f, > + const struct lu_object_conf *conf) > { > struct lu_object *o; > struct lu_object *shadow; > @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, > * It is unnecessary to perform lookup-alloc-lookup-insert, instead, > * just alloc and insert directly. > * > - * If dying object is found during index search, add @waiter to the > - * site wait-queue and return ERR_PTR(-EAGAIN). > */ > if (conf && conf->loc_flags & LOC_F_NEW) > return lu_object_new(env, dev, f, conf); > > s = dev->ld_site; > hs = s->ls_obj_hash; > - cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 1); > - o = htable_lookup(s, &bd, f, waiter, &version); > - cfs_hash_bd_unlock(hs, &bd, 1); > + cfs_hash_bd_get_and_lock(hs, (void *)f, &bd, 0); > + o = htable_lookup(s, &bd, f, &version); > + cfs_hash_bd_unlock(hs, &bd, 0); Here you changed the locking to a non-exclusive (read) lock instead of an exclusive (write) lock? Why. > + > if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) > return o; > > @@ -751,7 +760,7 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, > > cfs_hash_bd_lock(hs, &bd, 1); > > - shadow = htable_lookup(s, &bd, f, waiter, &version); > + shadow = htable_lookup(s, &bd, f, &version); > if (likely(PTR_ERR(shadow) == -ENOENT)) { > cfs_hash_bd_add_locked(hs, &bd, &o->lo_header->loh_hash); > cfs_hash_bd_unlock(hs, &bd, 1); > @@ -766,34 +775,6 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, > lu_object_free(env, o); > return shadow; > } > - > -/** > - * Much like lu_object_find(), but top level device of object is specifically > - * \a dev rather than top level device of the site. This interface allows > - * objects of different "stacking" to be created within the same site. > - */ > -struct lu_object *lu_object_find_at(const struct lu_env *env, > - struct lu_device *dev, > - const struct lu_fid *f, > - const struct lu_object_conf *conf) > -{ > - wait_queue_head_t *wq; > - struct lu_object *obj; > - wait_queue_entry_t wait; > - > - while (1) { > - obj = lu_object_find_try(env, dev, f, conf, &wait); > - if (obj != ERR_PTR(-EAGAIN)) > - return obj; > - /* > - * lu_object_find_try() already added waiter into the > - * wait queue. > - */ > - schedule(); > - wq = lu_site_wq_from_fid(dev->ld_site, (void *)f); > - remove_wait_queue(wq, &wait); > - } > -} > EXPORT_SYMBOL(lu_object_find_at); > > /** > > > _______________________________________________ > lustre-devel mailing list > lustre-devel@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation