2008-02-26 18:27:23

by Pete Wyckoff

[permalink] [raw]
Subject: [PATCH 0/2] (was Re: [ofa-general] fmr pool free_list empty)

[email protected] wrote on Mon, 25 Feb 2008 15:02 -0800:
> Ugh.
[pw wrote:]
> > Looking at the FMR dirty list unmapping code in
> > ib_fmr_batch_release(), there is a section that pulls all the dirty
> > entries onto a list that it will later unmap and put back on the
> > free list.
>
> > But it also plans to unmap all the free entries that have ever been
> > remapped:
>
> Yes, this came from a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should
> flush all dirty FMRs"). That solved a real problem for Olaf, because
> otherwise dirty FMRs with not at the max map count might never get
> invalidated. It's not exactly an optimization but rather a
> correctness issue, because RDS relies on killing mapping eventually.
>
> On the other hand, this behavior clearly does lead to the possibility
> of leaving the free list temporarily empty for stupid reasons.
>
> I don't see a really good way to fix this at the momemnt, need to
> meditate a little.

Adding CCs in case some iser users are not on the openfabrics list.
Original message is here:
http://lists.openfabrics.org/pipermail/general/2008-February/047111.html

This quoted commit is a regression for iSER. Not sure if it causes
problems for the other FMR user, SRP. It went in after v2.6.24.
Following this mail are two patches. One to revert the change, and
one to attempt to do Olaf's patch in such a way that it does not
cause problems for other FMR users.

I haven't tested the patches with RDS. It apparently isn't in the
tree yet. In fact, there are no users of ib_flush_fmr_pool() in the
tree, which is the only function affected by the second patch. But
iSER is working again in my scenario.

As a side note, I don't remember seeing this patch on the
openfabrics mailing list. Perhaps I missed it. Sometimes these
sorts of interactions can be spotted if proposed changes get wider
attention.

-- Pete


2008-02-26 18:27:43

by Pete Wyckoff

[permalink] [raw]
Subject: [PATCH 1/2] Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"

This reverts commit a3cd7d9070be417a21905c997ee32d756d999b38.

The original commit breaks iSER reliably, making it complain:

iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

The FMR cleanup thread runs ib_fmr_batch_release() as dirty
entries build up. This commit causes clean but used FMR
entries also to be purged. During that process, another thread
can see that there are no free FMRs and fail, even though
there should always have been enough available.

Signed-off-by: Pete Wyckoff <[email protected]>
---
drivers/infiniband/core/fmr_pool.c | 21 ++++++---------------
1 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
index 7f00347..4044fdf 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -139,7 +139,7 @@ static inline struct ib_pool_fmr *ib_fmr_cache_lookup(struct ib_fmr_pool *pool,
static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
{
int ret;
- struct ib_pool_fmr *fmr, *next;
+ struct ib_pool_fmr *fmr;
LIST_HEAD(unmap_list);
LIST_HEAD(fmr_list);

@@ -158,20 +158,6 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
#endif
}

- /*
- * The free_list may hold FMRs that have been put there
- * because they haven't reached the max_remap count.
- * Invalidate their mapping as well.
- */
- list_for_each_entry_safe(fmr, next, &pool->free_list, list) {
- if (fmr->remap_count == 0)
- continue;
- hlist_del_init(&fmr->cache_node);
- fmr->remap_count = 0;
- list_add_tail(&fmr->fmr->list, &fmr_list);
- list_move(&fmr->list, &unmap_list);
- }
-
list_splice(&pool->dirty_list, &unmap_list);
INIT_LIST_HEAD(&pool->dirty_list);
pool->dirty_len = 0;
@@ -384,6 +370,11 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)

i = 0;
list_for_each_entry_safe(fmr, tmp, &pool->free_list, list) {
+ if (fmr->remap_count) {
+ INIT_LIST_HEAD(&fmr_list);
+ list_add_tail(&fmr->fmr->list, &fmr_list);
+ ib_unmap_fmr(&fmr_list);
+ }
ib_dealloc_fmr(fmr->fmr);
list_del(&fmr->list);
kfree(fmr);
--
1.5.4.1

2008-02-26 18:28:13

by Pete Wyckoff

[permalink] [raw]
Subject: [PATCH 2/2] ib fmr pool: flush used clean entries

Commit a3cd7d9070be417a21905c997ee32d756d999b38 (IB/fmr_pool:
ib_fmr_pool_flush() should flush all dirty FMRs) caused a
regression for iSER and was reverted in
e5507736c6449b3253347eed6f8ea77a28cf688e.

This change attempts to redo the original patch so that all used
FMR entries are flushed when ib_flush_fmr_pool() is called,
and other FMR users are not affected. Simply move used entries
from the clean list onto the dirty list before letting the
cleanup thread do its job.

Signed-off-by: Pete Wyckoff <[email protected]>
---
drivers/infiniband/core/fmr_pool.c | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
index 4044fdf..06d502c 100644
--- a/drivers/infiniband/core/fmr_pool.c
+++ b/drivers/infiniband/core/fmr_pool.c
@@ -398,8 +398,23 @@ EXPORT_SYMBOL(ib_destroy_fmr_pool);
*/
int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
{
- int serial = atomic_inc_return(&pool->req_ser);
+ int serial;
+ struct ib_pool_fmr *fmr, *next;
+
+ /*
+ * The free_list holds FMRs that may have been used
+ * but have not been remapped enough times to be dirty.
+ * Put them on the dirty list now so that the cleanup
+ * thread will reap them too.
+ */
+ spin_lock_irq(&pool->pool_lock);
+ list_for_each_entry_safe(fmr, next, &pool->free_list, list) {
+ if (fmr->remap_count > 0)
+ list_move(&fmr->list, &pool->dirty_list);
+ }
+ spin_unlock_irq(&pool->pool_lock);

+ serial = atomic_inc_return(&pool->req_ser);
wake_up_process(pool->thread);

if (wait_event_interruptible(pool->force_wait,
--
1.5.4.1

2008-02-26 19:23:21

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"

Pete, the subject says "PATCH 1/2" but I didn't see any follow-up message
for PATCH 2/2. Just wondering :)

Benny

On Feb. 26, 2008, 10:27 -0800, Pete Wyckoff <[email protected]> wrote:
> This reverts commit a3cd7d9070be417a21905c997ee32d756d999b38.
>
> The original commit breaks iSER reliably, making it complain:
>
> iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11
>
> The FMR cleanup thread runs ib_fmr_batch_release() as dirty
> entries build up. This commit causes clean but used FMR
> entries also to be purged. During that process, another thread
> can see that there are no free FMRs and fail, even though
> there should always have been enough available.
>
> Signed-off-by: Pete Wyckoff <[email protected]>
> ---
> drivers/infiniband/core/fmr_pool.c | 21 ++++++---------------
> 1 files changed, 6 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c
> index 7f00347..4044fdf 100644
> --- a/drivers/infiniband/core/fmr_pool.c
> +++ b/drivers/infiniband/core/fmr_pool.c
> @@ -139,7 +139,7 @@ static inline struct ib_pool_fmr *ib_fmr_cache_lookup(struct ib_fmr_pool *pool,
> static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
> {
> int ret;
> - struct ib_pool_fmr *fmr, *next;
> + struct ib_pool_fmr *fmr;
> LIST_HEAD(unmap_list);
> LIST_HEAD(fmr_list);
>
> @@ -158,20 +158,6 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
> #endif
> }
>
> - /*
> - * The free_list may hold FMRs that have been put there
> - * because they haven't reached the max_remap count.
> - * Invalidate their mapping as well.
> - */
> - list_for_each_entry_safe(fmr, next, &pool->free_list, list) {
> - if (fmr->remap_count == 0)
> - continue;
> - hlist_del_init(&fmr->cache_node);
> - fmr->remap_count = 0;
> - list_add_tail(&fmr->fmr->list, &fmr_list);
> - list_move(&fmr->list, &unmap_list);
> - }
> -
> list_splice(&pool->dirty_list, &unmap_list);
> INIT_LIST_HEAD(&pool->dirty_list);
> pool->dirty_len = 0;
> @@ -384,6 +370,11 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
>
> i = 0;
> list_for_each_entry_safe(fmr, tmp, &pool->free_list, list) {
> + if (fmr->remap_count) {
> + INIT_LIST_HEAD(&fmr_list);
> + list_add_tail(&fmr->fmr->list, &fmr_list);
> + ib_unmap_fmr(&fmr_list);
> + }
> ib_dealloc_fmr(fmr->fmr);
> list_del(&fmr->list);
> kfree(fmr);

2008-02-26 19:39:19

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"

On Tue, Feb 26, 2008 at 11:23:01AM -0800, Benny Halevy wrote:
> Pete, the subject says "PATCH 1/2" but I didn't see any follow-up message
> for PATCH 2/2. Just wondering :)

I think the problem's on your end ... I got it and so did marc:
http://marc.info/?l=linux-scsi&m=120405067313933&w=2

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-02-26 19:48:55

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 1/2] Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"

Diabolical ;-)
Thanks for the pointer!

Benny

On Feb. 26, 2008, 11:39 -0800, Matthew Wilcox <[email protected]> wrote:
> On Tue, Feb 26, 2008 at 11:23:01AM -0800, Benny Halevy wrote:
>> Pete, the subject says "PATCH 1/2" but I didn't see any follow-up message
>> for PATCH 2/2. Just wondering :)
>
> I think the problem's on your end ... I got it and so did marc:
> http://marc.info/?l=linux-scsi&m=120405067313933&w=2
>

2008-02-26 20:10:17

by Roland Dreier

[permalink] [raw]
Subject: Re: [ofa-general] [PATCH 2/2] ib fmr pool: flush used clean entries

This looks like a really nice approach to me. Olaf?

- R.

2008-02-26 21:59:44

by Olaf Kirch

[permalink] [raw]
Subject: Re: [ofa-general] [PATCH 2/2] ib fmr pool: flush used clean entries

On Tuesday 26 February 2008 21:09, Roland Dreier wrote:
> This looks like a really nice approach to me. Olaf?

Yes, this looks good. I haven't had a chance to test it,
but it looks like the right approach.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax