2018-03-20 02:51:29

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs

Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <[email protected]>
---
drivers/infiniband/hw/i40iw/i40iw_ctrl.c | 6 ++++--
drivers/infiniband/hw/i40iw/i40iw_osdep.h | 1 +
drivers/infiniband/hw/i40iw/i40iw_uk.c | 2 +-
drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index c74fd33..47f473e 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -706,9 +706,11 @@ static void i40iw_sc_ccq_arm(struct i40iw_sc_cq *ccq)
wmb(); /* make sure shadow area is updated before arming */

if (ccq->dev->is_pf)
- i40iw_wr32(ccq->dev->hw, I40E_PFPE_CQARM, ccq->cq_uk.cq_id);
+ i40iw_wr32_relaxed(ccq->dev->hw, I40E_PFPE_CQARM,
+ ccq->cq_uk.cq_id);
else
- i40iw_wr32(ccq->dev->hw, I40E_VFPE_CQARM1, ccq->cq_uk.cq_id);
+ i40iw_wr32_relaxed(ccq->dev->hw, I40E_VFPE_CQARM1,
+ ccq->cq_uk.cq_id);
}

/**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e..e06f4b9 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -213,5 +213,6 @@ void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
#define i40iw_mmiowb() mmiowb()
void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
+void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value);
u32 i40iw_rd32(struct i40iw_hw *hw, u32 reg);
#endif /* _I40IW_OSDEP_H_ */
diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
index 8afa5a6..7f0ebed 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_uk.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,

wmb(); /* make sure WQE is populated before valid bit is set */

- writel(cq->cq_id, cq->cqe_alloc_reg);
+ writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
}

/**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index ddc1056..99aa6f8 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -125,6 +125,17 @@ inline void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value)
}

/**
+ * i40iw_wr32_relaxed - write 32 bits to hw register without ordering
+ * @hw: hardware information including registers
+ * @reg: register offset
+ * @value: vvalue to write to register
+ */
+inline void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value)
+{
+ writel_relaxed(value, hw->hw_addr + reg);
+}
+
+/**
* i40iw_rd32 - read a 32 bit hw register
* @hw: hardware information including registers
* @reg: register offset
--
2.7.4



2018-03-20 14:57:52

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs

On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <[email protected]>
> drivers/infiniband/hw/i40iw/i40iw_ctrl.c | 6 ++++--
> drivers/infiniband/hw/i40iw/i40iw_osdep.h | 1 +
> drivers/infiniband/hw/i40iw/i40iw_uk.c | 2 +-
> drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
> 4 files changed, 17 insertions(+), 3 deletions(-)

The one looks fine

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2018-03-21 13:39:18

by Saleem, Shiraz

[permalink] [raw]
Subject: Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs

On Mon, Mar 19, 2018 at 08:47:45PM -0600, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <[email protected]>

Acked-by: Shiraz Saleem <[email protected]>

2018-03-21 20:04:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs

On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> index 8afa5a6..7f0ebed 100644
> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>
> wmb(); /* make sure WQE is populated before valid bit is set */
>
> - writel(cq->cq_id, cq->cqe_alloc_reg);
> + writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
> }

Ah, this one is probably not OK, i40iw_cq_request_notification is
called here:

spin_lock_irqsave(&iwcq->lock, flags);
ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
spin_unlock_irqrestore(&iwcq->lock, flags);

So this needs to add mmmiomb(); to keep the same semantics.

Generally I think you need to be very careful to ensure that any
conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
is present.

Maybe even do a first series with this obviously correct pattern:

wmb();
writel() -> writel_relaxed()
writel() -> writel_relaxed()
[..]
mmiowmb();

Jason

2018-03-21 21:04:44

by Sinan Kaya

[permalink] [raw]
Subject: Re: [PATCH v4 3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs

On 3/21/2018 3:02 PM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
>> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> index 8afa5a6..7f0ebed 100644
>> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>>
>> wmb(); /* make sure WQE is populated before valid bit is set */
>>
>> - writel(cq->cq_id, cq->cqe_alloc_reg);
>> + writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>> }
>
> Ah, this one is probably not OK, i40iw_cq_request_notification is
> called here:
>
> spin_lock_irqsave(&iwcq->lock, flags);
> ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
> spin_unlock_irqrestore(&iwcq->lock, flags);
>
> So this needs to add mmmiomb(); to keep the same semantics.
>
> Generally I think you need to be very careful to ensure that any
> conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
> is present.
>
> Maybe even do a first series with this obviously correct pattern:
>
> wmb();
> writel() -> writel_relaxed()
> writel() -> writel_relaxed()
> [..]
> mmiowmb();

Good catch. I changed it as follows:

+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,8 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,

wmb(); /* make sure WQE is populated before valid bit is set */

- writel(cq->cq_id, cq->cqe_alloc_reg);
+ writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
+ mmiowb();
}


>
> Jason
>


--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.