2008-06-13 14:56:24

by Stefan Roscher

[permalink] [raw]
Subject: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

During corner case testing, we noticed that some versions of ehca
do not properly transition to interrupt done in special load situations.
This can be resolved by periodically triggering EOI through H_EOI,
if eqes are pending.

Signed-off-by: Stefan Roscher <[email protected]>
---
As firmware team suggested I moved the call of the EOI h_call into
the handler function, this ensures that we will call EOI only when we
find a valid eqe on the event queue.
Additionally I changed the calculation of the xirr value as Roland suggested.

drivers/infiniband/hw/ehca/ehca_irq.c | 9 +++++++--
drivers/infiniband/hw/ehca/hcp_if.c | 10 ++++++++++
drivers/infiniband/hw/ehca/hcp_if.h | 1 +
3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index ce1ab05..0792d93 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -531,7 +531,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
{
struct ehca_eq *eq = &shca->eq;
struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache;
- u64 eqe_value;
+ u64 eqe_value, ret;
unsigned long flags;
int eqe_cnt, i;
int eq_empty = 0;
@@ -583,8 +583,13 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
ehca_dbg(&shca->ib_device,
"No eqe found for irq event");
goto unlock_irq_spinlock;
- } else if (!is_irq)
+ } else if (!is_irq) {
+ ret = hipz_h_eoi(eq->ist);
+ if (ret != H_SUCCESS)
+ ehca_err(&shca->ib_device,
+ "bad return code EOI -rc = %ld\n", ret);
ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt);
+ }
if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
ehca_dbg(&shca->ib_device, "too many eqes for one irq event");
/* enable irq for new packets */
diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
index 5245e13..415d3a4 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.c
+++ b/drivers/infiniband/hw/ehca/hcp_if.c
@@ -933,3 +933,13 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
r_cb,
0, 0, 0, 0);
}
+
+u64 hipz_h_eoi(int irq)
+{
+ unsigned long xirr;
+
+ iosync();
+ xirr = (0xffULL << 24) | irq;
+
+ return plpar_hcall_norets(H_EOI, xirr);
+}
diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
index 60ce02b..2c3c6e0 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.h
+++ b/drivers/infiniband/hw/ehca/hcp_if.h
@@ -260,5 +260,6 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
const u64 ressource_handle,
void *rblock,
unsigned long *byte_count);
+u64 hipz_h_eoi(int irq);

#endif /* __HCP_IF_H__ */
--
1.5.5


2008-06-19 22:35:53

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

> During corner case testing, we noticed that some versions of ehca
> do not properly transition to interrupt done in special load situations.
> This can be resolved by periodically triggering EOI through H_EOI,
> if eqes are pending.
>
> Signed-off-by: Stefan Roscher <[email protected]>
> ---
> As firmware team suggested I moved the call of the EOI h_call into
> the handler function, this ensures that we will call EOI only when we
> find a valid eqe on the event queue.
> Additionally I changed the calculation of the xirr value as Roland suggested.

paulus / benh -- does this version still get your ack? Seems that fw
team is OK with it according to Stefan...

If so I will add this to my tree for 2.6.27.

> diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
> index ce1ab05..0792d93 100644
> --- a/drivers/infiniband/hw/ehca/ehca_irq.c
> +++ b/drivers/infiniband/hw/ehca/ehca_irq.c
> @@ -531,7 +531,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
> {
> struct ehca_eq *eq = &shca->eq;
> struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache;
> - u64 eqe_value;
> + u64 eqe_value, ret;
> unsigned long flags;
> int eqe_cnt, i;
> int eq_empty = 0;
> @@ -583,8 +583,13 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
> ehca_dbg(&shca->ib_device,
> "No eqe found for irq event");
> goto unlock_irq_spinlock;
> - } else if (!is_irq)
> + } else if (!is_irq) {
> + ret = hipz_h_eoi(eq->ist);
> + if (ret != H_SUCCESS)
> + ehca_err(&shca->ib_device,
> + "bad return code EOI -rc = %ld\n", ret);
> ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt);
> + }
> if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
> ehca_dbg(&shca->ib_device, "too many eqes for one irq event");
> /* enable irq for new packets */
> diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
> index 5245e13..415d3a4 100644
> --- a/drivers/infiniband/hw/ehca/hcp_if.c
> +++ b/drivers/infiniband/hw/ehca/hcp_if.c
> @@ -933,3 +933,13 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
> r_cb,
> 0, 0, 0, 0);
> }
> +
> +u64 hipz_h_eoi(int irq)
> +{
> + unsigned long xirr;
> +
> + iosync();
> + xirr = (0xffULL << 24) | irq;
> +
> + return plpar_hcall_norets(H_EOI, xirr);
> +}
> diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
> index 60ce02b..2c3c6e0 100644
> --- a/drivers/infiniband/hw/ehca/hcp_if.h
> +++ b/drivers/infiniband/hw/ehca/hcp_if.h
> @@ -260,5 +260,6 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
> const u64 ressource_handle,
> void *rblock,
> unsigned long *byte_count);
> +u64 hipz_h_eoi(int irq);
>
> #endif /* __HCP_IF_H__ */
> --
> 1.5.5
>
>

2008-06-22 00:32:27

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH REPOST #2] IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts

On Fri, 2008-06-13 at 16:55 +0200, Stefan Roscher wrote:
> During corner case testing, we noticed that some versions of ehca
> do not properly transition to interrupt done in special load situations.
> This can be resolved by periodically triggering EOI through H_EOI,
> if eqes are pending.
>
> Signed-off-by: Stefan Roscher <[email protected]>

Acked-by: Benjamin Herrenschmidt <[email protected]>

---

> As firmware team suggested I moved the call of the EOI h_call into
> the handler function, this ensures that we will call EOI only when we
> find a valid eqe on the event queue.
> Additionally I changed the calculation of the xirr value as Roland suggested.
>
> drivers/infiniband/hw/ehca/ehca_irq.c | 9 +++++++--
> drivers/infiniband/hw/ehca/hcp_if.c | 10 ++++++++++
> drivers/infiniband/hw/ehca/hcp_if.h | 1 +
> 3 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
> index ce1ab05..0792d93 100644
> --- a/drivers/infiniband/hw/ehca/ehca_irq.c
> +++ b/drivers/infiniband/hw/ehca/ehca_irq.c
> @@ -531,7 +531,7 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
> {
> struct ehca_eq *eq = &shca->eq;
> struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache;
> - u64 eqe_value;
> + u64 eqe_value, ret;
> unsigned long flags;
> int eqe_cnt, i;
> int eq_empty = 0;
> @@ -583,8 +583,13 @@ void ehca_process_eq(struct ehca_shca *shca, int is_irq)
> ehca_dbg(&shca->ib_device,
> "No eqe found for irq event");
> goto unlock_irq_spinlock;
> - } else if (!is_irq)
> + } else if (!is_irq) {
> + ret = hipz_h_eoi(eq->ist);
> + if (ret != H_SUCCESS)
> + ehca_err(&shca->ib_device,
> + "bad return code EOI -rc = %ld\n", ret);
> ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt);
> + }
> if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
> ehca_dbg(&shca->ib_device, "too many eqes for one irq event");
> /* enable irq for new packets */
> diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
> index 5245e13..415d3a4 100644
> --- a/drivers/infiniband/hw/ehca/hcp_if.c
> +++ b/drivers/infiniband/hw/ehca/hcp_if.c
> @@ -933,3 +933,13 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
> r_cb,
> 0, 0, 0, 0);
> }
> +
> +u64 hipz_h_eoi(int irq)
> +{
> + unsigned long xirr;
> +
> + iosync();
> + xirr = (0xffULL << 24) | irq;
> +
> + return plpar_hcall_norets(H_EOI, xirr);
> +}
> diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
> index 60ce02b..2c3c6e0 100644
> --- a/drivers/infiniband/hw/ehca/hcp_if.h
> +++ b/drivers/infiniband/hw/ehca/hcp_if.h
> @@ -260,5 +260,6 @@ u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
> const u64 ressource_handle,
> void *rblock,
> unsigned long *byte_count);
> +u64 hipz_h_eoi(int irq);
>
> #endif /* __HCP_IF_H__ */

2008-06-23 20:23:43

by Roland Dreier

[permalink] [raw]