2018-04-23 11:02:48

by Dong Jia Shi

[permalink] [raw]
Subject: [PATCH v2 0/5] vfio: ccw: error handling fixes and improvements

Dear Reviewers,

Here is a new version for this patch series.

We didn't get agreement on patch #5 (#4 in v1) in the former cycle though,
I made it based on my understanding. We can continue discussing on it.

Changelog:
v1->v2:
- #1. Reworded commit message and comment, plus some typo fixes.
- #2. New patch.
- #3. Added the missing suggested-by Pierre.
Fixed typos.
Added sanity check on pa->pa_iova_pfn and updated comments accordingly.
- #4. Removed unused idaw_nr.
- #5. Replaced leading white spaces with TABs.
Traced the function in anycase.

Dong Jia Shi (3):
vfio: ccw: shorten kernel doc description for pfn_array_pin()
vfio: ccw: refactor and improve pfn_array_alloc_pin()
vfio: ccw: set ccw->cda to NULL defensively

Halil Pasic (2):
vfio: ccw: fix cleanup if cp_prefetch fails
vfio: ccw: add traceponits for interesting error paths

drivers/s390/cio/Makefile | 1 +
drivers/s390/cio/vfio_ccw_cp.c | 134 ++++++++++++++++++++------------------
drivers/s390/cio/vfio_ccw_fsm.c | 16 ++++-
drivers/s390/cio/vfio_ccw_trace.h | 77 ++++++++++++++++++++++
4 files changed, 164 insertions(+), 64 deletions(-)
create mode 100644 drivers/s390/cio/vfio_ccw_trace.h

--
2.13.5



2018-04-23 11:03:06

by Dong Jia Shi

[permalink] [raw]
Subject: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths

From: Halil Pasic <[email protected]>

Add some tracepoints so we can inspect what is not working as is should.

Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/Makefile | 1 +
drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++-
drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++
3 files changed, 93 insertions(+), 1 deletion(-)
create mode 100644 drivers/s390/cio/vfio_ccw_trace.h

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index a070ef0efe65..f230516abb96 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -5,6 +5,7 @@

# The following is required for define_trace.h to find ./trace.h
CFLAGS_trace.o := -I$(src)
+CFLAGS_vfio_ccw_fsm.o := -I$(src)

obj-y += airq.o blacklist.o chsc.o cio.o css.o chp.o idset.o isc.o \
fcx.o itcw.o crw.o ccwreq.o trace.o ioasm.o
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index ff6963ad6e39..90cab4e1436e 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -13,6 +13,9 @@
#include "ioasm.h"
#include "vfio_ccw_private.h"

+#define CREATE_TRACE_POINTS
+#include "vfio_ccw_trace.h"
+
static int fsm_io_helper(struct vfio_ccw_private *private)
{
struct subchannel *sch;
@@ -105,6 +108,10 @@ static void fsm_disabled_irq(struct vfio_ccw_private *private,
*/
cio_disable_subchannel(sch);
}
+inline struct subchannel_id get_schid(struct vfio_ccw_private *p)
+{
+ return p->sch->schid;
+}

/*
* Deal with the ccw command request from the userspace.
@@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
goto err_out;

io_region->ret_code = cp_prefetch(&private->cp);
+ trace_vfio_ccw_cp_prefetch(get_schid(private),
+ io_region->ret_code);
if (io_region->ret_code) {
cp_free(&private->cp);
goto err_out;
@@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private,

/* Start channel program and wait for I/O interrupt. */
io_region->ret_code = fsm_io_helper(private);
+ trace_vfio_ccw_fsm_io_helper(get_schid(private),
+ io_region->ret_code);
if (io_region->ret_code) {
cp_free(&private->cp);
goto err_out;
}
- return;
+ goto out;
} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
/* XXX: Handle halt. */
io_region->ret_code = -EOPNOTSUPP;
@@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private,

err_out:
private->state = VFIO_CCW_STATE_IDLE;
+out:
+ trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
+ io_region->ret_code);
}

/*
diff --git a/drivers/s390/cio/vfio_ccw_trace.h b/drivers/s390/cio/vfio_ccw_trace.h
new file mode 100644
index 000000000000..25128331c285
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_trace.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Tracepoints for vfio_ccw driver
+ *
+ * Copyright IBM Corp. 2018
+ *
+ * Author(s): Dong Jia Shi <[email protected]>
+ * Halil Pasic <[email protected]>
+ */
+
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vfio_ccw
+
+#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ)
+#define _VFIO_CCW_TRACE_
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(vfio_ccw_schid_errno,
+ TP_PROTO(struct subchannel_id schid, int errno),
+ TP_ARGS(schid, errno),
+
+ TP_STRUCT__entry(
+ __field_struct(struct subchannel_id, schid)
+ __field(int, errno)
+ ),
+
+ TP_fast_assign(
+ __entry->schid = schid;
+ __entry->errno = errno;
+ ),
+
+ TP_printk("schid=%x.%x.%04x errno=%d", __entry->schid.cssid,
+ __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
+);
+
+DEFINE_EVENT(vfio_ccw_schid_errno, vfio_ccw_cp_prefetch,
+ TP_PROTO(struct subchannel_id schid, int errno),
+ TP_ARGS(schid, errno)
+);
+
+DEFINE_EVENT(vfio_ccw_schid_errno, vfio_ccw_fsm_io_helper,
+ TP_PROTO(struct subchannel_id schid, int errno),
+ TP_ARGS(schid, errno)
+);
+
+TRACE_EVENT(vfio_ccw_io_fctl,
+ TP_PROTO(int fctl, struct subchannel_id schid, int errno),
+ TP_ARGS(fctl, schid, errno),
+
+ TP_STRUCT__entry(
+ __field(int, fctl)
+ __field_struct(struct subchannel_id, schid)
+ __field(int, errno)
+ ),
+
+ TP_fast_assign(
+ __entry->fctl = fctl;
+ __entry->schid = schid;
+ __entry->errno = errno;
+ ),
+
+ TP_printk("schid=%x.%x.%04x fctl=%x errno=%d", __entry->schid.cssid,
+ __entry->schid.ssid, __entry->schid.sch_no,
+ __entry->fctl, __entry->errno)
+);
+
+#endif /* _VFIO_CCW_TRACE_ */
+
+/* This part must be outside protection */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vfio_ccw_trace
+
+#include <trace/define_trace.h>
--
2.13.5


2018-04-23 11:04:17

by Dong Jia Shi

[permalink] [raw]
Subject: [PATCH v2 2/5] vfio: ccw: shorten kernel doc description for pfn_array_pin()

The kernel doc description for usage of the struct pfn_array in
pfn_array_pin() is unnecessary long. Let's shorten it by describing
the contents of the struct pfn_array fields at the struct's definition
instead.

Suggested-by: Cornelia Huck <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 62d66e195304..e8fe7450702e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -23,9 +23,13 @@
#define CCWCHAIN_LEN_MAX 256

struct pfn_array {
+ /* Starting guest physical I/O address. */
unsigned long pa_iova;
+ /* Array that stores PFNs of the pages need to pin. */
unsigned long *pa_iova_pfn;
+ /* Array that receives PFNs of the pages pinned. */
unsigned long *pa_pfn;
+ /* Number of pages to pin/pinned from @pa_iova. */
int pa_nr;
};

@@ -53,14 +57,8 @@ struct ccwchain {
* Attempt to pin user pages in memory.
*
* Usage of pfn_array:
- * @pa->pa_iova starting guest physical I/O address. Assigned by caller.
- * @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
- * by caller.
- * @pa->pa_pfn array that receives PFNs of the pages pinned. Allocated by
- * caller.
- * @pa->pa_nr number of pages from @pa->pa_iova to pin. Assigned by
- * caller.
- * number of pages pinned. Assigned by callee.
+ * Any field in this structure should be initialized by caller.
+ * We expect @pa->pa_nr > 0, and its value will be assigned by callee.
*
* Returns:
* Number of pages pinned on success.
--
2.13.5


2018-04-23 11:04:40

by Dong Jia Shi

[permalink] [raw]
Subject: [PATCH v2 4/5] vfio: ccw: set ccw->cda to NULL defensively

Let's avoid free on ccw->cda that points to a guest address
or an already freed memory area by setting it to NULL if memory
allocation didn't happen or failed.

Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 340804b45f6a..1eef102c685e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -491,7 +491,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
struct ccw1 *ccw;
struct pfn_array_table *pat;
unsigned long *idaws;
- int idaw_nr;
+ int ret;

ccw = chain->ch_ccw + idx;

@@ -511,18 +511,19 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
* needed when translating a direct ccw to a idal ccw.
*/
pat = chain->ch_pat + idx;
- if (pfn_array_table_init(pat, 1))
- return -ENOMEM;
- idaw_nr = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
- ccw->cda, ccw->count);
- if (idaw_nr < 0)
- return idaw_nr;
+ ret = pfn_array_table_init(pat, 1);
+ if (ret)
+ goto out_init;
+
+ ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev, ccw->cda, ccw->count);
+ if (ret < 0)
+ goto out_init;

/* Translate this direct ccw to a idal ccw. */
- idaws = kcalloc(idaw_nr, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
+ idaws = kcalloc(ret, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
if (!idaws) {
- pfn_array_table_unpin_free(pat, cp->mdev);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto out_unpin;
}
ccw->cda = (__u32) virt_to_phys(idaws);
ccw->flags |= CCW_FLAG_IDA;
@@ -530,6 +531,12 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
pfn_array_table_idal_create_words(pat, idaws);

return 0;
+
+out_unpin:
+ pfn_array_table_unpin_free(pat, cp->mdev);
+out_init:
+ ccw->cda = 0;
+ return ret;
}

static int ccwchain_fetch_idal(struct ccwchain *chain,
@@ -559,7 +566,7 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
pat = chain->ch_pat + idx;
ret = pfn_array_table_init(pat, idaw_nr);
if (ret)
- return ret;
+ goto out_init;

/* Translate idal ccw to use new allocated idaws. */
idaws = kzalloc(idaw_len, GFP_DMA | GFP_KERNEL);
@@ -591,6 +598,8 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
kfree(idaws);
out_unpin:
pfn_array_table_unpin_free(pat, cp->mdev);
+out_init:
+ ccw->cda = 0;
return ret;
}

--
2.13.5


2018-04-23 11:04:42

by Dong Jia Shi

[permalink] [raw]
Subject: [PATCH v2 3/5] vfio: ccw: refactor and improve pfn_array_alloc_pin()

This refactors pfn_array_alloc_pin() and also improves it by adding
defensive code in error handling so that calling pfn_array_unpin_free()
after error return won't lead to problem. This mainly does:
1. Merge pfn_array_pin() into pfn_array_alloc_pin(), since there is no
other user of pfn_array_pin(). As a result, also remove kernel-doc
for pfn_array_pin() and add/update kernel-doc for pfn_array_alloc_pin()
and struct pfn_array.
2. For a vfio_pin_pages() failure, set pa->pa_nr to zero to indicate
zero pages were pinned.
3. Set pa->pa_iova_pfn to NULL right after it was freed.

Suggested-by: Pierre Morel <[email protected]>,
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 82 +++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 46 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index e8fe7450702e..340804b45f6a 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -29,7 +29,7 @@ struct pfn_array {
unsigned long *pa_iova_pfn;
/* Array that receives PFNs of the pages pinned. */
unsigned long *pa_pfn;
- /* Number of pages to pin/pinned from @pa_iova. */
+ /* Number of pages pinned from @pa_iova. */
int pa_nr;
};

@@ -50,64 +50,33 @@ struct ccwchain {
};

/*
- * pfn_array_pin() - pin user pages in memory
+ * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
* @pa: pfn_array on which to perform the operation
* @mdev: the mediated device to perform pin/unpin operations
+ * @iova: target guest physical address
+ * @len: number of bytes that should be pinned from @iova
*
- * Attempt to pin user pages in memory.
+ * Attempt to allocate memory for PFNs, and pin user pages in memory.
*
* Usage of pfn_array:
- * Any field in this structure should be initialized by caller.
- * We expect @pa->pa_nr > 0, and its value will be assigned by callee.
+ * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
+ * this structure will be filled in by this function.
*
* Returns:
* Number of pages pinned on success.
- * If @pa->pa_nr is 0 or negative, returns 0.
+ * If @pa->pa_nr is not 0, or @pa->pa_iova_pfn is not NULL initially,
+ * returns -EINVAL.
* If no pages were pinned, returns -errno.
*/
-static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
-{
- int i, ret;
-
- if (pa->pa_nr <= 0) {
- pa->pa_nr = 0;
- return 0;
- }
-
- pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
- for (i = 1; i < pa->pa_nr; i++)
- pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
-
- ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
- IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
-
- if (ret > 0 && ret != pa->pa_nr) {
- vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
- pa->pa_nr = 0;
- return 0;
- }
-
- return ret;
-}
-
-/* Unpin the pages before releasing the memory. */
-static void pfn_array_unpin_free(struct pfn_array *pa, struct device *mdev)
-{
- vfio_unpin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr);
- pa->pa_nr = 0;
- kfree(pa->pa_iova_pfn);
-}
-
-/* Alloc memory for PFNs, then pin pages with them. */
static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
u64 iova, unsigned int len)
{
- int ret = 0;
+ int i, ret = 0;

if (!len)
return 0;

- if (pa->pa_nr)
+ if (pa->pa_nr || pa->pa_iova_pfn)
return -EINVAL;

pa->pa_iova = iova;
@@ -124,18 +93,39 @@ static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
return -ENOMEM;
pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;

- ret = pfn_array_pin(pa, mdev);
+ pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+ for (i = 1; i < pa->pa_nr; i++)
+ pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+
+ ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
+ IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);

- if (ret > 0)
- return ret;
- else if (!ret)
+ if (ret < 0) {
+ goto err_out;
+ } else if (ret > 0 && ret != pa->pa_nr) {
+ vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
ret = -EINVAL;
+ goto err_out;
+ }
+
+ return ret;

+err_out:
+ pa->pa_nr = 0;
kfree(pa->pa_iova_pfn);
+ pa->pa_iova_pfn = NULL;

return ret;
}

+/* Unpin the pages before releasing the memory. */
+static void pfn_array_unpin_free(struct pfn_array *pa, struct device *mdev)
+{
+ vfio_unpin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr);
+ pa->pa_nr = 0;
+ kfree(pa->pa_iova_pfn);
+}
+
static int pfn_array_table_init(struct pfn_array_table *pat, int nr)
{
pat->pat_pa = kcalloc(nr, sizeof(*pat->pat_pa), GFP_KERNEL);
--
2.13.5


2018-04-23 11:05:43

by Dong Jia Shi

[permalink] [raw]
Subject: [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails

From: Halil Pasic <[email protected]>

If the translation of a channel program fails, we may end up attempting
to clean up (free, unpin) stuff that never got translated (and allocated,
pinned) in the first place.

By adjusting the lengths of the chains accordingly (so the element that
failed, and all subsequent elements are excluded) cleanup activities
based on false assumptions can be avoided.

Let's make sure cp_free works properly after cp_prefetch returns with an
error by setting ch_len of a ccw chain to the number of the translated
CCWs on that chain.

Acked-by: Pierre Morel <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 2c7550797ec2..62d66e195304 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -715,6 +715,10 @@ void cp_free(struct channel_program *cp)
* and stores the result to ccwchain list. @cp must have been
* initialized by a previous call with cp_init(). Otherwise, undefined
* behavior occurs.
+ * For each chain composing the channel program:
+ * - On entry ch_len holds the count of CCW to be translated.
+ * - On exit ch_len is adjusted to the count of successfully translated CCW.
+ * This allows cp_free to find in ch_len the count of CCW to free in a chain.
*
* The S/390 CCW Translation APIS (prefixed by 'cp_') are introduced
* as helpers to do ccw chain translation inside the kernel. Basically
@@ -749,11 +753,18 @@ int cp_prefetch(struct channel_program *cp)
for (idx = 0; idx < len; idx++) {
ret = ccwchain_fetch_one(chain, idx, cp);
if (ret)
- return ret;
+ goto out_err;
}
}

return 0;
+out_err:
+ /* Only cleanup the chain elements that were actually translated. */
+ chain->ch_len = idx;
+ list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
+ chain->ch_len = 0;
+ }
+ return ret;
}

/**
--
2.13.5


2018-04-23 11:34:26

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] vfio: ccw: error handling fixes and improvements

On Mon, 23 Apr 2018 13:01:08 +0200
Dong Jia Shi <[email protected]> wrote:

> Dear Reviewers,
>
> Here is a new version for this patch series.
>
> We didn't get agreement on patch #5 (#4 in v1) in the former cycle though,
> I made it based on my understanding. We can continue discussing on it.
>
> Changelog:
> v1->v2:
> - #1. Reworded commit message and comment, plus some typo fixes.
> - #2. New patch.
> - #3. Added the missing suggested-by Pierre.
> Fixed typos.
> Added sanity check on pa->pa_iova_pfn and updated comments accordingly.
> - #4. Removed unused idaw_nr.
> - #5. Replaced leading white spaces with TABs.
> Traced the function in anycase.
>
> Dong Jia Shi (3):
> vfio: ccw: shorten kernel doc description for pfn_array_pin()
> vfio: ccw: refactor and improve pfn_array_alloc_pin()
> vfio: ccw: set ccw->cda to NULL defensively
>
> Halil Pasic (2):
> vfio: ccw: fix cleanup if cp_prefetch fails
> vfio: ccw: add traceponits for interesting error paths
>
> drivers/s390/cio/Makefile | 1 +
> drivers/s390/cio/vfio_ccw_cp.c | 134 ++++++++++++++++++++------------------
> drivers/s390/cio/vfio_ccw_fsm.c | 16 ++++-
> drivers/s390/cio/vfio_ccw_trace.h | 77 ++++++++++++++++++++++
> 4 files changed, 164 insertions(+), 64 deletions(-)
> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
>

Out of this series, patch 1 is a fix, while the rest are improvements,
correct? So patch 1 would be material for 4.17 (and maybe stable?), the
rest for 4.18?

2018-04-23 11:40:38

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails



On 04/23/2018 01:01 PM, Dong Jia Shi wrote:
> From: Halil Pasic <[email protected]>
>
> If the translation of a channel program fails, we may end up attempting
> to clean up (free, unpin) stuff that never got translated (and allocated,
> pinned) in the first place.
>
> By adjusting the lengths of the chains accordingly (so the element that
> failed, and all subsequent elements are excluded) cleanup activities
> based on false assumptions can be avoided.
>
> Let's make sure cp_free works properly after cp_prefetch returns with an
> error by setting ch_len of a ccw chain to the number of the translated
> CCWs on that chain.
>
> Acked-by: Pierre Morel <[email protected]>
> Reviewed-by: Dong Jia Shi <[email protected]>
> Signed-off-by: Halil Pasic <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>

AFAIR we came to the conclusion that this one is stable
material. [https://www.spinics.net/lists/kvm/msg166629.html]

> ---
> drivers/s390/cio/vfio_ccw_cp.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 2c7550797ec2..62d66e195304 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -715,6 +715,10 @@ void cp_free(struct channel_program *cp)
> * and stores the result to ccwchain list. @cp must have been
> * initialized by a previous call with cp_init(). Otherwise, undefined
> * behavior occurs.
> + * For each chain composing the channel program:
> + * - On entry ch_len holds the count of CCW to be translated.
> + * - On exit ch_len is adjusted to the count of successfully translated CCW.
> + * This allows cp_free to find in ch_len the count of CCW to free in a chain.
> *
> * The S/390 CCW Translation APIS (prefixed by 'cp_') are introduced
> * as helpers to do ccw chain translation inside the kernel. Basically
> @@ -749,11 +753,18 @@ int cp_prefetch(struct channel_program *cp)
> for (idx = 0; idx < len; idx++) {
> ret = ccwchain_fetch_one(chain, idx, cp);
> if (ret)
> - return ret;
> + goto out_err;
> }
> }
>
> return 0;
> +out_err:
> + /* Only cleanup the chain elements that were actually translated. */
> + chain->ch_len = idx;
> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
> + chain->ch_len = 0;
> + }
> + return ret;
> }
>
> /**
>


2018-04-23 11:42:18

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails

On Mon, 23 Apr 2018 13:01:09 +0200
Dong Jia Shi <[email protected]> wrote:

> From: Halil Pasic <[email protected]>
>
> If the translation of a channel program fails, we may end up attempting
> to clean up (free, unpin) stuff that never got translated (and allocated,
> pinned) in the first place.
>
> By adjusting the lengths of the chains accordingly (so the element that
> failed, and all subsequent elements are excluded) cleanup activities
> based on false assumptions can be avoided.
>
> Let's make sure cp_free works properly after cp_prefetch returns with an
> error by setting ch_len of a ccw chain to the number of the translated
> CCWs on that chain.

Should that be cc:stable? This problem has been there probably since we
introduced vfio-ccw, no?

>
> Acked-by: Pierre Morel <[email protected]>
> Reviewed-by: Dong Jia Shi <[email protected]>
> Signed-off-by: Halil Pasic <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 2c7550797ec2..62d66e195304 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -715,6 +715,10 @@ void cp_free(struct channel_program *cp)
> * and stores the result to ccwchain list. @cp must have been
> * initialized by a previous call with cp_init(). Otherwise, undefined
> * behavior occurs.
> + * For each chain composing the channel program:
> + * - On entry ch_len holds the count of CCW to be translated.
> + * - On exit ch_len is adjusted to the count of successfully translated CCW.
> + * This allows cp_free to find in ch_len the count of CCW to free in a chain.

s/CCW/CCWs/ (x3)?

Can change on applying.

> *
> * The S/390 CCW Translation APIS (prefixed by 'cp_') are introduced
> * as helpers to do ccw chain translation inside the kernel. Basically
> @@ -749,11 +753,18 @@ int cp_prefetch(struct channel_program *cp)
> for (idx = 0; idx < len; idx++) {
> ret = ccwchain_fetch_one(chain, idx, cp);
> if (ret)
> - return ret;
> + goto out_err;
> }
> }
>
> return 0;
> +out_err:
> + /* Only cleanup the chain elements that were actually translated. */
> + chain->ch_len = idx;
> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
> + chain->ch_len = 0;
> + }
> + return ret;
> }
>
> /**

Else, looks good.

2018-04-23 11:46:18

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] vfio: ccw: shorten kernel doc description for pfn_array_pin()

On Mon, 23 Apr 2018 13:01:10 +0200
Dong Jia Shi <[email protected]> wrote:

> The kernel doc description for usage of the struct pfn_array in
> pfn_array_pin() is unnecessary long. Let's shorten it by describing
> the contents of the struct pfn_array fields at the struct's definition
> instead.
>
> Suggested-by: Cornelia Huck <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 62d66e195304..e8fe7450702e 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -23,9 +23,13 @@
> #define CCWCHAIN_LEN_MAX 256
>
> struct pfn_array {
> + /* Starting guest physical I/O address. */
> unsigned long pa_iova;
> + /* Array that stores PFNs of the pages need to pin. */

s/need to pin/needing to be pinned/ ? Or "we need to pin"? Or even
simply "to pin"?

(Pre-exiting, but we can as well fix it up. Can also do while applying.)

> unsigned long *pa_iova_pfn;
> + /* Array that receives PFNs of the pages pinned. */
> unsigned long *pa_pfn;
> + /* Number of pages to pin/pinned from @pa_iova. */
> int pa_nr;
> };
>
> @@ -53,14 +57,8 @@ struct ccwchain {
> * Attempt to pin user pages in memory.
> *
> * Usage of pfn_array:
> - * @pa->pa_iova starting guest physical I/O address. Assigned by caller.
> - * @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
> - * by caller.
> - * @pa->pa_pfn array that receives PFNs of the pages pinned. Allocated by
> - * caller.
> - * @pa->pa_nr number of pages from @pa->pa_iova to pin. Assigned by
> - * caller.
> - * number of pages pinned. Assigned by callee.
> + * Any field in this structure should be initialized by caller.
> + * We expect @pa->pa_nr > 0, and its value will be assigned by callee.
> *
> * Returns:
> * Number of pages pinned on success.

Otherwise, looks good.

2018-04-23 12:03:04

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails



On 04/23/2018 01:40 PM, Cornelia Huck wrote:
> On Mon, 23 Apr 2018 13:01:09 +0200
> Dong Jia Shi <[email protected]> wrote:
>
>> From: Halil Pasic <[email protected]>
>>
>> If the translation of a channel program fails, we may end up attempting
>> to clean up (free, unpin) stuff that never got translated (and allocated,
>> pinned) in the first place.
>>
>> By adjusting the lengths of the chains accordingly (so the element that
>> failed, and all subsequent elements are excluded) cleanup activities
>> based on false assumptions can be avoided.
>>
>> Let's make sure cp_free works properly after cp_prefetch returns with an
>> error by setting ch_len of a ccw chain to the number of the translated
>> CCWs on that chain.
>
> Should that be cc:stable? This problem has been there probably since we
> introduced vfio-ccw, no?
>

Seems emails crossed in flight. Yes I think it should
be cc stable and this was broken since the very beginning.

>>
>> Acked-by: Pierre Morel <[email protected]>
>> Reviewed-by: Dong Jia Shi <[email protected]>
>> Signed-off-by: Halil Pasic <[email protected]>
>> Signed-off-by: Dong Jia Shi <[email protected]>
>> ---
>> drivers/s390/cio/vfio_ccw_cp.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>> index 2c7550797ec2..62d66e195304 100644
>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>> @@ -715,6 +715,10 @@ void cp_free(struct channel_program *cp)
>> * and stores the result to ccwchain list. @cp must have been
>> * initialized by a previous call with cp_init(). Otherwise, undefined
>> * behavior occurs.
>> + * For each chain composing the channel program:
>> + * - On entry ch_len holds the count of CCW to be translated.
>> + * - On exit ch_len is adjusted to the count of successfully translated CCW.
>> + * This allows cp_free to find in ch_len the count of CCW to free in a chain.
>
> s/CCW/CCWs/ (x3)?
>

Nod.

> Can change on applying.
>
>> *
>> * The S/390 CCW Translation APIS (prefixed by 'cp_') are introduced
>> * as helpers to do ccw chain translation inside the kernel. Basically
>> @@ -749,11 +753,18 @@ int cp_prefetch(struct channel_program *cp)
>> for (idx = 0; idx < len; idx++) {
>> ret = ccwchain_fetch_one(chain, idx, cp);
>> if (ret)
>> - return ret;
>> + goto out_err;
>> }
>> }
>>
>> return 0;
>> +out_err:
>> + /* Only cleanup the chain elements that were actually translated. */
>> + chain->ch_len = idx;
>> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
>> + chain->ch_len = 0;
>> + }
>> + return ret;
>> }
>>
>> /**
>
> Else, looks good.
>


2018-04-24 11:24:37

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails

On Mon, 23 Apr 2018 13:01:09 +0200
Dong Jia Shi <[email protected]> wrote:

> From: Halil Pasic <[email protected]>
>
> If the translation of a channel program fails, we may end up attempting
> to clean up (free, unpin) stuff that never got translated (and allocated,
> pinned) in the first place.
>
> By adjusting the lengths of the chains accordingly (so the element that
> failed, and all subsequent elements are excluded) cleanup activities
> based on false assumptions can be avoided.
>
> Let's make sure cp_free works properly after cp_prefetch returns with an
> error by setting ch_len of a ccw chain to the number of the translated
> CCWs on that chain.
>
> Acked-by: Pierre Morel <[email protected]>
> Reviewed-by: Dong Jia Shi <[email protected]>
> Signed-off-by: Halil Pasic <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)

Thanks, applied. I'll probably send a pull req after lunch.

[Some procedural notes: I've created a new vfio-ccw-fixes branch based
on the s390 fixes branch for easier handling. Things targeted for the
next release will go on the vfio-ccw branch on top of the s390 features
branch, as before. Does that work for everybody? (And I am the only
vfio-ccw maintainer with a kernel.org account, right?)]

2018-04-25 11:21:18

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails



On 04/25/2018 04:43 AM, Dong Jia Shi wrote:
>> [Some procedural notes: I've created a new vfio-ccw-fixes branch based
>> on the s390 fixes branch for easier handling. Things targeted for the
>> next release will go on the vfio-ccw branch on top of the s390 features
>> branch, as before. Does that work for everybody?
> I don't see a reason that why it doesn't work for me.
>

Works for me.

>> (And I am the only vfio-ccw maintainer with a kernel.org account,
>> right?)]
>>
> I don't have such an account, and don't know if I need to apply for one.

Neither do I have a kernel.org account -- at least for now.

Regards,
Halil


2018-04-27 10:15:35

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths

On Mon, 23 Apr 2018 13:01:13 +0200
Dong Jia Shi <[email protected]> wrote:

typo in subject: s/traceponits/tracepoints/

> From: Halil Pasic <[email protected]>
>
> Add some tracepoints so we can inspect what is not working as is should.
>
> Signed-off-by: Halil Pasic <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/Makefile | 1 +
> drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++-
> drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 93 insertions(+), 1 deletion(-)
> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h


> @@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> goto err_out;
>
> io_region->ret_code = cp_prefetch(&private->cp);
> + trace_vfio_ccw_cp_prefetch(get_schid(private),
> + io_region->ret_code);
> if (io_region->ret_code) {
> cp_free(&private->cp);
> goto err_out;
> @@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>
> /* Start channel program and wait for I/O interrupt. */
> io_region->ret_code = fsm_io_helper(private);
> + trace_vfio_ccw_fsm_io_helper(get_schid(private),
> + io_region->ret_code);
> if (io_region->ret_code) {
> cp_free(&private->cp);
> goto err_out;
> }
> - return;
> + goto out;
> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> /* XXX: Handle halt. */
> io_region->ret_code = -EOPNOTSUPP;
> @@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>
> err_out:
> private->state = VFIO_CCW_STATE_IDLE;
> +out:
> + trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
> + io_region->ret_code);
> }
>
> /*

I really don't want to bikeshed, especially as some tracepoints are
better than no tracepoints, but...

We now trace fctl/schid/ret_code unconditionally (good).

We trace the outcome of cp_prefetch() and fsm_io_helper()
unconditionally. We don't, however, trace all things that may go wrong.
We have the tracepoint at the end, but it cannot tell us where the
error came from. Should we have tracepoints in every place (in this
function) that may generate an error? Only if there is an actual error?
Are the two enough for common debug scenarios?

Opinions? We can just go ahead with this and improve things later on, I
guess.

2018-04-30 11:53:13

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths

On Sat, 28 Apr 2018 13:50:23 +0800
Dong Jia Shi <[email protected]> wrote:

> * Cornelia Huck <[email protected]> [2018-04-27 12:13:53 +0200]:
>
> > On Mon, 23 Apr 2018 13:01:13 +0200
> > Dong Jia Shi <[email protected]> wrote:
> >
> > typo in subject: s/traceponits/tracepoints/
> >
> > > From: Halil Pasic <[email protected]>
> > >
> > > Add some tracepoints so we can inspect what is not working as is should.
> > >
> > > Signed-off-by: Halil Pasic <[email protected]>
> > > Signed-off-by: Dong Jia Shi <[email protected]>
> > > ---
> > > drivers/s390/cio/Makefile | 1 +
> > > drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++-
> > > drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++
> > > 3 files changed, 93 insertions(+), 1 deletion(-)
> > > create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
> >
> >
> > > @@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> > > goto err_out;
> > >
> > > io_region->ret_code = cp_prefetch(&private->cp);
> > > + trace_vfio_ccw_cp_prefetch(get_schid(private),
> > > + io_region->ret_code);
> > > if (io_region->ret_code) {
> > > cp_free(&private->cp);
> > > goto err_out;
> > > @@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> > >
> > > /* Start channel program and wait for I/O interrupt. */
> > > io_region->ret_code = fsm_io_helper(private);
> > > + trace_vfio_ccw_fsm_io_helper(get_schid(private),
> > > + io_region->ret_code);
> > > if (io_region->ret_code) {
> > > cp_free(&private->cp);
> > > goto err_out;
> > > }
> > > - return;
> > > + goto out;
> > > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> > > /* XXX: Handle halt. */
> > > io_region->ret_code = -EOPNOTSUPP;
> > > @@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> > >
> > > err_out:
> > > private->state = VFIO_CCW_STATE_IDLE;
> > > +out:
> > > + trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
> > > + io_region->ret_code);
> > > }
> > >
> > > /*
> >
> > I really don't want to bikeshed, especially as some tracepoints are
> > better than no tracepoints, but...
> >
> > We now trace fctl/schid/ret_code unconditionally (good).
> >
> > We trace the outcome of cp_prefetch() and fsm_io_helper()
> > unconditionally. We don't, however, trace all things that may go wrong.
> > We have the tracepoint at the end, but it cannot tell us where the
> > error came from. Should we have tracepoints in every place (in this
> > function) that may generate an error? Only if there is an actual error?
> > Are the two enough for common debug scenarios?
> Trace actual error sounds like a better idea than trace unconditionally
> of these two functions.
> These two are not enough for common debug scenarios. For example, we
> cann't tell if a -EOPNOTSUPP is a orb->tm.b problem, or error code
> returned by cp_init().
>
> Idea to improve:
> 1. Trace actual error.
> 2. Define a trace event and add error trace for cp_init().

Hm. Going from what I have done in the past when doing printk debugging:

- stick in a message that is always hit, with some information about
parameters, if it makes sense
- stick in a message "foo happened!" in the error branches
- or, alternatively, trace the called functions

So tracing on failure only might be more useful? Have all failure paths
under a common knob to turn on/off?

> > Opinions? We can just go ahead with this and improve things later
> > on, I guess.
> >
> I think it's also fine to do this - better something than nothing. We
> could at least have a code base to be improved to make everybody
> happier in future.

Maybe keep the patch as it is now, except trace the errors only
(keeping the fctl trace point)?

Halil, as you wrote the patch (and I presume you found it helpful):
What is your opinion?

2018-04-30 14:15:30

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths



On 04/30/2018 01:51 PM, Cornelia Huck wrote:
> On Sat, 28 Apr 2018 13:50:23 +0800
> Dong Jia Shi <[email protected]> wrote:
>
>> * Cornelia Huck <[email protected]> [2018-04-27 12:13:53 +0200]:
>>
>>> On Mon, 23 Apr 2018 13:01:13 +0200
>>> Dong Jia Shi <[email protected]> wrote:
>>>
>>> typo in subject: s/traceponits/tracepoints/
>>>
>>>> From: Halil Pasic <[email protected]>
>>>>
>>>> Add some tracepoints so we can inspect what is not working as is should.
>>>>
>>>> Signed-off-by: Halil Pasic <[email protected]>
>>>> Signed-off-by: Dong Jia Shi <[email protected]>
>>>> ---
>>>> drivers/s390/cio/Makefile | 1 +
>>>> drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++-
>>>> drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++
>>>> 3 files changed, 93 insertions(+), 1 deletion(-)
>>>> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
>>>
>>>
>>>> @@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>>>> goto err_out;
>>>>
>>>> io_region->ret_code = cp_prefetch(&private->cp);
>>>> + trace_vfio_ccw_cp_prefetch(get_schid(private),
>>>> + io_region->ret_code);
>>>> if (io_region->ret_code) {
>>>> cp_free(&private->cp);
>>>> goto err_out;
>>>> @@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>>>>
>>>> /* Start channel program and wait for I/O interrupt. */
>>>> io_region->ret_code = fsm_io_helper(private);
>>>> + trace_vfio_ccw_fsm_io_helper(get_schid(private),
>>>> + io_region->ret_code);
>>>> if (io_region->ret_code) {
>>>> cp_free(&private->cp);
>>>> goto err_out;
>>>> }
>>>> - return;
>>>> + goto out;
>>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
>>>> /* XXX: Handle halt. */
>>>> io_region->ret_code = -EOPNOTSUPP;
>>>> @@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>>>>
>>>> err_out:
>>>> private->state = VFIO_CCW_STATE_IDLE;
>>>> +out:
>>>> + trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
>>>> + io_region->ret_code);
>>>> }
>>>>
>>>> /*
>>>
>>> I really don't want to bikeshed, especially as some tracepoints are
>>> better than no tracepoints, but...
>>>
>>> We now trace fctl/schid/ret_code unconditionally (good).
>>>
>>> We trace the outcome of cp_prefetch() and fsm_io_helper()
>>> unconditionally. We don't, however, trace all things that may go wrong.
>>> We have the tracepoint at the end, but it cannot tell us where the
>>> error came from. Should we have tracepoints in every place (in this
>>> function) that may generate an error? Only if there is an actual error?
>>> Are the two enough for common debug scenarios?
>> Trace actual error sounds like a better idea than trace unconditionally
>> of these two functions.
>> These two are not enough for common debug scenarios. For example, we
>> cann't tell if a -EOPNOTSUPP is a orb->tm.b problem, or error code
>> returned by cp_init().
>>
>> Idea to improve:
>> 1. Trace actual error.
>> 2. Define a trace event and add error trace for cp_init().
>
> Hm. Going from what I have done in the past when doing printk debugging:
>
> - stick in a message that is always hit, with some information about
> parameters, if it makes sense
> - stick in a message "foo happened!" in the error branches
> - or, alternatively, trace the called functions
>
> So tracing on failure only might be more useful? Have all failure paths
> under a common knob to turn on/off?
>
>>> Opinions? We can just go ahead with this and improve things later
>>> on, I guess.
>>>
>> I think it's also fine to do this - better something than nothing. We
>> could at least have a code base to be improved to make everybody
>> happier in future.
>
> Maybe keep the patch as it is now, except trace the errors only
> (keeping the fctl trace point)?

What do you mean by this sentence. Get rid of vfio_ccw_io_fctl or get
rid of vfio_ccw_cp_prefetch and vfio_ccw_fsm_io_helper, or get don't
get rid of any, but make some conditional (!errno)?

>
> Halil, as you wrote the patch (and I presume you found it helpful):
> What is your opinion?
>

I'm in favor of this patch (as previously stated here
https://patchwork.kernel.org/patch/10298305/). And regarding the
questions under discussion I'm mostly fine either way.

I think the naming of this fctl thing is a bit cryptic,
but if we don't see this as ABI I'm fine with it -- can be improved.
What would be a better name? I was thinking along the lines accept_request.
(Bad error code would mean that the request did not get accepted. Good
code does not mean the requested function was performed successfully.)

Also I think vfio_ccw_io_fctl with no zero error code would make sense
as dev_warn. If I were an admin looking into a problem I would very much
appreciate seeing something in the messages log (and not having to enable
tracing first). This point seems to be a good one for high level 'request gone
bad' kind of report. Opinions?

Regards,
Halil




2018-04-30 15:04:27

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths

On Mon, 30 Apr 2018 16:14:21 +0200
Halil Pasic <[email protected]> wrote:

> On 04/30/2018 01:51 PM, Cornelia Huck wrote:
> > On Sat, 28 Apr 2018 13:50:23 +0800
> > Dong Jia Shi <[email protected]> wrote:
> >
> >> * Cornelia Huck <[email protected]> [2018-04-27 12:13:53 +0200]:
> >>
> >>> On Mon, 23 Apr 2018 13:01:13 +0200
> >>> Dong Jia Shi <[email protected]> wrote:
> >>>
> >>> typo in subject: s/traceponits/tracepoints/
> >>>
> >>>> From: Halil Pasic <[email protected]>
> >>>>
> >>>> Add some tracepoints so we can inspect what is not working as is should.
> >>>>
> >>>> Signed-off-by: Halil Pasic <[email protected]>
> >>>> Signed-off-by: Dong Jia Shi <[email protected]>
> >>>> ---
> >>>> drivers/s390/cio/Makefile | 1 +
> >>>> drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++-
> >>>> drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++
> >>>> 3 files changed, 93 insertions(+), 1 deletion(-)
> >>>> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
> >>>
> >>>
> >>>> @@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>> goto err_out;
> >>>>
> >>>> io_region->ret_code = cp_prefetch(&private->cp);
> >>>> + trace_vfio_ccw_cp_prefetch(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> if (io_region->ret_code) {
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> @@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>
> >>>> /* Start channel program and wait for I/O interrupt. */
> >>>> io_region->ret_code = fsm_io_helper(private);
> >>>> + trace_vfio_ccw_fsm_io_helper(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> if (io_region->ret_code) {
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> }
> >>>> - return;
> >>>> + goto out;
> >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> >>>> /* XXX: Handle halt. */
> >>>> io_region->ret_code = -EOPNOTSUPP;
> >>>> @@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>
> >>>> err_out:
> >>>> private->state = VFIO_CCW_STATE_IDLE;
> >>>> +out:
> >>>> + trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
> >>>> + io_region->ret_code);
> >>>> }
> >>>>
> >>>> /*
> >>>
> >>> I really don't want to bikeshed, especially as some tracepoints are
> >>> better than no tracepoints, but...
> >>>
> >>> We now trace fctl/schid/ret_code unconditionally (good).
> >>>
> >>> We trace the outcome of cp_prefetch() and fsm_io_helper()
> >>> unconditionally. We don't, however, trace all things that may go wrong.
> >>> We have the tracepoint at the end, but it cannot tell us where the
> >>> error came from. Should we have tracepoints in every place (in this
> >>> function) that may generate an error? Only if there is an actual error?
> >>> Are the two enough for common debug scenarios?
> >> Trace actual error sounds like a better idea than trace unconditionally
> >> of these two functions.
> >> These two are not enough for common debug scenarios. For example, we
> >> cann't tell if a -EOPNOTSUPP is a orb->tm.b problem, or error code
> >> returned by cp_init().
> >>
> >> Idea to improve:
> >> 1. Trace actual error.
> >> 2. Define a trace event and add error trace for cp_init().
> >
> > Hm. Going from what I have done in the past when doing printk debugging:
> >
> > - stick in a message that is always hit, with some information about
> > parameters, if it makes sense
> > - stick in a message "foo happened!" in the error branches
> > - or, alternatively, trace the called functions
> >
> > So tracing on failure only might be more useful? Have all failure paths
> > under a common knob to turn on/off?
> >
> >>> Opinions? We can just go ahead with this and improve things later
> >>> on, I guess.
> >>>
> >> I think it's also fine to do this - better something than nothing. We
> >> could at least have a code base to be improved to make everybody
> >> happier in future.
> >
> > Maybe keep the patch as it is now, except trace the errors only
> > (keeping the fctl trace point)?
>
> What do you mean by this sentence. Get rid of vfio_ccw_io_fctl or get
> rid of vfio_ccw_cp_prefetch and vfio_ccw_fsm_io_helper, or get don't
> get rid of any, but make some conditional (!errno)?

The third option.

>
> >
> > Halil, as you wrote the patch (and I presume you found it helpful):
> > What is your opinion?
> >
>
> I'm in favor of this patch (as previously stated here
> https://patchwork.kernel.org/patch/10298305/). And regarding the
> questions under discussion I'm mostly fine either way.

OK.

>
> I think the naming of this fctl thing is a bit cryptic,
> but if we don't see this as ABI I'm fine with it -- can be improved.
> What would be a better name? I was thinking along the lines accept_request.
> (Bad error code would mean that the request did not get accepted. Good
> code does not mean the requested function was performed successfully.)

I think fctl is fine (if you don't understand what 'fctl' is, you're
unlikely to understand it even if it were named differently.)

>
> Also I think vfio_ccw_io_fctl with no zero error code would make sense
> as dev_warn. If I were an admin looking into a problem I would very much
> appreciate seeing something in the messages log (and not having to enable
> tracing first). This point seems to be a good one for high level 'request gone
> bad' kind of report. Opinions?

I'd also exclude -EOPNOTSUPP (as this also might happen with e.g. a halt/clear enabled user space, which probes availability of halt/clear support by giving it a try once (yes, I really want to post my patches this week.))

2018-04-30 16:52:31

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths



On 04/30/2018 05:03 PM, Cornelia Huck wrote:
>> I think the naming of this fctl thing is a bit cryptic,
>> but if we don't see this as ABI I'm fine with it -- can be improved.
>> What would be a better name? I was thinking along the lines accept_request.
>> (Bad error code would mean that the request did not get accepted. Good
>> code does not mean the requested function was performed successfully.)
> I think fctl is fine (if you don't understand what 'fctl' is, you're
> unlikely to understand it even if it were named differently.)
>

AFAIU this fctl is a bit more complicated than the normal fctl. But
better let sleeping dogs lie.

>> Also I think vfio_ccw_io_fctl with no zero error code would make sense
>> as dev_warn. If I were an admin looking into a problem I would very much
>> appreciate seeing something in the messages log (and not having to enable
>> tracing first). This point seems to be a good one for high level 'request gone
>> bad' kind of report. Opinions?
> I'd also exclude -EOPNOTSUPP (as this also might happen with e.g. a halt/clear enabled user space, which probes availability of halt/clear support by giving it a try once (yes, I really want to post my patches this week.))
>

I'm looking forward to the clear/halt. It hope it will help me understand
the big vfio-ccw picture better. There are still dark spots, but I don't
feel like doing something against this, as there is quite some activity
going on here -- and I don't want to hamper the efforts by binding resources.

Regards,
Halil


2018-05-22 12:57:27

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths

On Wed, 16 May 2018 14:53:55 +0800
Dong Jia Shi <[email protected]> wrote:

> Politely ping. Want a new version?

Just walking through my backlog now, sorry about the delay. Yes, a new
version would be easiest for me.