Hi Conny,
Halil reported a host crash when using vfio-ccw. The root cause of the problem
is that vfio_pin_pages fails with EINVAL for reasons unknown. He has
experienced such failures after online-ing a dasd in the guest (the dasd has 3
partitions, hat may or may not have any significance). The problem isn't
experienced on every attempt to online the dasd, and breaking at css_do_ssch
seems to make things work.
One thing is sure: the host kernel should not crash under the described
circumstances.
To fix the problem, the first patch of this series fixes the cleanup when
cp_prefetch fails in the higher level. The 2nd and the 3rd patches provide
correctness and denfensive actions for the interfaces in the lower level.
The 4th patch is trying to add tracepoints for vfio-ccw, so that we can debug
such issue easier in future.
For details see the commit message portions of the inividual patches.
Thanks.
Dong Jia Shi (2):
vfio: ccw: refactor and improve pfn_array_alloc_pin()
vfio: ccw: set ccw->cda to NULL defensively
Halil Pasic (2):
vfio: ccw: fix cleanup if cp_prefetch fails
vfio: ccw: add traceponits for interesting error paths
drivers/s390/cio/Makefile | 1 +
drivers/s390/cio/vfio_ccw_cp.c | 121 ++++++++++++++++++++------------------
drivers/s390/cio/vfio_ccw_fsm.c | 13 ++++
drivers/s390/cio/vfio_ccw_trace.h | 86 +++++++++++++++++++++++++++
4 files changed, 163 insertions(+), 58 deletions(-)
create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
--
2.13.5
This refactors pfn_array_alloc_pin() and also improves it by adding
defensive code in error handling so that calling pfn_array_unpin_free()
after error return won't lead to problem. This mains does:
1. Merge pfn_array_pin() into pfn_array_alloc_pin(), since there is no
other user of pfn_array_pin(). As a result, also remove kernel-doc
for pfn_array_pin() and add kernel-doc for pfn_array_alloc_pin().
2. For a vfio_pin_pages() failure, set pa->pa_nr to zero to indicate
zero pages were pinned.
3. Set pa->pa_iova_pfn to NULL right after it was freed.
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 84 ++++++++++++++++++------------------------
1 file changed, 36 insertions(+), 48 deletions(-)
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 2be114db02f9..3abc9770910a 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -46,65 +46,32 @@ struct ccwchain {
};
/*
- * pfn_array_pin() - pin user pages in memory
+ * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
* @pa: pfn_array on which to perform the operation
* @mdev: the mediated device to perform pin/unpin operations
+ * @iova: target guest physical address
+ * @len: number of bytes that should be pinned from @iova
*
- * Attempt to pin user pages in memory.
+ * Attempt to allocate memory for PFNs, and pin user pages in memory.
*
* Usage of pfn_array:
- * @pa->pa_iova starting guest physical I/O address. Assigned by caller.
+ * @pa->pa_iova starting guest physical I/O address. Assigned by callee.
* @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
- * by caller.
+ * by callee.
* @pa->pa_pfn array that receives PFNs of the pages pinned. Allocated by
- * caller.
- * @pa->pa_nr number of pages from @pa->pa_iova to pin. Assigned by
- * caller.
- * number of pages pinned. Assigned by callee.
+ * callee.
+ * @pa->pa_nr initiated as 0 by caller.
+ * number of pages pinned from @pa->pa_iova. Assigned by callee.
*
* Returns:
* Number of pages pinned on success.
- * If @pa->pa_nr is 0 or negative, returns 0.
+ * If @pa->pa_nr is not 0 initially, returns -EINVAL.
* If no pages were pinned, returns -errno.
*/
-static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
-{
- int i, ret;
-
- if (pa->pa_nr <= 0) {
- pa->pa_nr = 0;
- return 0;
- }
-
- pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
- for (i = 1; i < pa->pa_nr; i++)
- pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
-
- ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
- IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
-
- if (ret > 0 && ret != pa->pa_nr) {
- vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
- pa->pa_nr = 0;
- return 0;
- }
-
- return ret;
-}
-
-/* Unpin the pages before releasing the memory. */
-static void pfn_array_unpin_free(struct pfn_array *pa, struct device *mdev)
-{
- vfio_unpin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr);
- pa->pa_nr = 0;
- kfree(pa->pa_iova_pfn);
-}
-
-/* Alloc memory for PFNs, then pin pages with them. */
static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
u64 iova, unsigned int len)
{
- int ret = 0;
+ int i, ret = 0;
if (!len)
return 0;
@@ -126,18 +93,39 @@ static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
return -ENOMEM;
pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
- ret = pfn_array_pin(pa, mdev);
+ pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+ for (i = 1; i < pa->pa_nr; i++)
+ pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+
+ ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
+ IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
- if (ret > 0)
- return ret;
- else if (!ret)
+ if (ret < 0) {
+ goto err_out;
+ } else if (ret > 0 && ret != pa->pa_nr) {
+ vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
ret = -EINVAL;
+ goto err_out;
+ }
+
+ return ret;
+err_out:
+ pa->pa_nr = 0;
kfree(pa->pa_iova_pfn);
+ pa->pa_iova_pfn = NULL;
return ret;
}
+/* Unpin the pages before releasing the memory. */
+static void pfn_array_unpin_free(struct pfn_array *pa, struct device *mdev)
+{
+ vfio_unpin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr);
+ pa->pa_nr = 0;
+ kfree(pa->pa_iova_pfn);
+}
+
static int pfn_array_table_init(struct pfn_array_table *pat, int nr)
{
pat->pat_pa = kcalloc(nr, sizeof(*pat->pat_pa), GFP_KERNEL);
--
2.13.5
From: Halil Pasic <[email protected]>
Add some tracepoints so we can inspect what is not working as is should.
Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/Makefile | 1 +
drivers/s390/cio/vfio_ccw_fsm.c | 13 ++++++
drivers/s390/cio/vfio_ccw_trace.h | 86 +++++++++++++++++++++++++++++++++++++++
3 files changed, 100 insertions(+)
create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index a070ef0efe65..f230516abb96 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -5,6 +5,7 @@
# The following is required for define_trace.h to find ./trace.h
CFLAGS_trace.o := -I$(src)
+CFLAGS_vfio_ccw_fsm.o := -I$(src)
obj-y += airq.o blacklist.o chsc.o cio.o css.o chp.o idset.o isc.o \
fcx.o itcw.o crw.o ccwreq.o trace.o ioasm.o
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index c30420c517b1..7ed27f90d741 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -13,6 +13,9 @@
#include "ioasm.h"
#include "vfio_ccw_private.h"
+#define CREATE_TRACE_POINTS
+#include "vfio_ccw_trace.h"
+
static int fsm_io_helper(struct vfio_ccw_private *private)
{
struct subchannel *sch;
@@ -105,6 +108,10 @@ static void fsm_disabled_irq(struct vfio_ccw_private *private,
*/
cio_disable_subchannel(sch);
}
+inline struct subchannel_id get_schid(struct vfio_ccw_private *p)
+{
+ return p->sch->schid;
+}
/*
* Deal with the ccw command request from the userspace.
@@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
io_region->ret_code = cp_prefetch(&private->cp);
if (io_region->ret_code) {
+ trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
+ io_region->ret_code);
cp_free(&private->cp);
goto err_out;
}
@@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
/* Start channel program and wait for I/O interrupt. */
io_region->ret_code = fsm_io_helper(private);
if (io_region->ret_code) {
+ trace_vfio_ccw_ssch_failed(get_schid(private),
+ io_region->ret_code);
cp_free(&private->cp);
goto err_out;
}
@@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
/* XXX: Handle halt. */
io_region->ret_code = -EOPNOTSUPP;
+ trace_vfio_ccw_halt(get_schid(private));
goto err_out;
} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
/* XXX: Handle clear. */
io_region->ret_code = -EOPNOTSUPP;
+ trace_vfio_ccw_clear(get_schid(private));
goto err_out;
}
diff --git a/drivers/s390/cio/vfio_ccw_trace.h b/drivers/s390/cio/vfio_ccw_trace.h
new file mode 100644
index 000000000000..edd3321cd919
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_trace.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Tracepoints for vfio_ccw driver
+ *
+ * Copyright IBM Corp. 2018
+ *
+ * Author(s): Dong Jia Shi <[email protected]>
+ * Halil Pasic <[email protected]>
+ */
+
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vfio_ccw
+
+#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ)
+#define _VFIO_CCW_TRACE_
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(vfio_ccw_cp_prefetch_failed,
+ TP_PROTO(struct subchannel_id schid, int errno),
+ TP_ARGS(schid, errno),
+
+ TP_STRUCT__entry(
+ __field_struct(struct subchannel_id, schid)
+ __field(int, errno)
+ ),
+
+ TP_fast_assign(
+ __entry->schid = schid;
+ __entry->errno = errno;
+ ),
+
+ TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)",
+ __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
+);
+
+TRACE_EVENT(vfio_ccw_ssch_failed,
+ TP_PROTO(struct subchannel_id schid, int errno),
+ TP_ARGS(schid, errno),
+
+ TP_STRUCT__entry(
+ __field_struct(struct subchannel_id, schid)
+ __field(int, errno)
+ ),
+
+ TP_fast_assign(
+ __entry->schid = schid;
+ __entry->errno = errno;
+ ),
+
+ TP_printk("(schid 0.%x.%04X) ssch failed (errno: %d)",
+ __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
+);
+
+DECLARE_EVENT_CLASS(vfio_ccw_notsupp,
+ TP_PROTO(struct subchannel_id schid),
+ TP_ARGS(schid),
+
+ TP_STRUCT__entry(
+ __field_struct(struct subchannel_id, schid)
+ ),
+
+ TP_fast_assign(
+ __entry->schid = schid;
+ ),
+
+ TP_printk("(schid 0.%x.%04X) request not supported",
+ __entry->schid.ssid, __entry->schid.sch_no)
+);
+
+DEFINE_EVENT(vfio_ccw_notsupp, vfio_ccw_clear,
+ TP_PROTO(struct subchannel_id schid), TP_ARGS(schid));
+
+DEFINE_EVENT(vfio_ccw_notsupp, vfio_ccw_halt,
+ TP_PROTO(struct subchannel_id schid), TP_ARGS(schid));
+
+#endif /* _VFIO_CCW_TRACE_ */
+
+/* This part must be outside protection */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vfio_ccw_trace
+
+#include <trace/define_trace.h>
--
2.13.5
Let's avoid free on ccw->cda that points to a guest address
or a already freed memory area by setting it to NULL if memory
allocation didn't happen or failed.
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 3abc9770910a..5958c35ab343 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -491,7 +491,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
struct ccw1 *ccw;
struct pfn_array_table *pat;
unsigned long *idaws;
- int idaw_nr;
+ int idaw_nr, ret;
ccw = chain->ch_ccw + idx;
@@ -511,18 +511,20 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
* needed when translating a direct ccw to a idal ccw.
*/
pat = chain->ch_pat + idx;
- if (pfn_array_table_init(pat, 1))
- return -ENOMEM;
- idaw_nr = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
+ ret = pfn_array_table_init(pat, 1);
+ if (ret)
+ goto out_init;
+
+ idaw_nr = ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
ccw->cda, ccw->count);
- if (idaw_nr < 0)
- return idaw_nr;
+ if (ret < 0)
+ goto out_init;
/* Translate this direct ccw to a idal ccw. */
idaws = kcalloc(idaw_nr, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
if (!idaws) {
- pfn_array_table_unpin_free(pat, cp->mdev);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto out_unpin;
}
ccw->cda = (__u32) virt_to_phys(idaws);
ccw->flags |= CCW_FLAG_IDA;
@@ -530,6 +532,12 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
pfn_array_table_idal_create_words(pat, idaws);
return 0;
+
+out_unpin:
+ pfn_array_table_unpin_free(pat, cp->mdev);
+out_init:
+ ccw->cda = 0;
+ return ret;
}
static int ccwchain_fetch_idal(struct ccwchain *chain,
@@ -559,7 +567,7 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
pat = chain->ch_pat + idx;
ret = pfn_array_table_init(pat, idaw_nr);
if (ret)
- return ret;
+ goto out_init;
/* Translate idal ccw to use new allocated idaws. */
idaws = kzalloc(idaw_len, GFP_DMA | GFP_KERNEL);
@@ -591,6 +599,8 @@ static int ccwchain_fetch_idal(struct ccwchain *chain,
kfree(idaws);
out_unpin:
pfn_array_table_unpin_free(pat, cp->mdev);
+out_init:
+ ccw->cda = 0;
return ret;
}
--
2.13.5
From: Halil Pasic <[email protected]>
If the translation of a channel program fails, we may end up attempting
to clean up (free, unpin) stuff that never got translated (and allocated,
pinned) in the first place.
By adjusting the lengths of the chains accordingly (so the element that
failed, and all subsequent elements are excluded) cleanup activities
based on false assumptions can be avoided.
Let's make sure cp_free works properly after cp_prefetch returns with an
error by setting ch_len to 0 for the ccw chains those are not prefetched.
Acked-by: Pierre Morel <[email protected]>
Reviewed-by: Dong Jia Shi <[email protected]>
Signed-off-by: Halil Pasic <[email protected]>
Signed-off-by: Dong Jia Shi <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index d9a2fffd034b..2be114db02f9 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -749,11 +749,18 @@ int cp_prefetch(struct channel_program *cp)
for (idx = 0; idx < len; idx++) {
ret = ccwchain_fetch_one(chain, idx, cp);
if (ret)
- return ret;
+ goto out_err;
}
}
return 0;
+out_err:
+ /* Only cleanup the chain elements that where actually translated. */
+ chain->ch_len = idx;
+ list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
+ chain->ch_len = 0;
+ }
+ return ret;
}
/**
--
2.13.5
On 03/21/2018 03:08 AM, Dong Jia Shi wrote:
> From: Halil Pasic <[email protected]>
>
> If the translation of a channel program fails, we may end up attempting
> to clean up (free, unpin) stuff that never got translated (and allocated,
> pinned) in the first place.
>
> By adjusting the lengths of the chains accordingly (so the element that
> failed, and all subsequent elements are excluded) cleanup activities
> based on false assumptions can be avoided.
>
> Let's make sure cp_free works properly after cp_prefetch returns with an
> error by setting ch_len to 0 for the ccw chains those are not prefetched.
This sentence used to be:
Let's make sure cp_free works properly after cp_prefetch returns with an
error.
@Dong Jia
I find the 'by setting ch_len to 0 for the ccw chains those are not prefetched'
you added for clarification (I guess) somewhat problematic.
The chain in which the translation failure occurred
+ chain->ch_len = idx;
is shortened so that only the translated elements (ccws) are going to
get cleaned up (on a per element basis) by cp_free. This may or may
not be the first ccw. Subsequent chains are shortened to 0 as there
no translation took place.
So as a result of this change only properly translated ccws are going
to get (re)visited by cp_free as only those may have resources bound
to them which need to be released.
I'm not against improving the commit message. But this ain't
an improvement to me.
>
> Acked-by: Pierre Morel <[email protected]>
> Reviewed-by: Dong Jia Shi <[email protected]>
> Signed-off-by: Halil Pasic <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index d9a2fffd034b..2be114db02f9 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -749,11 +749,18 @@ int cp_prefetch(struct channel_program *cp)
> for (idx = 0; idx < len; idx++) {
> ret = ccwchain_fetch_one(chain, idx, cp);
> if (ret)
> - return ret;
> + goto out_err;
> }
> }
>
> return 0;
> +out_err:
> + /* Only cleanup the chain elements that where actually translated. */
> + chain->ch_len = idx;
> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
> + chain->ch_len = 0;
> + }
> + return ret;
> }
>
> /**
>
On 22/03/2018 03:22, Dong Jia Shi wrote:
> * Halil Pasic <[email protected]> [2018-03-21 13:49:54 +0100]:
>
>>
>> On 03/21/2018 03:08 AM, Dong Jia Shi wrote:
>>> From: Halil Pasic <[email protected]>
>>>
>>> If the translation of a channel program fails, we may end up attempting
>>> to clean up (free, unpin) stuff that never got translated (and allocated,
>>> pinned) in the first place.
>>>
>>> By adjusting the lengths of the chains accordingly (so the element that
>>> failed, and all subsequent elements are excluded) cleanup activities
>>> based on false assumptions can be avoided.
>>>
>>> Let's make sure cp_free works properly after cp_prefetch returns with an
>>> error by setting ch_len to 0 for the ccw chains those are not prefetched.
>> This sentence used to be:
>>
>> Let's make sure cp_free works properly after cp_prefetch returns with an
>> error.
>>
>> @Dong Jia
>> I find the 'by setting ch_len to 0 for the ccw chains those are not prefetched'
>> you added for clarification (I guess) somewhat problematic.
>> The chain in which the translation failure occurred
>> + chain->ch_len = idx;
> I made a mistake. When rewording the message, I missed this part...
> Sorry for the problem!
>
>> is shortened so that only the translated elements (ccws) are going to
>> get cleaned up (on a per element basis) by cp_free. This may or may
>> not be the first ccw. Subsequent chains are shortened to 0 as there
>> no translation took place.
>>
>> So as a result of this change only properly translated ccws are going
>> to get (re)visited by cp_free as only those may have resources bound
>> to them which need to be released.
>>
>> I'm not against improving the commit message. But this ain't
>> an improvement to me.
> You are right. How about:
> Let's make sure cp_free works properly after cp_prefetch returns with an
> error by setting ch_len of a ccw chain to the number of the translated
> ccws on that chain.
By the way, since you will propose a new version,
you have a long description of the cp_prefetch function in the code.
I think you should modify it according to the changes and describe how and
why the ch_len field of each chain is used and changed by this function.
Something like:
"
For each chain composing the channel program:
On entry ch_len hold the count of CCW to be translated.
On exit ch_len is adjusted to the count of successfully translated CCW.
This allows cp_free to find in ch_len the count of CCW to free in a chain.
"
Could also be inside the commit message.
Pierre
>
>>> Acked-by: Pierre Morel <[email protected]>
>>> Reviewed-by: Dong Jia Shi <[email protected]>
>>> Signed-off-by: Halil Pasic <[email protected]>
>>> Signed-off-by: Dong Jia Shi <[email protected]>
>>> ---
>>> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>>> index d9a2fffd034b..2be114db02f9 100644
>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>> @@ -749,11 +749,18 @@ int cp_prefetch(struct channel_program *cp)
>>> for (idx = 0; idx < len; idx++) {
>>> ret = ccwchain_fetch_one(chain, idx, cp);
>>> if (ret)
>>> - return ret;
>>> + goto out_err;
>>> }
>>> }
>>>
>>> return 0;
>>> +out_err:
>>> + /* Only cleanup the chain elements that where actually translated. */
>>> + chain->ch_len = idx;
>>> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
>>> + chain->ch_len = 0;
>>> + }
>>> + return ret;
>>> }
>>>
>>> /**
>>>
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
On 03/22/2018 10:37 AM, Pierre Morel wrote:
> On 22/03/2018 03:22, Dong Jia Shi wrote:
>> * Halil Pasic <[email protected]> [2018-03-21 13:49:54 +0100]:
>>
>>>
>>> On 03/21/2018 03:08 AM, Dong Jia Shi wrote:
>>>> From: Halil Pasic <[email protected]>
>>>>
>>>> If the translation of a channel program fails, we may end up attempting
>>>> to clean up (free, unpin) stuff that never got translated (and allocated,
>>>> pinned) in the first place.
>>>>
>>>> By adjusting the lengths of the chains accordingly (so the element that
>>>> failed, and all subsequent elements are excluded) cleanup activities
>>>> based on false assumptions can be avoided.
>>>>
>>>> Let's make sure cp_free works properly after cp_prefetch returns with an
>>>> error by setting ch_len to 0 for the ccw chains those are not prefetched.
>>> This sentence used to be:
>>>
>>> Let's make sure cp_free works properly after cp_prefetch returns with an
>>> error.
>>>
>>> @Dong Jia
>>> I find the 'by setting ch_len to 0 for the ccw chains those are not prefetched'
>>> you added for clarification (I guess) somewhat problematic.
>>> The chain in which the translation failure occurred
>>> + chain->ch_len = idx;
>> I made a mistake. When rewording the message, I missed this part...
>> Sorry for the problem!
np
>>
>>> is shortened so that only the translated elements (ccws) are going to
>>> get cleaned up (on a per element basis) by cp_free. This may or may
>>> not be the first ccw. Subsequent chains are shortened to 0 as there
>>> no translation took place.
>>>
>>> So as a result of this change only properly translated ccws are going
>>> to get (re)visited by cp_free as only those may have resources bound
>>> to them which need to be released.
>>>
>>> I'm not against improving the commit message. But this ain't
>>> an improvement to me.
>> You are right. How about:
>> Let's make sure cp_free works properly after cp_prefetch returns with an
>> error by setting ch_len of a ccw chain to the number of the translated
>> ccws on that chain.
Works with me.
>
> By the way, since you will propose a new version,
> you have a long description of the cp_prefetch function in the code.
> I think you should modify it according to the changes and describe how and
> why the ch_len field of each chain is used and changed by this function.
>
> Something like:
>
> "
> For each chain composing the channel program:
> On entry ch_len hold the count of CCW to be translated.
> On exit ch_len is adjusted to the count of successfully translated CCW.
>
> This allows cp_free to find in ch_len the count of CCW to free in a chain.
> "
Sounds good to me.
Halil
On Wed, 21 Mar 2018 03:08:18 +0100
Dong Jia Shi <[email protected]> wrote:
> Hi Conny,
>
> Halil reported a host crash when using vfio-ccw. The root cause of the problem
> is that vfio_pin_pages fails with EINVAL for reasons unknown. He has
> experienced such failures after online-ing a dasd in the guest (the dasd has 3
> partitions, hat may or may not have any significance). The problem isn't
> experienced on every attempt to online the dasd, and breaking at css_do_ssch
> seems to make things work.
>
> One thing is sure: the host kernel should not crash under the described
> circumstances.
>
> To fix the problem, the first patch of this series fixes the cleanup when
> cp_prefetch fails in the higher level. The 2nd and the 3rd patches provide
> correctness and denfensive actions for the interfaces in the lower level.
So, is the first patch stable material?
>
> The 4th patch is trying to add tracepoints for vfio-ccw, so that we can debug
> such issue easier in future.
Tracepoints are nice :)
>
> For details see the commit message portions of the inividual patches.
Still digging through the post-vacation mail pile, will do more looking
later.
> Thanks.
>
> Dong Jia Shi (2):
> vfio: ccw: refactor and improve pfn_array_alloc_pin()
> vfio: ccw: set ccw->cda to NULL defensively
>
> Halil Pasic (2):
> vfio: ccw: fix cleanup if cp_prefetch fails
> vfio: ccw: add traceponits for interesting error paths
>
> drivers/s390/cio/Makefile | 1 +
> drivers/s390/cio/vfio_ccw_cp.c | 121 ++++++++++++++++++++------------------
> drivers/s390/cio/vfio_ccw_fsm.c | 13 ++++
> drivers/s390/cio/vfio_ccw_trace.h | 86 +++++++++++++++++++++++++++
> 4 files changed, 163 insertions(+), 58 deletions(-)
> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
>
On 03/26/2018 11:02 AM, Cornelia Huck wrote:
> On Wed, 21 Mar 2018 03:08:18 +0100
> Dong Jia Shi <[email protected]> wrote:
>
>> Hi Conny,
>>
>> Halil reported a host crash when using vfio-ccw. The root cause of the problem
>> is that vfio_pin_pages fails with EINVAL for reasons unknown. He has
>> experienced such failures after online-ing a dasd in the guest (the dasd has 3
>> partitions, hat may or may not have any significance). The problem isn't
>> experienced on every attempt to online the dasd, and breaking at css_do_ssch
>> seems to make things work.
>>
>> One thing is sure: the host kernel should not crash under the described
>> circumstances.
>>
>> To fix the problem, the first patch of this series fixes the cleanup when
>> cp_prefetch fails in the higher level. The 2nd and the 3rd patches provide
>> correctness and denfensive actions for the interfaces in the lower level.
>
> So, is the first patch stable material?
I think it should be.
>
>>
>> The 4th patch is trying to add tracepoints for vfio-ccw, so that we can debug
>> such issue easier in future.
>
> Tracepoints are nice :)
>
>>
>> For details see the commit message portions of the inividual patches.
>
> Still digging through the post-vacation mail pile, will do more looking
> later.
>
>
>> Thanks.
>>
>> Dong Jia Shi (2):
>> vfio: ccw: refactor and improve pfn_array_alloc_pin()
>> vfio: ccw: set ccw->cda to NULL defensively
>>
>> Halil Pasic (2):
>> vfio: ccw: fix cleanup if cp_prefetch fails
>> vfio: ccw: add traceponits for interesting error paths
>>
>> drivers/s390/cio/Makefile | 1 +
>> drivers/s390/cio/vfio_ccw_cp.c | 121 ++++++++++++++++++++------------------
>> drivers/s390/cio/vfio_ccw_fsm.c | 13 ++++
>> drivers/s390/cio/vfio_ccw_trace.h | 86 +++++++++++++++++++++++++++
>> 4 files changed, 163 insertions(+), 58 deletions(-)
>> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
>>
>
On Thu, 22 Mar 2018 10:37:36 +0100
Pierre Morel <[email protected]> wrote:
> On 22/03/2018 03:22, Dong Jia Shi wrote:
> > * Halil Pasic <[email protected]> [2018-03-21 13:49:54 +0100]:
> >
> >>
> >> On 03/21/2018 03:08 AM, Dong Jia Shi wrote:
> >>> From: Halil Pasic <[email protected]>
> >>>
> >>> If the translation of a channel program fails, we may end up attempting
> >>> to clean up (free, unpin) stuff that never got translated (and allocated,
> >>> pinned) in the first place.
> >>>
> >>> By adjusting the lengths of the chains accordingly (so the element that
> >>> failed, and all subsequent elements are excluded) cleanup activities
> >>> based on false assumptions can be avoided.
> >>>
> >>> Let's make sure cp_free works properly after cp_prefetch returns with an
> >>> error by setting ch_len to 0 for the ccw chains those are not prefetched.
> >> This sentence used to be:
> >>
> >> Let's make sure cp_free works properly after cp_prefetch returns with an
> >> error.
> >>
> >> @Dong Jia
> >> I find the 'by setting ch_len to 0 for the ccw chains those are not prefetched'
> >> you added for clarification (I guess) somewhat problematic.
> >> The chain in which the translation failure occurred
> >> + chain->ch_len = idx;
> > I made a mistake. When rewording the message, I missed this part...
> > Sorry for the problem!
> >
> >> is shortened so that only the translated elements (ccws) are going to
> >> get cleaned up (on a per element basis) by cp_free. This may or may
> >> not be the first ccw. Subsequent chains are shortened to 0 as there
> >> no translation took place.
> >>
> >> So as a result of this change only properly translated ccws are going
> >> to get (re)visited by cp_free as only those may have resources bound
> >> to them which need to be released.
> >>
> >> I'm not against improving the commit message. But this ain't
> >> an improvement to me.
> > You are right. How about:
> > Let's make sure cp_free works properly after cp_prefetch returns with an
> > error by setting ch_len of a ccw chain to the number of the translated
> > ccws on that chain.
>
> By the way, since you will propose a new version,
> you have a long description of the cp_prefetch function in the code.
> I think you should modify it according to the changes and describe how and
> why the ch_len field of each chain is used and changed by this function.
>
> Something like:
>
> "
> For each chain composing the channel program:
> On entry ch_len hold the count of CCW to be translated.
s/hold/holds/ ?
> On exit ch_len is adjusted to the count of successfully translated CCW.
>
> This allows cp_free to find in ch_len the count of CCW to free in a chain.
> "
>
> Could also be inside the commit message.
>
> Pierre
>
>
> >
> >>> Acked-by: Pierre Morel <[email protected]>
> >>> Reviewed-by: Dong Jia Shi <[email protected]>
> >>> Signed-off-by: Halil Pasic <[email protected]>
> >>> Signed-off-by: Dong Jia Shi <[email protected]>
> >>> ---
> >>> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++++-
> >>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> >>> index d9a2fffd034b..2be114db02f9 100644
> >>> --- a/drivers/s390/cio/vfio_ccw_cp.c
> >>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> >>> @@ -749,11 +749,18 @@ int cp_prefetch(struct channel_program *cp)
> >>> for (idx = 0; idx < len; idx++) {
> >>> ret = ccwchain_fetch_one(chain, idx, cp);
> >>> if (ret)
> >>> - return ret;
> >>> + goto out_err;
> >>> }
> >>> }
> >>>
> >>> return 0;
> >>> +out_err:
> >>> + /* Only cleanup the chain elements that where actually translated. */
s/where/were/
> >>> + chain->ch_len = idx;
> >>> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
> >>> + chain->ch_len = 0;
> >>> + }
> >>> + return ret;
> >>> }
> >>>
> >>> /**
> >>>
>
On Wed, 21 Mar 2018 03:08:20 +0100
Dong Jia Shi <[email protected]> wrote:
> This refactors pfn_array_alloc_pin() and also improves it by adding
> defensive code in error handling so that calling pfn_array_unpin_free()
> after error return won't lead to problem. This mains does:
> 1. Merge pfn_array_pin() into pfn_array_alloc_pin(), since there is no
> other user of pfn_array_pin(). As a result, also remove kernel-doc
> for pfn_array_pin() and add kernel-doc for pfn_array_alloc_pin().
> 2. For a vfio_pin_pages() failure, set pa->pa_nr to zero to indicate
> zero pages were pinned.
> 3. Set pa->pa_iova_pfn to NULL right after it was freed.
>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 84 ++++++++++++++++++------------------------
> 1 file changed, 36 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 2be114db02f9..3abc9770910a 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -46,65 +46,32 @@ struct ccwchain {
> };
>
> /*
> - * pfn_array_pin() - pin user pages in memory
> + * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
> * @pa: pfn_array on which to perform the operation
> * @mdev: the mediated device to perform pin/unpin operations
> + * @iova: target guest physical address
> + * @len: number of bytes that should be pinned from @iova
> *
> - * Attempt to pin user pages in memory.
> + * Attempt to allocate memory for PFNs, and pin user pages in memory.
> *
> * Usage of pfn_array:
> - * @pa->pa_iova starting guest physical I/O address. Assigned by caller.
> + * @pa->pa_iova starting guest physical I/O address. Assigned by callee.
> * @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
> - * by caller.
> + * by callee.
> * @pa->pa_pfn array that receives PFNs of the pages pinned. Allocated by
> - * caller.
> - * @pa->pa_nr number of pages from @pa->pa_iova to pin. Assigned by
> - * caller.
> - * number of pages pinned. Assigned by callee.
> + * callee.
> + * @pa->pa_nr initiated as 0 by caller.
s/initiated/initialized/
but see below
> + * number of pages pinned from @pa->pa_iova. Assigned by callee.
So, basically everything is filled by pfn_array_alloc_pin()? Should we
expect a clean struct pfn_array handed in by the caller, then (not just
pa_nr == 0)?
Would it make sense to describe the contents of the struct pfn_array
fields at the struct's definition instead? You could then shorten the
description here to "we expect pa_nr == 0, any field in this structure
will be filled in by this function".
> *
> * Returns:
> * Number of pages pinned on success.
> - * If @pa->pa_nr is 0 or negative, returns 0.
> + * If @pa->pa_nr is not 0 initially, returns -EINVAL.
> * If no pages were pinned, returns -errno.
> */
The change itself looks good to me.
On Wed, 21 Mar 2018 03:08:21 +0100
Dong Jia Shi <[email protected]> wrote:
> Let's avoid free on ccw->cda that points to a guest address
> or a already freed memory area by setting it to NULL if memory
> allocation didn't happen or failed.
Does not hurt, I guess.
>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 28 +++++++++++++++++++---------
> 1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 3abc9770910a..5958c35ab343 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -491,7 +491,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
> struct ccw1 *ccw;
> struct pfn_array_table *pat;
> unsigned long *idaws;
> - int idaw_nr;
> + int idaw_nr, ret;
>
> ccw = chain->ch_ccw + idx;
>
> @@ -511,18 +511,20 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
> * needed when translating a direct ccw to a idal ccw.
> */
> pat = chain->ch_pat + idx;
> - if (pfn_array_table_init(pat, 1))
> - return -ENOMEM;
> - idaw_nr = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> + ret = pfn_array_table_init(pat, 1);
> + if (ret)
> + goto out_init;
> +
> + idaw_nr = ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
Ugh, I don't like setting both at the same time... only set idaw_nr for
ret > 0? (personal preference)
> ccw->cda, ccw->count);
> - if (idaw_nr < 0)
> - return idaw_nr;
> + if (ret < 0)
> + goto out_init;
On Wed, 21 Mar 2018 03:08:22 +0100
Dong Jia Shi <[email protected]> wrote:
> From: Halil Pasic <[email protected]>
>
> Add some tracepoints so we can inspect what is not working as is should.
Tracepoints are certainly helpful (we can add more later).
>
> Signed-off-by: Halil Pasic <[email protected]>
> Signed-off-by: Dong Jia Shi <[email protected]>
> ---
> drivers/s390/cio/Makefile | 1 +
> drivers/s390/cio/vfio_ccw_fsm.c | 13 ++++++
> drivers/s390/cio/vfio_ccw_trace.h | 86 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 100 insertions(+)
> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
>
> diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
> index a070ef0efe65..f230516abb96 100644
> --- a/drivers/s390/cio/Makefile
> +++ b/drivers/s390/cio/Makefile
> @@ -5,6 +5,7 @@
>
> # The following is required for define_trace.h to find ./trace.h
> CFLAGS_trace.o := -I$(src)
> +CFLAGS_vfio_ccw_fsm.o := -I$(src)
>
> obj-y += airq.o blacklist.o chsc.o cio.o css.o chp.o idset.o isc.o \
> fcx.o itcw.o crw.o ccwreq.o trace.o ioasm.o
> diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
> index c30420c517b1..7ed27f90d741 100644
> --- a/drivers/s390/cio/vfio_ccw_fsm.c
> +++ b/drivers/s390/cio/vfio_ccw_fsm.c
> @@ -13,6 +13,9 @@
> #include "ioasm.h"
> #include "vfio_ccw_private.h"
>
> +#define CREATE_TRACE_POINTS
> +#include "vfio_ccw_trace.h"
> +
> static int fsm_io_helper(struct vfio_ccw_private *private)
> {
> struct subchannel *sch;
> @@ -105,6 +108,10 @@ static void fsm_disabled_irq(struct vfio_ccw_private *private,
> */
> cio_disable_subchannel(sch);
> }
> +inline struct subchannel_id get_schid(struct vfio_ccw_private *p)
> +{
> + return p->sch->schid;
> +}
>
> /*
> * Deal with the ccw command request from the userspace.
> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>
> io_region->ret_code = cp_prefetch(&private->cp);
> if (io_region->ret_code) {
> + trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
> + io_region->ret_code);
> cp_free(&private->cp);
> goto err_out;
> }
> @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> /* Start channel program and wait for I/O interrupt. */
> io_region->ret_code = fsm_io_helper(private);
> if (io_region->ret_code) {
> + trace_vfio_ccw_ssch_failed(get_schid(private),
> + io_region->ret_code);
> cp_free(&private->cp);
> goto err_out;
> }
> @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> /* XXX: Handle halt. */
> io_region->ret_code = -EOPNOTSUPP;
> + trace_vfio_ccw_halt(get_schid(private));
> goto err_out;
> } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> /* XXX: Handle clear. */
> io_region->ret_code = -EOPNOTSUPP;
> + trace_vfio_ccw_clear(get_schid(private));
> goto err_out;
Hmmm.... perhaps better to just trace the function (start/halt/clear)
in any case?
> }
>
> diff --git a/drivers/s390/cio/vfio_ccw_trace.h b/drivers/s390/cio/vfio_ccw_trace.h
> new file mode 100644
> index 000000000000..edd3321cd919
> --- /dev/null
> +++ b/drivers/s390/cio/vfio_ccw_trace.h
> @@ -0,0 +1,86 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + * Tracepoints for vfio_ccw driver
> + *
> + * Copyright IBM Corp. 2018
> + *
> + * Author(s): Dong Jia Shi <[email protected]>
> + * Halil Pasic <[email protected]>
> + */
> +
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM vfio_ccw
> +
> +#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ)
> +#define _VFIO_CCW_TRACE_
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(vfio_ccw_cp_prefetch_failed,
> + TP_PROTO(struct subchannel_id schid, int errno),
> + TP_ARGS(schid, errno),
> +
> + TP_STRUCT__entry(
> + __field_struct(struct subchannel_id, schid)
> + __field(int, errno)
> + ),
> +
> + TP_fast_assign(
> + __entry->schid = schid;
> + __entry->errno = errno;
> + ),
> +
> + TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)",
> + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
> +);
> +
> +TRACE_EVENT(vfio_ccw_ssch_failed,
> + TP_PROTO(struct subchannel_id schid, int errno),
> + TP_ARGS(schid, errno),
> +
> + TP_STRUCT__entry(
> + __field_struct(struct subchannel_id, schid)
> + __field(int, errno)
> + ),
> +
> + TP_fast_assign(
> + __entry->schid = schid;
> + __entry->errno = errno;
> + ),
> +
> + TP_printk("(schid 0.%x.%04X) ssch failed (errno: %d)",
> + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
> +);
> +
> +DECLARE_EVENT_CLASS(vfio_ccw_notsupp,
> + TP_PROTO(struct subchannel_id schid),
> + TP_ARGS(schid),
> +
> + TP_STRUCT__entry(
> + __field_struct(struct subchannel_id, schid)
> + ),
> +
> + TP_fast_assign(
> + __entry->schid = schid;
> + ),
> +
> + TP_printk("(schid 0.%x.%04X) request not supported",
> + __entry->schid.ssid, __entry->schid.sch_no)
> +);
Especially as I don't plan to leave this unsupported for too long :)
Just tracing the function is useful now and will stay useful in the
future.
Another idea: Trace the fsm state transitions. Probably something for
an additional patch.
> +
> +DEFINE_EVENT(vfio_ccw_notsupp, vfio_ccw_clear,
> + TP_PROTO(struct subchannel_id schid), TP_ARGS(schid));
> +
> +DEFINE_EVENT(vfio_ccw_notsupp, vfio_ccw_halt,
> + TP_PROTO(struct subchannel_id schid), TP_ARGS(schid));
> +
> +#endif /* _VFIO_CCW_TRACE_ */
> +
> +/* This part must be outside protection */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH .
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE vfio_ccw_trace
> +
> +#include <trace/define_trace.h>
On Tue, 27 Mar 2018 11:00:26 +0800
Dong Jia Shi <[email protected]> wrote:
> * Cornelia Huck <[email protected]> [2018-03-26 15:28:46 +0200]:
>
> > On Wed, 21 Mar 2018 03:08:20 +0100
> > Dong Jia Shi <[email protected]> wrote:
> >
> > > This refactors pfn_array_alloc_pin() and also improves it by adding
> > > defensive code in error handling so that calling pfn_array_unpin_free()
> > > after error return won't lead to problem. This mains does:
> > > 1. Merge pfn_array_pin() into pfn_array_alloc_pin(), since there is no
> > > other user of pfn_array_pin(). As a result, also remove kernel-doc
> > > for pfn_array_pin() and add kernel-doc for pfn_array_alloc_pin().
> > > 2. For a vfio_pin_pages() failure, set pa->pa_nr to zero to indicate
> > > zero pages were pinned.
> > > 3. Set pa->pa_iova_pfn to NULL right after it was freed.
> > >
> > > Signed-off-by: Dong Jia Shi <[email protected]>
> > > ---
> > > drivers/s390/cio/vfio_ccw_cp.c | 84 ++++++++++++++++++------------------------
> > > 1 file changed, 36 insertions(+), 48 deletions(-)
> > >
> > > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> > > index 2be114db02f9..3abc9770910a 100644
> > > --- a/drivers/s390/cio/vfio_ccw_cp.c
> > > +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > > @@ -46,65 +46,32 @@ struct ccwchain {
> > > };
> > >
> > > /*
> > > - * pfn_array_pin() - pin user pages in memory
> > > + * pfn_array_alloc_pin() - alloc memory for PFNs, then pin user pages in memory
> > > * @pa: pfn_array on which to perform the operation
> > > * @mdev: the mediated device to perform pin/unpin operations
> > > + * @iova: target guest physical address
> > > + * @len: number of bytes that should be pinned from @iova
> > > *
> > > - * Attempt to pin user pages in memory.
> > > + * Attempt to allocate memory for PFNs, and pin user pages in memory.
> > > *
> > > * Usage of pfn_array:
> > > - * @pa->pa_iova starting guest physical I/O address. Assigned by caller.
> > > + * @pa->pa_iova starting guest physical I/O address. Assigned by callee.
> > > * @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
> > > - * by caller.
> > > + * by callee.
> > > * @pa->pa_pfn array that receives PFNs of the pages pinned. Allocated by
> > > - * caller.
> > > - * @pa->pa_nr number of pages from @pa->pa_iova to pin. Assigned by
> > > - * caller.
> > > - * number of pages pinned. Assigned by callee.
> > > + * callee.
> > > + * @pa->pa_nr initiated as 0 by caller.
> >
> > s/initiated/initialized/
> Ok.
>
> >
> > but see below
> >
> > > + * number of pages pinned from @pa->pa_iova. Assigned by callee.
> >
> > So, basically everything is filled by pfn_array_alloc_pin()?
> Yes.
>
> > Should we expect a clean struct pfn_array handed in by the caller,
> > then (not just pa_nr == 0)?
> The current idea is:
> - It is a clean struct that pfn_array_alloc_pin() expects from its
> caller.
> - pfn_array_alloc_pin() and pfn_array_unpin_free() should be used in
> pair. They are the only functions those change the values of the
> elements of a pfn_array struct.
> - Caller of pfn_array_alloc_pin() should either hand in a new allocated
> pfn_array (zeroed out), or a freed-after-used one.
> - So using pa_nr == 0, is enough to identify all the good cases.
> [We set pa_nr to 0 in pfn_array_unpin_free().]
>
> Validating all of the elements only helps when there is case that a
> caller breaks the usage rule of these interfaces - the caller itself
> assigns values for pfn_pa elements directly... I don't think we allow
> this to happen.
>
> So I think the current logic is fine.
Yes, I think it is fine -- I was mainly wondering whether we wanted
more sanity checks.
>
> >
> > Would it make sense to describe the contents of the struct pfn_array
> > fields at the struct's definition instead? You could then shorten the
> > description here to "we expect pa_nr == 0, any field in this structure
> > will be filled in by this function".
> Sounds good!
> Do you want a separated patch for this, or I do this change on this
> patch? Either will be ok with me.
Perhaps as an additional patch in front of this one?
On Tue, 27 Mar 2018 11:08:09 +0800
Dong Jia Shi <[email protected]> wrote:
> * Cornelia Huck <[email protected]> [2018-03-26 15:47:53 +0200]:
>
> > On Wed, 21 Mar 2018 03:08:21 +0100
> > Dong Jia Shi <[email protected]> wrote:
> >
> > > Let's avoid free on ccw->cda that points to a guest address
> > > or a already freed memory area by setting it to NULL if memory
> > > allocation didn't happen or failed.
> >
> > Does not hurt, I guess.
> >
> > >
> > > Signed-off-by: Dong Jia Shi <[email protected]>
> > > ---
> > > drivers/s390/cio/vfio_ccw_cp.c | 28 +++++++++++++++++++---------
> > > 1 file changed, 19 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> > > index 3abc9770910a..5958c35ab343 100644
> > > --- a/drivers/s390/cio/vfio_ccw_cp.c
> > > +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > > @@ -491,7 +491,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
> > > struct ccw1 *ccw;
> > > struct pfn_array_table *pat;
> > > unsigned long *idaws;
> > > - int idaw_nr;
> > > + int idaw_nr, ret;
> > >
> > > ccw = chain->ch_ccw + idx;
> > >
> > > @@ -511,18 +511,20 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
> > > * needed when translating a direct ccw to a idal ccw.
> > > */
> > > pat = chain->ch_pat + idx;
> > > - if (pfn_array_table_init(pat, 1))
> > > - return -ENOMEM;
> > > - idaw_nr = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> > > + ret = pfn_array_table_init(pat, 1);
> > > + if (ret)
> > > + goto out_init;
> > > +
> > > + idaw_nr = ret = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
> >
> > Ugh, I don't like setting both at the same time... only set idaw_nr for
> > ret > 0? (personal preference)
> >
> Ah, we don't need idaw_nr anymore. I think I should just replace it with
> ret.
Let's do that.
>
> > > ccw->cda, ccw->count);
> > > - if (idaw_nr < 0)
> > > - return idaw_nr;
> > > + if (ret < 0)
> > > + goto out_init;
> >
>
On Tue, 27 Mar 2018 15:51:14 +0800
Dong Jia Shi <[email protected]> wrote:
> * Cornelia Huck <[email protected]> [2018-03-26 15:59:02 +0200]:
>
> [...]
>
> > > @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> > >
> > > io_region->ret_code = cp_prefetch(&private->cp);
> > > if (io_region->ret_code) {
> > > + trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
> > > + io_region->ret_code);
> > > cp_free(&private->cp);
> > > goto err_out;
> > > }
> > > @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> > > /* Start channel program and wait for I/O interrupt. */
> > > io_region->ret_code = fsm_io_helper(private);
> > > if (io_region->ret_code) {
> > > + trace_vfio_ccw_ssch_failed(get_schid(private),
> > > + io_region->ret_code);
> > > cp_free(&private->cp);
> > > goto err_out;
> > > }
> > > @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> > > } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> > > /* XXX: Handle halt. */
> > > io_region->ret_code = -EOPNOTSUPP;
> > > + trace_vfio_ccw_halt(get_schid(private));
> > > goto err_out;
> > > } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> > > /* XXX: Handle clear. */
> > > io_region->ret_code = -EOPNOTSUPP;
> > > + trace_vfio_ccw_clear(get_schid(private));
> > > goto err_out;
> >
> > Hmmm.... perhaps better to just trace the function (start/halt/clear)
> > in any case?
> >
> I agree trace the function in any case is good. @Halil, opinion?
>
> But the traces for cp_prefetch() and fsm_io_helper() should also be
> kept, since they are helpful to debug problem. So I tend to trace the
> following in any case:
> - cp_prefetch()
> - fsm_io_helper()
> - start
> - halt
> - clear
OK, I was unclear :) I'd argue to keep the others, just replace the
halt/clear tracing with tracing the function.
>
> > > }
> > >
> > > diff --git a/drivers/s390/cio/vfio_ccw_trace.h b/drivers/s390/cio/vfio_ccw_trace.h
> > > new file mode 100644
> > > index 000000000000..edd3321cd919
> > > --- /dev/null
> > > +++ b/drivers/s390/cio/vfio_ccw_trace.h
> > > @@ -0,0 +1,86 @@
> > > +/* SPDX-License-Identifier: GPL-2.0
> > > + * Tracepoints for vfio_ccw driver
> > > + *
> > > + * Copyright IBM Corp. 2018
> > > + *
> > > + * Author(s): Dong Jia Shi <[email protected]>
> > > + * Halil Pasic <[email protected]>
> > > + */
> > > +
> > > +
> > > +#undef TRACE_SYSTEM
> > > +#define TRACE_SYSTEM vfio_ccw
> > > +
> > > +#if !defined(_VFIO_CCW_TRACE_) || defined(TRACE_HEADER_MULTI_READ)
> > > +#define _VFIO_CCW_TRACE_
> > > +
> > > +#include <linux/tracepoint.h>
> > > +
> > > +TRACE_EVENT(vfio_ccw_cp_prefetch_failed,
> > > + TP_PROTO(struct subchannel_id schid, int errno),
> > > + TP_ARGS(schid, errno),
> > > +
> > > + TP_STRUCT__entry(
> > > + __field_struct(struct subchannel_id, schid)
> > > + __field(int, errno)
> > > + ),
> > > +
> > > + TP_fast_assign(
> > > + __entry->schid = schid;
> > > + __entry->errno = errno;
> > > + ),
> > > +
> > > + TP_printk("(schid 0.%x.%04X) translation failed (errno: %d)",
> > > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
> > > +);
> > > +
> > > +TRACE_EVENT(vfio_ccw_ssch_failed,
> > > + TP_PROTO(struct subchannel_id schid, int errno),
> > > + TP_ARGS(schid, errno),
> > > +
> > > + TP_STRUCT__entry(
> > > + __field_struct(struct subchannel_id, schid)
> > > + __field(int, errno)
> > > + ),
> > > +
> > > + TP_fast_assign(
> > > + __entry->schid = schid;
> > > + __entry->errno = errno;
> > > + ),
> > > +
> > > + TP_printk("(schid 0.%x.%04X) ssch failed (errno: %d)",
> > > + __entry->schid.ssid, __entry->schid.sch_no, __entry->errno)
> > > +);
> > > +
> > > +DECLARE_EVENT_CLASS(vfio_ccw_notsupp,
> > > + TP_PROTO(struct subchannel_id schid),
> > > + TP_ARGS(schid),
> > > +
> > > + TP_STRUCT__entry(
> > > + __field_struct(struct subchannel_id, schid)
> > > + ),
> > > +
> > > + TP_fast_assign(
> > > + __entry->schid = schid;
> > > + ),
> > > +
> > > + TP_printk("(schid 0.%x.%04X) request not supported",
> > > + __entry->schid.ssid, __entry->schid.sch_no)
> > > +);
> >
> > Especially as I don't plan to leave this unsupported for too long :)
> >
> > Just tracing the function is useful now and will stay useful in the
> > future.
> If we agree with ideas given above, we could:
> 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno
> 2. DEFINE_EVENT:
> vfio_ccw_fam_io_helper
> vfio_ccw_cp_prefetch
> vfio_ccw_io_start
> vfio_ccw_io_clear
> vfio_ccw_io_halt
Use a vfio_ccw_io_fctl tracepoint instead?
> 3. add trace points in coresponding places
>
> >
> > Another idea: Trace the fsm state transitions. Probably something for
> > an additional patch.
> Considering Pierre is refactoring the fsm, we can add trace points in
> that series (or as following on patch).
Yes, while poking around I also wondered whether we should tweak the
fsm in places. So adding tracepoints there looks like a good idea.
>
> >
> >
> > > +
> > > +DEFINE_EVENT(vfio_ccw_notsupp, vfio_ccw_clear,
> > > + TP_PROTO(struct subchannel_id schid), TP_ARGS(schid));
> > > +
> > > +DEFINE_EVENT(vfio_ccw_notsupp, vfio_ccw_halt,
> > > + TP_PROTO(struct subchannel_id schid), TP_ARGS(schid));
> > > +
> > > +#endif /* _VFIO_CCW_TRACE_ */
> > > +
> > > +/* This part must be outside protection */
> > > +
> > > +#undef TRACE_INCLUDE_PATH
> > > +#define TRACE_INCLUDE_PATH .
> > > +#undef TRACE_INCLUDE_FILE
> > > +#define TRACE_INCLUDE_FILE vfio_ccw_trace
> > > +
> > > +#include <trace/define_trace.h>
> >
>
On 03/27/2018 12:07 PM, Cornelia Huck wrote:
> On Tue, 27 Mar 2018 15:51:14 +0800
> Dong Jia Shi <[email protected]> wrote:
>
>> * Cornelia Huck <[email protected]> [2018-03-26 15:59:02 +0200]:
>>
>> [...]
>>
>>>> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>>>>
>>>> io_region->ret_code = cp_prefetch(&private->cp);
>>>> if (io_region->ret_code) {
>>>> + trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
>>>> + io_region->ret_code);
>>>> cp_free(&private->cp);
>>>> goto err_out;
>>>> }
>>>> @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>>>> /* Start channel program and wait for I/O interrupt. */
>>>> io_region->ret_code = fsm_io_helper(private);
>>>> if (io_region->ret_code) {
>>>> + trace_vfio_ccw_ssch_failed(get_schid(private),
>>>> + io_region->ret_code);
>>>> cp_free(&private->cp);
>>>> goto err_out;
>>>> }
>>>> @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
>>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
>>>> /* XXX: Handle halt. */
>>>> io_region->ret_code = -EOPNOTSUPP;
>>>> + trace_vfio_ccw_halt(get_schid(private));
>>>> goto err_out;
>>>> } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
>>>> /* XXX: Handle clear. */
>>>> io_region->ret_code = -EOPNOTSUPP;
>>>> + trace_vfio_ccw_clear(get_schid(private));
>>>> goto err_out;
>>>
>>> Hmmm.... perhaps better to just trace the function (start/halt/clear)
>>> in any case?
>>>
>> I agree trace the function in any case is good. @Halil, opinion?
See below. I don't really understand the question. Trace the function
means, trace when it was requested on a subch, or trace the outcome
of the request? Seems the question got amended though.
>>
>> But the traces for cp_prefetch() and fsm_io_helper() should also be
>> kept, since they are helpful to debug problem. So I tend to trace the
>> following in any case:
>> - cp_prefetch()
>> - fsm_io_helper()
>> - start
>> - halt
>> - clear
>
> OK, I was unclear :) I'd argue to keep the others, just replace the
> halt/clear tracing with tracing the function.
I'm a bit confused.
My idea was the following: Prior to this patch we had a kind of OK
possibility to trace what we consider the expected and good scenario
using the function tracer and the normal cio stuff. But what I wanted is
to verify that my fix works (problem occurs but is handled more
appropriately) and I've found it difficult to trace this. So the idea was
to introduce trace points which tell us what went wrong. The idea is to
benefit diagnostic of unrecoverable failures and get an idea how often
are we doing extra work recovering recoverable failures.
In this sense halt and clear is something that does not work currently.
When we get proper halt and clear, these trace points were supposed to
become obsolete and get removed. I guess the implementation will
eventually issue csch() and hsch() for the underlying subchannel and and
we should be able to trace those (see drivers/s390/cio/ioasm.c)
Now this is the tricky part. I've read some used to see trace points as
part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU
this is not a concern any more -- but my confidence on this is pretty
low.
So IMHO we have two questions to answer:
* Do we want static trace points (events) for undesirable or concerning
stuff happened (e.g. translation failed, a function we hope we can live
without was supposed to be executed)?
* Do we want static trace points (events) coverage for the normal path
(that is beyond what cio and the function tracer already give us)? What
benefit do we expect if we do want these? (E.g. performance evaluation,
better debugging especially when multiple virtualized subchannels used.)
>
>>
>>>> }
>>>>
[..]
>>>> +DECLARE_EVENT_CLASS(vfio_ccw_notsupp,
>>>> + TP_PROTO(struct subchannel_id schid),
>>>> + TP_ARGS(schid),
>>>> +
>>>> + TP_STRUCT__entry(
>>>> + __field_struct(struct subchannel_id, schid)
>>>> + ),
>>>> +
>>>> + TP_fast_assign(
>>>> + __entry->schid = schid;
>>>> + ),
>>>> +
>>>> + TP_printk("(schid 0.%x.%04X) request not supported",
>>>> + __entry->schid.ssid, __entry->schid.sch_no)
>>>> +);
>>>
>>> Especially as I don't plan to leave this unsupported for too long :)
Sounds great! I don't insist. Especially not if our linux guest tells
us what went wrong. @Dong Jia: What would happen should the guest for
some reason try to do a clear or a halt (e.g. we make it fail here,
guest retries a couple of times and then panicks or gives up on the
device)?
>>>
>>> Just tracing the function is useful now and will stay useful in the
>>> future.
>> If we agree with ideas given above, we could:
>> 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno
>> 2. DEFINE_EVENT:
>> vfio_ccw_fam_io_helper
>> vfio_ccw_cp_prefetch
>> vfio_ccw_io_start
>> vfio_ccw_io_clear
>> vfio_ccw_io_halt
>
> Use a vfio_ccw_io_fctl tracepoint instead?
>
That would trace what? A request to perform a basic I/O function
on the virtualized subchannel by issuing the corresponding I/O
instruction to the underlying subchannel?
If that's the case, I think using the trace events from
drivers/s390/cio/ioasm.c (tracing the instructions) may suffice for
the 'good case'.
>> 3. add trace points in coresponding places
>>
>>>
>>> Another idea: Trace the fsm state transitions. Probably something for
>>> an additional patch.
>> Considering Pierre is refactoring the fsm, we can add trace points in
>> that series (or as following on patch).
>
> Yes, while poking around I also wondered whether we should tweak the
> fsm in places. So adding tracepoints there looks like a good idea.
>
How does this relate to normal cio? I don't think cio has such trace
points capturing the state machine transitions. I wonder why? Spontaneously
I would say sounds like something useful.
I'm still pondering what are we trying to achieve.
Regards,
Halil
On Wed, 28 Mar 2018 10:36:38 +0800
Dong Jia Shi <[email protected]> wrote:
> * Cornelia Huck <[email protected]> [2018-03-27 12:01:27 +0200]:
>
> [...]
>
> > > >
> > > > So, basically everything is filled by pfn_array_alloc_pin()?
> > > Yes.
> > >
> > > > Should we expect a clean struct pfn_array handed in by the caller,
> > > > then (not just pa_nr == 0)?
> > > The current idea is:
> > > - It is a clean struct that pfn_array_alloc_pin() expects from its
> > > caller.
> > > - pfn_array_alloc_pin() and pfn_array_unpin_free() should be used in
> > > pair. They are the only functions those change the values of the
> > > elements of a pfn_array struct.
> > > - Caller of pfn_array_alloc_pin() should either hand in a new allocated
> > > pfn_array (zeroed out), or a freed-after-used one.
> > > - So using pa_nr == 0, is enough to identify all the good cases.
> > > [We set pa_nr to 0 in pfn_array_unpin_free().]
> > >
> > > Validating all of the elements only helps when there is case that a
> > > caller breaks the usage rule of these interfaces - the caller itself
> > > assigns values for pfn_pa elements directly... I don't think we allow
> > > this to happen.
> > >
> > > So I think the current logic is fine.
> >
> > Yes, I think it is fine -- I was mainly wondering whether we wanted
> > more sanity checks.
> >
> Ok.
> Check on (pa->pa_iova_pfn != NULL) could be added. It's easy to do so.
> Check on pa->pa_iova doesn't make sense, since its value will be
> re-assigned anyway.
> Check on pa->pa_pfn doesn't make sense, since we treat it as a pointer
> that points to part of the memory area that was pointed by
> pa->pa_iova_pfn. And we will re-assign it with new pa->pa_iova_pfn
> value.
Yeah, so additional checks are probably not very useful.
>
> > >
> > > >
> > > > Would it make sense to describe the contents of the struct pfn_array
> > > > fields at the struct's definition instead? You could then shorten the
> > > > description here to "we expect pa_nr == 0, any field in this structure
> > > > will be filled in by this function".
> > > Sounds good!
> > > Do you want a separated patch for this, or I do this change on this
> > > patch? Either will be ok with me.
> >
> > Perhaps as an additional patch in front of this one?
> >
> It's doable. I will do that.
>
Cool, thx!
On Tue, 27 Mar 2018 17:27:54 +0200
Halil Pasic <[email protected]> wrote:
> On 03/27/2018 12:07 PM, Cornelia Huck wrote:
> > On Tue, 27 Mar 2018 15:51:14 +0800
> > Dong Jia Shi <[email protected]> wrote:
> >
> >> * Cornelia Huck <[email protected]> [2018-03-26 15:59:02 +0200]:
> >>
> >> [...]
> >>
> >>>> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>
> >>>> io_region->ret_code = cp_prefetch(&private->cp);
> >>>> if (io_region->ret_code) {
> >>>> + trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> }
> >>>> @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>> /* Start channel program and wait for I/O interrupt. */
> >>>> io_region->ret_code = fsm_io_helper(private);
> >>>> if (io_region->ret_code) {
> >>>> + trace_vfio_ccw_ssch_failed(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> }
> >>>> @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> >>>> /* XXX: Handle halt. */
> >>>> io_region->ret_code = -EOPNOTSUPP;
> >>>> + trace_vfio_ccw_halt(get_schid(private));
> >>>> goto err_out;
> >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> >>>> /* XXX: Handle clear. */
> >>>> io_region->ret_code = -EOPNOTSUPP;
> >>>> + trace_vfio_ccw_clear(get_schid(private));
> >>>> goto err_out;
> >>>
> >>> Hmmm.... perhaps better to just trace the function (start/halt/clear)
> >>> in any case?
> >>>
> >> I agree trace the function in any case is good. @Halil, opinion?
>
> See below. I don't really understand the question. Trace the function
> means, trace when it was requested on a subch, or trace the outcome
> of the request? Seems the question got amended though.
Both tracing the functions per se and tracing their outcome has its
benefits IME.
>
> >>
> >> But the traces for cp_prefetch() and fsm_io_helper() should also be
> >> kept, since they are helpful to debug problem. So I tend to trace the
> >> following in any case:
> >> - cp_prefetch()
> >> - fsm_io_helper()
> >> - start
> >> - halt
> >> - clear
> >
> > OK, I was unclear :) I'd argue to keep the others, just replace the
> > halt/clear tracing with tracing the function.
>
> I'm a bit confused.
>
> My idea was the following: Prior to this patch we had a kind of OK
> possibility to trace what we consider the expected and good scenario
> using the function tracer and the normal cio stuff. But what I wanted is
> to verify that my fix works (problem occurs but is handled more
> appropriately) and I've found it difficult to trace this. So the idea was
> to introduce trace points which tell us what went wrong. The idea is to
> benefit diagnostic of unrecoverable failures and get an idea how often
> are we doing extra work recovering recoverable failures.
>
> In this sense halt and clear is something that does not work currently.
> When we get proper halt and clear, these trace points were supposed to
> become obsolete and get removed. I guess the implementation will
> eventually issue csch() and hsch() for the underlying subchannel and and
> we should be able to trace those (see drivers/s390/cio/ioasm.c)
Tracing what userspace expects us to do has its benefits as well (i.e.
do we get mainly ssch? unexpected amounts of csch? etc.). I find it
useful to be able to get this information from the vfio-ccw layer
already.
>
> Now this is the tricky part. I've read some used to see trace points as
> part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU
> this is not a concern any more -- but my confidence on this is pretty
> low.
I'd not worry about that.
FWIW, for kvm/s390 I tried to do the following:
- one set of tracepoints for things that are mandated by the
architecture and therefore expected to be stable
- and another set of tracepoints for implementation details that have a
good change of changing
>
> So IMHO we have two questions to answer:
>
> * Do we want static trace points (events) for undesirable or concerning
> stuff happened (e.g. translation failed, a function we hope we can live
> without was supposed to be executed)?
>
> * Do we want static trace points (events) coverage for the normal path
> (that is beyond what cio and the function tracer already give us)? What
> benefit do we expect if we do want these? (E.g. performance evaluation,
> better debugging especially when multiple virtualized subchannels used.)
Whatever you think is useful. Of course, we can go there step by step
(and I'd really advise to do so; getting some useful info right now and
not holding things up until we have a more complete understand what
would be useful for e.g. ffdc).
You can always have userspace enable tracing of some things by default
and leave the rest off until wanted.
> >>> Just tracing the function is useful now and will stay useful in the
> >>> future.
> >> If we agree with ideas given above, we could:
> >> 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno
> >> 2. DEFINE_EVENT:
> >> vfio_ccw_fam_io_helper
> >> vfio_ccw_cp_prefetch
> >> vfio_ccw_io_start
> >> vfio_ccw_io_clear
> >> vfio_ccw_io_halt
> >
> > Use a vfio_ccw_io_fctl tracepoint instead?
> >
>
> That would trace what? A request to perform a basic I/O function
> on the virtualized subchannel by issuing the corresponding I/O
> instruction to the underlying subchannel?
Basically, what userspace requests us to do.
>
> If that's the case, I think using the trace events from
> drivers/s390/cio/ioasm.c (tracing the instructions) may suffice for
> the 'good case'.
>
> >> 3. add trace points in coresponding places
> >>
> >>>
> >>> Another idea: Trace the fsm state transitions. Probably something for
> >>> an additional patch.
> >> Considering Pierre is refactoring the fsm, we can add trace points in
> >> that series (or as following on patch).
> >
> > Yes, while poking around I also wondered whether we should tweak the
> > fsm in places. So adding tracepoints there looks like a good idea.
> >
>
>
> How does this relate to normal cio? I don't think cio has such trace
> points capturing the state machine transitions. I wonder why? Spontaneously
> I would say sounds like something useful.
The cio fsm simply dates back to the 2.5 era.
On Tue, 10 Apr 2018 10:16:39 +0800
Dong Jia Shi <[email protected]> wrote:
> Does the following effect make sense?
>
> # tracer: nop
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> qemu-system-s39-4252 [006] .... 231.457214: vfio_ccw_cp_prefetch: schid=0.0.013f errno=0
> qemu-system-s39-4252 [006] .... 231.457222: vfio_ccw_fsm_io_helper: schid=0.0.013f errno=0
> qemu-system-s39-4252 [006] .... 231.457223: vfio_ccw_io_fctl: schid=0.0.013f fctl=4 errno=0
> ... ...
I would likely find this useful for following a code path and making
sure the right things are called.
We certainly want error conditions traced as well (although the code
has been working too well for me to trigger that easily :)
On 04/10/2018 10:55 AM, Cornelia Huck wrote:
> On Tue, 10 Apr 2018 10:16:39 +0800
> Dong Jia Shi <[email protected]> wrote:
>
>> Does the following effect make sense?
>>
>> # tracer: nop
>> #
>> # _-----=> irqs-off
>> # / _----=> need-resched
>> # | / _---=> hardirq/softirq
>> # || / _--=> preempt-depth
>> # ||| / delay
>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> # | | | |||| | |
>> qemu-system-s39-4252 [006] .... 231.457214: vfio_ccw_cp_prefetch: schid=0.0.013f errno=0
>> qemu-system-s39-4252 [006] .... 231.457222: vfio_ccw_fsm_io_helper: schid=0.0.013f errno=0
>> qemu-system-s39-4252 [006] .... 231.457223: vfio_ccw_io_fctl: schid=0.0.013f fctl=4 errno=0
>> ... ...
>
> I would likely find this useful for following a code path and making
> sure the right things are called.
>
> We certainly want error conditions traced as well (although the code
> has been working too well for me to trigger that easily :)
>
Looks interesting. The approach is to trace (all) exits from selected
functions, or?
It is an interesting approach. Function entry could probably be traced
with the function tracer (if we should ever need that, although relating
the two unambiguously may not be possible -- I don't know).
I'm still not completely in clear how do we want to do error reporting.
Using traces as means of error reporting smells like abuse to me. @Dong
Jia: could you help me get an overview? I'm thinking of something like
a matrix of type:
error | handler | action (propagate as / report / try recover / discard silently)
I'm mostly interested in what gets reported and if there is stuff that
should be reported.
Other than that I'm in favor. And having traces for tracking error
condition is clearly better than having nothing.
Regards,
Halil
ping
I think get this fixed. Better sooner than later.
On 03/27/2018 03:42 AM, Dong Jia Shi wrote:
> * Cornelia Huck <[email protected]> [2018-03-26 14:28:39 +0200]:
>
> [...]
>
>>> By the way, since you will propose a new version,
>>> you have a long description of the cp_prefetch function in the code.
>>> I think you should modify it according to the changes and describe how and
>>> why the ch_len field of each chain is used and changed by this function.
>>>
>>> Something like:
>>>
>>> "
>>> For each chain composing the channel program:
>>> On entry ch_len hold the count of CCW to be translated.
>>
>> s/hold/holds/ ?
>>
> Sure.
>
>>> On exit ch_len is adjusted to the count of successfully translated CCW.
>>>
>>> This allows cp_free to find in ch_len the count of CCW to free in a chain.
>>> "
>>>
>>> Could also be inside the commit message.
>>>
>>> Pierre
>>>
>>>
>>>>
>>>>>> Acked-by: Pierre Morel <[email protected]>
>>>>>> Reviewed-by: Dong Jia Shi <[email protected]>
>>>>>> Signed-off-by: Halil Pasic <[email protected]>
>>>>>> Signed-off-by: Dong Jia Shi <[email protected]>
>>>>>> ---
>>>>>> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++++-
>>>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> index d9a2fffd034b..2be114db02f9 100644
>>>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
>>>>>> @@ -749,11 +749,18 @@ int cp_prefetch(struct channel_program *cp)
>>>>>> for (idx = 0; idx < len; idx++) {
>>>>>> ret = ccwchain_fetch_one(chain, idx, cp);
>>>>>> if (ret)
>>>>>> - return ret;
>>>>>> + goto out_err;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> return 0;
>>>>>> +out_err:
>>>>>> + /* Only cleanup the chain elements that where actually translated. */
>>
>> s/where/were/
> Ok.
>
>>
>>>>>> + chain->ch_len = idx;
>>>>>> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
>>>>>> + chain->ch_len = 0;
>>>>>> + }
>>>>>> + return ret;
>>>>>> }
>>>>>>
>>>>>> /**
>>>>>>
>>>
>>
>
On Fri, 20 Apr 2018 12:54:26 +0200
Halil Pasic <[email protected]> wrote:
> ping
>
> I think get this fixed. Better sooner than later.
>
> On 03/27/2018 03:42 AM, Dong Jia Shi wrote:
> > * Cornelia Huck <[email protected]> [2018-03-26 14:28:39 +0200]:
> >
> > [...]
> >
> >>> By the way, since you will propose a new version,
> >>> you have a long description of the cp_prefetch function in the code.
> >>> I think you should modify it according to the changes and describe how and
> >>> why the ch_len field of each chain is used and changed by this function.
> >>>
> >>> Something like:
> >>>
> >>> "
> >>> For each chain composing the channel program:
> >>> On entry ch_len hold the count of CCW to be translated.
> >>
> >> s/hold/holds/ ?
> >>
> > Sure.
> >
> >>> On exit ch_len is adjusted to the count of successfully translated CCW.
> >>>
> >>> This allows cp_free to find in ch_len the count of CCW to free in a chain.
> >>> "
> >>>
> >>> Could also be inside the commit message.
> >>>
> >>> Pierre
> >>>
> >>>
> >>>>
> >>>>>> Acked-by: Pierre Morel <[email protected]>
> >>>>>> Reviewed-by: Dong Jia Shi <[email protected]>
> >>>>>> Signed-off-by: Halil Pasic <[email protected]>
> >>>>>> Signed-off-by: Dong Jia Shi <[email protected]>
> >>>>>> ---
> >>>>>> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++++-
> >>>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> >>>>>> index d9a2fffd034b..2be114db02f9 100644
> >>>>>> --- a/drivers/s390/cio/vfio_ccw_cp.c
> >>>>>> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> >>>>>> @@ -749,11 +749,18 @@ int cp_prefetch(struct channel_program *cp)
> >>>>>> for (idx = 0; idx < len; idx++) {
> >>>>>> ret = ccwchain_fetch_one(chain, idx, cp);
> >>>>>> if (ret)
> >>>>>> - return ret;
> >>>>>> + goto out_err;
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> return 0;
> >>>>>> +out_err:
> >>>>>> + /* Only cleanup the chain elements that where actually translated. */
> >>
> >> s/where/were/
> > Ok.
> >
> >>
> >>>>>> + chain->ch_len = idx;
> >>>>>> + list_for_each_entry_continue(chain, &cp->ccwchain_list, next) {
> >>>>>> + chain->ch_len = 0;
> >>>>>> + }
> >>>>>> + return ret;
> >>>>>> }
> >>>>>>
> >>>>>> /**
> >>>>>>
> >>>
> >>
> >
>
Hm, it seems that drowned in other patches... can I get a re-send,
please?
On 04/20/2018 01:36 PM, Cornelia Huck wrote:
> On Fri, 20 Apr 2018 12:54:26 +0200
> Halil Pasic <[email protected]> wrote:
>
>> ping
>>
>> I think get this fixed. Better sooner than later.
[..]
>
> Hm, it seems that drowned in other patches... can I get a re-send,
> please?
I guess Dong Jia will handle it. If not I will on Monday.
Regards,
Halil