Received: by 10.213.65.68 with SMTP id h4csp771826imn; Tue, 27 Mar 2018 08:29:19 -0700 (PDT) X-Google-Smtp-Source: AG47ELvoYxXQLnMZea9hNEsuJ0oNmD/+zbC/ywNwjuXbfg5t/Yg18hFdBjFyk+8EYNkL+pdZKQrD X-Received: by 10.99.95.144 with SMTP id t138mr30935471pgb.94.1522164559751; Tue, 27 Mar 2018 08:29:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522164559; cv=none; d=google.com; s=arc-20160816; b=a8aL50MW49HdWBU88N8ExymHL/WhryPtB1SUICWoz3JE8fu4ER+e/+HBVwcF6QtHtg PagiM7pVOKDDyH+cSbZfHl/nULNRK9XDu+CjCrs5RJO4fpeb0LGIcfhUe6cGc7qmMrDP 8cC9VKjGujJueZQ9YyAYwMoBcx7pltUWW72rLo2biQDasmhuZrrNnq5MdF+ire3AooKn QUR/UGvZfJYf+BFFGwSONEVbQV7F1TzVls/6dqD/AKBTQarvOz5JWifIQUPZQWKFi/H0 JjU6+2qh0NPb0BWNxGB5ieoiMt3ilkB2YQ4noauzHijulwxmvX2fJca6OuXT31VSDfOl JCzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:openpgp :from:references:cc:to:subject:arc-authentication-results; bh=4eB7vO5HxQmxloaqh7eyj9/FYN10gM3iXgiVqkwnn/0=; b=zHuvXaDNnKkpzvzlP3XuhaaFI7JgZbk3E2WEpdcQ4ipQn5Kl/0aC5NCgA30tTQ4vZj zqhF0slS9FACGdh/ovBmvw/Egn0u1XmlIDocpZhb+yNAc0cI+Y9TtA6y03nP7qO5La7l cX6MhUHnYDyj/N6foIdSLDBtOTvxrb5BQ4m5I5sIngqG5gfdtZzMyDcLHdATIDXdms7/ GgJLXRD68e3q03fQVBqCIYzd0x08hUqm8dxmRKofTU7od+h2FyyjYrK69TqjlfM150R0 GnSBIdZd5k5nP1/KUzRtDLL7Jby0nzXCsxk/562GZfQ3khEs0jcEp5/aI9a1HYxyXLoG Uz6g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g3-v6si1420823plp.662.2018.03.27.08.29.02; Tue, 27 Mar 2018 08:29:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752249AbeC0P2D (ORCPT + 99 others); Tue, 27 Mar 2018 11:28:03 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54712 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751765AbeC0P2B (ORCPT ); Tue, 27 Mar 2018 11:28:01 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2RFPxop001793 for ; Tue, 27 Mar 2018 11:28:01 -0400 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0a-001b2d01.pphosted.com with ESMTP id 2gypef888f-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Tue, 27 Mar 2018 11:28:00 -0400 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 27 Mar 2018 16:27:58 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 27 Mar 2018 16:27:55 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w2RFRtQW21299334; Tue, 27 Mar 2018 15:27:55 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AE0E3A4059; Tue, 27 Mar 2018 16:20:31 +0100 (BST) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 78379A404D; Tue, 27 Mar 2018 16:20:31 +0100 (BST) Received: from oc3836556865.ibm.com (unknown [9.152.224.160]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 27 Mar 2018 16:20:31 +0100 (BST) Subject: Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths To: Cornelia Huck , Dong Jia Shi Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, kvm@vger.kernel.org, borntraeger@de.ibm.com, pmorel@linux.vnet.ibm.com References: <20180321020822.86255-1-bjsdjshi@linux.vnet.ibm.com> <20180321020822.86255-5-bjsdjshi@linux.vnet.ibm.com> <20180326155902.12bed785.cohuck@redhat.com> <20180327075114.GK12194@bjsdjshi@linux.vnet.ibm.com> <20180327120723.192f7577.cohuck@redhat.com> From: Halil Pasic Openpgp: preference=signencrypt Date: Tue, 27 Mar 2018 17:27:54 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180327120723.192f7577.cohuck@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18032715-0020-0000-0000-0000040A6341 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18032715-0021-0000-0000-0000429E66CC Message-Id: <493b19b7-fa86-0220-7427-be519f1b40ad@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-03-27_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1803270154 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/27/2018 12:07 PM, Cornelia Huck wrote: > On Tue, 27 Mar 2018 15:51:14 +0800 > Dong Jia Shi wrote: > >> * Cornelia Huck [2018-03-26 15:59:02 +0200]: >> >> [...] >> >>>> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, >>>> >>>> io_region->ret_code = cp_prefetch(&private->cp); >>>> if (io_region->ret_code) { >>>> + trace_vfio_ccw_cp_prefetch_failed(get_schid(private), >>>> + io_region->ret_code); >>>> cp_free(&private->cp); >>>> goto err_out; >>>> } >>>> @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, >>>> /* Start channel program and wait for I/O interrupt. */ >>>> io_region->ret_code = fsm_io_helper(private); >>>> if (io_region->ret_code) { >>>> + trace_vfio_ccw_ssch_failed(get_schid(private), >>>> + io_region->ret_code); >>>> cp_free(&private->cp); >>>> goto err_out; >>>> } >>>> @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private, >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { >>>> /* XXX: Handle halt. */ >>>> io_region->ret_code = -EOPNOTSUPP; >>>> + trace_vfio_ccw_halt(get_schid(private)); >>>> goto err_out; >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) { >>>> /* XXX: Handle clear. */ >>>> io_region->ret_code = -EOPNOTSUPP; >>>> + trace_vfio_ccw_clear(get_schid(private)); >>>> goto err_out; >>> >>> Hmmm.... perhaps better to just trace the function (start/halt/clear) >>> in any case? >>> >> I agree trace the function in any case is good. @Halil, opinion? See below. I don't really understand the question. Trace the function means, trace when it was requested on a subch, or trace the outcome of the request? Seems the question got amended though. >> >> But the traces for cp_prefetch() and fsm_io_helper() should also be >> kept, since they are helpful to debug problem. So I tend to trace the >> following in any case: >> - cp_prefetch() >> - fsm_io_helper() >> - start >> - halt >> - clear > > OK, I was unclear :) I'd argue to keep the others, just replace the > halt/clear tracing with tracing the function. I'm a bit confused. My idea was the following: Prior to this patch we had a kind of OK possibility to trace what we consider the expected and good scenario using the function tracer and the normal cio stuff. But what I wanted is to verify that my fix works (problem occurs but is handled more appropriately) and I've found it difficult to trace this. So the idea was to introduce trace points which tell us what went wrong. The idea is to benefit diagnostic of unrecoverable failures and get an idea how often are we doing extra work recovering recoverable failures. In this sense halt and clear is something that does not work currently. When we get proper halt and clear, these trace points were supposed to become obsolete and get removed. I guess the implementation will eventually issue csch() and hsch() for the underlying subchannel and and we should be able to trace those (see drivers/s390/cio/ioasm.c) Now this is the tricky part. I've read some used to see trace points as part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU this is not a concern any more -- but my confidence on this is pretty low. So IMHO we have two questions to answer: * Do we want static trace points (events) for undesirable or concerning stuff happened (e.g. translation failed, a function we hope we can live without was supposed to be executed)? * Do we want static trace points (events) coverage for the normal path (that is beyond what cio and the function tracer already give us)? What benefit do we expect if we do want these? (E.g. performance evaluation, better debugging especially when multiple virtualized subchannels used.) > >> >>>> } >>>> [..] >>>> +DECLARE_EVENT_CLASS(vfio_ccw_notsupp, >>>> + TP_PROTO(struct subchannel_id schid), >>>> + TP_ARGS(schid), >>>> + >>>> + TP_STRUCT__entry( >>>> + __field_struct(struct subchannel_id, schid) >>>> + ), >>>> + >>>> + TP_fast_assign( >>>> + __entry->schid = schid; >>>> + ), >>>> + >>>> + TP_printk("(schid 0.%x.%04X) request not supported", >>>> + __entry->schid.ssid, __entry->schid.sch_no) >>>> +); >>> >>> Especially as I don't plan to leave this unsupported for too long :) Sounds great! I don't insist. Especially not if our linux guest tells us what went wrong. @Dong Jia: What would happen should the guest for some reason try to do a clear or a halt (e.g. we make it fail here, guest retries a couple of times and then panicks or gives up on the device)? >>> >>> Just tracing the function is useful now and will stay useful in the >>> future. >> If we agree with ideas given above, we could: >> 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno >> 2. DEFINE_EVENT: >> vfio_ccw_fam_io_helper >> vfio_ccw_cp_prefetch >> vfio_ccw_io_start >> vfio_ccw_io_clear >> vfio_ccw_io_halt > > Use a vfio_ccw_io_fctl tracepoint instead? > That would trace what? A request to perform a basic I/O function on the virtualized subchannel by issuing the corresponding I/O instruction to the underlying subchannel? If that's the case, I think using the trace events from drivers/s390/cio/ioasm.c (tracing the instructions) may suffice for the 'good case'. >> 3. add trace points in coresponding places >> >>> >>> Another idea: Trace the fsm state transitions. Probably something for >>> an additional patch. >> Considering Pierre is refactoring the fsm, we can add trace points in >> that series (or as following on patch). > > Yes, while poking around I also wondered whether we should tweak the > fsm in places. So adding tracepoints there looks like a good idea. > How does this relate to normal cio? I don't think cio has such trace points capturing the state machine transitions. I wonder why? Spontaneously I would say sounds like something useful. I'm still pondering what are we trying to achieve. Regards, Halil