Received: by 10.192.165.148 with SMTP id m20csp3888774imm; Mon, 30 Apr 2018 08:04:27 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpLjbH3nIWi1hOmMbVPrwwrUQByno46zjEauZddoyP+EM2sUbskHOfPer5pmX5jXdnRHVJp X-Received: by 2002:a17:902:6b86:: with SMTP id p6-v6mr12943662plk.32.1525100667730; Mon, 30 Apr 2018 08:04:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525100667; cv=none; d=google.com; s=arc-20160816; b=nCivXC3tnyNWPxATqwGkUIUU/++eIeO5cF3/NqO3+cC8zRvV1aDGlYfF4OMl/WciPL swWetbN5BmnvjaAVFbu/VKjgJoZRxy+pAuOR6OjR40SUD7bs52XMcY7JCkhTyI44utE9 K9BOxMZmVA4kFe5jHOau+pluuTmLQKf2/qg9xNMPHFxrRCIb+XCtv23NORHxSSRb6IM7 GsBBuu+RnnoQ19chNkroTXqC7txQxmfool9/mcD9fspYNrVvi4mFDWNITH1AHLoeYCPP KIRVp7Fc03LWBy+3Eo9Z3Q6jTV/933HKw4QRL0DJr1RJqzmkRGmwjvYnGzm47T8LhWBC J2lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:arc-authentication-results; bh=EBwFoOFRUIQsssqyUGrLv4MGq4J2fCmiz/cZ/fWDs7Y=; b=erQt7mF8xNmtAgjdAMWFSCkNRtv8tIAVfGcW7iNQlm9wJpmqY03zy2hioMBIlIeSAc qWdFF91sLpYMXeoQO12WtZgckDy+6sxmeWVNabOgewQiYvJyqlFm2pveoNBnjnsxwHbu BZB5clFnRHdPTXCW6P01RavmtI1ve2vekvony4xMRPX3nNtyPrDVdL9QfW1xzXQd73r2 CgSoGonvYvXcdogOecWKmKlKzhKMR24XXOW9Z5wo7z3GoWMq74r76LT2Lpnqa5GUswnC 7WkuzCwNDUgiMXW5B9a1k9zihRw8k+ee7T/IeMPxuAU+A/R8QKLK9GT0SAFSTcTXTlWd QdCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a68-v6si7634053pli.158.2018.04.30.08.04.13; Mon, 30 Apr 2018 08:04:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754404AbeD3PEE (ORCPT + 99 others); Mon, 30 Apr 2018 11:04:04 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:39532 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753691AbeD3PEC (ORCPT ); Mon, 30 Apr 2018 11:04:02 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C62F8401DEA8; Mon, 30 Apr 2018 15:04:01 +0000 (UTC) Received: from gondolin (dhcp-192-222.str.redhat.com [10.33.192.222]) by smtp.corp.redhat.com (Postfix) with ESMTP id 707E62023288; Mon, 30 Apr 2018 15:04:00 +0000 (UTC) Date: Mon, 30 Apr 2018 17:03:58 +0200 From: Cornelia Huck To: Halil Pasic Cc: Dong Jia Shi , Halil Pasic , linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, kvm@vger.kernel.org, borntraeger@de.ibm.com, bjsdjshi@linux.ibm.com, pmorel@linux.ibm.com Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths Message-ID: <20180430170358.0ee6fe6a.cohuck@redhat.com> In-Reply-To: References: <20180423110113.59385-1-bjsdjshi@linux.vnet.ibm.com> <20180423110113.59385-6-bjsdjshi@linux.vnet.ibm.com> <20180427121353.4453bdc2.cohuck@redhat.com> <20180428055023.GS5428@bjsdjshi@linux.vnet.ibm.com> <20180430135153.1d108675.cohuck@redhat.com> Organization: Red Hat GmbH MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 30 Apr 2018 15:04:01 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 30 Apr 2018 15:04:01 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'cohuck@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 30 Apr 2018 16:14:21 +0200 Halil Pasic wrote: > On 04/30/2018 01:51 PM, Cornelia Huck wrote: > > On Sat, 28 Apr 2018 13:50:23 +0800 > > Dong Jia Shi wrote: > > > >> * Cornelia Huck [2018-04-27 12:13:53 +0200]: > >> > >>> On Mon, 23 Apr 2018 13:01:13 +0200 > >>> Dong Jia Shi wrote: > >>> > >>> typo in subject: s/traceponits/tracepoints/ > >>> > >>>> From: Halil Pasic > >>>> > >>>> Add some tracepoints so we can inspect what is not working as is should. > >>>> > >>>> Signed-off-by: Halil Pasic > >>>> Signed-off-by: Dong Jia Shi > >>>> --- > >>>> drivers/s390/cio/Makefile | 1 + > >>>> drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++- > >>>> drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++ > >>>> 3 files changed, 93 insertions(+), 1 deletion(-) > >>>> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h > >>> > >>> > >>>> @@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private, > >>>> goto err_out; > >>>> > >>>> io_region->ret_code = cp_prefetch(&private->cp); > >>>> + trace_vfio_ccw_cp_prefetch(get_schid(private), > >>>> + io_region->ret_code); > >>>> if (io_region->ret_code) { > >>>> cp_free(&private->cp); > >>>> goto err_out; > >>>> @@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private, > >>>> > >>>> /* Start channel program and wait for I/O interrupt. */ > >>>> io_region->ret_code = fsm_io_helper(private); > >>>> + trace_vfio_ccw_fsm_io_helper(get_schid(private), > >>>> + io_region->ret_code); > >>>> if (io_region->ret_code) { > >>>> cp_free(&private->cp); > >>>> goto err_out; > >>>> } > >>>> - return; > >>>> + goto out; > >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) { > >>>> /* XXX: Handle halt. */ > >>>> io_region->ret_code = -EOPNOTSUPP; > >>>> @@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private, > >>>> > >>>> err_out: > >>>> private->state = VFIO_CCW_STATE_IDLE; > >>>> +out: > >>>> + trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private), > >>>> + io_region->ret_code); > >>>> } > >>>> > >>>> /* > >>> > >>> I really don't want to bikeshed, especially as some tracepoints are > >>> better than no tracepoints, but... > >>> > >>> We now trace fctl/schid/ret_code unconditionally (good). > >>> > >>> We trace the outcome of cp_prefetch() and fsm_io_helper() > >>> unconditionally. We don't, however, trace all things that may go wrong. > >>> We have the tracepoint at the end, but it cannot tell us where the > >>> error came from. Should we have tracepoints in every place (in this > >>> function) that may generate an error? Only if there is an actual error? > >>> Are the two enough for common debug scenarios? > >> Trace actual error sounds like a better idea than trace unconditionally > >> of these two functions. > >> These two are not enough for common debug scenarios. For example, we > >> cann't tell if a -EOPNOTSUPP is a orb->tm.b problem, or error code > >> returned by cp_init(). > >> > >> Idea to improve: > >> 1. Trace actual error. > >> 2. Define a trace event and add error trace for cp_init(). > > > > Hm. Going from what I have done in the past when doing printk debugging: > > > > - stick in a message that is always hit, with some information about > > parameters, if it makes sense > > - stick in a message "foo happened!" in the error branches > > - or, alternatively, trace the called functions > > > > So tracing on failure only might be more useful? Have all failure paths > > under a common knob to turn on/off? > > > >>> Opinions? We can just go ahead with this and improve things later > >>> on, I guess. > >>> > >> I think it's also fine to do this - better something than nothing. We > >> could at least have a code base to be improved to make everybody > >> happier in future. > > > > Maybe keep the patch as it is now, except trace the errors only > > (keeping the fctl trace point)? > > What do you mean by this sentence. Get rid of vfio_ccw_io_fctl or get > rid of vfio_ccw_cp_prefetch and vfio_ccw_fsm_io_helper, or get don't > get rid of any, but make some conditional (!errno)? The third option. > > > > > Halil, as you wrote the patch (and I presume you found it helpful): > > What is your opinion? > > > > I'm in favor of this patch (as previously stated here > https://patchwork.kernel.org/patch/10298305/). And regarding the > questions under discussion I'm mostly fine either way. OK. > > I think the naming of this fctl thing is a bit cryptic, > but if we don't see this as ABI I'm fine with it -- can be improved. > What would be a better name? I was thinking along the lines accept_request. > (Bad error code would mean that the request did not get accepted. Good > code does not mean the requested function was performed successfully.) I think fctl is fine (if you don't understand what 'fctl' is, you're unlikely to understand it even if it were named differently.) > > Also I think vfio_ccw_io_fctl with no zero error code would make sense > as dev_warn. If I were an admin looking into a problem I would very much > appreciate seeing something in the messages log (and not having to enable > tracing first). This point seems to be a good one for high level 'request gone > bad' kind of report. Opinions? I'd also exclude -EOPNOTSUPP (as this also might happen with e.g. a halt/clear enabled user space, which probes availability of halt/clear support by giving it a try once (yes, I really want to post my patches this week.))