Received: by 10.223.176.5 with SMTP id f5csp3863603wra; Mon, 29 Jan 2018 21:28:10 -0800 (PST) X-Google-Smtp-Source: AH8x2256Jh3fdPhOOIQWfeCEj9/ObLCBy/ONAjQf7kTbTeSGg4cF3koBgcu0vr3mQX4+UnUa2p53 X-Received: by 2002:a17:902:f81:: with SMTP id 1-v6mr299818plz.383.1517290090614; Mon, 29 Jan 2018 21:28:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517290090; cv=none; d=google.com; s=arc-20160816; b=wXmTth98az1QsntgNGOwUqLLQzJSJzznIigU0usiSb9v2YEZbbfxqRl6etrteqBY3b WV7XQd58VFY+vu4WHg5LLprpQ0Njva4ysylusgPQIjbzIUhWTy1jBaqpeKSe7D2p7wAi TE+vyEoakMbv3+sfgY4NxFC8Ba325MFkPkg+1HEZV5dU7Sb9e9olxbYzQ1R3Lrvke02O tPHLznmMMdx+r7xdoLEGzqwbKmASoFWUoNI/nt3R4L+ipxO4A/EAcnaP6Dj8jD4b1482 IgzZcf9qBTV/NHeXPiOiObWDj/wAE8gmWMJJ4MFpqP1gZHVQV6ZZhb00htr06X23GlxA tkEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:mime-version:date:references :in-reply-to:subject:cc:to:from:arc-authentication-results; bh=8Wy/4aQJyMXTwzscCIy0EaJPQHZDhM4feGkOTMTMb4U=; b=PLCn2kAfBSNHTUrGJ0N8afl4GeIzmWUBupnpYrsfyw2bSyY1SzXsz6nnHT0uUeWRRN vVlTbJV5li6jvvAFJx/JNpmlG01rCoaIAlrjlQFYEE4C/Ee1mFS2dtoGSYo71eLqJRim pqb6fWS7kV6VNvscl06ibia0Xtrw4KorBBobor+aEx6ZC6MZvtjaaSYke4tvVPe0KVA6 /hBTEUyNDWfsV50Hdo58nSouPoO728srEAhzU+QhL8W4wdL5qEIosDp5aNP2uX+oW8FK 0k/pwxuiNWwxIceCtFBpsLorxHYR+l43Q41u/54s8GJ87E8VHC2eA4DJlU+uvcwBABIo kNRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y2-v6si6405931pln.156.2018.01.29.21.27.56; Mon, 29 Jan 2018 21:28:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752786AbeA3F1T (ORCPT + 99 others); Tue, 30 Jan 2018 00:27:19 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54262 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752251AbeA3F1R (ORCPT ); Tue, 30 Jan 2018 00:27:17 -0500 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0U5PRPD065047 for ; Tue, 30 Jan 2018 00:27:17 -0500 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ftdskhg5s-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 30 Jan 2018 00:27:17 -0500 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 Jan 2018 00:27:15 -0500 Received: from b01cxnp23033.gho.pok.ibm.com (9.57.198.28) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 30 Jan 2018 00:27:12 -0500 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0U5RBO660293138; Tue, 30 Jan 2018 05:27:11 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 452B012403D; Tue, 30 Jan 2018 00:24:00 -0500 (EST) Received: from bjsdjshi.ibm.com (unknown [9.115.112.58]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS id DC176124035; Tue, 30 Jan 2018 00:23:57 -0500 (EST) From: Dong Jia Shi To: Halil Pasic Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-s390x@nongnu.org, cohuck@redhat.com, borntraeger@de.ibm.com, pmorel@linux.vnet.ibm.com Subject: Re: [Qemu-devel] [RFC PATCH 0/3] vfio: ccw: basic channel path event handling In-Reply-To: <879459e4-e6b1-2030-e3d4-0b991b532f51@linux.vnet.ibm.com> References: <20180111030421.31418-1-bjsdjshi@linux.vnet.ibm.com> <20180115085945.GC12499@bjsdjshi@linux.vnet.ibm.com> <20180123062356.GC32354@bjsdjshi@linux.vnet.ibm.com> <879459e4-e6b1-2030-e3d4-0b991b532f51@linux.vnet.ibm.com> Date: Tue, 30 Jan 2018 13:27:06 +0800 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 x-cbid: 18013005-0044-0000-0000-000003D7EED5 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008451; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000248; SDB=6.00982371; UDB=6.00498122; IPR=6.00761653; BA=6.00005800; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019281; XFM=3.00000015; UTC=2018-01-30 05:27:14 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18013005-0045-0000-0000-00000807584C Message-Id: <878tcfex1x.fsf@bjsdjshi.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-30_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801300070 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Halil Pasic writes: --text follows this line-- Hi Halil, --text follows this line-- AS you may noticed, Conny replied to this thread on my mail. Some of her comments there could answer your questions. If that applies, I will just say "See Conny's mail" in the following, and you can reply to that mail, so we can consolidate further discussion. >>> * When a channel path malfunctions a CRW is generated and a machine >>> check channel report pending is generated. Same goes for >>> channel paths popping up (on HW level). Should not these get >>> propagated? >> AFAIR, channel path related CRW is not that reliable. I mean it seems >> that it's not mandatory to generate CRWs for channel path malfunctions. >> AFAIU, it depneds on machine models. For example, we won't get >> CRW_ERC_INIT for a "path has come" event on many models I believe. And >> that's why nobody noticed the misuse of CRW_ERC_IPARM and CRW_ERC_INIT >> in the CRW handling logic for channel path CRWs. >> Ref: >> 2daace78a8c9 s390: chp: handle CRW_ERC_INIT for channel-path status change >> > > Honestly, I forgot about this discussion. I've refreshed my memories by > now, but I could not find why is it supposed to be architecturally OK > to loose CRWs when for instance a chp goes away. > >> So my understanding for this is: it really a design decision for us to >> propagate all of the channel path CRWs, or we just use other ways (like >> using PNO and PNOM as this series shows). > > From what I read, CRWs did not seem optional. > I wonder what should I read in order to change my mind. I'm not > talking about the hardware in the wild -- but that could be buggy > hardware. > Ah, can you point out the chapter that says CRWs is mandatory? AFAIU, PoP doesn't say, for example, chp gone will lead to a CRW, but it only says when we got a chp gone CRW it means a chp has been gone. [See Conny's mail pls, and we can discuss there.] >> >> Of course, CRW propagation is good to have. But as a discussion result >> of a former thread, that is something we can consider only after we have >> a good channel path re-modelling. That is the problem. We can try to >> trigger CRW event in some work of machine checks handling enhancement >> later. >> >>> * Kind of instead of the CRW you introduce a per device interrupt >>> for channel path events on the qemu/kvm boundary. AFAIU for a chp >>> event you do an STSCH for each (affected?) subchannel >> Until here, yes and right. >> >>> and generate an irq. Right? Why is this a good idea. >> This is not right. This series does not generate an irq for the guest. > > You misunderstood this. The word 'irq' this sentence is a short version > of 'per device interrupt for channel path events on the qemu/kvm boundary' > form the previous sentence. It's not an irq injected into the guest but > a notification (which you call irq in '[RFC PATCH 2/3] vfio: ccw: > introduce channel path irq') for QEMU. > I see now. I misunderstood. >> In QEMU, when it gets the notification of a channel path status change >> event, it read the newest SCHIB from the schib region, and update the >> SCHIB (patch related parts) for the virtual subchannel. The guest will >> get the new SCHIB whenever it calls STSCH then. > > We are in agreement. I just wanted to say, if let's say 42 vfio-ccw devices > are using the same chp and it goes away, you will generate 42 notifications. See reply below #LabelA. > >> >> I think this is a good idea, because: >> 1. This is complies the Linux implementation. Path status change could >> be noticed by Linux. >> 2. This provides enhancement with a small work. On the contrary, channel >> path re-modelling needs a lot of work. > > I thing your answer is based on the misunderstanding explained above. I see now. > > My idea was to be lazy in a different way. Instead of being lazy about > reading the subsection, I was wondering why not do an STSCH in the host > as a response to one being done in the guest. > > That means: if there is no activity on the passed trough devices, nothing > needs to be done. (Except maybe the CRWs). > #LabelA: I see your point. This is an interesting idea. Pros: - code simplification - no chp event notification and handling Cons: - have to read schib region (or issue STSCH on host) for each STSCH request from a guest I think code siplification is very acctractive, and performance is not the most important issue. Any comments? @Conny > The things aren't observable by the guest if it does not do STSCH anyway. Nod. This is the idea that this series is based on. > > >> >>> * SCHIB.PMCW provides path information relevant for a given device. >>> This information is retrieved using store subchannel. Is your series >>> sufficient for making store subchannel work properly (correct and >>> reasonably up to date)? >> Introducing a SCHIB region is the basis of further STSCH hanlding work. >> This series depends on it, and only focuses on the update of the channel >> path related parts of it. And for these parts, this work I think is >> basically correct (more review comments are surely to be welcomed). >> > > Please elaborate on that basically. Or are those actually just correct > in your opinion?! > As listed in a former mail, the following values will be updated for the guest: PMCW: - PIM/PAM/POM - PNOM - CHPIDs SCSW - PNOM bit What else do we need? >> For the rest parts, this does not change anything than what have. As >> replied to Conny in other mail of this thread: I think, if the other >> fields are handled by QEMU well, then we don't need to trigger update >> events for them. Currently I don't find things that need extra trigger. >> > > Fair. > >>> Regarding PMCW it's supposed to get updated when io functions are >>> preformed by the css, AFAIR. Is that right? >> I think so. And also for some other cases, for example, when we >> configure on/off a channel path. >> > > Sorry, I ended up confusing opm with pom. That would have been illegal > (as Connie has pointed out recently) because it can only change > only if the path is tried for IO. What is opm? Sorry, I lost in translation... See Conny's mail, and let's discuss there? > > >>> If yes what are the implications? Are the manipulations you do on >>> some PMCW fields architecturally correct? >> If "architecturally correct" means what guest sees/senses is all obey to >> the PoP, then I think so. We don't have to emulate the internal >> behaviors (which could not be seen by OS) of CSS exactly when we >> emulate, if the emulation does not impact on what Linux guest will see, >> right? >> >> I mean, the time point we update the PMCW does not has to be in the time >> slot of io function performance. The guest would still have to get the >> updated information through the calls of STSCH. We only need to provide >> the updaetd result to the guest by handling STSCH well. And that's why >> we say we do that lazily. >> >> Nothing different (added value) will be seen by the guest, comparing >> with the current lazy implementation I think.> >>> * The one thing I would very much appreciate as an user of vfio, >>> and should in my understanding be the user story of this series >>> (and the qemu counterpart of course) is the following. If my device >>> (that is IO operation) failed because no path could be found on >>> which the device is accessible, I would like to be able to identify >>> that. Preferably the same way I would do for an LPAR guest. Is this >>> series accomplishing that? >> With this the guest could lazily identify that. But since we have no >> machine check propagation yet, for those cases which generate CRWs, it >> might not be the same way for an LPAR guest I think. >> > > This was a yes/no question. I can't interpret your answer neither as > yes or as no. Maybe a little clarification: I'm talking about a > recent linux guest. See Conny's mail. > >>> * Why not just do proper STSCH for vfio-ccw? >> I only did the interested parts (path related). For the other parts, the >> current handling is enough, no? Anyway, after we have the schib region, >> we can do further STSCH handling any time later we want then. >> > > I mean, your approach works on the premise, that the host will notify > QEMU each time the content of the SCHIB (area) changes (modulo stuff) so > it can do an update in QEMU, and give the guest an up to date virtualized > SCHIB when it asks for it executing the STSCH instruction. > > I was asking myself. Why not instead while interpreting STSCH do a > STSCH on the host device (that is passed through), and virtualize the > result as necessary. > > It occurred to me if nothing simpler. I think I got your point now, and replied in the above #LabelA section. > >>> * Shouldn't we virtualize CHPIDs? >> Nobody would say no for this. :) The problem is that this is something >> big, and it's not a short-term goal in my current working project... I >> really want to deliver a minimal function set in the near future >> firstly. If the current working does not hurt, can we defer channel path >> re-modelling and CHPIDs virtualization? > > The problem is, I'm not sure it does not hurt. For instance because of > the question below. I would also prefer having a fair idea of what > we need for the envisioned (complete) solution before introducing > kernel interfaces (to avoid: pity in the end we need something > different, and resulting cluttered interface). > >> >>> What if we have a clash? Lets say my dasd is only accessible via chp >>> 0.00 in the host. Currently we have a problem there, or (as the only >>> path would end up being ignored)? >> You mean the clash that sharing path 0.00 between a physical dasd and >> the virtio ccw devices? The problem I saw is in the checking of the >> chpid.is_virtual in css_do_rchp(). For a virtual chp, we should generate >> INIT CRWs; while for a non-virtual one, we shouldn't. If the chp is >> shared with virtual and non-virtual device, I don't know what to do >> then... >> >> I don't think we can fix this before we re-modelled the channel path. >> But this is neither introduced nor aggravated by this series, right? > > I'm not sure about the later. What prevents the guest from believing > the (to use your terminology shared) chp 0.00 has become unavailable > if 0.00 becomes unavailable in the host? > > Would that adversely affect the virtual devices? It would at > least look strange. AFAI read the code, this series does not hurt the vritual devices, which always have fixed path information in SCHIB. The strange thing is, the chp 0.00 is both virtual and non-virtual with mix use of virtual and non-virtual devices. And any existing problem is not caused or enhanced by this work. If we can not fix the problem without MCSS support or chp re-modelling in the near future, we should document the problem and the effect. I will do some test on this. > >> >> We'd have to document this either I think. >> >>> Note: this is another unpleasant effect of not having MCSSE in the >>> guest. The original design was to have a separate css for vfio, and >>> then we would not have this kind of a problem. (Virtualization of >>> chps would still be good for migration.) >> Nod. BTW, I don't say that I do not want channel path virtualization in >> the long run. It's just I want a minimal function set more currently >> from what I'm focusing perspective. >> >> THANKS for these insightful questions! >> -- Dong Jia Shi