Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp594293pxb; Wed, 27 Jan 2021 16:09:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJw2gKeTvH6pWArIduzuS/wqp9fn2o10AYmsx/QSR2QCCxDD9O/s1vU7BlG5dl1FyGH131q9 X-Received: by 2002:a17:906:29d4:: with SMTP id y20mr8547901eje.294.1611792584027; Wed, 27 Jan 2021 16:09:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611792584; cv=none; d=google.com; s=arc-20160816; b=hWthCsoYp8/wL1c9S5cTaf5wira6z+AvnoD/KeZWEyfJo2DNlZ1sstFANO7RRC8JAu dmq+ZU5wX/vdQQ3pu949DBnrLCzmYLV2AjAas4bGK/TFKqhE1VXQmnRHrVs3Ot3IN6du ki6pCS9jTJQlfwm2SNmkqukQl7YO1cQcAWwAbCoIlBPjIbTcCnCRJ29BA8ILhBkWLLRC 3djoGxM7/2m3A7MiInFPj5FPV765FQEp3t0rma5mdiEeSNktQ4zasiQoCIUrNdRUuSwI FewjoZZEnpNiZzubD6FdEj23NL9b4Yc8Zdjym9EirDC15gezY+KzAMDJ45FM3KoG3ngw aBUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=pPln2heOg/mgLEveK1ZElt3+62JEdhTgH1Y6Lbqcv9w=; b=xewXBZoJH+kZC43v7JVskVO2zrt0fy/0i0gB/slkvo+T+43WlAJWTJV2byh3J0MUcf YYaUAPqXOTysC1x3LoIjE6d/gQ3UY0mFjBHlvvibYWzk4YTW4R0e8smMDt7DHKlGt4vO 8WsmFGcqSmlLKikisCvpurjMuT4EsTqz1xVRvvBL0DY1TxfhjEmB3ccjjxYJ1zFpyFuq X5l3UGGK7Do4vLS+pmDa2s3F4gtVNLziFboS0rJot8d5DNgiNgpgyrg5aV6Ti7mdtIkF wNA6SGNhu0IPhEAaKrqyfu62CLfXshbXqSKWV7Xnj4guIhMAlu1m7QXXdTMUwMrrrsWs zNwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Z8kABSAu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v15si1482882eja.359.2021.01.27.16.09.19; Wed, 27 Jan 2021 16:09:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Z8kABSAu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344295AbhA0RuA (ORCPT + 99 others); Wed, 27 Jan 2021 12:50:00 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:40993 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344174AbhA0Rrc (ORCPT ); Wed, 27 Jan 2021 12:47:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611769564; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pPln2heOg/mgLEveK1ZElt3+62JEdhTgH1Y6Lbqcv9w=; b=Z8kABSAuanE+1F+P5EtjRdGJ1/RdzaBEtROy6m+mArAEqGu/Eh+wkQI92YE35RjEmrq/s/ Q8PSBNSNfvpy74a+JW3qg2JTJRIIvfL53IC2NtQCYk4OhSoxpajToiDn+UgvbP17ueQkCY ZhcllUN/Q73NeYK0r28XFSIsAh5ohSE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-464-Ww9agJtvPtWU9PFKsXxHCg-1; Wed, 27 Jan 2021 12:46:00 -0500 X-MC-Unique: Ww9agJtvPtWU9PFKsXxHCg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 876BF107AD4C; Wed, 27 Jan 2021 17:45:58 +0000 (UTC) Received: from gondolin (ovpn-112-95.ams2.redhat.com [10.36.112.95]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A5B972163; Wed, 27 Jan 2021 17:45:52 +0000 (UTC) Date: Wed, 27 Jan 2021 18:45:50 +0100 From: Cornelia Huck To: Alex Williamson Cc: Matthew Rosato , schnelle@linux.ibm.com, pmorel@linux.ibm.com, borntraeger@de.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, gerald.schaefer@linux.ibm.com, linux-s390@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/4] vfio-pci/zdev: Introduce the zPCI I/O vfio region Message-ID: <20210127184550.01c3fcdd.cohuck@redhat.com> In-Reply-To: <20210127085305.153e01e4@omen.home.shazbot.org> References: <1611086550-32765-1-git-send-email-mjrosato@linux.ibm.com> <1611086550-32765-5-git-send-email-mjrosato@linux.ibm.com> <20210122164843.269f806c@omen.home.shazbot.org> <9c363ff5-b76c-d697-98e2-cf091a404d15@linux.ibm.com> <20210126161817.683485e0@omen.home.shazbot.org> <20210127085305.153e01e4@omen.home.shazbot.org> Organization: Red Hat GmbH MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 27 Jan 2021 08:53:05 -0700 Alex Williamson wrote: > On Wed, 27 Jan 2021 09:23:04 -0500 > Matthew Rosato wrote: > > > On 1/26/21 6:18 PM, Alex Williamson wrote: > > > On Mon, 25 Jan 2021 09:40:38 -0500 > > > Matthew Rosato wrote: > > > > > >> On 1/22/21 6:48 PM, Alex Williamson wrote: > > >>> On Tue, 19 Jan 2021 15:02:30 -0500 > > >>> Matthew Rosato wrote: > > >>> > > >>>> Some s390 PCI devices (e.g. ISM) perform I/O operations that have very > > >>>> specific requirements in terms of alignment as well as the patterns in > > >>>> which the data is read/written. Allowing these to proceed through the > > >>>> typical vfio_pci_bar_rw path will cause them to be broken in up in such a > > >>>> way that these requirements can't be guaranteed. In addition, ISM devices > > >>>> do not support the MIO codepaths that might be triggered on vfio I/O coming > > >>>> from userspace; we must be able to ensure that these devices use the > > >>>> non-MIO instructions. To facilitate this, provide a new vfio region by > > >>>> which non-MIO instructions can be passed directly to the host kernel s390 > > >>>> PCI layer, to be reliably issued as non-MIO instructions. > > >>>> > > >>>> This patch introduces the new vfio VFIO_REGION_SUBTYPE_IBM_ZPCI_IO region > > >>>> and implements the ability to pass PCISTB and PCILG instructions over it, > > >>>> as these are what is required for ISM devices. > > >>> > > >>> There have been various discussions about splitting vfio-pci to allow > > >>> more device specific drivers rather adding duct tape and bailing wire > > >>> for various device specific features to extend vfio-pci. The latest > > >>> iteration is here[1]. Is it possible that such a solution could simply > > >>> provide the standard BAR region indexes, but with an implementation that > > >>> works on s390, rather than creating new device specific regions to > > >>> perform the same task? Thanks, > > >>> > > >>> Alex > > >>> > > >>> [1]https://lore.kernel.org/lkml/20210117181534.65724-1-mgurtovoy@nvidia.com/ > > >>> > > >> > > >> Thanks for the pointer, I'll have to keep an eye on this. An approach > > >> like this could solve some issues, but I think a main issue that still > > >> remains with relying on the standard BAR region indexes (whether using > > >> the current vfio-pci driver or a device-specific driver) is that QEMU > > >> writes to said BAR memory region are happening in, at most, 8B chunks > > >> (which then, in the current general-purpose vfio-pci code get further > > >> split up into 4B iowrite operations). The alternate approach I'm > > >> proposing here is allowing for the whole payload (4K) in a single > > >> operation, which is significantly faster. So, I suspect even with a > > >> device specific driver we'd want this sort of a region anyhow.. > > > > > > Why is this device specific behavior? It would be a fair argument that > > > acceptable device access widths for MMIO are always device specific, so > > > we should never break them down. Looking at the PCI spec, a TLP > > > requires a dword (4-byte) aligned address with a 10-bit length field > indicating the number of dwords, so up to 4K data as you suggest is the > > > > Well, as I mentioned in a different thread, it's not really device > > Sorry, I tried to follow the thread, not sure it's possible w/o lots of > preexisting s390 knowledge. > > > specific behavior but rather architecture/s390x specific behavior; > > PCISTB is an interface available to all PCI devices on s390x, it just so > > happens that ISM is the first device type that is running into the > > nuance. The architecture is designed in such a way that other devices > > can use the same interface in the same manner. > > As a platform access mechanism, this leans towards a platform specific > implementation of the PCI BAR regions. > > > To drill down a bit, the PCISTB payload can either be 'strict' or > > 'relaxed', which via the s390x architecture 'relaxed' means it's not > > dword-aligned but rather byte-aligned up to 4K. We find out at init > > time which interface a device supports -- So, for a device that > > supports 'relaxed' PCISTB like ISM, an example would be a PCISTB of 11 > > bytes coming from a non-dword-aligned address is permissible, which > > doesn't match the TLP from the spec as you described... I believe this > > 'relaxed' operation that steps outside of the spec is unique to s390x. > > (Conversely, devices that use 'strict' PCISTB conform to the typical TLP > > definition) > > > > This deviation from spec is to my mind is another argument to treat > > these particular PCISTB separately. > > I think that's just an accessor abstraction, we're not asking users to > generate TLPs. If we expose a byte granularity interface, some > platforms might pass that directly to the PCISTB command, otherwise > might align the address, perform a dword access, and return the > requested byte. AFAICT, both conventional and express PCI use dword > alignement on the bus, so this should be valid and at best questions > whether ISM is really PCI or not. The vibes I'm getting from ISM is that it is mostly a construct using (one set of) the s390 pci instructions, which ends up being something not entirely unlike a pci device... the question is how much things like the 'relaxed' payload may also be supported by 'real' pci devices plugged into an s390. > > > > whole payload. It's quite possible that the reason we don't have more > > > access width problems is that MMIO is typically mmap'd on other > > > platforms. We get away with using the x-no-mmap=on flag for debugging, > > > but it's not unheard of that the device also doesn't work quite > > > correctly with that flag, which could be due to access width or timing > > > difference. > > > > > > So really, I don't see why we wouldn't want to maintain the guest > > > access width through QEMU and the kernel interface for all devices. It > > > seems like that should be our default vfio-pci implementation. I think > > > we chose the current width based on the QEMU implementation that was > > > already splitting accesses, and it (mostly) worked. Thanks, > > > > > > > But unless you think that allowing more flexibility than the PCI spec > > dictates (byte-aligned up to 4K rather than dword-aligned up to 4K, see > > above) this still wouldn't allow me to solve the issue I'm trying to > > with this patch set. > > As above, it still seems like an improvement to honor user access width > to the ability of the platform or bus/device interface. If ISM is > really that different from PCI in this respect, it only strengthens the > argument to make a separate bus driver derived from vfio-pci(-core) imo. > > > If you DO think allowing byte-aligned access up to 4K is OK, then I'm > > still left with a further issue (@Niklas): I'm also using this > > region-based approach to ensure that the host uses a matching interface > > when talking to the host device (basically, s390x has two different > > versions of access to PCI devices, and for ISM at least we need to > > ensure that the same operation intercepted from the guest is being used > > on the host vs attempting to 'upgrade', which always happens via the > > standard s390s kernel PCI interfaces). > > In the proposed vfio-pci-core library model, devices would be attached > to the most appropriate vfio bus driver, an ISM device might be bound > to a vfio-zpci-ism (heh, "-ism") driver on the host, standard device That would be a nice name, at least :) > might simply be attached to vfio-pci. I'm wondering what a good split would be there. ISTM that the devices the vfio-pci-core split is written in mind with are still normal pci devices, just with some extra needs that can be fulfilled via an aux driver. ISM seems to need special treatment for normal accesses, but does not hook into a separate framework. (If other zpci devices need special accesses, would that then be a zpci framework?)