Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7009795ybi; Mon, 22 Jul 2019 05:37:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqyq9Ve7Ytb6AGu/pkafQRxZ1HVw/XLVBg/TzKCv6FvhUFsBvEeb4Swe6eMGFcYotBfOtXmS X-Received: by 2002:a17:90a:28e4:: with SMTP id f91mr74441106pjd.99.1563799064003; Mon, 22 Jul 2019 05:37:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563799063; cv=none; d=google.com; s=arc-20160816; b=AJeNYcjkjP8kqqPicVu0HJpmdn9HIyBXBWrsORNXSpxpJEkIXccQcjKKyoN/FO/0Hf oReQPtSI8bChpXJV+CXbSeH7eJe42VvOFRd5/zbQ+MpE9LDdrvXRlliLATWvd2+558Bw sznUF6AodrOAAkghsWTIl/ywqjG6A9WF5jWs3SqvXg+E6JiUjLtdR0X0gD2KbHHPcao1 SjpTLjPSPVY+TLlaNYz+NakJemsDL3z8rfZyJGoHhNRLDGYUM5NnRdI1zFyobxxRN+Tr CJ2MWNfEqx8Pihqd+O/w5U2QYHaZAy8NKr/+s0jN8DZMxhezVMM7WTzR5sPumHhrW92b Ajfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:autocrypt :openpgp:from:references:cc:to:subject; bh=WPkK1eCVXDgPkEW7MC3G4nqqe9knYb7Wk3QZKXi39S8=; b=IXw8z/ZLYaQwrJ/7HLWm2r2v2pDLmwj72/v6sLAZTUzNkDSwylaDHdTD4X8fXXTdBI /6DOydm8urNDDZMospqiGFxEAYeTSppuITuPJDn4Odw3wlID+yTx/NsPPfSv3N8/bMxK AwR6S7MioNJtaffDKBce4rGsmQ9U4Zbi2/5gpkCfZe8S+Hbl1AkkkgRr1U5kwrMdIEpm kftDV5iP8obgvgBaHegZEOkvMeg8adWWHftAwETd5j7BmoXCTT3duHwnOees9ZlEq1sH GwVPgSV8ddqqjngtRPqAtWGsGjjGkKl095VeypMxMC9387C3BQPLtsrDU5IyBKQ1eHJY Ngsw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 23si333377pgh.298.2019.07.22.05.37.28; Mon, 22 Jul 2019 05:37:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729759AbfGVKxm (ORCPT + 99 others); Mon, 22 Jul 2019 06:53:42 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:33264 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727795AbfGVKxl (ORCPT ); Mon, 22 Jul 2019 06:53:41 -0400 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6MArUaa093093 for ; Mon, 22 Jul 2019 06:53:39 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 2tw9u15gkf-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 22 Jul 2019 06:53:38 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 22 Jul 2019 11:51:58 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 22 Jul 2019 11:51:54 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6MApq5l9633978 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 22 Jul 2019 10:51:52 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4949811C052; Mon, 22 Jul 2019 10:51:52 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D000C11C050; Mon, 22 Jul 2019 10:51:49 +0000 (GMT) Received: from oc7455500831.ibm.com (unknown [9.152.224.116]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 22 Jul 2019 10:51:49 +0000 (GMT) Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device To: Dan Williams , Vivek Goyal Cc: Halil Pasic , Collin Walling , Cornelia Huck , Sebastian Ott , KVM list , Miklos Szeredi , linux-nvdimm , Linux Kernel Mailing List , "Dr. David Alan Gilbert" , Stefan Hajnoczi , linux-fsdevel , Steven Whitehouse , Heiko Carstens , David Hildenbrand References: <20190515192715.18000-1-vgoyal@redhat.com> <20190515192715.18000-19-vgoyal@redhat.com> <20190717192725.25c3d146.pasic@linux.ibm.com> <20190718131532.GA13883@redhat.com> From: Christian Borntraeger Openpgp: preference=signencrypt Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= mQINBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABtDRDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKElCTSkgPGJvcm50cmFlZ2VyQGRlLmlibS5jb20+iQI4BBMBAgAiBQJO nDz4AhsDBgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRARe7yAtaYcfOYVD/9sqc6ZdYKD bmDIvc2/1LL0g7OgiA8pHJlYN2WHvIhUoZUIqy8Sw2EFny/nlpPVWfG290JizNS2LZ0mCeGZ 80yt0EpQNR8tLVzLSSr0GgoY0lwsKhAnx3p3AOrA8WXsPL6prLAu3yJI5D0ym4MJ6KlYVIjU ppi4NLWz7ncA2nDwiIqk8PBGxsjdc/W767zOOv7117rwhaGHgrJ2tLxoGWj0uoH3ZVhITP1z gqHXYaehPEELDV36WrSKidTarfThCWW0T3y4bH/mjvqi4ji9emp1/pOWs5/fmd4HpKW+44tD Yt4rSJRSa8lsXnZaEPaeY3nkbWPcy3vX6qafIey5d8dc8Uyaan39WslnJFNEx8cCqJrC77kI vcnl65HaW3y48DezrMDH34t3FsNrSVv5fRQ0mbEed8hbn4jguFAjPt4az1xawSp0YvhzwATJ YmZWRMa3LPx/fAxoolq9cNa0UB3D3jmikWktm+Jnp6aPeQ2Db3C0cDyxcOQY/GASYHY3KNra z8iwS7vULyq1lVhOXg1EeSm+lXQ1Ciz3ub3AhzE4c0ASqRrIHloVHBmh4favY4DEFN19Xw1p 76vBu6QjlsJGjvROW3GRKpLGogQTLslbjCdIYyp3AJq2KkoKxqdeQYm0LZXjtAwtRDbDo71C FxS7i/qfvWJv8ie7bE9A6Wsjn7kCDQROnDz4ARAAmPI1e8xB0k23TsEg8O1sBCTXkV8HSEq7 JlWz7SWyM8oFkJqYAB7E1GTXV5UZcr9iurCMKGSTrSu3ermLja4+k0w71pLxws859V+3z1jr nhB3dGzVZEUhCr3EuN0t8eHSLSMyrlPL5qJ11JelnuhToT6535cLOzeTlECc51bp5Xf6/XSx SMQaIU1nDM31R13o98oRPQnvSqOeljc25aflKnVkSfqWSrZmb4b0bcWUFFUKVPfQ5Z6JEcJg Hp7qPXHW7+tJTgmI1iM/BIkDwQ8qe3Wz8R6rfupde+T70NiId1M9w5rdo0JJsjKAPePKOSDo RX1kseJsTZH88wyJ30WuqEqH9zBxif0WtPQUTjz/YgFbmZ8OkB1i+lrBCVHPdcmvathknAxS bXL7j37VmYNyVoXez11zPYm+7LA2rvzP9WxR8bPhJvHLhKGk2kZESiNFzP/E4r4Wo24GT4eh YrDo7GBHN82V4O9JxWZtjpxBBl8bH9PvGWBmOXky7/bP6h96jFu9ZYzVgIkBP3UYW+Pb1a+b w4A83/5ImPwtBrN324bNUxPPqUWNW0ftiR5b81ms/rOcDC/k/VoN1B+IHkXrcBf742VOLID4 YP+CB9GXrwuF5KyQ5zEPCAjlOqZoq1fX/xGSsumfM7d6/OR8lvUPmqHfAzW3s9n4lZOW5Jfx bbkAEQEAAYkCHwQYAQIACQUCTpw8+AIbDAAKCRARe7yAtaYcfPzbD/9WNGVf60oXezNzSVCL hfS36l/zy4iy9H9rUZFmmmlBufWOATjiGAXnn0rr/Jh6Zy9NHuvpe3tyNYZLjB9pHT6mRZX7 Z1vDxeLgMjTv983TQ2hUSlhRSc6e6kGDJyG1WnGQaqymUllCmeC/p9q5m3IRxQrd0skfdN1V AMttRwvipmnMduy5SdNayY2YbhWLQ2wS3XHJ39a7D7SQz+gUQfXgE3pf3FlwbwZhRtVR3z5u aKjxqjybS3Ojimx4NkWjidwOaUVZTqEecBV+QCzi2oDr9+XtEs0m5YGI4v+Y/kHocNBP0myd pF3OoXvcWdTb5atk+OKcc8t4TviKy1WCNujC+yBSq3OM8gbmk6NwCwqhHQzXCibMlVF9hq5a FiJb8p4QKSVyLhM8EM3HtiFqFJSV7F+h+2W0kDyzBGyE0D8z3T+L3MOj3JJJkfCwbEbTpk4f n8zMboekuNruDw1OADRMPlhoWb+g6exBWx/YN4AY9LbE2KuaScONqph5/HvJDsUldcRN3a5V RGIN40QWFVlZvkKIEkzlzqpAyGaRLhXJPv/6tpoQaCQQoSAc5Z9kM/wEd9e2zMeojcWjUXgg oWj8A/wY4UXExGBu+UCzzP/6sQRpBiPFgmqPTytrDo/gsUGqjOudLiHQcMU+uunULYQxVghC syiRa+UVlsKmx1hsEg== Date: Mon, 22 Jul 2019 12:51:49 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19072210-0008-0000-0000-000002FFBFD8 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19072210-0009-0000-0000-0000226D481B Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-22_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1907220130 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18.07.19 16:30, Dan Williams wrote: > On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal wrote: >> >> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote: >>> On Wed, 15 May 2019 15:27:03 -0400 >>> Vivek Goyal wrote: >>> >>>> From: Stefan Hajnoczi >>>> >>>> Setup a dax device. >>>> >>>> Use the shm capability to find the cache entry and map it. >>>> >>>> The DAX window is accessed by the fs/dax.c infrastructure and must have >>>> struct pages (at least on x86). Use devm_memremap_pages() to map the >>>> DAX window PCI BAR and allocate struct page. >>>> >>> >>> Sorry for being this late. I don't see any more recent version so I will >>> comment here. >>> >>> I'm trying to figure out how is this supposed to work on s390. My concern >>> is, that on s390 PCI memory needs to be accessed by special >>> instructions. This is taken care of by the stuff defined in >>> arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses >>> the appropriate s390 instruction. However if the code does not use the >>> linux abstractions for accessing PCI memory, but assumes it can be >>> accessed like RAM, we have a problem. >>> >>> Looking at this patch, it seems to me, that we might end up with exactly >>> the case described. For example AFAICT copy_to_iter() (3) resolves to >>> the function in lib/iov_iter.c which does not seem to cater for s390 >>> oddities. >>> >>> I didn't have the time to investigate this properly, and since virtio-fs >>> is virtual, we may be able to get around what is otherwise a >>> limitation on s390. My understanding of these areas is admittedly >>> shallow, and since I'm not sure I'll have much more time to >>> invest in the near future I decided to raise concern. >>> >>> Any opinions? >> >> Hi Halil, >> >> I don't understand s390 and how PCI works there as well. Is there any >> other transport we can use there to map IO memory directly and access >> using DAX? >> >> BTW, is DAX supported for s390. >> >> I am also hoping somebody who knows better can chip in. Till that time, >> we could still use virtio-fs on s390 without DAX. > > s390 has so-called "limited" dax support, see CONFIG_FS_DAX_LIMITED. > In practice that means that support for PTE_DEVMAP is missing which > means no get_user_pages() support for dax mappings. Effectively it's > only useful for execute-in-place as operations like fork() and ptrace > of dax mappings will fail. This is only true for the dcssblk device driver (drivers/s390/block/dcssblk.c and arch/s390/mm/extmem.c). For what its worth, the dcssblk looks to Linux like normal memory (just above the previously detected memory) that can be used like normal memory. In previous time we even had struct pages for this memory - this was removed long ago (when it was still xip) to reduce the memory footprint for large dcss blocks and small memory guests. Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that memory? Now some observations: - dcssblk is z/VM only (not KVM) - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on wether a device driver is compiled in or not seems not flexible enough in case if you have device driver that does have struct pages and another one that doesn't - I do not see a reason why we should not be able to map anything from QEMU into the guest real memory via an additional KVM memory slot. We would need to handle that in the guest somehow (and not as a PCI bar), register this with struct pages etc. - we must then look how we can create the link between the guest memory and the virtio-fs driver. For virtio-ccw we might be able to add a new ccw command or whatever. Maybe we could also piggy-back on some memory hotplug work from David Hildenbrand (add cc). Regarding limitations on the platform: - while we do have PCI, the virtio devices are usually plugged via the ccw bus. That implies no PCI bars. I assume you use those PCI bars only to implicitely have the location of the shared memory Correct? - no real memory mapped I/O. Instead there are instructions that work on the mmio. As I understand things, this is of no concern regarding virtio-fs as you do not need mmio in the sense that a memory access of the guest to such an address triggers an exit. You just need the shared memory as a mean to have the data inside the guest. Any notification is done via normal virtqueue mechanisms Correct? Adding Heiko, maybe he remembers some details of the dcssblk/xip history.