Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp1551601ybh; Fri, 13 Mar 2020 03:19:09 -0700 (PDT) X-Google-Smtp-Source: ADFU+vviETpAeltNn+Fl97+5tVZpKvEw1HUkVWlDY3Yj2q/S5eyYLlsGmRc1pB9+i8aukUk2AFZg X-Received: by 2002:a54:418a:: with SMTP id 10mr4503436oiy.105.1584094749242; Fri, 13 Mar 2020 03:19:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584094749; cv=none; d=google.com; s=arc-20160816; b=XJoz472lH4paZmacxIzsacSWuk3NI+VLQJX1K8snFuslXnTeAOa8W18RSvKvtO3AWA zFiwq6XzY5HUMBdbbY7dmhB3KHhKBj62WH5MLBmVIrKy7BBWM2yS5DV4KUixUGX3qWG+ TTQ7eTaHpgeXOhyk4j56pCSNYBJ60pohMgblEgz0esrDF2mJMp1ucKrkiKV6mjIy3uG+ 5lZnX+6RNpA48Hu3wYcD8YaViv7RxadqmvqpVhJOyih4tQGBVaDDE876hy6CniL2BThH x0JPnADmVviIi3icd9T/KLcoVy0E5yT+4sre/srHPXctaPwQv6k+BV5cGmVBPxsesA8N vtng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wyFbd+XTU4jmr46S/cr+5oxuJ4yyBlKA+WZPNk944HQ=; b=Cif1pZuS8g/PQo1kaU84HGA6itw9eec278Gi7bD0stdDrqUdBLGboQ64HSyZIjL4mu frqDgBTiGc4+yREQMh3GzNAMu1BICTOIrQfB33MVP2x9sH05CL+vLOezk8Bx7auhEQ1C 6SGC/sCXLciWkQCDGWGEG5/ZSP66lKlGARPfc27ciyKTIXKhAm6nj3TXuAtx/ytaDnyM 50usKpzA9tj4aejq+1qGKi/DR3PyPyLjZr1on+IRravRomPZLUPPyMIKHwQebogwBFbC HalkwC9LSwyJuTH4ckjxN1sdo00HFT4/kkYnIwk9oCFhmlGy24gpDH+Wkqe3whX7FX7K NpEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b="jbLOV/wK"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 96si4578068oty.198.2020.03.13.03.18.56; Fri, 13 Mar 2020 03:19:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b="jbLOV/wK"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726446AbgCMKS2 (ORCPT + 99 others); Fri, 13 Mar 2020 06:18:28 -0400 Received: from mail-il1-f196.google.com ([209.85.166.196]:42685 "EHLO mail-il1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726216AbgCMKS2 (ORCPT ); Fri, 13 Mar 2020 06:18:28 -0400 Received: by mail-il1-f196.google.com with SMTP id p2so537951ile.9 for ; Fri, 13 Mar 2020 03:18:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wyFbd+XTU4jmr46S/cr+5oxuJ4yyBlKA+WZPNk944HQ=; b=jbLOV/wKEBCP+egAIPuAmGI01EGi9iRz1D+M4W1UTJ1zDP7EGuZ06aIyWN1ZP2PqaS jt1Y6IaJZOgv5eR99mE1xIvQrt2lJqt2PMidGpfnF4LDdvdkrrGqK/CQ12baoYeXoNdR bMkkgu83JFDRZECx6628ZmgAPYa4D+C7IpQ4M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wyFbd+XTU4jmr46S/cr+5oxuJ4yyBlKA+WZPNk944HQ=; b=ADJMvO8AGf+lDazat+xb6J2Dwz3AMEgfQJeauQLqvhuhfhNugwUKz0Y9T7vYuGjiY8 AzjIW3HldfIiItW6vf2GRx/o1GQEol7viEWyAqwl9oDObp+pxpPyhMh3yVWIjnQb+zes n1gejuAxX0+Vq8QhDKH+sbOG5iQVLQtEbMT/ZMpgUFrPsYtNT2WHft4S1R+LMMICeclz XA74cmcWQfSz8mtoNS4SDKvd+oRvwd/QU0SmxsAX+SNJwakLy/bVSEVDTMedDP06jQC8 Xf7k6filwZocjiPmO8jJVx/s7OwFBq7LeIXJ60ElAdrMsdb8ei+mCzWkcBdYt4aN12Ue MnOQ== X-Gm-Message-State: ANhLgQ1XydcvhxOQEZq2dcGJWjEiJ/FNHOKXDCr3OQ+V8dHT4YZY/eFm G3Op2di1/jkedSjrnGGshlCPDOcFnywkPIp5VvASMA== X-Received: by 2002:a92:d745:: with SMTP id e5mr12394886ilq.285.1584094706462; Fri, 13 Mar 2020 03:18:26 -0700 (PDT) MIME-Version: 1.0 References: <20200304165845.3081-1-vgoyal@redhat.com> <20200304165845.3081-14-vgoyal@redhat.com> <20200312160208.GB114720@redhat.com> In-Reply-To: <20200312160208.GB114720@redhat.com> From: Miklos Szeredi Date: Fri, 13 Mar 2020 11:18:15 +0100 Message-ID: Subject: Re: [PATCH 13/20] fuse, dax: Implement dax read/write operations To: Vivek Goyal Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm , virtio-fs@redhat.com, Stefan Hajnoczi , "Dr. David Alan Gilbert" , "Michael S. Tsirkin" , Miklos Szeredi , Liu Bo , Peng Tao Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 12, 2020 at 5:02 PM Vivek Goyal wrote: > > On Thu, Mar 12, 2020 at 10:43:10AM +0100, Miklos Szeredi wrote: > > On Wed, Mar 4, 2020 at 5:59 PM Vivek Goyal wrote: > > > > > > This patch implements basic DAX support. mmap() is not implemented > > > yet and will come in later patches. This patch looks into implemeting > > > read/write. > > > > > > We make use of interval tree to keep track of per inode dax mappings. > > > > > > Do not use dax for file extending writes, instead just send WRITE message > > > to daemon (like we do for direct I/O path). This will keep write and > > > i_size change atomic w.r.t crash. > > > > > > Signed-off-by: Stefan Hajnoczi > > > Signed-off-by: Dr. David Alan Gilbert > > > Signed-off-by: Vivek Goyal > > > Signed-off-by: Miklos Szeredi > > > Signed-off-by: Liu Bo > > > Signed-off-by: Peng Tao > > > --- > > > fs/fuse/file.c | 597 +++++++++++++++++++++++++++++++++++++- > > > fs/fuse/fuse_i.h | 23 ++ > > > fs/fuse/inode.c | 6 + > > > include/uapi/linux/fuse.h | 1 + > > > 4 files changed, 621 insertions(+), 6 deletions(-) > > > > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > > > index 9d67b830fb7a..9effdd3dc6d6 100644 > > > --- a/fs/fuse/file.c > > > +++ b/fs/fuse/file.c > > > @@ -18,6 +18,12 @@ > > > #include > > > #include > > > #include > > > +#include > > > +#include > > > +#include > > > + > > > +INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, rb, __u64, __subtree_last, > > > + START, LAST, static inline, fuse_dax_interval_tree); > > > > Are you using this because of byte ranges (u64)? Does it not make > > more sense to use page offsets, which are unsigned long and so fit > > nicely into the generic interval tree? > > I think I should be able to use generic interval tree. I will switch > to that. > > [..] > > > +/* offset passed in should be aligned to FUSE_DAX_MEM_RANGE_SZ */ > > > +static int fuse_setup_one_mapping(struct inode *inode, loff_t offset, > > > + struct fuse_dax_mapping *dmap, bool writable, > > > + bool upgrade) > > > +{ > > > + struct fuse_conn *fc = get_fuse_conn(inode); > > > + struct fuse_inode *fi = get_fuse_inode(inode); > > > + struct fuse_setupmapping_in inarg; > > > + FUSE_ARGS(args); > > > + ssize_t err; > > > + > > > + WARN_ON(offset % FUSE_DAX_MEM_RANGE_SZ); > > > + WARN_ON(fc->nr_free_ranges < 0); > > > + > > > + /* Ask fuse daemon to setup mapping */ > > > + memset(&inarg, 0, sizeof(inarg)); > > > + inarg.foffset = offset; > > > + inarg.fh = -1; > > > + inarg.moffset = dmap->window_offset; > > > + inarg.len = FUSE_DAX_MEM_RANGE_SZ; > > > + inarg.flags |= FUSE_SETUPMAPPING_FLAG_READ; > > > + if (writable) > > > + inarg.flags |= FUSE_SETUPMAPPING_FLAG_WRITE; > > > + args.opcode = FUSE_SETUPMAPPING; > > > + args.nodeid = fi->nodeid; > > > + args.in_numargs = 1; > > > + args.in_args[0].size = sizeof(inarg); > > > + args.in_args[0].value = &inarg; > > > > args.force = true? > > I can do that but I am not sure what exactly does args.force do and > why do we need it in this case. Hm, it prevents interrupts. Looking closely, however it will only prevent SIGKILL from immediately interrupting the request, otherwise it will send an INTERRUPT request and the filesystem can ignore that. Might make sense to have a args.nonint flag to prevent the sending of INTERRUPT... > First thing it does is that request is allocated with flag __GFP_NOFAIL. > Second thing it does is that caller is forced to wait for request > completion and its not an interruptible sleep. > > I am wondering what makes FUSE_SETUPMAPING/FUSE_REMOVEMAPPING requests > special that we need to set force flag. Maybe not for SETUPMAPPING (I was confused by the error log). However if REMOVEMAPPING fails for some reason, than that dax mapping will be leaked for the lifetime of the filesystem. Or am I misunderstanding it? > > > + ret = fuse_setup_one_mapping(inode, > > > + ALIGN_DOWN(pos, FUSE_DAX_MEM_RANGE_SZ), > > > + dmap, true, true); > > > + if (ret < 0) { > > > + printk("fuse_setup_one_mapping() failed. err=%d pos=0x%llx\n", > > > + ret, pos); > > > > Again. > > Will remove. How about converting some of them to pr_debug() instead? It > can help with debugging if something is not working. Okay, and please move it to fuse_setup_one_mapping() where there's already a pr_debug() for the success case. > > + > > > + /* Do not use dax for file extending writes as its an mmap and > > > + * trying to write beyong end of existing page will generate > > > + * SIGBUS. > > > > Ah, here it is. So what happens in case of a race? Does that > > currently crash KVM? > > In case of race, yes, KVM hangs. So no shared directory operation yet > till we have designed proper error handling in kvm path. I think before this is merged we have to fix the KVM crash; that's not acceptable even if we explicitly say that shared directory is not supported for the time being. Thanks, Miklos