Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp746432imm; Wed, 10 Oct 2018 03:45:28 -0700 (PDT) X-Google-Smtp-Source: ACcGV61lC1Mn3lK2FWeWdeBstbh2D7yur2eqEPDJlJxmJcOlZsYnpueDdyVRiyJa3YGRRqaX7XVo X-Received: by 2002:a62:1a16:: with SMTP id a22-v6mr34181268pfa.237.1539168327977; Wed, 10 Oct 2018 03:45:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539168327; cv=none; d=google.com; s=arc-20160816; b=qedhIad4TMnlJAp+mGg2d588toGCLzY3UGHE4S5XkGk7SJzUlIgvMe7TVnWu7aQqay Jo3X7ZrvLQOJXwWj9G+cBm9TSsHOoHK/VrzOy5Aw+ww2IqH9YVfP3IbEj1mrymfgBW3T V+rSqn5r3uOBKRSkltAvDrf3iUC2DSND7h/eMT7x8RlPbPwr2PpuIQeayRvgK7+Qo3Eu prjbfxTtpGoF5t+9BJO5CjFUjga/e+i3jJceY/QYqeOypCW/NOOB0FDwJqYf1QMv9vvQ SpCejOAITR/XGe87xbwprp5ws9V8Y1SFOd29Ro3/nnwy8ukMqtUfLR+s0iDqBkTuUc7l lOXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=pfC4EWDvMvVG20MYWuy/ExxD/ybhTLTB6orOZndwgaI=; b=apEJaYD3o5cNWy8TeTPHzpPl5TPFeDAg+NQFu79yYzh14M+MtcaNtj7nUYcHDUgH1Q t+vHQNKe9Zix8sk+3x1WpBEfbBZynEWqEPDq0aRAq14Oj+yH2oWiEDznpqpuWbFpz+N0 SObDfUrP9uZPx2nxtXmuIBm+xl+DxMq3EA58FHruVzCZNLIv9dGbuKGEPy14F818zWNZ FsCLPmzbyV02DCOJ79rV2wW4ZA90QaH7tZUex9BKr3KZDAZ4OsjZfxR6yRL9PuDadwv9 DX45eXPdfnH6bXOZACPyQANgGvc4P+pgRxFyGfjdX1hxcQhY8QAumeMtHWwtu97Zv6TJ 5POw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PmRF1fIy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e9-v6si27583810pln.265.2018.10.10.03.45.13; Wed, 10 Oct 2018 03:45:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PmRF1fIy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726882AbeJJSF1 (ORCPT + 99 others); Wed, 10 Oct 2018 14:05:27 -0400 Received: from mail-io1-f67.google.com ([209.85.166.67]:36423 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726617AbeJJSF0 (ORCPT ); Wed, 10 Oct 2018 14:05:26 -0400 Received: by mail-io1-f67.google.com with SMTP id p4-v6so3501769iom.3; Wed, 10 Oct 2018 03:43:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pfC4EWDvMvVG20MYWuy/ExxD/ybhTLTB6orOZndwgaI=; b=PmRF1fIyGnJN80QnxsnoNZYYunP22i0sQkD9RvE0tWn054gSY9VW2xM/Xea6d0LOFs Vckf1WfO3BQ4+0QwmtyWcVYAAnfv+QKAXUDlkiLMFruATs4EDmX6Mgfmb5Eh4Is80Vgq 17D8zx9Isf/uS6x1de220m24LgypDWSIVK74lVrG0sGOuan+rJ/BNccKmwMooaO3t9kn T+zxOXqpPlRtMFrVxCg4fMApZrP7LiFA1HP9Mw044xyqAfuU7oICA/gN5YWehO2X5sJ0 wq0a/6BMGltMX3lucqX2BNl7W/o/6vRsEbu4E6yLmMVpNj9KTQHOZqO9UDgQ1zEM/CR5 bWKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pfC4EWDvMvVG20MYWuy/ExxD/ybhTLTB6orOZndwgaI=; b=jS7o4EaL6Xxd1s60P5F07JkJLDt8mVyb97NX/ZLiqZRv1u9P6HDeE/bv+nxfOw674E Px9eAuwBS+VDduZ659SdvwnyKJ2ktx4Mcs6xPtSahahKy1A9H4wtmnKTSGvaFN+YLkLc CbgUxvUu1Mgz/BE+GkhE/453NhbaNIB6GIjUyveZ3B/8Qc31aKKNzxaLQ+IGX4yxzMgv 9CVF8pwUbWmQraGxXB4MxIwtkUFCJPurP+s5+9nDjBv8wsUEp0yptq/ktfC7l2cbl92c hXuw4XQSTwPPsfgq/SIt25u4p7YErTdXPVKsZPhTDQJSjvQuYYdncZlN4ZJyjHOGOrIm renw== X-Gm-Message-State: ABuFfoisqrEchpm/F3XRyMCyPdhcTbRHuiW5x3VFLTqQU35UyHrOs8FB Lh7MDoVwZKRP54fpe7KS4JRJmMyayWpSF5QtqVo= X-Received: by 2002:a6b:7906:: with SMTP id i6-v6mr22815842iop.241.1539168232051; Wed, 10 Oct 2018 03:43:52 -0700 (PDT) MIME-Version: 1.0 References: <20181009175428.18543-1-lhenriques@suse.com> In-Reply-To: From: Ilya Dryomov Date: Wed, 10 Oct 2018 12:43:40 +0200 Message-ID: Subject: Re: [PATCH] ceph: only allow punch hole mode in fallocate To: "Yan, Zheng" Cc: Luis Henriques , "Yan, Zheng" , Sage Weil , Ceph Development , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 10, 2018 at 6:21 AM Yan, Zheng wrote: > > On Wed, Oct 10, 2018 at 1:54 AM Luis Henriques wrote: > > > > Current implementation of cephfs fallocate isn't correct as it doesn't > > really reserve the space in the cluster, which means that a subsequent > > call to a write may actually fail due to lack of space. In fact, it is > > currently possible to fallocate an amount space that is larger than the > > free space in the cluster. > > > > Since there's no easy solution to fix this at the moment, this patch > > simply removes support for all fallocate operations but > > FALLOC_FL_PUNCH_HOLE (which implies FALLOC_FL_KEEP_SIZE). > > > > Link: https://tracker.ceph.com/issues/36317 > > Cc: stable@vger.kernel.org > > Fixes: ad7a60de882a ("ceph: punch hole support") > > Signed-off-by: Luis Henriques > > --- > > fs/ceph/file.c | 45 +++++++++------------------------------------ > > 1 file changed, 9 insertions(+), 36 deletions(-) > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > index 92ab20433682..91a7ad259bcf 100644 > > --- a/fs/ceph/file.c > > +++ b/fs/ceph/file.c > > @@ -1735,7 +1735,6 @@ static long ceph_fallocate(struct file *file, int mode, > > struct ceph_file_info *fi = file->private_data; > > struct inode *inode = file_inode(file); > > struct ceph_inode_info *ci = ceph_inode(inode); > > - struct ceph_fs_client *fsc = ceph_inode_to_client(inode); > > struct ceph_cap_flush *prealloc_cf; > > int want, got = 0; > > int dirty; > > @@ -1743,10 +1742,7 @@ static long ceph_fallocate(struct file *file, int mode, > > loff_t endoff = 0; > > loff_t size; > > > > - if ((offset + length) > max(i_size_read(inode), fsc->max_file_size)) > > - return -EFBIG; > > - > > - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > > + if (mode != (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > > return -EOPNOTSUPP; > > > > if (!S_ISREG(inode->i_mode)) > > @@ -1763,18 +1759,6 @@ static long ceph_fallocate(struct file *file, int mode, > > goto unlock; > > } > > > > - if (!(mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)) && > > - ceph_quota_is_max_bytes_exceeded(inode, offset + length)) { > > - ret = -EDQUOT; > > - goto unlock; > > - } > > - > > - if (ceph_osdmap_flag(&fsc->client->osdc, CEPH_OSDMAP_FULL) && > > - !(mode & FALLOC_FL_PUNCH_HOLE)) { > > - ret = -ENOSPC; > > - goto unlock; > > - } > > - > > if (ci->i_inline_version != CEPH_INLINE_NONE) { > > ret = ceph_uninline_data(file, NULL); > > if (ret < 0) > > @@ -1782,12 +1766,12 @@ static long ceph_fallocate(struct file *file, int mode, > > } > > > > size = i_size_read(inode); > > - if (!(mode & FALLOC_FL_KEEP_SIZE)) { > > - endoff = offset + length; > > - ret = inode_newsize_ok(inode, endoff); > > - if (ret) > > - goto unlock; > > - } > > + > > + /* Are we punching a hole beyond EOF? */ > > + if (offset >= size) > > + goto unlock; > > + if ((offset + length) > size) > > + length = size - offset; > > > > if (fi->fmode & CEPH_FILE_MODE_LAZY) > > want = CEPH_CAP_FILE_BUFFER | CEPH_CAP_FILE_LAZYIO; > > @@ -1798,16 +1782,8 @@ static long ceph_fallocate(struct file *file, int mode, > > if (ret < 0) > > goto unlock; > > > > - if (mode & FALLOC_FL_PUNCH_HOLE) { > > - if (offset < size) > > - ceph_zero_pagecache_range(inode, offset, length); > > - ret = ceph_zero_objects(inode, offset, length); > > - } else if (endoff > size) { > > - truncate_pagecache_range(inode, size, -1); > > - if (ceph_inode_set_size(inode, endoff)) > > - ceph_check_caps(ceph_inode(inode), > > - CHECK_CAPS_AUTHONLY, NULL); > > - } > > + ceph_zero_pagecache_range(inode, offset, length); > > + ret = ceph_zero_objects(inode, offset, length); > > > > if (!ret) { > > spin_lock(&ci->i_ceph_lock); > > @@ -1817,9 +1793,6 @@ static long ceph_fallocate(struct file *file, int mode, > > spin_unlock(&ci->i_ceph_lock); > > if (dirty) > > __mark_inode_dirty(inode, dirty); > > - if ((endoff > size) && > > - ceph_quota_is_max_bytes_approaching(inode, endoff)) > > - ceph_check_caps(ci, CHECK_CAPS_NODELAY, NULL); > > } > > > > ceph_put_cap_refs(ci, got); > > Applied, thanks I don't think it should go to stable kernels. Strictly speaking it's a behaviour change -- it's been this way for many years and, unless you are close to ENOSPC, it's sort of appears to work. I'll take off the stable tag unless I hear objections. Thanks, Ilya