Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp5168137img; Wed, 27 Mar 2019 03:32:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKaElQ6des2hRyF/twah9UEyRbc7EDAo0Us8xAN5dkUiqZD1+D+BEubibYkzFUglUKPsWI X-Received: by 2002:a17:902:6804:: with SMTP id h4mr36398909plk.115.1553682756956; Wed, 27 Mar 2019 03:32:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553682756; cv=none; d=google.com; s=arc-20160816; b=Vn6qSbM3m+YicI8Rcdc9aEC3jPjkSVWLDXKSE8/lG1gFCIlTqsfp4Q+r6b++P4kVbz IR77Djym2/GNptlQiV4htP7VAnT/PLotlS2z3SFxFdYbK49XdS7MdehBPxxZBlz++ErV O0S5Fl+o75eRelwTmiD26DQ7MvEpP7FXi1mqnbnGiWjpBAwuKBRU0mqpXV/3oD3bdIBV e8mXx6C5c+UfVqwJnIpUsZ0nxbpp7EKsKPqbJ4+7ndgkqBhFsCcEGWXF3pCXPu/0XqMd byEEt0ybkKDZSfPpDaC2goEW+WPlj+flHJxbHwEKWphmA75FdbMtiKPDc8aQgSMp3Ew0 6LKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :date:references:in-reply-to:message-id:cc:to:subject:from :dkim-signature:dkim-signature; bh=5CmMhAf2ZqyrN+ETYX+NFf1/omfg9P9nGlFxAhjhWIM=; b=wIQ8xkHo63JHbu+6dK6fhSvyQc0twSOn6Hkk/0EhYzk4A4PQegrb2gClFErf+mzGrp SEEE8gYzDe2qhhQHDuXPeAl/ZmkY9Vs9K95tU4tTZQQDOI9LEM7AwBl/YUzUFmYBkTqi abZKfd/XGedQrdTe7HFTCyEoQxTvwMs6cSPiFMlGoSatRdaH0cT2G/UUyifr7SvS7rRR UK7XG6JIGX/YTFM7zOu09T2okB4JU4knfnx8RV6+Aj70eDLLCKjwc7mfgchH46qgK5JZ qFpCTVEl/qhV6ZgwED3Npt1sf0rTRrDHrq2n/ZJ/i/czHlTGh2nvG+SREEo2yI8Z46dZ ipVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nexedi.com header.s=mandrill header.b=Aq5pOywE; dkim=pass header.i=@mandrillapp.com header.s=mandrill header.b=lI4DLtiz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q3si2054834pfc.151.2019.03.27.03.32.21; Wed, 27 Mar 2019 03:32:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nexedi.com header.s=mandrill header.b=Aq5pOywE; dkim=pass header.i=@mandrillapp.com header.s=mandrill header.b=lI4DLtiz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732887AbfC0KaR (ORCPT + 99 others); Wed, 27 Mar 2019 06:30:17 -0400 Received: from mail128-16.atl41.mandrillapp.com ([198.2.128.16]:24730 "EHLO mail128-16.atl41.mandrillapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727328AbfC0KaR (ORCPT ); Wed, 27 Mar 2019 06:30:17 -0400 X-Greylist: delayed 901 seconds by postgrey-1.27 at vger.kernel.org; Wed, 27 Mar 2019 06:30:16 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=mandrill; d=nexedi.com; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Date:MIME-Version:Content-Type:Content-Transfer-Encoding; i=kirr@nexedi.com; bh=5CmMhAf2ZqyrN+ETYX+NFf1/omfg9P9nGlFxAhjhWIM=; b=Aq5pOywEGKvR6EPAg2gLQv2h0Py1QDE5gKiZo6bFMbB93BgRio2hMQZGbA2BN/lnc5K/CvjpufY2 Y0XCKjQH0HPzVs2LLKEiYnEQt4r9uldfiTinURpKRdUWkIpa4Ll1/Ade8tA8nz+EpO8jnC9JAp/I NmadWnoKuUKJNmgCHpE= Received: from pmta01.mandrill.prod.atl01.rsglab.com (127.0.0.1) by mail128-16.atl41.mandrillapp.com id hjd8bg1mqukr for ; Wed, 27 Mar 2019 10:15:15 +0000 (envelope-from ) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; i=@mandrillapp.com; q=dns/txt; s=mandrill; t=1553681715; h=From : Subject : To : Cc : Message-Id : In-Reply-To : References : Date : MIME-Version : Content-Type : Content-Transfer-Encoding : From : Subject : Date : X-Mandrill-User : List-Unsubscribe; bh=5CmMhAf2ZqyrN+ETYX+NFf1/omfg9P9nGlFxAhjhWIM=; b=lI4DLtizPnph4Fol58asV2VgF5uCRTCBlnkKV3XJKiPimJbs+eaCsVdyiZireVckDyrJ4T QzbKOXtnyRPN725YYkgayhfyS/brYqIPsnCNlya3QlrQsXdN5fBJKnBBQCAyO/xGvy7TCstJ dvHHP9lVJ+2Za0xHYEkpPEucZtwoA= From: Kirill Smelkov Subject: [RESEND4, PATCH 2/2] fuse: require /dev/fuse reads to have enough buffer capacity as negotiated Received: from [87.98.221.171] by mandrillapp.com id 2848fb7c92a24ed1a7a21cb5bf3746a3; Wed, 27 Mar 2019 10:15:15 +0000 X-Mailer: git-send-email 2.21.0.392.gf8f6787159 To: Miklos Szeredi , Miklos Szeredi Cc: Han-Wen Nienhuys , Jakob Unterwurzacher , Kirill Tkhai , Andrew Morton , , , , Kirill Smelkov Message-Id: In-Reply-To: References: X-Report-Abuse: Please forward a copy of this message, including all headers, to abuse@mandrill.com X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=31050260.2848fb7c92a24ed1a7a21cb5bf3746a3 X-Mandrill-User: md_31050260 Date: Wed, 27 Mar 2019 10:15:15 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A FUSE filesystem server queues /dev/fuse sys_read calls to get filesystem requests to handle. It does not know in advance what would be that request as it can be anything that client issues - LOOKUP, READ, WRITE, ... Many requests are short and retrieve data from the filesystem. However WRITE and NOTIFY_REPLY write data into filesystem. Before getting into operation phase, FUSE filesystem server and kernel client negotiate what should be the maximum write size the client will ever issue. After negotiation the contract in between server/client is that the filesystem server then should queue /dev/fuse sys_read calls with enough buffer capacity to receive any client request - WRITE in particular, while FUSE client should not, in particular, send WRITE requests with > negotiated max_write payload. FUSE client in kernel and libfuse historically reserve 4K for request header. This way the contract is that filesystem server should queue sys_reads with 4K+max_write buffer. If the filesystem server does not follow this contract, what can happen is that fuse_dev_do_read will see that request size is > buffer size, and then it will return EIO to client who issued the request but won't indicate in any way that there is a problem to filesystem server. This can be hard to diagnose because for some requests, e.g. for NOTIFY_REPLY which mimics WRITE, there is no client thread that is waiting for request completion and that EIO goes nowhere, while on filesystem server side things look like the kernel is not replying back after successful NOTIFY_RETRIEVE request made by the server. -> We can make the problem easy to diagnose if we indicate via error return to filesystem server when it is violating the contract. This should not practically cause problems because if a filesystem server is using shorter buffer, writes to it were already very likely to cause EIO, and if the filesystem is read-only it should be too following 8K minimum buffer size (= either FUSE_MIN_READ_BUFFER, see 1d3d752b47, or = 4K + min(max_write)=4k cared to be so by process_init_reply). Please see [1] for context where the problem of stuck filesystem was hit for real (because kernel client was incorrectly sending more than max_write data with NOTIFY_REPLY; see also previous patch), how the situation was traced and for more involving patch that did not make it into the tree. [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 Signed-off-by: Kirill Smelkov Cc: Han-Wen Nienhuys Cc: Jakob Unterwurzacher --- fs/fuse/dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 38e94bc43053..8fdfbafed037 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1317,6 +1317,16 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, unsigned reqsize; unsigned int hash; + /* + * Require sane minimum read buffer - that has capacity for fixed part + * of any request header + negotated max_write room for data. If the + * requirement is not satisfied return EINVAL to the filesystem server + * to indicate that it is not following FUSE server/client contract. + * Don't dequeue / abort any request. + */ + if (nbytes < (fc->conn_init ? 4096 + fc->max_write : FUSE_MIN_READ_BUFFER)) + return -EINVAL; + restart: spin_lock(&fiq->waitq.lock); err = -EAGAIN; -- 2.21.0.392.gf8f6787159