Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp95292pxk; Tue, 22 Sep 2020 20:06:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJybk0sRxM+sl8tFOCrpJpSIAllbvPjMs61k9TACfV7PVQx4i+RXhDl0qvy1KGBLwZEV/KSw X-Received: by 2002:a50:ed02:: with SMTP id j2mr7634922eds.137.1600830419511; Tue, 22 Sep 2020 20:06:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600830419; cv=none; d=google.com; s=arc-20160816; b=NZk4Y8hJtk0fpF9F0z5w8t6a8NthdnZ9JDctBo1eG0c3LmdfUjfAcD0iEE1TXhyhcX N5fbeFsAfeVyMw7OpH4QmToI/wjFWHO7VDrny+NRoul/L6x6W/0yvj71PZhqvQxQVmzz nVy4SVfLtsYRC/8NgbBmHeMsqVo6Z896m8PP5SGprHcuvUh/lIai/IuPHb+h2SsBTpmX xStE6YL5EXSH9nioMh3J8LYtlbeiku5SWUdYo8AUcApgk6Ea8BtcLWMAhsd5xU3GVBgn AaFA+7NSdbe8ULtFg74lOhkXxLXIea5oOII2pnf5+eOn5GQxUO/PEI2a8muKO7Cu1LMe Q+3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=+tm3L9Qdu9Qa27/TXUn0VZXyFl0wWgXIWdk1uWUmYXo=; b=ew1ARa1xvcm8VDRw/PBQgNMPep0aJVtdcHs+5tZpRqTRBQlDNReSKEhn7ajIwIPAkH +w3NNOwz10ZEsNorEZFp3hbD080DfypUsCnoj/hp4dBcaH+yMBa8bzXLWEWhAxakY1rA CnWn+YXT9KQsIo7TGNqK3OhmUcKbNOHkmVUWzNWkdinL2h+/1Isnh7bK9BCfWIJDdh1c VN5wK2CKcj0Vhbjcfm+3mx+pxJaXmWQIqoWROFMiDjIhTFoJzGMp4h6kvOAayfzOJcV1 XG7aP7bEFim02VkixLitexhho85hc8w9tTVxf8Yii4jCKfiPgJJQwvVLW0+srrTBaCch AOjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=bro9h2a+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rh24si11748632ejb.283.2020.09.22.20.06.35; Tue, 22 Sep 2020 20:06:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=bro9h2a+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727330AbgIWCtE (ORCPT + 99 others); Tue, 22 Sep 2020 22:49:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726873AbgIWCtD (ORCPT ); Tue, 22 Sep 2020 22:49:03 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8117DC061755; Tue, 22 Sep 2020 19:49:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=+tm3L9Qdu9Qa27/TXUn0VZXyFl0wWgXIWdk1uWUmYXo=; b=bro9h2a+1VJMD+WRmif02SxOCd V6iYwofbx8Scioi6syigtzG1nhgPfDNbKD+0AWCloYcoCtuGqizghQ+7957ITN7duW0ad3Y5qa8VB ihRdtXr+JyqzpvML6OtwnnNFi7N//U0P0vnT9u30mG5Rim8975d7CazXnklVXf3CEerJaHJoGFIz9 4bAUYyK+H0hxgkSUl+ojUHYKyXOM0gcJpmUlOTvCob9VH08ZAXjhQFntUbdjqyBJJLNWrbVYtRIHC OodUmoJ9hZSATDquWRrgE+cTS0iPMrQgoPY77FgQMh9bNrJbF4LdjWkjTbbI5hNKEdsJTwkzAeM1R P9U1wjcA==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kKuq4-0000Po-1g; Wed, 23 Sep 2020 02:49:00 +0000 Date: Wed, 23 Sep 2020 03:48:59 +0100 From: Matthew Wilcox To: Qian Cai Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , Christoph Hellwig , linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, Dave Kleikamp , jfs-discussion@lists.sourceforge.net, Dave Chinner , Stephen Rothwell , linux-next@vger.kernel.org Subject: Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Message-ID: <20200923024859.GM32101@casper.infradead.org> References: <20200910234707.5504-1-willy@infradead.org> <20200910234707.5504-6-willy@infradead.org> <163f852ba12fd9de5dec7c4a2d6b6c7cdb379ebc.camel@redhat.com> <20200922170526.GK32101@casper.infradead.org> <95bd1230f2fcf01f690770eb77696862b8fb607b.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <95bd1230f2fcf01f690770eb77696862b8fb607b.camel@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote: > On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote: > > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote: > > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote: > > > > Size the uptodate array dynamically to support larger pages in the > > > > page cache. With a 64kB page, we're only saving 8 bytes per page today, > > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB > > > > per page. Add a few debugging assertions. > > > > > > > > Signed-off-by: Matthew Wilcox (Oracle) > > > > Reviewed-by: Dave Chinner > > > > > > Some syscall fuzzing will trigger this on powerpc: > > > > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config > > > > > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered- > > > io.c:78 iomap_page_release+0x250/0x270 > > > > Well, I'm glad it triggered. That warning is: > > WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) != > > PageUptodate(page)); > > so there was definitely a problem of some kind. > > > > truncate_cleanup_page() calls > > do_invalidatepage() calls > > iomap_invalidatepage() calls > > iomap_page_release() > > > > Is this the first warning? I'm wondering if maybe there was an I/O error > > earlier which caused PageUptodate to get cleared again. If it's easy to > > reproduce, perhaps you could try something like this? > > > > +void dump_iomap_page(struct page *page, const char *reason) > > +{ > > + struct iomap_page *iop = to_iomap_page(page); > > + unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page); > > + > > + dump_page(page, reason); > > + if (iop) > > + printk("iop:reads %d writes %d uptodate %*pb\n", > > + atomic_read(&iop->read_bytes_pending), > > + atomic_read(&iop->write_bytes_pending), > > + nr_blocks, iop->uptodate); > > + else > > + printk("iop:none\n"); > > +} > > > > and then do something like: > > > > if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page)) > > dump_iomap_page(page, NULL); > > This: > > [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c > [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2" > [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru) > [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8 > [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000 > [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000 > [ 1683.158506][T164965] iop:none Oh, I'm a fool. This is after the call to detach_page_private() so page->private is NULL and we don't get the iop dumped. Nevertheless, this is interesting. Somehow, the page is marked Uptodate, but the bitmap is deemed not full. There are three places where we set an iomap page Uptodate: 1. if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page))) SetPageUptodate(page); 2. if (page_has_private(page)) iomap_iop_set_range_uptodate(page, off, len); else SetPageUptodate(page); 3. BUG_ON(page->index); ... SetPageUptodate(page); It can't be #2 because the page has an iop. It can't be #3 because the page->index is not 0. So at some point in the past, the bitmap was full. I don't think it's possible for inode->i_blksize to change, and you aren't running with THPs, so it's definitely not possible for thp_size() to change. So i_blocks_per_page() isn't going to change. We seem to have allocated enough memory for ->iop because that's also based on i_blocks_per_page(). I'm out of ideas. Maybe I'll wake up with a better idea in the morning. I've been trying to reproduce this on x86 with a 1kB block size filesystem, and haven't been able to yet. Maybe I'll try to setup a powerpc cross-compilation environment tomorrow.