2023-12-21 08:59:36

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules

From: Yu Kuai <[email protected]>

Changes in v3:
- remove bdev_associated_mapping() and patch 12 from v1;
- add kerneldoc comments for new bdev apis;
- rename __bdev_get_folio() to bdev_get_folio;
- fix a problem in erofs that erofs_init_metabuf() is not always
called.
- add reviewed-by tag for patch 15-17;
Changes in v2:
- remove some bdev apis that is not necessary;
- pass in offset for bdev_read_folio() and __bdev_get_folio();
- remove bdev_gfp_constraint() and add a new helper in fs/buffer.c to
prevent access bd_indoe() directly from mapping_gfp_constraint() in
ext4.(patch 15, 16);
- remove block_device_ejected() from ext4.


Patch 1 add some bdev apis, then follow up patches will use these apis
to avoid access bd_inode directly, and hopefully the field bd_inode can
be removed eventually(after figure out a way for fs/buffer.c).

Yu Kuai (17):
block: add some bdev apis
xen/blkback: use bdev api in xen_update_blkif_status()
bcache: use bdev api in read_super()
mtd: block2mtd: use bdev apis
s390/dasd: use bdev api in dasd_format()
scsicam: use bdev api in scsi_bios_ptable()
bcachefs: remove dead function bdev_sectors()
bio: export bio_add_folio_nofail()
btrfs: use bdev apis
cramfs: use bdev apis in cramfs_blkdev_read()
erofs: use bdev api
nilfs2: use bdev api in nilfs_attach_log_writer()
jbd2: use bdev apis
buffer: add a new helper to read sb block
ext4: use new helper to read sb block
ext4: remove block_device_ejected()
ext4: use bdev apis

block/bdev.c | 148 +++++++++++++++++++++++++++++
block/bio.c | 1 +
block/blk.h | 2 -
drivers/block/xen-blkback/xenbus.c | 3 +-
drivers/md/bcache/super.c | 11 +--
drivers/mtd/devices/block2mtd.c | 81 +++++++---------
drivers/s390/block/dasd_ioctl.c | 5 +-
drivers/scsi/scsicam.c | 4 +-
fs/bcachefs/util.h | 5 -
fs/btrfs/disk-io.c | 71 +++++++-------
fs/btrfs/volumes.c | 17 ++--
fs/btrfs/zoned.c | 15 +--
fs/buffer.c | 68 +++++++++----
fs/cramfs/inode.c | 36 +++----
fs/erofs/data.c | 18 ++--
fs/erofs/internal.h | 2 +
fs/ext4/dir.c | 6 +-
fs/ext4/ext4.h | 13 ---
fs/ext4/ext4_jbd2.c | 6 +-
fs/ext4/inode.c | 8 +-
fs/ext4/super.c | 66 +++----------
fs/ext4/symlink.c | 2 +-
fs/jbd2/journal.c | 3 +-
fs/jbd2/recovery.c | 6 +-
fs/nilfs2/segment.c | 2 +-
include/linux/blkdev.h | 17 ++++
include/linux/buffer_head.h | 18 +++-
27 files changed, 377 insertions(+), 257 deletions(-)

--
2.39.2



2023-12-21 08:59:57

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 01/17] block: add some bdev apis

From: Yu Kuai <[email protected]>

Those apis will be used for other modules, so that bd_inode won't be
accessed directly from other modules.

Signed-off-by: Yu Kuai <[email protected]>
---
block/bdev.c | 148 +++++++++++++++++++++++++++++++++++++++++
block/blk.h | 2 -
include/linux/blkdev.h | 17 +++++
3 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 750aec178b6a..6204621c6db6 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -89,6 +89,25 @@ void invalidate_bdev(struct block_device *bdev)
}
EXPORT_SYMBOL(invalidate_bdev);

+/**
+ * invalidate_bdev_pages - Invalidate clean unused buffers and pagecache.
+ * @bdev: the block device which holds the cache to invalidate
+ * @start: the offset 'from' which to invalidate
+ * @end: the offset 'to' which to invalidate (inclusive)
+ *
+ * This function removes pages that are clean, unmapped and unlocked,
+ * as well as shadow entries. It will not block on IO activity.
+ *
+ * If you want to remove all the pages of one block device, regardless of
+ * their use and writeback state, use truncate_bdev_range().
+ */
+void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
+ pgoff_t end)
+{
+ invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
+}
+EXPORT_SYMBOL_GPL(invalidate_bdev_range);
+
/*
* Drop all buffers & page cache for given bdev range. This function bails
* with error if bdev has other exclusive owner (such as filesystem).
@@ -121,6 +140,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
lstart >> PAGE_SHIFT,
lend >> PAGE_SHIFT);
}
+EXPORT_SYMBOL_GPL(truncate_bdev_range);

static void set_init_blocksize(struct block_device *bdev)
{
@@ -1102,3 +1122,131 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)

blkdev_put_no_open(bdev);
}
+
+/**
+ * bdev_read_folio - Read into block device page cache.
+ * @bdev: the block device which holds the cache to read.
+ * @pos: the offset that allocated folio will contain.
+ *
+ * Read one page into the block device page cache. If it succeeds, the folio
+ * returned will contain @pos;
+ *
+ * Return: Uptodate folio on success, ERR_PTR() on failure.
+ */
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
+{
+ return mapping_read_folio_gfp(bdev->bd_inode->i_mapping,
+ pos >> PAGE_SHIFT, GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(bdev_read_folio);
+
+/**
+ * bdev_get_folio - Find and get a reference to a folio.
+ * @bdev: the block device which holds the address_space to search.
+ * @pos: the offset the returned folio will contain.
+ * @fgp_flags: %FGP flags modify how the folio is returned.
+ * @gfp: Memory allocation flags to use if %FGP_CREAT is specified.
+ *
+ * Looks up the page cache entry at @bdev->bd_inode->i_mapping from @pos. If
+ * this function returns a folio, it is returned with an increased refcount.
+ *
+ * Return: The found folio or an ERR_PTR() otherwise.
+ */
+struct folio *bdev_get_folio(struct block_device *bdev, loff_t pos,
+ fgf_t fgp_flags, gfp_t gfp)
+{
+ return __filemap_get_folio(bdev->bd_inode->i_mapping, pos >> PAGE_SHIFT,
+ fgp_flags, gfp);
+}
+EXPORT_SYMBOL_GPL(bdev_get_folio);
+
+/**
+ * bdev_wb_err_check - Has block device writeback error occurred?
+ * @bdev: the block device to check.
+ * @since: Previously-sampled @bdev->bd_inode->i_mapping->wb_err.
+ *
+ * Grab @bdev->bd_inode->i_mapping->wb_err, and see if it has changed @since
+ * the given value was sampled.
+ *
+ * Return: The latest error or 0 if it hasn't changed.
+ */
+int bdev_wb_err_check(struct block_device *bdev, errseq_t since)
+{
+ return errseq_check(&bdev->bd_inode->i_mapping->wb_err, since);
+}
+EXPORT_SYMBOL_GPL(bdev_wb_err_check);
+
+/**
+ * bdev_wb_err_check_and_advance() - Check block device writeback error and
+ * advance to current value.
+ * @bdev: the block device to check;
+ * @since: Pointer to previously-sampled @bdev->bd_inode->i_mapping->wb_err to
+ * check against and advance.
+ *
+ * Grab @bdev->bd_inode->i_mapping->wb_err, and see whether it matches the
+ * value that @since points to. If it does, then just return 0; If it doesn't,
+ * then the value has changed. Set the "seen" flag, and try to swap it into
+ * place as the new eseq value. Then, set that value as the new @since value,
+ * and return whatever the error portion is set to.
+ *
+ * Return: Negative errno if one has been stored, or 0 if no new error has
+ * occurred.
+ */
+int bdev_wb_err_check_and_advance(struct block_device *bdev, errseq_t *since)
+{
+ return errseq_check_and_advance(&bdev->bd_inode->i_mapping->wb_err,
+ since);
+}
+EXPORT_SYMBOL_GPL(bdev_wb_err_check_and_advance);
+
+/**
+ * bdev_balance_dirty_pages_ratelimited - balance dirty memory state.
+ * @bdev: the block device which was dirtied.
+ *
+ * Check the system's dirty state and will initiate writeback if needed.
+ */
+void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
+{
+ return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
+}
+EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);
+
+/**
+ * bdev_async_readahead - readahead for marked block device pages
+ * @bdev: the block device to read.
+ * @ra: file_ra_state which holds the readahead state.
+ * @file: Used by the filesystem for authentication.
+ * @index: Index of first page to be read.
+ * @req_count: Total number of pages being read by the caller.
+ *
+ * Read multiple pages into the block device page cache. The readahead logic may
+ * decide to piggyback more pages onto the read request if access patterns
+ * suggest it will improve performance.
+ */
+void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
+ struct file *file, pgoff_t index,
+ unsigned long req_count)
+{
+ struct file_ra_state tmp_ra = {};
+
+ if (!ra) {
+ ra = &tmp_ra;
+ file_ra_state_init(ra, bdev->bd_inode->i_mapping);
+ }
+ page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index,
+ req_count);
+}
+EXPORT_SYMBOL_GPL(bdev_sync_readahead);
+
+/**
+ * bdev_attach_wb - associate an block device with its wb
+ * @bdev: block device of interest
+ *
+ * If @bdev->bd_inode doesn't have its wb, associate it with the wb matching the
+ * %current.
+ */
+void bdev_attach_wb(struct block_device *bdev)
+{
+ inode_attach_wb(bdev->bd_inode, NULL);
+}
+EXPORT_SYMBOL_GPL(bdev_attach_wb);
diff --git a/block/blk.h b/block/blk.h
index 1ef920f72e0f..ce5fcd927eaa 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -465,8 +465,6 @@ extern struct device_attribute dev_attr_events_poll_msecs;
extern struct attribute_group blk_trace_attr_group;

blk_mode_t file_to_blk_mode(struct file *file);
-int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
- loff_t lstart, loff_t lend);
long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bc236e77d85e..ccac7d32bb86 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
#include <linux/sbitmap.h>
#include <linux/uuid.h>
#include <linux/xarray.h>
+#include <linux/pagemap.h>

struct module;
struct request_queue;
@@ -1475,6 +1476,22 @@ struct block_device *blkdev_get_no_open(dev_t dev);
void blkdev_put_no_open(struct block_device *bdev);

struct block_device *I_BDEV(struct inode *inode);
+void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
+ pgoff_t end);
+int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
+ loff_t lstart, loff_t lend);
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos);
+struct folio *bdev_get_folio(struct block_device *bdev, loff_t pos,
+ fgf_t fgp_flags, gfp_t gfp);
+int bdev_wb_err_check(struct block_device *bdev, errseq_t since);
+int bdev_wb_err_check_and_advance(struct block_device *bdev, errseq_t *since);
+void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev);
+void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
+ struct file *file, pgoff_t index,
+ unsigned long req_count);
+void bdev_attach_wb(struct block_device *bdev);
+void bdev_associated_mapping(struct block_device *bdev,
+ struct address_space *mapping);

#ifdef CONFIG_BLOCK
void invalidate_bdev(struct block_device *bdev);
--
2.39.2


2023-12-21 09:00:46

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/block/xen-blkback/xenbus.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e34219ea2b05..e645afa4af57 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
xenbus_dev_error(blkif->be->dev, err, "block flush");
return;
}
- invalidate_inode_pages2(
- blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
+ invalidate_bdev(blkif->vbd.bdev_handle->bdev);

for (i = 0; i < blkif->nr_rings; i++) {
ring = &blkif->rings[i];
--
2.39.2


2023-12-21 09:01:08

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 03/17] bcache: use bdev api in read_super()

From: Yu Kuai <[email protected]>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/md/bcache/super.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index bfe1685dbae5..23892b32c582 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -168,14 +168,13 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
{
const char *err;
struct cache_sb_disk *s;
- struct page *page;
+ struct folio *folio;
unsigned int i;

- page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
- SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
- if (IS_ERR(page))
+ folio = bdev_read_folio(bdev, SB_OFFSET);
+ if (IS_ERR(folio))
return "IO error";
- s = page_address(page) + offset_in_page(SB_OFFSET);
+ s = folio_address(folio) + offset_in_folio(folio, SB_OFFSET);

sb->offset = le64_to_cpu(s->offset);
sb->version = le64_to_cpu(s->version);
@@ -272,7 +271,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
*res = s;
return NULL;
err:
- put_page(page);
+ folio_put(folio);
return err;
}

--
2.39.2


2023-12-21 09:01:54

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis

From: Yu Kuai <[email protected]>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/mtd/devices/block2mtd.c | 81 +++++++++++++++------------------
1 file changed, 36 insertions(+), 45 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index aa44a23ec045..cf201bf73184 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -46,40 +46,34 @@ struct block2mtd_dev {
/* Static info about the MTD, used in cleanup_module */
static LIST_HEAD(blkmtd_device_list);

-
-static struct page *page_read(struct address_space *mapping, pgoff_t index)
-{
- return read_mapping_page(mapping, index, NULL);
-}
-
/* erase a specified part of the device */
static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
{
- struct address_space *mapping =
- dev->bdev_handle->bdev->bd_inode->i_mapping;
- struct page *page;
+ struct block_device *bdev = dev->bdev_handle->bdev;
+ struct folio *folio;
pgoff_t index = to >> PAGE_SHIFT; // page index
int pages = len >> PAGE_SHIFT;
u_long *p;
u_long *max;

while (pages) {
- page = page_read(mapping, index);
- if (IS_ERR(page))
- return PTR_ERR(page);
+ folio = bdev_read_folio(bdev, index << PAGE_SHIFT);
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);

- max = page_address(page) + PAGE_SIZE;
- for (p=page_address(page); p<max; p++)
+ max = folio_address(folio) + folio_size(folio);
+ for (p = folio_address(folio); p < max; p++)
if (*p != -1UL) {
- lock_page(page);
- memset(page_address(page), 0xff, PAGE_SIZE);
- set_page_dirty(page);
- unlock_page(page);
- balance_dirty_pages_ratelimited(mapping);
+ folio_lock(folio);
+ memset(folio_address(folio), 0xff,
+ folio_size(folio));
+ folio_mark_dirty(folio);
+ folio_unlock(folio);
+ bdev_balance_dirty_pages_ratelimited(bdev);
break;
}

- put_page(page);
+ folio_put(folio);
pages--;
index++;
}
@@ -106,9 +100,7 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
size_t *retlen, u_char *buf)
{
struct block2mtd_dev *dev = mtd->priv;
- struct address_space *mapping =
- dev->bdev_handle->bdev->bd_inode->i_mapping;
- struct page *page;
+ struct folio *folio;
pgoff_t index = from >> PAGE_SHIFT;
int offset = from & (PAGE_SIZE-1);
int cpylen;
@@ -120,12 +112,13 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
cpylen = len; // this page
len = len - cpylen;

- page = page_read(mapping, index);
- if (IS_ERR(page))
- return PTR_ERR(page);
+ folio = bdev_read_folio(dev->bdev_handle->bdev,
+ index << PAGE_SHIFT);
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);

- memcpy(buf, page_address(page) + offset, cpylen);
- put_page(page);
+ memcpy(buf, folio_address(folio) + offset, cpylen);
+ folio_put(folio);

if (retlen)
*retlen += cpylen;
@@ -141,9 +134,8 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
loff_t to, size_t len, size_t *retlen)
{
- struct page *page;
- struct address_space *mapping =
- dev->bdev_handle->bdev->bd_inode->i_mapping;
+ struct block_device *bdev = dev->bdev_handle->bdev;
+ struct folio *folio;
pgoff_t index = to >> PAGE_SHIFT; // page index
int offset = to & ~PAGE_MASK; // page offset
int cpylen;
@@ -155,18 +147,18 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
cpylen = len; // this page
len = len - cpylen;

- page = page_read(mapping, index);
- if (IS_ERR(page))
- return PTR_ERR(page);
+ folio = bdev_read_folio(bdev, index << PAGE_SHIFT);
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);

- if (memcmp(page_address(page)+offset, buf, cpylen)) {
- lock_page(page);
- memcpy(page_address(page) + offset, buf, cpylen);
- set_page_dirty(page);
- unlock_page(page);
- balance_dirty_pages_ratelimited(mapping);
+ if (memcmp(folio_address(folio) + offset, buf, cpylen)) {
+ folio_lock(folio);
+ memcpy(folio_address(folio) + offset, buf, cpylen);
+ folio_mark_dirty(folio);
+ folio_unlock(folio);
+ bdev_balance_dirty_pages_ratelimited(bdev);
}
- put_page(page);
+ folio_put(folio);

if (retlen)
*retlen += cpylen;
@@ -211,8 +203,7 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
kfree(dev->mtd.name);

if (dev->bdev_handle) {
- invalidate_mapping_pages(
- dev->bdev_handle->bdev->bd_inode->i_mapping, 0, -1);
+ invalidate_bdev(dev->bdev_handle->bdev);
bdev_release(dev->bdev_handle);
}

@@ -295,7 +286,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
goto err_free_block2mtd;
}

- if ((long)bdev->bd_inode->i_size % erase_size) {
+ if (bdev_nr_bytes(bdev) % erase_size) {
pr_err("erasesize must be a divisor of device size\n");
goto err_free_block2mtd;
}
@@ -313,7 +304,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,

dev->mtd.name = name;

- dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
+ dev->mtd.size = bdev_nr_bytes(bdev) & PAGE_MASK;
dev->mtd.erasesize = erase_size;
dev->mtd.writesize = 1;
dev->mtd.writebufsize = PAGE_SIZE;
--
2.39.2


2023-12-21 09:02:26

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 05/17] s390/dasd: use bdev api in dasd_format()

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/s390/block/dasd_ioctl.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 61b9675e2a67..bbfb958237e6 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -221,8 +221,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
* enabling the device later.
*/
if (fdata->start_unit == 0) {
- block->gdp->part0->bd_inode->i_blkbits =
- blksize_bits(fdata->blksize);
+ rc = set_blocksize(block->gdp->part0, fdata->blksize);
+ if (rc)
+ return rc;
}

rc = base->discipline->format_device(base, fdata, 1);
--
2.39.2


2023-12-21 09:02:39

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 06/17] scsicam: use bdev api in scsi_bios_ptable()

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <[email protected]>
---
drivers/scsi/scsicam.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index e2c7d8ef205f..9617d70c0ed1 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -32,11 +32,9 @@
*/
unsigned char *scsi_bios_ptable(struct block_device *dev)
{
- struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
unsigned char *res = NULL;
- struct folio *folio;
+ struct folio *folio = bdev_read_folio(bdev_whole(dev), 0);

- folio = read_mapping_folio(mapping, 0, NULL);
if (IS_ERR(folio))
return NULL;

--
2.39.2


2023-12-21 09:03:58

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 08/17] bio: export bio_add_folio_nofail()

From: Yu Kuai <[email protected]>

Currently btrfs is using __bio_add_page() in write_dev_supers(). In order
to convert to use folio for bdev in btrfs, export bio_add_folio_nofail()
so that it can replace __bio_add_page().

Signed-off-by: Yu Kuai <[email protected]>
---
block/bio.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/block/bio.c b/block/bio.c
index b9642a41f286..c7459839ca40 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1122,6 +1122,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
WARN_ON_ONCE(off > UINT_MAX);
__bio_add_page(bio, &folio->page, len, off);
}
+EXPORT_SYMBOL_GPL(bio_add_folio_nofail);

/**
* bio_add_folio - Attempt to add part of a folio to a bio.
--
2.39.2


2023-12-21 09:05:57

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <[email protected]>
---
fs/erofs/data.c | 18 ++++++++++++------
fs/erofs/internal.h | 2 ++
2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index c98aeda8abb2..bbe2fe199bf3 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
enum erofs_kmap_type type)
{
- struct inode *inode = buf->inode;
- erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
+ u8 blkszbits = buf->inode ? buf->inode->i_blkbits : buf->blkszbits;
+ erofs_off_t offset = (erofs_off_t)blkaddr << blkszbits;
pgoff_t index = offset >> PAGE_SHIFT;
struct page *page = buf->page;
struct folio *folio;
@@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
erofs_put_metabuf(buf);

nofs_flag = memalloc_nofs_save();
- folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
+ folio = buf->inode ?
+ read_mapping_folio(buf->inode->i_mapping, index, NULL) :
+ bdev_read_folio(buf->bdev, offset);
memalloc_nofs_restore(nofs_flag);
if (IS_ERR(folio))
return folio;
@@ -67,10 +69,14 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,

void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
{
- if (erofs_is_fscache_mode(sb))
+ if (erofs_is_fscache_mode(sb)) {
buf->inode = EROFS_SB(sb)->s_fscache->inode;
- else
- buf->inode = sb->s_bdev->bd_inode;
+ buf->bdev = NULL;
+ } else {
+ buf->inode = NULL;
+ buf->bdev = sb->s_bdev;
+ buf->blkszbits = EROFS_SB(sb)->blkszbits;
+ }
}

void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b0409badb017..c9206351b485 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -224,8 +224,10 @@ enum erofs_kmap_type {

struct erofs_buf {
struct inode *inode;
+ struct block_device *bdev;
struct page *page;
void *base;
+ u8 blkszbits;
enum erofs_kmap_type kmap_type;
};
#define __EROFS_BUF_INITIALIZER ((struct erofs_buf){ .page = NULL })
--
2.39.2


2023-12-21 09:06:54

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <[email protected]>
---
fs/jbd2/journal.c | 3 +--
fs/jbd2/recovery.c | 6 ++----
2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ed53188472f9..f1b5ffeaf02a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2003,8 +2003,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
byte_count = (block_stop - block_start + 1) *
journal->j_blocksize;

- truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
- byte_start, byte_stop);
+ truncate_bdev_range(journal->j_dev, 0, byte_start, byte_stop);

if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
err = blkdev_issue_discard(journal->j_dev,
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 01f744cb97a4..6b6a2c4585fa 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -290,7 +290,6 @@ int jbd2_journal_recover(journal_t *journal)

struct recovery_info info;
errseq_t wb_err;
- struct address_space *mapping;

memset(&info, 0, sizeof(info));
sb = journal->j_superblock;
@@ -309,8 +308,7 @@ int jbd2_journal_recover(journal_t *journal)
}

wb_err = 0;
- mapping = journal->j_fs_dev->bd_inode->i_mapping;
- errseq_check_and_advance(&mapping->wb_err, &wb_err);
+ bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
err = do_one_pass(journal, &info, PASS_SCAN);
if (!err)
err = do_one_pass(journal, &info, PASS_REVOKE);
@@ -334,7 +332,7 @@ int jbd2_journal_recover(journal_t *journal)
err2 = sync_blockdev(journal->j_fs_dev);
if (!err)
err = err2;
- err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
+ err2 = bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
if (!err)
err = err2;
/* Make sure all replayed data is on permanent storage */
--
2.39.2


2023-12-21 09:07:44

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block

From: Yu Kuai <[email protected]>

Unlike __bread_gfp(), ext4 has special handing while reading sb block:

1) __GFP_NOFAIL is not set, and memory allocation can fail;
2) If buffer write failed before, set buffer uptodate and don't read
block from disk;
3) REQ_META is set for all IO, and REQ_PRIO is set for reading xattr;
4) If failed, return error ptr instead of NULL;

This patch add a new helper __bread_gfp2() that will match above 2 and 3(
1 will be used, and 4 will still be encapsulated by ext4), and prepare to
prevent calling mapping_gfp_constraint() directly on bd_inode->i_mapping
in ext4.

Signed-off-by: Yu Kuai <[email protected]>
---
fs/buffer.c | 68 ++++++++++++++++++++++++++-----------
include/linux/buffer_head.h | 18 +++++++++-
2 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 967f34b70aa8..188bd36c9fea 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1255,16 +1255,19 @@ void __bforget(struct buffer_head *bh)
}
EXPORT_SYMBOL(__bforget);

-static struct buffer_head *__bread_slow(struct buffer_head *bh)
+static struct buffer_head *__bread_slow(struct buffer_head *bh,
+ blk_opf_t op_flags,
+ bool check_write_error)
{
lock_buffer(bh);
- if (buffer_uptodate(bh)) {
+ if (buffer_uptodate(bh) ||
+ (check_write_error && buffer_uptodate_or_error(bh))) {
unlock_buffer(bh);
return bh;
} else {
get_bh(bh);
bh->b_end_io = end_buffer_read_sync;
- submit_bh(REQ_OP_READ, bh);
+ submit_bh(REQ_OP_READ | op_flags, bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return bh;
@@ -1445,6 +1448,31 @@ void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
}
EXPORT_SYMBOL(__breadahead);

+static struct buffer_head *
+bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
+ blk_opf_t op_flags, gfp_t gfp, bool check_write_error)
+{
+ struct buffer_head *bh;
+
+ gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+
+ /*
+ * Prefer looping in the allocator rather than here, at least that
+ * code knows what it's doing.
+ */
+ gfp |= __GFP_NOFAIL;
+
+ bh = bdev_getblk(bdev, block, size, gfp);
+ if (unlikely(!bh))
+ return NULL;
+
+ if (buffer_uptodate(bh) ||
+ (check_write_error && buffer_uptodate_or_error(bh)))
+ return bh;
+
+ return __bread_slow(bh, op_flags, check_write_error);
+}
+
/**
* __bread_gfp() - reads a specified block and returns the bh
* @bdev: the block_device to read from
@@ -1458,27 +1486,27 @@ EXPORT_SYMBOL(__breadahead);
* It returns NULL if the block was unreadable.
*/
struct buffer_head *
-__bread_gfp(struct block_device *bdev, sector_t block,
- unsigned size, gfp_t gfp)
+__bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
+ gfp_t gfp)
{
- struct buffer_head *bh;
-
- gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
-
- /*
- * Prefer looping in the allocator rather than here, at least that
- * code knows what it's doing.
- */
- gfp |= __GFP_NOFAIL;
-
- bh = bdev_getblk(bdev, block, size, gfp);
-
- if (likely(bh) && !buffer_uptodate(bh))
- bh = __bread_slow(bh);
- return bh;
+ return bread_gfp(bdev, block, size, 0, gfp, false);
}
EXPORT_SYMBOL(__bread_gfp);

+/*
+ * This works like __bread_gfp() except:
+ * 1) If buffer write failed before, set buffer uptodate and don't read
+ * block from disk;
+ * 2) Caller can pass in additional op_flags like REQ_META;
+ */
+struct buffer_head *
+__bread_gfp2(struct block_device *bdev, sector_t block, unsigned int size,
+ blk_opf_t op_flags, gfp_t gfp)
+{
+ return bread_gfp(bdev, block, size, op_flags, gfp, true);
+}
+EXPORT_SYMBOL(__bread_gfp2);
+
static void __invalidate_bh_lrus(struct bh_lru *b)
{
int i;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 5f23ee599889..751b2744b4ae 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -171,6 +171,18 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh)
return test_bit_acquire(BH_Uptodate, &bh->b_state);
}

+static __always_inline int buffer_uptodate_or_error(struct buffer_head *bh)
+{
+ /*
+ * If the buffer has the write error flag, data was failed to write
+ * out in the block. In this case, set buffer uptodate to prevent
+ * reading old data.
+ */
+ if (buffer_write_io_error(bh))
+ set_buffer_uptodate(bh);
+ return buffer_uptodate(bh);
+}
+
static inline unsigned long bh_offset(const struct buffer_head *bh)
{
return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1);
@@ -231,7 +243,11 @@ void __brelse(struct buffer_head *);
void __bforget(struct buffer_head *);
void __breadahead(struct block_device *, sector_t block, unsigned int size);
struct buffer_head *__bread_gfp(struct block_device *,
- sector_t block, unsigned size, gfp_t gfp);
+ sector_t block, unsigned int size, gfp_t gfp);
+struct buffer_head *__bread_gfp2(struct block_device *bdev, sector_t block,
+ unsigned int size, blk_opf_t op_flags,
+ gfp_t gfp);
+
struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
void free_buffer_head(struct buffer_head * bh);
void unlock_buffer(struct buffer_head *bh);
--
2.39.2


2023-12-21 09:08:28

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 15/17] ext4: use new helper to read sb block

From: Yu Kuai <[email protected]>

Remove __ext4_sb_bread_gfp() and ext4_buffer_uptodate() that is defined
by ext4, and convert to use common helper __bread_gfp2() and
buffer_uptodate_or_error().

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 13 -------------
fs/ext4/inode.c | 8 ++++----
fs/ext4/super.c | 45 ++++++++++-----------------------------------
fs/ext4/symlink.c | 2 +-
4 files changed, 15 insertions(+), 53 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a5d784872303..8377f6c5264f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3824,19 +3824,6 @@ extern const struct iomap_ops ext4_iomap_ops;
extern const struct iomap_ops ext4_iomap_overwrite_ops;
extern const struct iomap_ops ext4_iomap_report_ops;

-static inline int ext4_buffer_uptodate(struct buffer_head *bh)
-{
- /*
- * If the buffer has the write error flag, we have failed
- * to write out data in the block. In this case, we don't
- * have to read the block because we may read the old data
- * successfully.
- */
- if (buffer_write_io_error(bh))
- set_buffer_uptodate(bh);
- return buffer_uptodate(bh);
-}
-
#endif /* __KERNEL__ */

#define EFSBADCRC EBADMSG /* Bad CRC detected */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 61277f7f8722..efb0af6f02f7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -887,7 +887,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
bh = ext4_getblk(handle, inode, block, map_flags);
if (IS_ERR(bh))
return bh;
- if (!bh || ext4_buffer_uptodate(bh))
+ if (!bh || buffer_uptodate_or_error(bh))
return bh;

ret = ext4_read_bh_lock(bh, REQ_META | REQ_PRIO, true);
@@ -915,7 +915,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,

for (i = 0; i < bh_count; i++)
/* Note that NULL bhs[i] is valid because of holes. */
- if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
+ if (bhs[i] && !buffer_uptodate_or_error(bhs[i]))
ext4_read_bh_lock(bhs[i], REQ_META | REQ_PRIO, false);

if (!wait)
@@ -4392,11 +4392,11 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
bh = sb_getblk(sb, block);
if (unlikely(!bh))
return -ENOMEM;
- if (ext4_buffer_uptodate(bh))
+ if (buffer_uptodate_or_error(bh))
goto has_buffer;

lock_buffer(bh);
- if (ext4_buffer_uptodate(bh)) {
+ if (buffer_uptodate_or_error(bh)) {
/* Someone brought it uptodate while we waited */
unlock_buffer(bh);
goto has_buffer;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c5fcf377ab1f..3f07eaa33332 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -180,7 +180,7 @@ void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags,
{
BUG_ON(!buffer_locked(bh));

- if (ext4_buffer_uptodate(bh)) {
+ if (buffer_uptodate_or_error(bh)) {
unlock_buffer(bh);
return;
}
@@ -191,7 +191,7 @@ int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags, bh_end_io_t *end_io
{
BUG_ON(!buffer_locked(bh));

- if (ext4_buffer_uptodate(bh)) {
+ if (buffer_uptodate_or_error(bh)) {
unlock_buffer(bh);
return 0;
}
@@ -214,49 +214,24 @@ int ext4_read_bh_lock(struct buffer_head *bh, blk_opf_t op_flags, bool wait)
return ext4_read_bh(bh, op_flags, NULL);
}

-/*
- * This works like __bread_gfp() except it uses ERR_PTR for error
- * returns. Currently with sb_bread it's impossible to distinguish
- * between ENOMEM and EIO situations (since both result in a NULL
- * return.
- */
-static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
- sector_t block,
- blk_opf_t op_flags, gfp_t gfp)
-{
- struct buffer_head *bh;
- int ret;
-
- bh = sb_getblk_gfp(sb, block, gfp);
- if (bh == NULL)
- return ERR_PTR(-ENOMEM);
- if (ext4_buffer_uptodate(bh))
- return bh;
-
- ret = ext4_read_bh_lock(bh, REQ_META | op_flags, true);
- if (ret) {
- put_bh(bh);
- return ERR_PTR(ret);
- }
- return bh;
-}
-
struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
blk_opf_t op_flags)
{
- gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
- ~__GFP_FS) | __GFP_MOVABLE;
+ struct buffer_head *bh = __bread_gfp2(sb->s_bdev, block,
+ sb->s_blocksize,
+ REQ_META | op_flags,
+ __GFP_MOVABLE);

- return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
+ return bh ? bh : ERR_PTR(-EIO);
}

struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
sector_t block)
{
- gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
- ~__GFP_FS);
+ struct buffer_head *bh = __bread_gfp2(sb->s_bdev, block,
+ sb->s_blocksize, REQ_META, 0);

- return __ext4_sb_bread_gfp(sb, block, 0, gfp);
+ return bh ? bh : ERR_PTR(-EIO);
}

void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 75bf1f88843c..49e918221aac 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -94,7 +94,7 @@ static const char *ext4_get_link(struct dentry *dentry, struct inode *inode,
bh = ext4_getblk(NULL, inode, 0, EXT4_GET_BLOCKS_CACHED_NOWAIT);
if (IS_ERR(bh))
return ERR_CAST(bh);
- if (!bh || !ext4_buffer_uptodate(bh))
+ if (!bh || !buffer_uptodate_or_error(bh))
return ERR_PTR(-ECHILD);
} else {
bh = ext4_bread(NULL, inode, 0, 0);
--
2.39.2


2023-12-21 09:09:23

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 16/17] ext4: remove block_device_ejected()

From: Yu Kuai <[email protected]>

block_device_ejected() is added by commit bdfe0cbd746a ("Revert
"ext4: remove block_device_ejected"") in 2015. At that time 'bdi->wb'
is destroyed synchronized from del_gendisk(), hence if ext4 is still
mounted, and then mark_buffer_dirty() will reference destroyed 'wb'.
However, such problem doesn't exist anymore:

- commit d03f6cdc1fc4 ("block: Dynamically allocate and refcount
backing_dev_info") switch bdi to use refcounting;
- commit 13eec2363ef0 ("fs: Get proper reference for s_bdi"), will grab
additional reference of bdi while mounting, so that 'bdi->wb' will not
be destroyed until generic_shutdown_super().

Hence remove this dead function block_device_ejected().

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/ext4/super.c | 18 ------------------
1 file changed, 18 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3f07eaa33332..a7935edbd7b1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -467,22 +467,6 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
}

-/*
- * The del_gendisk() function uninitializes the disk-specific data
- * structures, including the bdi structure, without telling anyone
- * else. Once this happens, any attempt to call mark_buffer_dirty()
- * (for example, by ext4_commit_super), will cause a kernel OOPS.
- * This is a kludge to prevent these oops until we can put in a proper
- * hook in del_gendisk() to inform the VFS and file system layers.
- */
-static int block_device_ejected(struct super_block *sb)
-{
- struct inode *bd_inode = sb->s_bdev->bd_inode;
- struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
-
- return bdi->dev == NULL;
-}
-
static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
{
struct super_block *sb = journal->j_private;
@@ -6162,8 +6146,6 @@ static int ext4_commit_super(struct super_block *sb)

if (!sbh)
return -EINVAL;
- if (block_device_ejected(sb))
- return -ENODEV;

ext4_update_super(sb);

--
2.39.2


2023-12-21 09:09:25

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer()

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <[email protected]>
---
fs/nilfs2/segment.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 55e31cc903d1..a1130e384937 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2823,7 +2823,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (!nilfs->ns_writer)
return -ENOMEM;

- inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
+ bdev_attach_wb(nilfs->ns_bdev);

err = nilfs_segctor_start_thread(nilfs->ns_writer);
if (unlikely(err))
--
2.39.2


2023-12-21 09:09:58

by Yu Kuai

[permalink] [raw]
Subject: [PATCH RFC v3 for-6.8/block 17/17] ext4: use bdev apis

From: Yu Kuai <[email protected]>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/dir.c | 6 ++----
fs/ext4/ext4_jbd2.c | 6 +++---
fs/ext4/super.c | 3 +--
3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3985f8c33f95..64e35eb6a324 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -191,10 +191,8 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
pgoff_t index = map.m_pblk >>
(PAGE_SHIFT - inode->i_blkbits);
if (!ra_has_index(&file->f_ra, index))
- page_cache_sync_readahead(
- sb->s_bdev->bd_inode->i_mapping,
- &file->f_ra, file,
- index, 1);
+ bdev_sync_readahead(sb->s_bdev, &file->f_ra,
+ file, index, 1);
file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
bh = ext4_bread(NULL, inode, map.m_lblk, 0);
if (IS_ERR(bh)) {
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index d1a2e6624401..c1bf3a00fad9 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -206,7 +206,6 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,

static void ext4_check_bdev_write_error(struct super_block *sb)
{
- struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
struct ext4_sb_info *sbi = EXT4_SB(sb);
int err;

@@ -216,9 +215,10 @@ static void ext4_check_bdev_write_error(struct super_block *sb)
* we could read old data from disk and write it out again, which
* may lead to on-disk filesystem inconsistency.
*/
- if (errseq_check(&mapping->wb_err, READ_ONCE(sbi->s_bdev_wb_err))) {
+ if (bdev_wb_err_check(sb->s_bdev, READ_ONCE(sbi->s_bdev_wb_err))) {
spin_lock(&sbi->s_bdev_wb_lock);
- err = errseq_check_and_advance(&mapping->wb_err, &sbi->s_bdev_wb_err);
+ err = bdev_wb_err_check_and_advance(sb->s_bdev,
+ &sbi->s_bdev_wb_err);
spin_unlock(&sbi->s_bdev_wb_lock);
if (err)
ext4_error_err(sb, -err,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a7935edbd7b1..25c3d2ac8559 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5544,8 +5544,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* used to detect the metadata async write error.
*/
spin_lock_init(&sbi->s_bdev_wb_lock);
- errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
- &sbi->s_bdev_wb_err);
+ bdev_wb_err_check_and_advance(sb->s_bdev, &sbi->s_bdev_wb_err);
EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
ext4_orphan_cleanup(sb, es);
EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS;
--
2.39.2


2024-01-04 11:06:53

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()

On Thu 21-12-23 16:56:57, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_devcie.
>
> Signed-off-by: Yu Kuai <[email protected]>
> ---
> drivers/block/xen-blkback/xenbus.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index e34219ea2b05..e645afa4af57 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
> xenbus_dev_error(blkif->be->dev, err, "block flush");
> return;
> }
> - invalidate_inode_pages2(
> - blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
> + invalidate_bdev(blkif->vbd.bdev_handle->bdev);

This function uses invalidate_inode_pages2() while invalidate_bdev() ends
up using mapping_try_invalidate() and there are subtle behavioral
differences between these two (for example invalidate_inode_pages2() tries
to clean dirty pages using the ->launder_folio method). So I think you'll
need helper like invalidate_bdev2() for this.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-04 11:29:23

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis

On Thu 21-12-23 16:56:59, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> On the one hand covert to use folio while reading bdev inode, on the
> other hand prevent to access bd_inode directly.
>
> Signed-off-by: Yu Kuai <[email protected]>
...
> + for (p = folio_address(folio); p < max; p++)
> if (*p != -1UL) {
> - lock_page(page);
> - memset(page_address(page), 0xff, PAGE_SIZE);
> - set_page_dirty(page);
> - unlock_page(page);
> - balance_dirty_pages_ratelimited(mapping);
> + folio_lock(folio);
> + memset(folio_address(folio), 0xff,
> + folio_size(folio));
> + folio_mark_dirty(folio);
> + folio_unlock(folio);
> + bdev_balance_dirty_pages_ratelimited(bdev);

Rather then creating this bdev_balance_dirty_pages_ratelimited() just for
MTD perhaps we can have here (and in other functions):

...
mapping = folio_mapping(folio);
folio_unlock(folio);
if (mapping)
balance_dirty_pages_ratelimited(mapping);

What do you think? Because when we are working with the folios it is rather
natural to use their mapping for dirty balancing?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-04 12:02:32

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api

On Thu 21-12-23 16:58:26, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_device.
>
> Signed-off-by: Yu Kuai <[email protected]>

I'm not erofs maintainer but IMO this is quite ugly and grows erofs_buf
unnecessarily. I'd rather store 'sb' pointer in erofs_buf and then do the
right thing in erofs_bread() which is the only place that seems to care
about the erofs_is_fscache_mode() distinction... Also blkszbits is then
trivially sb->s_blocksize_bits so it would all seem much more
straightforward.

Honza

> ---
> fs/erofs/data.c | 18 ++++++++++++------
> fs/erofs/internal.h | 2 ++
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index c98aeda8abb2..bbe2fe199bf3 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
> void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
> enum erofs_kmap_type type)
> {
> - struct inode *inode = buf->inode;
> - erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
> + u8 blkszbits = buf->inode ? buf->inode->i_blkbits : buf->blkszbits;
> + erofs_off_t offset = (erofs_off_t)blkaddr << blkszbits;
> pgoff_t index = offset >> PAGE_SHIFT;
> struct page *page = buf->page;
> struct folio *folio;
> @@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
> erofs_put_metabuf(buf);
>
> nofs_flag = memalloc_nofs_save();
> - folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
> + folio = buf->inode ?
> + read_mapping_folio(buf->inode->i_mapping, index, NULL) :
> + bdev_read_folio(buf->bdev, offset);
> memalloc_nofs_restore(nofs_flag);
> if (IS_ERR(folio))
> return folio;
> @@ -67,10 +69,14 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>
> void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
> {
> - if (erofs_is_fscache_mode(sb))
> + if (erofs_is_fscache_mode(sb)) {
> buf->inode = EROFS_SB(sb)->s_fscache->inode;
> - else
> - buf->inode = sb->s_bdev->bd_inode;
> + buf->bdev = NULL;
> + } else {
> + buf->inode = NULL;
> + buf->bdev = sb->s_bdev;
> + buf->blkszbits = EROFS_SB(sb)->blkszbits;
> + }
> }
>
> void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index b0409badb017..c9206351b485 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -224,8 +224,10 @@ enum erofs_kmap_type {
>
> struct erofs_buf {
> struct inode *inode;
> + struct block_device *bdev;
> struct page *page;
> void *base;
> + u8 blkszbits;
> enum erofs_kmap_type kmap_type;
> };
> #define __EROFS_BUF_INITIALIZER ((struct erofs_buf){ .page = NULL })
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-04 12:12:24

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis

On Thu 21-12-23 16:58:46, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_device.
>
> Signed-off-by: Yu Kuai <[email protected]>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

But note there are changes pending to this code for the coming merge window
so you'll have to rebase...

Honza

> ---
> fs/jbd2/journal.c | 3 +--
> fs/jbd2/recovery.c | 6 ++----
> 2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index ed53188472f9..f1b5ffeaf02a 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2003,8 +2003,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
> byte_count = (block_stop - block_start + 1) *
> journal->j_blocksize;
>
> - truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
> - byte_start, byte_stop);
> + truncate_bdev_range(journal->j_dev, 0, byte_start, byte_stop);
>
> if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
> err = blkdev_issue_discard(journal->j_dev,
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 01f744cb97a4..6b6a2c4585fa 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -290,7 +290,6 @@ int jbd2_journal_recover(journal_t *journal)
>
> struct recovery_info info;
> errseq_t wb_err;
> - struct address_space *mapping;
>
> memset(&info, 0, sizeof(info));
> sb = journal->j_superblock;
> @@ -309,8 +308,7 @@ int jbd2_journal_recover(journal_t *journal)
> }
>
> wb_err = 0;
> - mapping = journal->j_fs_dev->bd_inode->i_mapping;
> - errseq_check_and_advance(&mapping->wb_err, &wb_err);
> + bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
> err = do_one_pass(journal, &info, PASS_SCAN);
> if (!err)
> err = do_one_pass(journal, &info, PASS_REVOKE);
> @@ -334,7 +332,7 @@ int jbd2_journal_recover(journal_t *journal)
> err2 = sync_blockdev(journal->j_fs_dev);
> if (!err)
> err = err2;
> - err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
> + err2 = bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
> if (!err)
> err = err2;
> /* Make sure all replayed data is on permanent storage */
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-04 12:19:39

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()

Hi, Jan!

?? 2024/01/04 19:06, Jan Kara д??:
> On Thu 21-12-23 16:56:57, Yu Kuai wrote:
>> From: Yu Kuai <[email protected]>
>>
>> Avoid to access bd_inode directly, prepare to remove bd_inode from
>> block_devcie.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>> ---
>> drivers/block/xen-blkback/xenbus.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index e34219ea2b05..e645afa4af57 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>> xenbus_dev_error(blkif->be->dev, err, "block flush");
>> return;
>> }
>> - invalidate_inode_pages2(
>> - blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
>> + invalidate_bdev(blkif->vbd.bdev_handle->bdev);
>
> This function uses invalidate_inode_pages2() while invalidate_bdev() ends
> up using mapping_try_invalidate() and there are subtle behavioral
> differences between these two (for example invalidate_inode_pages2() tries
> to clean dirty pages using the ->launder_folio method). So I think you'll
> need helper like invalidate_bdev2() for this.

Thanks for reviewing this patch, I know the differenct between then,
what I don't understand is that why using invalidate_inode_pages2()
here. sync_blockdev() is just called and 0 is returned, I think in this
case it's safe to call invalidate_bdev() directly, or am I missing
other things?

Thanks,
Kuai

>
> Honza
>


2024-01-04 12:22:36

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block

On Thu 21-12-23 16:58:53, Yu Kuai wrote:
> From: Yu Kuai <[email protected]>
>
> Unlike __bread_gfp(), ext4 has special handing while reading sb block:
>
> 1) __GFP_NOFAIL is not set, and memory allocation can fail;
> 2) If buffer write failed before, set buffer uptodate and don't read
> block from disk;
> 3) REQ_META is set for all IO, and REQ_PRIO is set for reading xattr;
> 4) If failed, return error ptr instead of NULL;
>
> This patch add a new helper __bread_gfp2() that will match above 2 and 3(
> 1 will be used, and 4 will still be encapsulated by ext4), and prepare to
> prevent calling mapping_gfp_constraint() directly on bd_inode->i_mapping
> in ext4.
>
> Signed-off-by: Yu Kuai <[email protected]>

I'm not enthusiastic about this but I guess it is as good as it gets
without larger cleanups in this area. So feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/buffer.c | 68 ++++++++++++++++++++++++++-----------
> include/linux/buffer_head.h | 18 +++++++++-
> 2 files changed, 65 insertions(+), 21 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 967f34b70aa8..188bd36c9fea 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1255,16 +1255,19 @@ void __bforget(struct buffer_head *bh)
> }
> EXPORT_SYMBOL(__bforget);
>
> -static struct buffer_head *__bread_slow(struct buffer_head *bh)
> +static struct buffer_head *__bread_slow(struct buffer_head *bh,
> + blk_opf_t op_flags,
> + bool check_write_error)
> {
> lock_buffer(bh);
> - if (buffer_uptodate(bh)) {
> + if (buffer_uptodate(bh) ||
> + (check_write_error && buffer_uptodate_or_error(bh))) {
> unlock_buffer(bh);
> return bh;
> } else {
> get_bh(bh);
> bh->b_end_io = end_buffer_read_sync;
> - submit_bh(REQ_OP_READ, bh);
> + submit_bh(REQ_OP_READ | op_flags, bh);
> wait_on_buffer(bh);
> if (buffer_uptodate(bh))
> return bh;
> @@ -1445,6 +1448,31 @@ void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
> }
> EXPORT_SYMBOL(__breadahead);
>
> +static struct buffer_head *
> +bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
> + blk_opf_t op_flags, gfp_t gfp, bool check_write_error)
> +{
> + struct buffer_head *bh;
> +
> + gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
> +
> + /*
> + * Prefer looping in the allocator rather than here, at least that
> + * code knows what it's doing.
> + */
> + gfp |= __GFP_NOFAIL;
> +
> + bh = bdev_getblk(bdev, block, size, gfp);
> + if (unlikely(!bh))
> + return NULL;
> +
> + if (buffer_uptodate(bh) ||
> + (check_write_error && buffer_uptodate_or_error(bh)))
> + return bh;
> +
> + return __bread_slow(bh, op_flags, check_write_error);
> +}
> +
> /**
> * __bread_gfp() - reads a specified block and returns the bh
> * @bdev: the block_device to read from
> @@ -1458,27 +1486,27 @@ EXPORT_SYMBOL(__breadahead);
> * It returns NULL if the block was unreadable.
> */
> struct buffer_head *
> -__bread_gfp(struct block_device *bdev, sector_t block,
> - unsigned size, gfp_t gfp)
> +__bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
> + gfp_t gfp)
> {
> - struct buffer_head *bh;
> -
> - gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
> -
> - /*
> - * Prefer looping in the allocator rather than here, at least that
> - * code knows what it's doing.
> - */
> - gfp |= __GFP_NOFAIL;
> -
> - bh = bdev_getblk(bdev, block, size, gfp);
> -
> - if (likely(bh) && !buffer_uptodate(bh))
> - bh = __bread_slow(bh);
> - return bh;
> + return bread_gfp(bdev, block, size, 0, gfp, false);
> }
> EXPORT_SYMBOL(__bread_gfp);
>
> +/*
> + * This works like __bread_gfp() except:
> + * 1) If buffer write failed before, set buffer uptodate and don't read
> + * block from disk;
> + * 2) Caller can pass in additional op_flags like REQ_META;
> + */
> +struct buffer_head *
> +__bread_gfp2(struct block_device *bdev, sector_t block, unsigned int size,
> + blk_opf_t op_flags, gfp_t gfp)
> +{
> + return bread_gfp(bdev, block, size, op_flags, gfp, true);
> +}
> +EXPORT_SYMBOL(__bread_gfp2);
> +
> static void __invalidate_bh_lrus(struct bh_lru *b)
> {
> int i;
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 5f23ee599889..751b2744b4ae 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -171,6 +171,18 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh)
> return test_bit_acquire(BH_Uptodate, &bh->b_state);
> }
>
> +static __always_inline int buffer_uptodate_or_error(struct buffer_head *bh)
> +{
> + /*
> + * If the buffer has the write error flag, data was failed to write
> + * out in the block. In this case, set buffer uptodate to prevent
> + * reading old data.
> + */
> + if (buffer_write_io_error(bh))
> + set_buffer_uptodate(bh);
> + return buffer_uptodate(bh);
> +}
> +
> static inline unsigned long bh_offset(const struct buffer_head *bh)
> {
> return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1);
> @@ -231,7 +243,11 @@ void __brelse(struct buffer_head *);
> void __bforget(struct buffer_head *);
> void __breadahead(struct block_device *, sector_t block, unsigned int size);
> struct buffer_head *__bread_gfp(struct block_device *,
> - sector_t block, unsigned size, gfp_t gfp);
> + sector_t block, unsigned int size, gfp_t gfp);
> +struct buffer_head *__bread_gfp2(struct block_device *bdev, sector_t block,
> + unsigned int size, blk_opf_t op_flags,
> + gfp_t gfp);
> +
> struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
> void free_buffer_head(struct buffer_head * bh);
> void unlock_buffer(struct buffer_head *bh);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-04 12:23:38

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis

Hi,

?? 2024/01/04 19:28, Jan Kara д??:
> On Thu 21-12-23 16:56:59, Yu Kuai wrote:
>> From: Yu Kuai <[email protected]>
>>
>> On the one hand covert to use folio while reading bdev inode, on the
>> other hand prevent to access bd_inode directly.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
> ...
>> + for (p = folio_address(folio); p < max; p++)
>> if (*p != -1UL) {
>> - lock_page(page);
>> - memset(page_address(page), 0xff, PAGE_SIZE);
>> - set_page_dirty(page);
>> - unlock_page(page);
>> - balance_dirty_pages_ratelimited(mapping);
>> + folio_lock(folio);
>> + memset(folio_address(folio), 0xff,
>> + folio_size(folio));
>> + folio_mark_dirty(folio);
>> + folio_unlock(folio);
>> + bdev_balance_dirty_pages_ratelimited(bdev);
>
> Rather then creating this bdev_balance_dirty_pages_ratelimited() just for
> MTD perhaps we can have here (and in other functions):
>
> ...
> mapping = folio_mapping(folio);
> folio_unlock(folio);
> if (mapping)
> balance_dirty_pages_ratelimited(mapping);
>
> What do you think? Because when we are working with the folios it is rather
> natural to use their mapping for dirty balancing?

I think this is a great idea! And bdev_balance_dirty_pages_ratelimited()
can be removed as well.

Thanks,
Kuai

>
> Honza
>


2024-01-04 12:33:10

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api

Hi, Jan!

?? 2024/01/04 20:02, Jan Kara д??:
> On Thu 21-12-23 16:58:26, Yu Kuai wrote:
>> From: Yu Kuai <[email protected]>
>>
>> Avoid to access bd_inode directly, prepare to remove bd_inode from
>> block_device.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>
> I'm not erofs maintainer but IMO this is quite ugly and grows erofs_buf
> unnecessarily. I'd rather store 'sb' pointer in erofs_buf and then do the
> right thing in erofs_bread() which is the only place that seems to care
> about the erofs_is_fscache_mode() distinction... Also blkszbits is then
> trivially sb->s_blocksize_bits so it would all seem much more
> straightforward.

Thanks for your suggestion, I'll follow this unless Gao Xiang has other
suggestions.

Kuai
>
> Honza
>
>> ---
>> fs/erofs/data.c | 18 ++++++++++++------
>> fs/erofs/internal.h | 2 ++
>> 2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>> index c98aeda8abb2..bbe2fe199bf3 100644
>> --- a/fs/erofs/data.c
>> +++ b/fs/erofs/data.c
>> @@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
>> void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>> enum erofs_kmap_type type)
>> {
>> - struct inode *inode = buf->inode;
>> - erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
>> + u8 blkszbits = buf->inode ? buf->inode->i_blkbits : buf->blkszbits;
>> + erofs_off_t offset = (erofs_off_t)blkaddr << blkszbits;
>> pgoff_t index = offset >> PAGE_SHIFT;
>> struct page *page = buf->page;
>> struct folio *folio;
>> @@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>> erofs_put_metabuf(buf);
>>
>> nofs_flag = memalloc_nofs_save();
>> - folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
>> + folio = buf->inode ?
>> + read_mapping_folio(buf->inode->i_mapping, index, NULL) :
>> + bdev_read_folio(buf->bdev, offset);
>> memalloc_nofs_restore(nofs_flag);
>> if (IS_ERR(folio))
>> return folio;
>> @@ -67,10 +69,14 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>>
>> void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
>> {
>> - if (erofs_is_fscache_mode(sb))
>> + if (erofs_is_fscache_mode(sb)) {
>> buf->inode = EROFS_SB(sb)->s_fscache->inode;
>> - else
>> - buf->inode = sb->s_bdev->bd_inode;
>> + buf->bdev = NULL;
>> + } else {
>> + buf->inode = NULL;
>> + buf->bdev = sb->s_bdev;
>> + buf->blkszbits = EROFS_SB(sb)->blkszbits;
>> + }
>> }
>>
>> void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index b0409badb017..c9206351b485 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -224,8 +224,10 @@ enum erofs_kmap_type {
>>
>> struct erofs_buf {
>> struct inode *inode;
>> + struct block_device *bdev;
>> struct page *page;
>> void *base;
>> + u8 blkszbits;
>> enum erofs_kmap_type kmap_type;
>> };
>> #define __EROFS_BUF_INITIALIZER ((struct erofs_buf){ .page = NULL })
>> --
>> 2.39.2
>>


2024-01-04 15:17:19

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()

Hi Kuai!

On Thu 04-01-24 20:19:05, Yu Kuai wrote:
> 在 2024/01/04 19:06, Jan Kara 写道:
> > On Thu 21-12-23 16:56:57, Yu Kuai wrote:
> > > From: Yu Kuai <[email protected]>
> > >
> > > Avoid to access bd_inode directly, prepare to remove bd_inode from
> > > block_devcie.
> > >
> > > Signed-off-by: Yu Kuai <[email protected]>
> > > ---
> > > drivers/block/xen-blkback/xenbus.c | 3 +--
> > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > index e34219ea2b05..e645afa4af57 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
> > > xenbus_dev_error(blkif->be->dev, err, "block flush");
> > > return;
> > > }
> > > - invalidate_inode_pages2(
> > > - blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
> > > + invalidate_bdev(blkif->vbd.bdev_handle->bdev);
> >
> > This function uses invalidate_inode_pages2() while invalidate_bdev() ends
> > up using mapping_try_invalidate() and there are subtle behavioral
> > differences between these two (for example invalidate_inode_pages2() tries
> > to clean dirty pages using the ->launder_folio method). So I think you'll
> > need helper like invalidate_bdev2() for this.
>
> Thanks for reviewing this patch, I know the differenct between then,
> what I don't understand is that why using invalidate_inode_pages2()
> here.

Well, then the change in behavior should be at least noted in the
changelog.

> sync_blockdev() is just called and 0 is returned, I think in this
> case it's safe to call invalidate_bdev() directly, or am I missing
> other things?

I still think there's a difference. invalidate_inode_pages2() also unmaps
memory mappings which mapping_try_invalidate() does not do. That being said
in xen_update_blkif_status() we seem to be bringing up a virtual block
device so before this function is called, anybody would have hard time
using anything in it. But this definitely needs a confirmation from Xen
maintainers and a good documentation of the behavioral change in the
changelog.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-01-05 04:44:28

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api



On 2024/1/4 20:32, Yu Kuai wrote:
> Hi, Jan!
>
> 在 2024/01/04 20:02, Jan Kara 写道:
>> On Thu 21-12-23 16:58:26, Yu Kuai wrote:
>>> From: Yu Kuai <[email protected]>
>>>
>>> Avoid to access bd_inode directly, prepare to remove bd_inode from
>>> block_device.
>>>
>>> Signed-off-by: Yu Kuai <[email protected]>
>>
>> I'm not erofs maintainer but IMO this is quite ugly and grows erofs_buf
>> unnecessarily. I'd rather store 'sb' pointer in erofs_buf and then do the
>> right thing in erofs_bread() which is the only place that seems to care
>> about the erofs_is_fscache_mode() distinction... Also blkszbits is then
>> trivially sb->s_blocksize_bits so it would all seem much more
>> straightforward.
>
> Thanks for your suggestion, I'll follow this unless Gao Xiang has other
> suggestions.

Yes, that would be better, I'm fine with that. Yet in the future we
may support a seperate large dirblocksize more than block size, but
we could revisit later.

Thanks,
Gao Xiang

>
> Kuai
>>
>>                                 Honza
>>

2024-01-05 06:09:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()

On Thu, Jan 04, 2024 at 12:06:31PM +0100, Jan Kara wrote:
> This function uses invalidate_inode_pages2() while invalidate_bdev() ends
> up using mapping_try_invalidate() and there are subtle behavioral
> differences between these two (for example invalidate_inode_pages2() tries
> to clean dirty pages using the ->launder_folio method). So I think you'll
> need helper like invalidate_bdev2() for this.

That assues that the existing code actually does this intentionally,
which seems doubtful. But the change in behavior does not to be
documented and explained.


2024-01-05 06:11:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis

On Thu, Jan 04, 2024 at 12:28:55PM +0100, Jan Kara wrote:
> What do you think? Because when we are working with the folios it is rather
> natural to use their mapping for dirty balancing?

The real problem is that block2mtd pokes way to deep into block
internals.

I think the saviour here is Christians series to replace the bdev handle
with a struct file, which will allow to use the normal file write path
here and get rid of the entire layering volation.


2024-01-05 10:32:21

by Yu Kuai

[permalink] [raw]
Subject: Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis

Hi,

?? 2024/01/05 14:10, Christoph Hellwig д??:
> On Thu, Jan 04, 2024 at 12:28:55PM +0100, Jan Kara wrote:
>> What do you think? Because when we are working with the folios it is rather
>> natural to use their mapping for dirty balancing?
>
> The real problem is that block2mtd pokes way to deep into block
> internals.
>
> I think the saviour here is Christians series to replace the bdev handle
> with a struct file, which will allow to use the normal file write path
> here and get rid of the entire layering volation.

Yes, looks like lots of patches from this set is not needed anymore.
I'll stop sending v4 and just send some patches that is not related to
'bd_inode' separately.

Thanks,
Kuai

>
> .
>