2019-12-06 08:56:48

by Liang Chen

[permalink] [raw]
Subject: [PATCH 1/2] [PATCH] bcache: cached_dev_free needs to put the sb page

Same as cache device, the buffer page needs to be put while
freeing cached_dev. Otherwise a page would be leaked every
time a cached_dev is stopped.

Signed-off-by: Liang Chen <[email protected]>
---
drivers/md/bcache/super.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 77e9869345e7..a573ce1d85aa 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1275,6 +1275,9 @@ static void cached_dev_free(struct closure *cl)

mutex_unlock(&bch_register_lock);

+ if (dc->sb_bio.bi_inline_vecs[0].bv_page)
+ put_page(bio_first_page_all(&dc->sb_bio));
+
if (!IS_ERR_OR_NULL(dc->bdev))
blkdev_put(dc->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);

--
2.17.0


2019-12-06 08:57:53

by Liang Chen

[permalink] [raw]
Subject: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

__write_super assumes super block data starts at offset 0 of the page
read in with __bread from read_super, which is not true when page size
is not 4k. We encountered the issue on system with 64K page size - commonly
seen on aarch64 architecture.

Instead of making any assumption on the offset of the data within the page,
this patch calls __bread again to locate the data. That should not introduce
an extra io since the page has been held when it's read in from read_super,
and __write_super is not on performance critical code path.

Signed-off-by: Liang Chen <[email protected]>
---
drivers/md/bcache/super.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a573ce1d85aa..a39450c9bc34 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -207,15 +207,27 @@ static void write_bdev_super_endio(struct bio *bio)
closure_put(&dc->sb_write);
}

-static void __write_super(struct cache_sb *sb, struct bio *bio)
+static int __write_super(struct cache_sb *sb, struct bio *bio,
+ struct block_device *bdev)
{
- struct cache_sb *out = page_address(bio_first_page_all(bio));
+ struct cache_sb *out;
unsigned int i;
+ struct buffer_head *bh;
+
+ /*
+ * The page is held since read_super, this __bread * should not
+ * cause an extra io read.
+ */
+ bh = __bread(bdev, 1, SB_SIZE);
+ if (!bh)
+ goto out_bh;
+
+ out = (struct cache_sb *) bh->b_data;

bio->bi_iter.bi_sector = SB_SECTOR;
bio->bi_iter.bi_size = SB_SIZE;
bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC|REQ_META);
- bch_bio_map(bio, NULL);
+ bch_bio_map(bio, bh->b_data);

out->offset = cpu_to_le64(sb->offset);
out->version = cpu_to_le64(sb->version);
@@ -239,7 +251,14 @@ static void __write_super(struct cache_sb *sb, struct bio *bio)
pr_debug("ver %llu, flags %llu, seq %llu",
sb->version, sb->flags, sb->seq);

+ /* The page will still be held without this bh.*/
+ put_bh(bh);
submit_bio(bio);
+ return 0;
+
+out_bh:
+ pr_err("Couldn't read super block, __write_super failed");
+ return -1;
}

static void bch_write_bdev_super_unlock(struct closure *cl)
@@ -264,7 +283,8 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent)

closure_get(cl);
/* I/O request sent to backing device */
- __write_super(&dc->sb, bio);
+ if(__write_super(&dc->sb, bio, dc->bdev))
+ closure_put(cl);

closure_return_with_destructor(cl, bch_write_bdev_super_unlock);
}
@@ -312,7 +332,9 @@ void bcache_write_super(struct cache_set *c)
bio->bi_private = ca;

closure_get(cl);
- __write_super(&ca->sb, bio);
+ if(__write_super(&ca->sb, bio, ca->bdev))
+ closure_put(cl);
+
}

closure_return_with_destructor(cl, bcache_write_super_unlock);
--
2.17.0

2019-12-06 09:24:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

On Fri, Dec 06, 2019 at 04:55:43PM +0800, Liang Chen wrote:
> __write_super assumes super block data starts at offset 0 of the page
> read in with __bread from read_super, which is not true when page size
> is not 4k. We encountered the issue on system with 64K page size - commonly
> seen on aarch64 architecture.
>
> Instead of making any assumption on the offset of the data within the page,
> this patch calls __bread again to locate the data. That should not introduce
> an extra io since the page has been held when it's read in from read_super,
> and __write_super is not on performance critical code path.

No need to use buffer heads here, you can just use offset_in_page
to calculate the offset. Similarly I think the read side shouldn't
use buffer heads either (it is the only use of buffer heads in bcache!),
a siple read_cache_page should be all that is needed.

2019-12-06 09:29:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/2] [PATCH] bcache: cached_dev_free needs to put the sb page

On Fri, Dec 06, 2019 at 04:55:42PM +0800, Liang Chen wrote:
> Same as cache device, the buffer page needs to be put while
> freeing cached_dev. Otherwise a page would be leaked every
> time a cached_dev is stopped.
>
> Signed-off-by: Liang Chen <[email protected]>
> ---
> drivers/md/bcache/super.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 77e9869345e7..a573ce1d85aa 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1275,6 +1275,9 @@ static void cached_dev_free(struct closure *cl)
>
> mutex_unlock(&bch_register_lock);
>
> + if (dc->sb_bio.bi_inline_vecs[0].bv_page)
> + put_page(bio_first_page_all(&dc->sb_bio));

Using bio_first_page_all in the put_page call, but open coding it
for the check looks rather strange.

The cleanest thing here would be to just add a page pointer to the
cached device structure and use that instead of all the indirections.

2019-12-06 09:43:40

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH 1/2] [PATCH] bcache: cached_dev_free needs to put the sb page

On 2019/12/6 4:55 下午, Liang Chen wrote:
> Same as cache device, the buffer page needs to be put while
> freeing cached_dev. Otherwise a page would be leaked every
> time a cached_dev is stopped.
>
> Signed-off-by: Liang Chen <[email protected]>

I have this in my for-test directory, thanks.

> ---
> drivers/md/bcache/super.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 77e9869345e7..a573ce1d85aa 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1275,6 +1275,9 @@ static void cached_dev_free(struct closure *cl)
>
> mutex_unlock(&bch_register_lock);
>
> + if (dc->sb_bio.bi_inline_vecs[0].bv_page)
> + put_page(bio_first_page_all(&dc->sb_bio));
> +
> if (!IS_ERR_OR_NULL(dc->bdev))
> blkdev_put(dc->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
>
>


--

Coly Li

2019-12-06 09:45:50

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

On 2019/12/6 4:55 下午, Liang Chen wrote:
> __write_super assumes super block data starts at offset 0 of the page
> read in with __bread from read_super, which is not true when page size
> is not 4k. We encountered the issue on system with 64K page size - commonly
> seen on aarch64 architecture.
>
> Instead of making any assumption on the offset of the data within the page,
> this patch calls __bread again to locate the data. That should not introduce
> an extra io since the page has been held when it's read in from read_super,
> and __write_super is not on performance critical code path.
>
> Signed-off-by: Liang Chen <[email protected]>

In general the patch is good for me. Just two minor requests I add them
in line the email.

Thanks.

> ---
> drivers/md/bcache/super.c | 32 +++++++++++++++++++++++++++-----
> 1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index a573ce1d85aa..a39450c9bc34 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -207,15 +207,27 @@ static void write_bdev_super_endio(struct bio *bio)
> closure_put(&dc->sb_write);
> }
>
> -static void __write_super(struct cache_sb *sb, struct bio *bio)
> +static int __write_super(struct cache_sb *sb, struct bio *bio,
> + struct block_device *bdev)
> {
> - struct cache_sb *out = page_address(bio_first_page_all(bio));
> + struct cache_sb *out;
> unsigned int i;
> + struct buffer_head *bh;
> +
> + /*
> + * The page is held since read_super, this __bread * should not
> + * cause an extra io read.
> + */
> + bh = __bread(bdev, 1, SB_SIZE);
> + if (!bh)
> + goto out_bh;
> +
> + out = (struct cache_sb *) bh->b_data;

This is quite tricky here. Could you please to move this code piece into
an inline function and add code comments to explain why a read is
necessary for a write.


>
> bio->bi_iter.bi_sector = SB_SECTOR;
> bio->bi_iter.bi_size = SB_SIZE;
> bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC|REQ_META);
> - bch_bio_map(bio, NULL);
> + bch_bio_map(bio, bh->b_data);
>
> out->offset = cpu_to_le64(sb->offset);
> out->version = cpu_to_le64(sb->version);
> @@ -239,7 +251,14 @@ static void __write_super(struct cache_sb *sb, struct bio *bio)
> pr_debug("ver %llu, flags %llu, seq %llu",
> sb->version, sb->flags, sb->seq);
>
> + /* The page will still be held without this bh.*/
> + put_bh(bh);
> submit_bio(bio);
> + return 0;
> +
> +out_bh:
> + pr_err("Couldn't read super block, __write_super failed");
> + return -1;
> }
>
> static void bch_write_bdev_super_unlock(struct closure *cl)
> @@ -264,7 +283,8 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent)
>
> closure_get(cl);
> /* I/O request sent to backing device */
> - __write_super(&dc->sb, bio);
> + if(__write_super(&dc->sb, bio, dc->bdev))
> + closure_put(cl);
>
> closure_return_with_destructor(cl, bch_write_bdev_super_unlock);
> }
> @@ -312,7 +332,9 @@ void bcache_write_super(struct cache_set *c)
> bio->bi_private = ca;
>
> closure_get(cl);
> - __write_super(&ca->sb, bio);
> + if(__write_super(&ca->sb, bio, ca->bdev))

And here, please add code comments for why closure_put() is necessary here.

> + closure_put(cl);
> +
> }
>
> closure_return_with_destructor(cl, bcache_write_super_unlock);
>


--

Coly Li

2019-12-06 11:23:48

by Liang Chen

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

Sure. will do in a follow up patch.

On Fri, Dec 6, 2019 at 5:44 PM Coly Li <[email protected]> wrote:
>
> On 2019/12/6 4:55 下午, Liang Chen wrote:
> > __write_super assumes super block data starts at offset 0 of the page
> > read in with __bread from read_super, which is not true when page size
> > is not 4k. We encountered the issue on system with 64K page size - commonly
> > seen on aarch64 architecture.
> >
> > Instead of making any assumption on the offset of the data within the page,
> > this patch calls __bread again to locate the data. That should not introduce
> > an extra io since the page has been held when it's read in from read_super,
> > and __write_super is not on performance critical code path.
> >
> > Signed-off-by: Liang Chen <[email protected]>
>
> In general the patch is good for me. Just two minor requests I add them
> in line the email.
>
> Thanks.
>
> > ---
> > drivers/md/bcache/super.c | 32 +++++++++++++++++++++++++++-----
> > 1 file changed, 27 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index a573ce1d85aa..a39450c9bc34 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -207,15 +207,27 @@ static void write_bdev_super_endio(struct bio *bio)
> > closure_put(&dc->sb_write);
> > }
> >
> > -static void __write_super(struct cache_sb *sb, struct bio *bio)
> > +static int __write_super(struct cache_sb *sb, struct bio *bio,
> > + struct block_device *bdev)
> > {
> > - struct cache_sb *out = page_address(bio_first_page_all(bio));
> > + struct cache_sb *out;
> > unsigned int i;
> > + struct buffer_head *bh;
> > +
> > + /*
> > + * The page is held since read_super, this __bread * should not
> > + * cause an extra io read.
> > + */
> > + bh = __bread(bdev, 1, SB_SIZE);
> > + if (!bh)
> > + goto out_bh;
> > +
> > + out = (struct cache_sb *) bh->b_data;
>
> This is quite tricky here. Could you please to move this code piece into
> an inline function and add code comments to explain why a read is
> necessary for a write.
>
>
> >
> > bio->bi_iter.bi_sector = SB_SECTOR;
> > bio->bi_iter.bi_size = SB_SIZE;
> > bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC|REQ_META);
> > - bch_bio_map(bio, NULL);
> > + bch_bio_map(bio, bh->b_data);
> >
> > out->offset = cpu_to_le64(sb->offset);
> > out->version = cpu_to_le64(sb->version);
> > @@ -239,7 +251,14 @@ static void __write_super(struct cache_sb *sb, struct bio *bio)
> > pr_debug("ver %llu, flags %llu, seq %llu",
> > sb->version, sb->flags, sb->seq);
> >
> > + /* The page will still be held without this bh.*/
> > + put_bh(bh);
> > submit_bio(bio);
> > + return 0;
> > +
> > +out_bh:
> > + pr_err("Couldn't read super block, __write_super failed");
> > + return -1;
> > }
> >
> > static void bch_write_bdev_super_unlock(struct closure *cl)
> > @@ -264,7 +283,8 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent)
> >
> > closure_get(cl);
> > /* I/O request sent to backing device */
> > - __write_super(&dc->sb, bio);
> > + if(__write_super(&dc->sb, bio, dc->bdev))
> > + closure_put(cl);
> >
> > closure_return_with_destructor(cl, bch_write_bdev_super_unlock);
> > }
> > @@ -312,7 +332,9 @@ void bcache_write_super(struct cache_set *c)
> > bio->bi_private = ca;
> >
> > closure_get(cl);
> > - __write_super(&ca->sb, bio);
> > + if(__write_super(&ca->sb, bio, ca->bdev))
>
> And here, please add code comments for why closure_put() is necessary here.
>
> > + closure_put(cl);
> > +
> > }
> >
> > closure_return_with_destructor(cl, bcache_write_super_unlock);
> >
>
>
> --
>
> Coly Li

2019-12-06 11:24:50

by Liang Chen

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

Thanks for the advise.
Yeah, calculating the offset based on the buffer size is possible. I
just wanted to avoid making a dependency on some buffer head
internal logic here, like the way it dividesthe page into equal sized
buffers, and at the same time keep the patch less intrusive.

On Fri, Dec 6, 2019 at 5:23 PM Christoph Hellwig <[email protected]> wrote:
>
> On Fri, Dec 06, 2019 at 04:55:43PM +0800, Liang Chen wrote:
> > __write_super assumes super block data starts at offset 0 of the page
> > read in with __bread from read_super, which is not true when page size
> > is not 4k. We encountered the issue on system with 64K page size - commonly
> > seen on aarch64 architecture.
> >
> > Instead of making any assumption on the offset of the data within the page,
> > this patch calls __bread again to locate the data. That should not introduce
> > an extra io since the page has been held when it's read in from read_super,
> > and __write_super is not on performance critical code path.
>
> No need to use buffer heads here, you can just use offset_in_page
> to calculate the offset. Similarly I think the read side shouldn't
> use buffer heads either (it is the only use of buffer heads in bcache!),
> a siple read_cache_page should be all that is needed.

2019-12-06 11:28:53

by Liang Chen

[permalink] [raw]
Subject: Re: [PATCH 1/2] [PATCH] bcache: cached_dev_free needs to put the sb page

Sure. I will make a patch to clean up all the occurrences of this
usages later. Thanks.

On Fri, Dec 6, 2019 at 5:27 PM Christoph Hellwig <[email protected]> wrote:
>
> On Fri, Dec 06, 2019 at 04:55:42PM +0800, Liang Chen wrote:
> > Same as cache device, the buffer page needs to be put while
> > freeing cached_dev. Otherwise a page would be leaked every
> > time a cached_dev is stopped.
> >
> > Signed-off-by: Liang Chen <[email protected]>
> > ---
> > drivers/md/bcache/super.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index 77e9869345e7..a573ce1d85aa 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -1275,6 +1275,9 @@ static void cached_dev_free(struct closure *cl)
> >
> > mutex_unlock(&bch_register_lock);
> >
> > + if (dc->sb_bio.bi_inline_vecs[0].bv_page)
> > + put_page(bio_first_page_all(&dc->sb_bio));
>
> Using bio_first_page_all in the put_page call, but open coding it
> for the check looks rather strange.
>
> The cleanest thing here would be to just add a page pointer to the
> cached device structure and use that instead of all the indirections.

2019-12-06 23:10:03

by Eric Wheeler

[permalink] [raw]
Subject: Re: [PATCH 1/2] [PATCH] bcache: cached_dev_free needs to put the sb page

On Fri, 6 Dec 2019, Liang Chen wrote:

> Same as cache device, the buffer page needs to be put while
> freeing cached_dev. Otherwise a page would be leaked every
> time a cached_dev is stopped.
>
> Signed-off-by: Liang Chen <[email protected]>


+cc stable?

--
Eric Wheeler


> ---
> drivers/md/bcache/super.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 77e9869345e7..a573ce1d85aa 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1275,6 +1275,9 @@ static void cached_dev_free(struct closure *cl)
>
> mutex_unlock(&bch_register_lock);
>
> + if (dc->sb_bio.bi_inline_vecs[0].bv_page)
> + put_page(bio_first_page_all(&dc->sb_bio));
> +
> if (!IS_ERR_OR_NULL(dc->bdev))
> blkdev_put(dc->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
>
> --
> 2.17.0
>
>

2019-12-09 07:40:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

On Fri, Dec 06, 2019 at 05:44:38PM +0800, Coly Li wrote:
> > {
> > - struct cache_sb *out = page_address(bio_first_page_all(bio));
> > + struct cache_sb *out;
> > unsigned int i;
> > + struct buffer_head *bh;
> > +
> > + /*
> > + * The page is held since read_super, this __bread * should not
> > + * cause an extra io read.
> > + */
> > + bh = __bread(bdev, 1, SB_SIZE);
> > + if (!bh)
> > + goto out_bh;
> > +
> > + out = (struct cache_sb *) bh->b_data;
>
> This is quite tricky here. Could you please to move this code piece into
> an inline function and add code comments to explain why a read is
> necessary for a write.

A read is not nessecary. He only added it because he was too fearful
of calculating the data offset directly. But calculating it directly
is almost trivial and should just be done here. Alternatively if that
is still to hard just keep a pointer to the cache_sb around, which is
how most file systems do it.

2019-12-09 09:53:55

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

On 2019/12/9 3:37 下午, Christoph Hellwig wrote:
> On Fri, Dec 06, 2019 at 05:44:38PM +0800, Coly Li wrote:
>>> {
>>> - struct cache_sb *out = page_address(bio_first_page_all(bio));
>>> + struct cache_sb *out;
>>> unsigned int i;
>>> + struct buffer_head *bh;
>>> +
>>> + /*
>>> + * The page is held since read_super, this __bread * should not
>>> + * cause an extra io read.
>>> + */
>>> + bh = __bread(bdev, 1, SB_SIZE);
>>> + if (!bh)
>>> + goto out_bh;
>>> +
>>> + out = (struct cache_sb *) bh->b_data;
>>
>> This is quite tricky here. Could you please to move this code piece into
>> an inline function and add code comments to explain why a read is
>> necessary for a write.
>
> A read is not nessecary. He only added it because he was too fearful
> of calculating the data offset directly. But calculating it directly
> is almost trivial and should just be done here. Alternatively if that
> is still to hard just keep a pointer to the cache_sb around, which is
> how most file systems do it.
>
Copied, if Liang does not have time to handle this as your suggestion, I
will handle it.

Thanks for the hint.

--

Coly Li

2019-12-09 10:47:17

by Liang Chen

[permalink] [raw]
Subject: Re: [PATCH 2/2] [PATCH] bcache: __write_super to handle page sizes other than 4k

No problem. I will change the patch to remove this extra read. Thanks.

On Mon, Dec 9, 2019 at 5:52 PM Coly Li <[email protected]> wrote:
>
> On 2019/12/9 3:37 下午, Christoph Hellwig wrote:
> > On Fri, Dec 06, 2019 at 05:44:38PM +0800, Coly Li wrote:
> >>> {
> >>> - struct cache_sb *out = page_address(bio_first_page_all(bio));
> >>> + struct cache_sb *out;
> >>> unsigned int i;
> >>> + struct buffer_head *bh;
> >>> +
> >>> + /*
> >>> + * The page is held since read_super, this __bread * should not
> >>> + * cause an extra io read.
> >>> + */
> >>> + bh = __bread(bdev, 1, SB_SIZE);
> >>> + if (!bh)
> >>> + goto out_bh;
> >>> +
> >>> + out = (struct cache_sb *) bh->b_data;
> >>
> >> This is quite tricky here. Could you please to move this code piece into
> >> an inline function and add code comments to explain why a read is
> >> necessary for a write.
> >
> > A read is not nessecary. He only added it because he was too fearful
> > of calculating the data offset directly. But calculating it directly
> > is almost trivial and should just be done here. Alternatively if that
> > is still to hard just keep a pointer to the cache_sb around, which is
> > how most file systems do it.
> >
> Copied, if Liang does not have time to handle this as your suggestion, I
> will handle it.
>
> Thanks for the hint.
>
> --
>
> Coly Li