2015-04-23 23:04:52

by Ming Lin

[permalink] [raw]
Subject: [PATCH v3 0/4] preparation for block layer simplification

I'd like to continue on Dongsu's effort to push Kent's block-stuff upstream.

This is a preparation series for simplifying block layer based on immutable
biovecs, a spin off of the v2 of simplifying patchset. [1] The original
goal of simplifying block layer was actually making generic_make_request()
accept arbitrarily sized bios, and pushing the splitting down to the
underlying drivers.

This patchset aims at cleaning up several parts that are independent of
core changes in the block layer. Doing that, it could be possible to
change block layer relatively easily without keeping up with many patches.

This patchset should be first applied prior to upcoming patchsets such as
"simplify block layer based on immutable biovecs." This patchset itself
should not bring any regression to end-users.

Changes in v3:
- rebase on top of Jen's bio-cnt branch
- drop patches 1 and 2 that were already upstream
- lib/iovec.c was gone, so move iov_count_pages to block/bio.c

Changes in v2:
- split up preparation patches from v1 into this separate series.
- In the patch 02, pass iov_iter by value to __bio_copy_iov(), and split
into read/write variants, as suggested by Christoph Hellwig.
- minor changes like writing more commit messages etc.

[1] https://lkml.org/lkml/2014/12/22/128

Kent Overstreet (4):
block: refactor iov_count_pages() from bio_{copy,map}_user_iov()
md/raid10: make sync_request_write() call bio_copy_data()
fs: make _submit_bh consistent with generic bio chaining
PM: submit bio in a sane way in cases without bio_chain


block/bio.c | 70 +++++++++++++++++++++++++++++++++-------------------------------------
drivers/md/raid10.c | 20 +++++---------------
fs/buffer.c | 4 ++--
kernel/power/block_io.c | 23 ++++++++++++++++++-----
4 files changed, 58 insertions(+), 59 deletions(-)


2015-04-23 23:05:15

by Ming Lin

[permalink] [raw]
Subject: [PATCH v3 1/4] block: refactor iov_count_pages() from bio_{copy,map}_user_iov()

From: Kent Overstreet <[email protected]>

Refactor the common part in bio_copy_user_iov() and
__bio_map_user_iov() to separate out iov_count_pages() into the general
iov_iter API, instead of open coding iov iterations as done previously.

This commit should contain only literal replacements, without
functional changes.

Cc: Christoph Hellwig <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Hans J. Koch" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
[dpark: add more description in commit message]
[mlin: move iov_count_pages to block/bio.c]
Signed-off-by: Dongsu Park <[email protected]>
Signed-off-by: Ming Lin <[email protected]>
---
block/bio.c | 70 +++++++++++++++++++++++++++++--------------------------------
1 file changed, 33 insertions(+), 37 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 05c2864..eb4471a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1139,6 +1139,33 @@ int bio_uncopy_user(struct bio *bio)
}
EXPORT_SYMBOL(bio_uncopy_user);

+static int iov_count_pages(const struct iov_iter *iter, unsigned align)
+{
+ struct iov_iter i = *iter;
+ int nr_pages = 0;
+
+ while (iov_iter_count(&i)) {
+ unsigned long uaddr = (unsigned long) i.iov->iov_base +
+ i.iov_offset;
+ unsigned long len = i.iov->iov_len - i.iov_offset;
+
+ if ((uaddr & align) || (len & align))
+ return -EINVAL;
+
+ /*
+ * Overflow, abort
+ */
+ if (uaddr + len < uaddr)
+ return -EINVAL;
+
+ nr_pages += DIV_ROUND_UP(len + offset_in_page(uaddr),
+ PAGE_SIZE);
+ iov_iter_advance(&i, len);
+ }
+
+ return nr_pages;
+}
+
/**
* bio_copy_user_iov - copy user data to bio
* @q: destination block queue
@@ -1163,24 +1190,9 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
unsigned int len = iter->count;
unsigned int offset = map_data ? map_data->offset & ~PAGE_MASK : 0;

- for (i = 0; i < iter->nr_segs; i++) {
- unsigned long uaddr;
- unsigned long end;
- unsigned long start;
-
- uaddr = (unsigned long) iter->iov[i].iov_base;
- end = (uaddr + iter->iov[i].iov_len + PAGE_SIZE - 1)
- >> PAGE_SHIFT;
- start = uaddr >> PAGE_SHIFT;
-
- /*
- * Overflow, abort
- */
- if (end < start)
- return ERR_PTR(-EINVAL);
-
- nr_pages += end - start;
- }
+ nr_pages = iov_count_pages(iter, 0);
+ if (nr_pages < 0)
+ return ERR_PTR(nr_pages);

if (offset)
nr_pages++;
@@ -1292,25 +1304,9 @@ struct bio *bio_map_user_iov(struct request_queue *q,
struct iov_iter i;
struct iovec iov;

- iov_for_each(iov, i, *iter) {
- unsigned long uaddr = (unsigned long) iov.iov_base;
- unsigned long len = iov.iov_len;
- unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = uaddr >> PAGE_SHIFT;
-
- /*
- * Overflow, abort
- */
- if (end < start)
- return ERR_PTR(-EINVAL);
-
- nr_pages += end - start;
- /*
- * buffer must be aligned to at least hardsector size for now
- */
- if (uaddr & queue_dma_alignment(q))
- return ERR_PTR(-EINVAL);
- }
+ nr_pages = iov_count_pages(iter, queue_dma_alignment(q));
+ if (nr_pages < 0)
+ return ERR_PTR(nr_pages);

if (!nr_pages)
return ERR_PTR(-EINVAL);
--
1.9.1

2015-04-23 23:06:11

by Ming Lin

[permalink] [raw]
Subject: [PATCH v3 2/4] md/raid10: make sync_request_write() call bio_copy_data()

From: Kent Overstreet <[email protected]>

Refactor sync_request_write() of md/raid10 to use bio_copy_data()
instead of open coding bio_vec iterations.

Cc: Christoph Hellwig <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <[email protected]>
Signed-off-by: Ming Lin <[email protected]>
---
drivers/md/raid10.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a7196c4..02e33f1 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2097,18 +2097,11 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
tbio->bi_vcnt = vcnt;
tbio->bi_iter.bi_size = r10_bio->sectors << 9;
tbio->bi_rw = WRITE;
- tbio->bi_private = r10_bio;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
-
- for (j=0; j < vcnt ; j++) {
- tbio->bi_io_vec[j].bv_offset = 0;
- tbio->bi_io_vec[j].bv_len = PAGE_SIZE;
-
- memcpy(page_address(tbio->bi_io_vec[j].bv_page),
- page_address(fbio->bi_io_vec[j].bv_page),
- PAGE_SIZE);
- }
tbio->bi_end_io = end_sync_write;
+ tbio->bi_private = r10_bio;
+
+ bio_copy_data(tbio, fbio);

d = r10_bio->devs[i].devnum;
atomic_inc(&conf->mirrors[d].rdev->nr_pending);
@@ -2124,17 +2117,14 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
* that are active
*/
for (i = 0; i < conf->copies; i++) {
- int j, d;
+ int d;

tbio = r10_bio->devs[i].repl_bio;
if (!tbio || !tbio->bi_end_io)
continue;
if (r10_bio->devs[i].bio->bi_end_io != end_sync_write
&& r10_bio->devs[i].bio != fbio)
- for (j = 0; j < vcnt; j++)
- memcpy(page_address(tbio->bi_io_vec[j].bv_page),
- page_address(fbio->bi_io_vec[j].bv_page),
- PAGE_SIZE);
+ bio_copy_data(tbio, fbio);
d = r10_bio->devs[i].devnum;
atomic_inc(&r10_bio->remaining);
md_sync_acct(conf->mirrors[d].replacement->bdev,
--
1.9.1

2015-04-23 23:05:22

by Ming Lin

[permalink] [raw]
Subject: [PATCH v3 3/4] fs: make _submit_bh consistent with generic bio chaining

From: Kent Overstreet <[email protected]>

Make _submit_bh() handle refcounting by increasing bio->bi_remaining,
followed by bio_endio(). Since bio chaining was introduced with
196d38bccfcf ("block: Generic bio chaining"), refcounting should be
done on bi_remaining instead of ancient bio_cnt. Doing that, calling
convention can be consistent with the immutable biovecs API.

Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
[dpark: add more description in commit message]
[mlin: rebase as Jens is changing the bi_remaining rules]
Signed-off-by: Dongsu Park <[email protected]>
Signed-off-by: Ming Lin <[email protected]>
---
fs/buffer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index c7a5602..c1c0e0d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3041,13 +3041,13 @@ int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
if (buffer_prio(bh))
rw |= REQ_PRIO;

- bio_get(bio);
+ bio_inc_remaining(bio);
submit_bio(rw, bio);

if (bio_flagged(bio, BIO_EOPNOTSUPP))
ret = -EOPNOTSUPP;

- bio_put(bio);
+ bio_endio(bio, 0);
return ret;
}
EXPORT_SYMBOL_GPL(_submit_bh);
--
1.9.1

2015-04-23 23:05:36

by Ming Lin

[permalink] [raw]
Subject: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

From: Kent Overstreet <[email protected]>

Make bio submission in kernel/power/block_io.c to properly submit
bios also when bio_chain is not available. In that case, it's not
necessary to handle refcount with bio_get(), but it's saner to simply
call a predefined helper submit_bio_wait(). So call bio_get() only
when bio_chain is given.

Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <[email protected]>
Signed-off-by: Ming Lin <[email protected]>
---
kernel/power/block_io.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index 9a58bc2..7206408 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -34,7 +34,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
- bio->bi_end_io = end_swap_bio_read;

if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
@@ -44,15 +43,29 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
}

lock_page(page);
- bio_get(bio);

if (bio_chain == NULL) {
- submit_bio(bio_rw, bio);
- wait_on_page_locked(page);
+ int err = submit_bio_wait(bio_rw, bio);
+
+ if (err) {
+ SetPageError(page);
+ ClearPageUptodate(page);
+ pr_alert("Read-error on swap-device (%u:%u:%llu)\n",
+ imajor(bio->bi_bdev->bd_inode),
+ iminor(bio->bi_bdev->bd_inode),
+ (unsigned long long)bio->bi_iter.bi_sector);
+ } else {
+ SetPageUptodate(page);
+ }
+
if (rw == READ)
- bio_set_pages_dirty(bio);
+ set_page_dirty_lock(page);
+ unlock_page(page);
bio_put(bio);
} else {
+ bio->bi_end_io = end_swap_bio_read;
+ bio_get(bio);
+
if (rw == READ)
get_page(page); /* These pages are freed later */
bio->bi_private = *bio_chain;
--
1.9.1

2015-04-24 07:06:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] block: refactor iov_count_pages() from bio_{copy,map}_user_iov()

On Thu, Apr 23, 2015 at 04:04:32PM -0700, Ming Lin wrote:
> From: Kent Overstreet <[email protected]>
>
> Refactor the common part in bio_copy_user_iov() and
> __bio_map_user_iov() to separate out iov_count_pages() into the general
> iov_iter API, instead of open coding iov iterations as done previously.
>
> This commit should contain only literal replacements, without
> functional changes.


Weve already got a iov_iter_npages and iov_iter_alignment,
the SG_IO code should probably be switched to those instead.

2015-04-24 07:09:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] md/raid10: make sync_request_write() call bio_copy_data()

On Thu, Apr 23, 2015 at 04:04:33PM -0700, Ming Lin wrote:
> From: Kent Overstreet <[email protected]>
>
> Refactor sync_request_write() of md/raid10 to use bio_copy_data()
> instead of open coding bio_vec iterations.

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2015-04-24 07:25:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] fs: make _submit_bh consistent with generic bio chaining

On Thu, Apr 23, 2015 at 04:04:34PM -0700, Ming Lin wrote:
> From: Kent Overstreet <[email protected]>
>
> Make _submit_bh() handle refcounting by increasing bio->bi_remaining,
> followed by bio_endio(). Since bio chaining was introduced with
> 196d38bccfcf ("block: Generic bio chaining"), refcounting should be
> done on bi_remaining instead of ancient bio_cnt. Doing that, calling
> convention can be consistent with the immutable biovecs API.

I think this and the copy & paste version in nilfs2 can just go away.
Since the big barrier to flush rewrite we will never fail FUA
and FLUSH requests with BIO_EOPNOTSUPP. That error is just left
for DISCARD adn WRITE_SAME. So something like the untested patch
below should take care of this:

diff --git a/block/bounce.c b/block/bounce.c
index ab21ba2..4bac725 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -128,9 +128,6 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool, int err)
struct bio_vec *bvec, *org_vec;
int i;

- if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
- set_bit(BIO_EOPNOTSUPP, &bio_orig->bi_flags);
-
/*
* free up bounce indirect pages used
*/
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 639f266..e8d7670 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3225,11 +3225,8 @@ static int write_dev_supers(struct btrfs_device *device,
*/
static void btrfs_end_empty_barrier(struct bio *bio, int err)
{
- if (err) {
- if (err == -EOPNOTSUPP)
- set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
+ if (err)
clear_bit(BIO_UPTODATE, &bio->bi_flags);
- }
if (bio->bi_private)
complete(bio->bi_private);
bio_put(bio);
@@ -3257,11 +3254,7 @@ static int write_dev_flush(struct btrfs_device *device, int wait)

wait_for_completion(&device->flush_wait);

- if (bio_flagged(bio, BIO_EOPNOTSUPP)) {
- printk_in_rcu("BTRFS: disabling barriers on dev %s\n",
- rcu_str_deref(device->name));
- device->nobarriers = 1;
- } else if (!bio_flagged(bio, BIO_UPTODATE)) {
+ if (!bio_flagged(bio, BIO_UPTODATE)) {
ret = -EIO;
btrfs_dev_stat_inc_and_print(device,
BTRFS_DEV_STAT_FLUSH_ERRS);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d688cfe..6a44bea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2767,8 +2767,6 @@ static int __must_check submit_one_bio(int rw, struct bio *bio,
else
btrfsic_submit_bio(rw, bio);

- if (bio_flagged(bio, BIO_EOPNOTSUPP))
- ret = -EOPNOTSUPP;
bio_put(bio);
return ret;
}
diff --git a/fs/buffer.c b/fs/buffer.c
index c7a5602..efd85e0 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2938,10 +2938,6 @@ static void end_bio_bh_io_sync(struct bio *bio, int err)
{
struct buffer_head *bh = bio->bi_private;

- if (err == -EOPNOTSUPP) {
- set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
- }
-
if (unlikely (test_bit(BIO_QUIET,&bio->bi_flags)))
set_bit(BH_Quiet, &bh->b_state);

@@ -3041,13 +3037,7 @@ int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
if (buffer_prio(bh))
rw |= REQ_PRIO;

- bio_get(bio);
submit_bio(rw, bio);
-
- if (bio_flagged(bio, BIO_EOPNOTSUPP))
- ret = -EOPNOTSUPP;
-
- bio_put(bio);
return ret;
}
EXPORT_SYMBOL_GPL(_submit_bh);
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 5765f88..c5d81e8 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -359,7 +359,6 @@ void ext4_io_submit(struct ext4_io_submit *io)
if (bio) {
bio_get(io->io_bio);
submit_bio(io->io_op, io->io_bio);
- BUG_ON(bio_flagged(io->io_bio, BIO_EOPNOTSUPP));
bio_put(io->io_bio);
}
io->io_bio = NULL;
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index dc3a9efd..42468e5 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -343,11 +343,6 @@ static void nilfs_end_bio_write(struct bio *bio, int err)
const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
struct nilfs_segment_buffer *segbuf = bio->bi_private;

- if (err == -EOPNOTSUPP) {
- set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
- /* to be detected by nilfs_segbuf_submit_bio() */
- }
-
if (!uptodate)
atomic_inc(&segbuf->sb_err);

@@ -374,15 +369,8 @@ static int nilfs_segbuf_submit_bio(struct nilfs_segment_buffer *segbuf,

bio->bi_end_io = nilfs_end_bio_write;
bio->bi_private = segbuf;
- bio_get(bio);
submit_bio(mode, bio);
segbuf->sb_nbio++;
- if (bio_flagged(bio, BIO_EOPNOTSUPP)) {
- bio_put(bio);
- err = -EOPNOTSUPP;
- goto failed;
- }
- bio_put(bio);

wi->bio = NULL;
wi->rest_blocks -= wi->end - wi->start;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a1b25e3..809feb3 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -118,7 +118,6 @@ struct bio {
#define BIO_CLONED 4 /* doesn't own data */
#define BIO_BOUNCED 5 /* bio is a bounce bio */
#define BIO_USER_MAPPED 6 /* contains user pages */
-#define BIO_EOPNOTSUPP 7 /* not supported */
#define BIO_NULL_MAPPED 8 /* contains invalid user pages */
#define BIO_QUIET 9 /* Make BIO Quiet */
#define BIO_SNAP_STABLE 10 /* bio data must be snapshotted during write */

2015-04-24 16:17:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

On Thu, Apr 23, 2015 at 04:04:35PM -0700, Ming Lin wrote:
> From: Kent Overstreet <[email protected]>
>
> Make bio submission in kernel/power/block_io.c to properly submit
> bios also when bio_chain is not available. In that case, it's not
> necessary to handle refcount with bio_get(), but it's saner to simply
> call a predefined helper submit_bio_wait(). So call bio_get() only
> when bio_chain is given.

The patch looks correct, buth that whole code is a f****ing mess.

For one it really shouldn't mess with pages states nor abuse
end_swap_bio_read.

Something like the untested patch below should do it, but it really
needs some solid testing from people that know how to even exercise
it.

---
>From af56dde07c34b203f5c40c8864bfd55697b0aad0 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Fri, 24 Apr 2015 11:26:00 +0200
Subject: suspend: sane block I/O handling

stop abusing struct page functionality and the swap end_io handler, and
instead add a modified version of the blk-lib.c bio_batch helpers.

Also move the block I/O code into swap.c as they are directly tied into
each other.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/swap.h | 1 -
kernel/power/Makefile | 3 +-
kernel/power/block_io.c | 103 -------------------------------
kernel/power/power.h | 9 ---
kernel/power/swap.c | 159 ++++++++++++++++++++++++++++++++++++------------
mm/page_io.c | 2 +-
6 files changed, 122 insertions(+), 155 deletions(-)
delete mode 100644 kernel/power/block_io.c

diff --git a/include/linux/swap.h b/include/linux/swap.h
index cee108c..3887472 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -377,7 +377,6 @@ extern void end_swap_bio_write(struct bio *bio, int err);
extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
void (*end_write_func)(struct bio *, int));
extern int swap_set_page_dirty(struct page *page);
-extern void end_swap_bio_read(struct bio *bio, int err);

int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
unsigned long nr_pages, sector_t start_block);
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 29472bf..cb880a1 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -7,8 +7,7 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += console.o
obj-$(CONFIG_FREEZER) += process.o
obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
-obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
- block_io.o
+obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
deleted file mode 100644
index 9a58bc2..0000000
--- a/kernel/power/block_io.c
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * This file provides functions for block I/O operations on swap/file.
- *
- * Copyright (C) 1998,2001-2005 Pavel Machek <[email protected]>
- * Copyright (C) 2006 Rafael J. Wysocki <[email protected]>
- *
- * This file is released under the GPLv2.
- */
-
-#include <linux/bio.h>
-#include <linux/kernel.h>
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-
-#include "power.h"
-
-/**
- * submit - submit BIO request.
- * @rw: READ or WRITE.
- * @off physical offset of page.
- * @page: page we're reading or writing.
- * @bio_chain: list of pending biod (for async reading)
- *
- * Straight from the textbook - allocate and initialize the bio.
- * If we're reading, make sure the page is marked as dirty.
- * Then submit it and, if @bio_chain == NULL, wait.
- */
-static int submit(int rw, struct block_device *bdev, sector_t sector,
- struct page *page, struct bio **bio_chain)
-{
- const int bio_rw = rw | REQ_SYNC;
- struct bio *bio;
-
- bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
- bio->bi_iter.bi_sector = sector;
- bio->bi_bdev = bdev;
- bio->bi_end_io = end_swap_bio_read;
-
- if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
- printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
- (unsigned long long)sector);
- bio_put(bio);
- return -EFAULT;
- }
-
- lock_page(page);
- bio_get(bio);
-
- if (bio_chain == NULL) {
- submit_bio(bio_rw, bio);
- wait_on_page_locked(page);
- if (rw == READ)
- bio_set_pages_dirty(bio);
- bio_put(bio);
- } else {
- if (rw == READ)
- get_page(page); /* These pages are freed later */
- bio->bi_private = *bio_chain;
- *bio_chain = bio;
- submit_bio(bio_rw, bio);
- }
- return 0;
-}
-
-int hib_bio_read_page(pgoff_t page_off, void *addr, struct bio **bio_chain)
-{
- return submit(READ, hib_resume_bdev, page_off * (PAGE_SIZE >> 9),
- virt_to_page(addr), bio_chain);
-}
-
-int hib_bio_write_page(pgoff_t page_off, void *addr, struct bio **bio_chain)
-{
- return submit(WRITE, hib_resume_bdev, page_off * (PAGE_SIZE >> 9),
- virt_to_page(addr), bio_chain);
-}
-
-int hib_wait_on_bio_chain(struct bio **bio_chain)
-{
- struct bio *bio;
- struct bio *next_bio;
- int ret = 0;
-
- if (bio_chain == NULL)
- return 0;
-
- bio = *bio_chain;
- if (bio == NULL)
- return 0;
- while (bio) {
- struct page *page;
-
- next_bio = bio->bi_private;
- page = bio->bi_io_vec[0].bv_page;
- wait_on_page_locked(page);
- if (!PageUptodate(page) || PageError(page))
- ret = -EIO;
- put_page(page);
- bio_put(bio);
- bio = next_bio;
- }
- *bio_chain = NULL;
- return ret;
-}
diff --git a/kernel/power/power.h b/kernel/power/power.h
index ce9b832..caadb56 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -163,15 +163,6 @@ extern void swsusp_close(fmode_t);
extern int swsusp_unmark(void);
#endif

-/* kernel/power/block_io.c */
-extern struct block_device *hib_resume_bdev;
-
-extern int hib_bio_read_page(pgoff_t page_off, void *addr,
- struct bio **bio_chain);
-extern int hib_bio_write_page(pgoff_t page_off, void *addr,
- struct bio **bio_chain);
-extern int hib_wait_on_bio_chain(struct bio **bio_chain);
-
struct timeval;
/* kernel/power/swsusp.c */
extern void swsusp_show_speed(ktime_t, ktime_t, unsigned int, char *);
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 570aff8..8a0b64d 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -212,7 +212,84 @@ int swsusp_swap_in_use(void)
*/

static unsigned short root_swap = 0xffff;
-struct block_device *hib_resume_bdev;
+static struct block_device *hib_resume_bdev;
+
+struct hib_bio_batch {
+ atomic_t count;
+ wait_queue_head_t wait;
+ int error;
+};
+
+static void hib_init_batch(struct hib_bio_batch *hb)
+{
+ atomic_set(&hb->count, 0);
+ init_waitqueue_head(&hb->wait);
+ hb->error = 0;
+}
+
+static void hib_end_io(struct bio *bio, int error)
+{
+ struct hib_bio_batch *hb = bio->bi_private;
+ const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+ struct page *page = bio->bi_io_vec[0].bv_page;
+
+ if (!uptodate || error) {
+ printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n",
+ imajor(bio->bi_bdev->bd_inode),
+ iminor(bio->bi_bdev->bd_inode),
+ (unsigned long long)bio->bi_iter.bi_sector);
+
+ if (!error)
+ error = -EIO;
+ }
+
+ if (bio_data_dir(bio) == WRITE)
+ put_page(page);
+
+ if (error && !hb->error)
+ hb->error = error;
+ if (atomic_dec_and_test(&hb->count))
+ wake_up(&hb->wait);
+
+ bio_put(bio);
+}
+
+static int hib_submit_io(int rw, pgoff_t page_off, void *addr,
+ struct hib_bio_batch *hb)
+{
+ struct page *page = virt_to_page(addr);
+ struct bio *bio;
+ int error = 0;
+
+ bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
+ bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9);
+ bio->bi_bdev = hib_resume_bdev;
+
+ if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+ printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
+ (unsigned long long)bio->bi_iter.bi_sector);
+ bio_put(bio);
+ return -EFAULT;
+ }
+
+ if (hb) {
+ bio->bi_end_io = hib_end_io;
+ bio->bi_private = hb;
+ atomic_inc(&hb->count);
+ submit_bio(rw, bio);
+ } else {
+ error = submit_bio_wait(rw, bio);
+ bio_put(bio);
+ }
+
+ return error;
+}
+
+static int hib_wait_io(struct hib_bio_batch *hb)
+{
+ wait_event(hb->wait, atomic_read(&hb->count) == 0);
+ return hb->error;
+}

/*
* Saving part
@@ -222,7 +299,7 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags)
{
int error;

- hib_bio_read_page(swsusp_resume_block, swsusp_header, NULL);
+ hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL);
if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) ||
!memcmp("SWAPSPACE2",swsusp_header->sig, 10)) {
memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10);
@@ -231,7 +308,7 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags)
swsusp_header->flags = flags;
if (flags & SF_CRC32_MODE)
swsusp_header->crc32 = handle->crc32;
- error = hib_bio_write_page(swsusp_resume_block,
+ error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
swsusp_header, NULL);
} else {
printk(KERN_ERR "PM: Swap header not found!\n");
@@ -271,10 +348,10 @@ static int swsusp_swap_check(void)
* write_page - Write one page to given swap location.
* @buf: Address we're writing.
* @offset: Offset of the swap page we're writing to.
- * @bio_chain: Link the next write BIO here
+ * @hb: bio completion batch
*/

-static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
+static int write_page(void *buf, sector_t offset, struct hib_bio_batch *hb)
{
void *src;
int ret;
@@ -282,13 +359,13 @@ static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
if (!offset)
return -ENOSPC;

- if (bio_chain) {
+ if (hb) {
src = (void *)__get_free_page(__GFP_WAIT | __GFP_NOWARN |
__GFP_NORETRY);
if (src) {
copy_page(src, buf);
} else {
- ret = hib_wait_on_bio_chain(bio_chain); /* Free pages */
+ ret = hib_wait_io(hb); /* Free pages */
if (ret)
return ret;
src = (void *)__get_free_page(__GFP_WAIT |
@@ -298,14 +375,14 @@ static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
copy_page(src, buf);
} else {
WARN_ON_ONCE(1);
- bio_chain = NULL; /* Go synchronous */
+ hb = NULL; /* Go synchronous */
src = buf;
}
}
} else {
src = buf;
}
- return hib_bio_write_page(offset, src, bio_chain);
+ return hib_submit_io(WRITE_SYNC, offset, src, hb);
}

static void release_swap_writer(struct swap_map_handle *handle)
@@ -348,7 +425,7 @@ err_close:
}

static int swap_write_page(struct swap_map_handle *handle, void *buf,
- struct bio **bio_chain)
+ struct hib_bio_batch *hb)
{
int error = 0;
sector_t offset;
@@ -356,7 +433,7 @@ static int swap_write_page(struct swap_map_handle *handle, void *buf,
if (!handle->cur)
return -EINVAL;
offset = alloc_swapdev_block(root_swap);
- error = write_page(buf, offset, bio_chain);
+ error = write_page(buf, offset, hb);
if (error)
return error;
handle->cur->entries[handle->k++] = offset;
@@ -365,15 +442,15 @@ static int swap_write_page(struct swap_map_handle *handle, void *buf,
if (!offset)
return -ENOSPC;
handle->cur->next_swap = offset;
- error = write_page(handle->cur, handle->cur_swap, bio_chain);
+ error = write_page(handle->cur, handle->cur_swap, hb);
if (error)
goto out;
clear_page(handle->cur);
handle->cur_swap = offset;
handle->k = 0;

- if (bio_chain && low_free_pages() <= handle->reqd_free_pages) {
- error = hib_wait_on_bio_chain(bio_chain);
+ if (hb && low_free_pages() <= handle->reqd_free_pages) {
+ error = hib_wait_io(hb);
if (error)
goto out;
/*
@@ -445,23 +522,24 @@ static int save_image(struct swap_map_handle *handle,
int ret;
int nr_pages;
int err2;
- struct bio *bio;
+ struct hib_bio_batch hb;
ktime_t start;
ktime_t stop;

+ hib_init_batch(&hb);
+
printk(KERN_INFO "PM: Saving image data pages (%u pages)...\n",
nr_to_write);
m = nr_to_write / 10;
if (!m)
m = 1;
nr_pages = 0;
- bio = NULL;
start = ktime_get();
while (1) {
ret = snapshot_read_next(snapshot);
if (ret <= 0)
break;
- ret = swap_write_page(handle, data_of(*snapshot), &bio);
+ ret = swap_write_page(handle, data_of(*snapshot), &hb);
if (ret)
break;
if (!(nr_pages % m))
@@ -469,7 +547,7 @@ static int save_image(struct swap_map_handle *handle,
nr_pages / m * 10);
nr_pages++;
}
- err2 = hib_wait_on_bio_chain(&bio);
+ err2 = hib_wait_io(&hb);
stop = ktime_get();
if (!ret)
ret = err2;
@@ -580,7 +658,7 @@ static int save_image_lzo(struct swap_map_handle *handle,
int ret = 0;
int nr_pages;
int err2;
- struct bio *bio;
+ struct hib_bio_batch hb;
ktime_t start;
ktime_t stop;
size_t off;
@@ -589,6 +667,8 @@ static int save_image_lzo(struct swap_map_handle *handle,
struct cmp_data *data = NULL;
struct crc_data *crc = NULL;

+ hib_init_batch(&hb);
+
/*
* We'll limit the number of threads for compression to limit memory
* footprint.
@@ -674,7 +754,6 @@ static int save_image_lzo(struct swap_map_handle *handle,
if (!m)
m = 1;
nr_pages = 0;
- bio = NULL;
start = ktime_get();
for (;;) {
for (thr = 0; thr < nr_threads; thr++) {
@@ -748,7 +827,7 @@ static int save_image_lzo(struct swap_map_handle *handle,
off += PAGE_SIZE) {
memcpy(page, data[thr].cmp + off, PAGE_SIZE);

- ret = swap_write_page(handle, page, &bio);
+ ret = swap_write_page(handle, page, &hb);
if (ret)
goto out_finish;
}
@@ -759,7 +838,7 @@ static int save_image_lzo(struct swap_map_handle *handle,
}

out_finish:
- err2 = hib_wait_on_bio_chain(&bio);
+ err2 = hib_wait_io(&hb);
stop = ktime_get();
if (!ret)
ret = err2;
@@ -906,7 +985,7 @@ static int get_swap_reader(struct swap_map_handle *handle,
return -ENOMEM;
}

- error = hib_bio_read_page(offset, tmp->map, NULL);
+ error = hib_submit_io(READ_SYNC, offset, tmp->map, NULL);
if (error) {
release_swap_reader(handle);
return error;
@@ -919,7 +998,7 @@ static int get_swap_reader(struct swap_map_handle *handle,
}

static int swap_read_page(struct swap_map_handle *handle, void *buf,
- struct bio **bio_chain)
+ struct hib_bio_batch *hb)
{
sector_t offset;
int error;
@@ -930,7 +1009,7 @@ static int swap_read_page(struct swap_map_handle *handle, void *buf,
offset = handle->cur->entries[handle->k];
if (!offset)
return -EFAULT;
- error = hib_bio_read_page(offset, buf, bio_chain);
+ error = hib_submit_io(READ_SYNC, offset, buf, hb);
if (error)
return error;
if (++handle->k >= MAP_PAGE_ENTRIES) {
@@ -968,27 +1047,28 @@ static int load_image(struct swap_map_handle *handle,
int ret = 0;
ktime_t start;
ktime_t stop;
- struct bio *bio;
+ struct hib_bio_batch hb;
int err2;
unsigned nr_pages;

+ hib_init_batch(&hb);
+
printk(KERN_INFO "PM: Loading image data pages (%u pages)...\n",
nr_to_read);
m = nr_to_read / 10;
if (!m)
m = 1;
nr_pages = 0;
- bio = NULL;
start = ktime_get();
for ( ; ; ) {
ret = snapshot_write_next(snapshot);
if (ret <= 0)
break;
- ret = swap_read_page(handle, data_of(*snapshot), &bio);
+ ret = swap_read_page(handle, data_of(*snapshot), &hb);
if (ret)
break;
if (snapshot->sync_read)
- ret = hib_wait_on_bio_chain(&bio);
+ ret = hib_wait_io(&hb);
if (ret)
break;
if (!(nr_pages % m))
@@ -996,7 +1076,7 @@ static int load_image(struct swap_map_handle *handle,
nr_pages / m * 10);
nr_pages++;
}
- err2 = hib_wait_on_bio_chain(&bio);
+ err2 = hib_wait_io(&hb);
stop = ktime_get();
if (!ret)
ret = err2;
@@ -1067,7 +1147,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
unsigned int m;
int ret = 0;
int eof = 0;
- struct bio *bio;
+ struct hib_bio_batch hb;
ktime_t start;
ktime_t stop;
unsigned nr_pages;
@@ -1080,6 +1160,8 @@ static int load_image_lzo(struct swap_map_handle *handle,
struct dec_data *data = NULL;
struct crc_data *crc = NULL;

+ hib_init_batch(&hb);
+
/*
* We'll limit the number of threads for decompression to limit memory
* footprint.
@@ -1190,7 +1272,6 @@ static int load_image_lzo(struct swap_map_handle *handle,
if (!m)
m = 1;
nr_pages = 0;
- bio = NULL;
start = ktime_get();

ret = snapshot_write_next(snapshot);
@@ -1199,7 +1280,7 @@ static int load_image_lzo(struct swap_map_handle *handle,

for(;;) {
for (i = 0; !eof && i < want; i++) {
- ret = swap_read_page(handle, page[ring], &bio);
+ ret = swap_read_page(handle, page[ring], &hb);
if (ret) {
/*
* On real read error, finish. On end of data,
@@ -1226,7 +1307,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
if (!asked)
break;

- ret = hib_wait_on_bio_chain(&bio);
+ ret = hib_wait_io(&hb);
if (ret)
goto out_finish;
have += asked;
@@ -1281,7 +1362,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
* Wait for more data while we are decompressing.
*/
if (have < LZO_CMP_PAGES && asked) {
- ret = hib_wait_on_bio_chain(&bio);
+ ret = hib_wait_io(&hb);
if (ret)
goto out_finish;
have += asked;
@@ -1430,7 +1511,7 @@ int swsusp_check(void)
if (!IS_ERR(hib_resume_bdev)) {
set_blocksize(hib_resume_bdev, PAGE_SIZE);
clear_page(swsusp_header);
- error = hib_bio_read_page(swsusp_resume_block,
+ error = hib_submit_io(READ_SYNC, swsusp_resume_block,
swsusp_header, NULL);
if (error)
goto put;
@@ -1438,7 +1519,7 @@ int swsusp_check(void)
if (!memcmp(HIBERNATE_SIG, swsusp_header->sig, 10)) {
memcpy(swsusp_header->sig, swsusp_header->orig_sig, 10);
/* Reset swap signature now */
- error = hib_bio_write_page(swsusp_resume_block,
+ error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
swsusp_header, NULL);
} else {
error = -EINVAL;
@@ -1482,10 +1563,10 @@ int swsusp_unmark(void)
{
int error;

- hib_bio_read_page(swsusp_resume_block, swsusp_header, NULL);
+ hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL);
if (!memcmp(HIBERNATE_SIG,swsusp_header->sig, 10)) {
memcpy(swsusp_header->sig,swsusp_header->orig_sig, 10);
- error = hib_bio_write_page(swsusp_resume_block,
+ error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
swsusp_header, NULL);
} else {
printk(KERN_ERR "PM: Cannot find swsusp signature!\n");
diff --git a/mm/page_io.c b/mm/page_io.c
index 6424869..520baa4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -69,7 +69,7 @@ void end_swap_bio_write(struct bio *bio, int err)
bio_put(bio);
}

-void end_swap_bio_read(struct bio *bio, int err)
+static void end_swap_bio_read(struct bio *bio, int err)
{
const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
struct page *page = bio->bi_io_vec[0].bv_page;
--
1.9.1

2015-04-25 06:23:57

by Ming Lin

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

On Fri, Apr 24, 2015 at 9:17 AM, Christoph Hellwig <[email protected]> wrote:
> On Thu, Apr 23, 2015 at 04:04:35PM -0700, Ming Lin wrote:
>> From: Kent Overstreet <[email protected]>
>>
>> Make bio submission in kernel/power/block_io.c to properly submit
>> bios also when bio_chain is not available. In that case, it's not
>> necessary to handle refcount with bio_get(), but it's saner to simply
>> call a predefined helper submit_bio_wait(). So call bio_get() only
>> when bio_chain is given.
>
> The patch looks correct, buth that whole code is a f****ing mess.
>
> For one it really shouldn't mess with pages states nor abuse
> end_swap_bio_read.
>
> Something like the untested patch below should do it, but it really
> needs some solid testing from people that know how to even exercise
> it.

Test suspend-to-disk in qemu-kvm, it works well.

Lv,
Do you have some kind of suspend-to-disk auto tests?

Thanks,
Ming

>
> ---
> From af56dde07c34b203f5c40c8864bfd55697b0aad0 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <[email protected]>
> Date: Fri, 24 Apr 2015 11:26:00 +0200
> Subject: suspend: sane block I/O handling
>
> stop abusing struct page functionality and the swap end_io handler, and
> instead add a modified version of the blk-lib.c bio_batch helpers.
>
> Also move the block I/O code into swap.c as they are directly tied into
> each other.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> include/linux/swap.h | 1 -
> kernel/power/Makefile | 3 +-
> kernel/power/block_io.c | 103 -------------------------------
> kernel/power/power.h | 9 ---
> kernel/power/swap.c | 159 ++++++++++++++++++++++++++++++++++++------------
> mm/page_io.c | 2 +-
> 6 files changed, 122 insertions(+), 155 deletions(-)
> delete mode 100644 kernel/power/block_io.c
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cee108c..3887472 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -377,7 +377,6 @@ extern void end_swap_bio_write(struct bio *bio, int err);
> extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
> void (*end_write_func)(struct bio *, int));
> extern int swap_set_page_dirty(struct page *page);
> -extern void end_swap_bio_read(struct bio *bio, int err);
>
> int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
> unsigned long nr_pages, sector_t start_block);
> diff --git a/kernel/power/Makefile b/kernel/power/Makefile
> index 29472bf..cb880a1 100644
> --- a/kernel/power/Makefile
> +++ b/kernel/power/Makefile
> @@ -7,8 +7,7 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += console.o
> obj-$(CONFIG_FREEZER) += process.o
> obj-$(CONFIG_SUSPEND) += suspend.o
> obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
> -obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o \
> - block_io.o
> +obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o
> obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
> obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o
>
> diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
> deleted file mode 100644
> index 9a58bc2..0000000
> --- a/kernel/power/block_io.c
> +++ /dev/null
> @@ -1,103 +0,0 @@
> -/*
> - * This file provides functions for block I/O operations on swap/file.
> - *
> - * Copyright (C) 1998,2001-2005 Pavel Machek <[email protected]>
> - * Copyright (C) 2006 Rafael J. Wysocki <[email protected]>
> - *
> - * This file is released under the GPLv2.
> - */
> -
> -#include <linux/bio.h>
> -#include <linux/kernel.h>
> -#include <linux/pagemap.h>
> -#include <linux/swap.h>
> -
> -#include "power.h"
> -
> -/**
> - * submit - submit BIO request.
> - * @rw: READ or WRITE.
> - * @off physical offset of page.
> - * @page: page we're reading or writing.
> - * @bio_chain: list of pending biod (for async reading)
> - *
> - * Straight from the textbook - allocate and initialize the bio.
> - * If we're reading, make sure the page is marked as dirty.
> - * Then submit it and, if @bio_chain == NULL, wait.
> - */
> -static int submit(int rw, struct block_device *bdev, sector_t sector,
> - struct page *page, struct bio **bio_chain)
> -{
> - const int bio_rw = rw | REQ_SYNC;
> - struct bio *bio;
> -
> - bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
> - bio->bi_iter.bi_sector = sector;
> - bio->bi_bdev = bdev;
> - bio->bi_end_io = end_swap_bio_read;
> -
> - if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
> - printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
> - (unsigned long long)sector);
> - bio_put(bio);
> - return -EFAULT;
> - }
> -
> - lock_page(page);
> - bio_get(bio);
> -
> - if (bio_chain == NULL) {
> - submit_bio(bio_rw, bio);
> - wait_on_page_locked(page);
> - if (rw == READ)
> - bio_set_pages_dirty(bio);
> - bio_put(bio);
> - } else {
> - if (rw == READ)
> - get_page(page); /* These pages are freed later */
> - bio->bi_private = *bio_chain;
> - *bio_chain = bio;
> - submit_bio(bio_rw, bio);
> - }
> - return 0;
> -}
> -
> -int hib_bio_read_page(pgoff_t page_off, void *addr, struct bio **bio_chain)
> -{
> - return submit(READ, hib_resume_bdev, page_off * (PAGE_SIZE >> 9),
> - virt_to_page(addr), bio_chain);
> -}
> -
> -int hib_bio_write_page(pgoff_t page_off, void *addr, struct bio **bio_chain)
> -{
> - return submit(WRITE, hib_resume_bdev, page_off * (PAGE_SIZE >> 9),
> - virt_to_page(addr), bio_chain);
> -}
> -
> -int hib_wait_on_bio_chain(struct bio **bio_chain)
> -{
> - struct bio *bio;
> - struct bio *next_bio;
> - int ret = 0;
> -
> - if (bio_chain == NULL)
> - return 0;
> -
> - bio = *bio_chain;
> - if (bio == NULL)
> - return 0;
> - while (bio) {
> - struct page *page;
> -
> - next_bio = bio->bi_private;
> - page = bio->bi_io_vec[0].bv_page;
> - wait_on_page_locked(page);
> - if (!PageUptodate(page) || PageError(page))
> - ret = -EIO;
> - put_page(page);
> - bio_put(bio);
> - bio = next_bio;
> - }
> - *bio_chain = NULL;
> - return ret;
> -}
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index ce9b832..caadb56 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -163,15 +163,6 @@ extern void swsusp_close(fmode_t);
> extern int swsusp_unmark(void);
> #endif
>
> -/* kernel/power/block_io.c */
> -extern struct block_device *hib_resume_bdev;
> -
> -extern int hib_bio_read_page(pgoff_t page_off, void *addr,
> - struct bio **bio_chain);
> -extern int hib_bio_write_page(pgoff_t page_off, void *addr,
> - struct bio **bio_chain);
> -extern int hib_wait_on_bio_chain(struct bio **bio_chain);
> -
> struct timeval;
> /* kernel/power/swsusp.c */
> extern void swsusp_show_speed(ktime_t, ktime_t, unsigned int, char *);
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index 570aff8..8a0b64d 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -212,7 +212,84 @@ int swsusp_swap_in_use(void)
> */
>
> static unsigned short root_swap = 0xffff;
> -struct block_device *hib_resume_bdev;
> +static struct block_device *hib_resume_bdev;
> +
> +struct hib_bio_batch {
> + atomic_t count;
> + wait_queue_head_t wait;
> + int error;
> +};
> +
> +static void hib_init_batch(struct hib_bio_batch *hb)
> +{
> + atomic_set(&hb->count, 0);
> + init_waitqueue_head(&hb->wait);
> + hb->error = 0;
> +}
> +
> +static void hib_end_io(struct bio *bio, int error)
> +{
> + struct hib_bio_batch *hb = bio->bi_private;
> + const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> + struct page *page = bio->bi_io_vec[0].bv_page;
> +
> + if (!uptodate || error) {
> + printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n",
> + imajor(bio->bi_bdev->bd_inode),
> + iminor(bio->bi_bdev->bd_inode),
> + (unsigned long long)bio->bi_iter.bi_sector);
> +
> + if (!error)
> + error = -EIO;
> + }
> +
> + if (bio_data_dir(bio) == WRITE)
> + put_page(page);
> +
> + if (error && !hb->error)
> + hb->error = error;
> + if (atomic_dec_and_test(&hb->count))
> + wake_up(&hb->wait);
> +
> + bio_put(bio);
> +}
> +
> +static int hib_submit_io(int rw, pgoff_t page_off, void *addr,
> + struct hib_bio_batch *hb)
> +{
> + struct page *page = virt_to_page(addr);
> + struct bio *bio;
> + int error = 0;
> +
> + bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
> + bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9);
> + bio->bi_bdev = hib_resume_bdev;
> +
> + if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
> + printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
> + (unsigned long long)bio->bi_iter.bi_sector);
> + bio_put(bio);
> + return -EFAULT;
> + }
> +
> + if (hb) {
> + bio->bi_end_io = hib_end_io;
> + bio->bi_private = hb;
> + atomic_inc(&hb->count);
> + submit_bio(rw, bio);
> + } else {
> + error = submit_bio_wait(rw, bio);
> + bio_put(bio);
> + }
> +
> + return error;
> +}
> +
> +static int hib_wait_io(struct hib_bio_batch *hb)
> +{
> + wait_event(hb->wait, atomic_read(&hb->count) == 0);
> + return hb->error;
> +}
>
> /*
> * Saving part
> @@ -222,7 +299,7 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags)
> {
> int error;
>
> - hib_bio_read_page(swsusp_resume_block, swsusp_header, NULL);
> + hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL);
> if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) ||
> !memcmp("SWAPSPACE2",swsusp_header->sig, 10)) {
> memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10);
> @@ -231,7 +308,7 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags)
> swsusp_header->flags = flags;
> if (flags & SF_CRC32_MODE)
> swsusp_header->crc32 = handle->crc32;
> - error = hib_bio_write_page(swsusp_resume_block,
> + error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
> swsusp_header, NULL);
> } else {
> printk(KERN_ERR "PM: Swap header not found!\n");
> @@ -271,10 +348,10 @@ static int swsusp_swap_check(void)
> * write_page - Write one page to given swap location.
> * @buf: Address we're writing.
> * @offset: Offset of the swap page we're writing to.
> - * @bio_chain: Link the next write BIO here
> + * @hb: bio completion batch
> */
>
> -static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
> +static int write_page(void *buf, sector_t offset, struct hib_bio_batch *hb)
> {
> void *src;
> int ret;
> @@ -282,13 +359,13 @@ static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
> if (!offset)
> return -ENOSPC;
>
> - if (bio_chain) {
> + if (hb) {
> src = (void *)__get_free_page(__GFP_WAIT | __GFP_NOWARN |
> __GFP_NORETRY);
> if (src) {
> copy_page(src, buf);
> } else {
> - ret = hib_wait_on_bio_chain(bio_chain); /* Free pages */
> + ret = hib_wait_io(hb); /* Free pages */
> if (ret)
> return ret;
> src = (void *)__get_free_page(__GFP_WAIT |
> @@ -298,14 +375,14 @@ static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
> copy_page(src, buf);
> } else {
> WARN_ON_ONCE(1);
> - bio_chain = NULL; /* Go synchronous */
> + hb = NULL; /* Go synchronous */
> src = buf;
> }
> }
> } else {
> src = buf;
> }
> - return hib_bio_write_page(offset, src, bio_chain);
> + return hib_submit_io(WRITE_SYNC, offset, src, hb);
> }
>
> static void release_swap_writer(struct swap_map_handle *handle)
> @@ -348,7 +425,7 @@ err_close:
> }
>
> static int swap_write_page(struct swap_map_handle *handle, void *buf,
> - struct bio **bio_chain)
> + struct hib_bio_batch *hb)
> {
> int error = 0;
> sector_t offset;
> @@ -356,7 +433,7 @@ static int swap_write_page(struct swap_map_handle *handle, void *buf,
> if (!handle->cur)
> return -EINVAL;
> offset = alloc_swapdev_block(root_swap);
> - error = write_page(buf, offset, bio_chain);
> + error = write_page(buf, offset, hb);
> if (error)
> return error;
> handle->cur->entries[handle->k++] = offset;
> @@ -365,15 +442,15 @@ static int swap_write_page(struct swap_map_handle *handle, void *buf,
> if (!offset)
> return -ENOSPC;
> handle->cur->next_swap = offset;
> - error = write_page(handle->cur, handle->cur_swap, bio_chain);
> + error = write_page(handle->cur, handle->cur_swap, hb);
> if (error)
> goto out;
> clear_page(handle->cur);
> handle->cur_swap = offset;
> handle->k = 0;
>
> - if (bio_chain && low_free_pages() <= handle->reqd_free_pages) {
> - error = hib_wait_on_bio_chain(bio_chain);
> + if (hb && low_free_pages() <= handle->reqd_free_pages) {
> + error = hib_wait_io(hb);
> if (error)
> goto out;
> /*
> @@ -445,23 +522,24 @@ static int save_image(struct swap_map_handle *handle,
> int ret;
> int nr_pages;
> int err2;
> - struct bio *bio;
> + struct hib_bio_batch hb;
> ktime_t start;
> ktime_t stop;
>
> + hib_init_batch(&hb);
> +
> printk(KERN_INFO "PM: Saving image data pages (%u pages)...\n",
> nr_to_write);
> m = nr_to_write / 10;
> if (!m)
> m = 1;
> nr_pages = 0;
> - bio = NULL;
> start = ktime_get();
> while (1) {
> ret = snapshot_read_next(snapshot);
> if (ret <= 0)
> break;
> - ret = swap_write_page(handle, data_of(*snapshot), &bio);
> + ret = swap_write_page(handle, data_of(*snapshot), &hb);
> if (ret)
> break;
> if (!(nr_pages % m))
> @@ -469,7 +547,7 @@ static int save_image(struct swap_map_handle *handle,
> nr_pages / m * 10);
> nr_pages++;
> }
> - err2 = hib_wait_on_bio_chain(&bio);
> + err2 = hib_wait_io(&hb);
> stop = ktime_get();
> if (!ret)
> ret = err2;
> @@ -580,7 +658,7 @@ static int save_image_lzo(struct swap_map_handle *handle,
> int ret = 0;
> int nr_pages;
> int err2;
> - struct bio *bio;
> + struct hib_bio_batch hb;
> ktime_t start;
> ktime_t stop;
> size_t off;
> @@ -589,6 +667,8 @@ static int save_image_lzo(struct swap_map_handle *handle,
> struct cmp_data *data = NULL;
> struct crc_data *crc = NULL;
>
> + hib_init_batch(&hb);
> +
> /*
> * We'll limit the number of threads for compression to limit memory
> * footprint.
> @@ -674,7 +754,6 @@ static int save_image_lzo(struct swap_map_handle *handle,
> if (!m)
> m = 1;
> nr_pages = 0;
> - bio = NULL;
> start = ktime_get();
> for (;;) {
> for (thr = 0; thr < nr_threads; thr++) {
> @@ -748,7 +827,7 @@ static int save_image_lzo(struct swap_map_handle *handle,
> off += PAGE_SIZE) {
> memcpy(page, data[thr].cmp + off, PAGE_SIZE);
>
> - ret = swap_write_page(handle, page, &bio);
> + ret = swap_write_page(handle, page, &hb);
> if (ret)
> goto out_finish;
> }
> @@ -759,7 +838,7 @@ static int save_image_lzo(struct swap_map_handle *handle,
> }
>
> out_finish:
> - err2 = hib_wait_on_bio_chain(&bio);
> + err2 = hib_wait_io(&hb);
> stop = ktime_get();
> if (!ret)
> ret = err2;
> @@ -906,7 +985,7 @@ static int get_swap_reader(struct swap_map_handle *handle,
> return -ENOMEM;
> }
>
> - error = hib_bio_read_page(offset, tmp->map, NULL);
> + error = hib_submit_io(READ_SYNC, offset, tmp->map, NULL);
> if (error) {
> release_swap_reader(handle);
> return error;
> @@ -919,7 +998,7 @@ static int get_swap_reader(struct swap_map_handle *handle,
> }
>
> static int swap_read_page(struct swap_map_handle *handle, void *buf,
> - struct bio **bio_chain)
> + struct hib_bio_batch *hb)
> {
> sector_t offset;
> int error;
> @@ -930,7 +1009,7 @@ static int swap_read_page(struct swap_map_handle *handle, void *buf,
> offset = handle->cur->entries[handle->k];
> if (!offset)
> return -EFAULT;
> - error = hib_bio_read_page(offset, buf, bio_chain);
> + error = hib_submit_io(READ_SYNC, offset, buf, hb);
> if (error)
> return error;
> if (++handle->k >= MAP_PAGE_ENTRIES) {
> @@ -968,27 +1047,28 @@ static int load_image(struct swap_map_handle *handle,
> int ret = 0;
> ktime_t start;
> ktime_t stop;
> - struct bio *bio;
> + struct hib_bio_batch hb;
> int err2;
> unsigned nr_pages;
>
> + hib_init_batch(&hb);
> +
> printk(KERN_INFO "PM: Loading image data pages (%u pages)...\n",
> nr_to_read);
> m = nr_to_read / 10;
> if (!m)
> m = 1;
> nr_pages = 0;
> - bio = NULL;
> start = ktime_get();
> for ( ; ; ) {
> ret = snapshot_write_next(snapshot);
> if (ret <= 0)
> break;
> - ret = swap_read_page(handle, data_of(*snapshot), &bio);
> + ret = swap_read_page(handle, data_of(*snapshot), &hb);
> if (ret)
> break;
> if (snapshot->sync_read)
> - ret = hib_wait_on_bio_chain(&bio);
> + ret = hib_wait_io(&hb);
> if (ret)
> break;
> if (!(nr_pages % m))
> @@ -996,7 +1076,7 @@ static int load_image(struct swap_map_handle *handle,
> nr_pages / m * 10);
> nr_pages++;
> }
> - err2 = hib_wait_on_bio_chain(&bio);
> + err2 = hib_wait_io(&hb);
> stop = ktime_get();
> if (!ret)
> ret = err2;
> @@ -1067,7 +1147,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
> unsigned int m;
> int ret = 0;
> int eof = 0;
> - struct bio *bio;
> + struct hib_bio_batch hb;
> ktime_t start;
> ktime_t stop;
> unsigned nr_pages;
> @@ -1080,6 +1160,8 @@ static int load_image_lzo(struct swap_map_handle *handle,
> struct dec_data *data = NULL;
> struct crc_data *crc = NULL;
>
> + hib_init_batch(&hb);
> +
> /*
> * We'll limit the number of threads for decompression to limit memory
> * footprint.
> @@ -1190,7 +1272,6 @@ static int load_image_lzo(struct swap_map_handle *handle,
> if (!m)
> m = 1;
> nr_pages = 0;
> - bio = NULL;
> start = ktime_get();
>
> ret = snapshot_write_next(snapshot);
> @@ -1199,7 +1280,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
>
> for(;;) {
> for (i = 0; !eof && i < want; i++) {
> - ret = swap_read_page(handle, page[ring], &bio);
> + ret = swap_read_page(handle, page[ring], &hb);
> if (ret) {
> /*
> * On real read error, finish. On end of data,
> @@ -1226,7 +1307,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
> if (!asked)
> break;
>
> - ret = hib_wait_on_bio_chain(&bio);
> + ret = hib_wait_io(&hb);
> if (ret)
> goto out_finish;
> have += asked;
> @@ -1281,7 +1362,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
> * Wait for more data while we are decompressing.
> */
> if (have < LZO_CMP_PAGES && asked) {
> - ret = hib_wait_on_bio_chain(&bio);
> + ret = hib_wait_io(&hb);
> if (ret)
> goto out_finish;
> have += asked;
> @@ -1430,7 +1511,7 @@ int swsusp_check(void)
> if (!IS_ERR(hib_resume_bdev)) {
> set_blocksize(hib_resume_bdev, PAGE_SIZE);
> clear_page(swsusp_header);
> - error = hib_bio_read_page(swsusp_resume_block,
> + error = hib_submit_io(READ_SYNC, swsusp_resume_block,
> swsusp_header, NULL);
> if (error)
> goto put;
> @@ -1438,7 +1519,7 @@ int swsusp_check(void)
> if (!memcmp(HIBERNATE_SIG, swsusp_header->sig, 10)) {
> memcpy(swsusp_header->sig, swsusp_header->orig_sig, 10);
> /* Reset swap signature now */
> - error = hib_bio_write_page(swsusp_resume_block,
> + error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
> swsusp_header, NULL);
> } else {
> error = -EINVAL;
> @@ -1482,10 +1563,10 @@ int swsusp_unmark(void)
> {
> int error;
>
> - hib_bio_read_page(swsusp_resume_block, swsusp_header, NULL);
> + hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL);
> if (!memcmp(HIBERNATE_SIG,swsusp_header->sig, 10)) {
> memcpy(swsusp_header->sig,swsusp_header->orig_sig, 10);
> - error = hib_bio_write_page(swsusp_resume_block,
> + error = hib_submit_io(WRITE_SYNC, swsusp_resume_block,
> swsusp_header, NULL);
> } else {
> printk(KERN_ERR "PM: Cannot find swsusp signature!\n");
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 6424869..520baa4 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -69,7 +69,7 @@ void end_swap_bio_write(struct bio *bio, int err)
> bio_put(bio);
> }
>
> -void end_swap_bio_read(struct bio *bio, int err)
> +static void end_swap_bio_read(struct bio *bio, int err)
> {
> const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
> struct page *page = bio->bi_io_vec[0].bv_page;
> --
> 1.9.1
>

2015-04-30 17:34:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

Hi!

> >From af56dde07c34b203f5c40c8864bfd55697b0aad0 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <[email protected]>
> Date: Fri, 24 Apr 2015 11:26:00 +0200
> Subject: suspend: sane block I/O handling
>
> stop abusing struct page functionality and the swap end_io handler, and
> instead add a modified version of the blk-lib.c bio_batch helpers.
>
> Also move the block I/O code into swap.c as they are directly tied into
> each other.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

Tested-by: Pavel Machek <[email protected]>
Acked-by: Pavel Machek <[email protected]>

(thinkpad x60 in shutdown mode, platform mode has some problems, but
they are probably not related).

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-05-01 17:22:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

On Thu, Apr 30, 2015 at 07:34:41PM +0200, Pavel Machek wrote:
> > Signed-off-by: Christoph Hellwig <[email protected]>
>
> Tested-by: Pavel Machek <[email protected]>
> Acked-by: Pavel Machek <[email protected]>
>
> (thinkpad x60 in shutdown mode, platform mode has some problems, but
> they are probably not related).

Thanks! Rafeal, do you want to take this through the pm tree, or can we
include it with the block patches?

2015-05-04 13:49:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

On Friday, May 01, 2015 10:21:56 AM Christoph Hellwig wrote:
> On Thu, Apr 30, 2015 at 07:34:41PM +0200, Pavel Machek wrote:
> > > Signed-off-by: Christoph Hellwig <[email protected]>
> >
> > Tested-by: Pavel Machek <[email protected]>
> > Acked-by: Pavel Machek <[email protected]>
> >
> > (thinkpad x60 in shutdown mode, platform mode has some problems, but
> > they are probably not related).
>
> Thanks! Rafeal, do you want to take this through the pm tree, or can we
> include it with the block patches?


Both work for me and I'm not aware of any dependencies on the PM side.

So, I guess, please go ahead and take it with the block patches. Feel free
to add my ACK to it too.

Rafael

2015-05-05 09:23:49

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] PM: submit bio in a sane way in cases without bio_chain

On Thu 2015-04-30 18:47:17, Pavel Machek wrote:
> Hi!
>
> > >From af56dde07c34b203f5c40c8864bfd55697b0aad0 Mon Sep 17 00:00:00 2001
> > From: Christoph Hellwig <[email protected]>
> > Date: Fri, 24 Apr 2015 11:26:00 +0200
> > Subject: suspend: sane block I/O handling
> >
> > stop abusing struct page functionality and the swap end_io handler, and
> > instead add a modified version of the blk-lib.c bio_batch helpers.
> >
> > Also move the block I/O code into swap.c as they are directly tied into
> > each other.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
>
> Tested-by: Pavel Machek <[email protected]>
> Acked-by: Pavel Machek <[email protected]>
>
> (thinkpad x60 in shutdown mode, platform mode has some problems, but
> they are probably not related).

There seems to be extra trailing whitespace in this function:

+static void hib_end_io(struct bio *bio, int error)
+{
+ struct hib_bio_batch *hb = bio->bi_private;
+ const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+ struct page *page = bio->bi_io_vec[0].bv_page;
+
+ if (!uptodate || error) {
+ printk(KERN_ALERT "Read-error on swap-device
(%u:%u:%Lu)\n",
+ imajor(bio->bi_bdev->bd_inode),
+ iminor(bio->bi_bdev->bd_inode),
+ (unsigned long
long)bio->bi_iter.bi_sector);
+
+ if (!error)
+ error = -EIO;
+ }
+
+ if (bio_data_dir(bio) == WRITE)
+ put_page(page);


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html