2007-10-26 07:16:53

by Sebastian Siewior

[permalink] [raw]
Subject: [BUG] panic after umount (biscted)

After umount of
/dev/md/1 on /mnt/sec type xfs (rw,nosuid,nodev,noatime,nodiratime)

I end up with [1] on v2.6.24-rc1. It worked fine with v2.6.23. Bisec
came to conclusion that

|fd5d806266935179deda1502101624832eacd01f is first bad commit
|commit fd5d806266935179deda1502101624832eacd01f
|Author: Jens Axboe <[email protected]>
|Date: Tue Oct 16 11:05:02 2007 +0200
|
| block: convert blkdev_issue_flush() to use empty barriers
|
| Then we can get rid of ->issue_flush_fn() and all the driver private
| implementations of that.
|
| Signed-off-by: Jens Axboe <[email protected]>
|
|:040000 040000 75f5c511fd785fa4c75b955f1818c382593a22ec adf460e288740ca9c0b5ab823dfc2ca04980f3d2 M block
|:040000 040000 a972fe9a2433a00c79fd5ead584c482cd08b4b2c 0f4534094b63fe21c5cf8a6170736abe5c518c00 M drivers
|:040000 040000 ba2a9f681826db89ba5192a7db7139baecde786a fc68a77a41d25c283f39f712216566e9b36d00fc M include

is the bad boy here.
The last panic (with first bad commit) is [2] (there is almost no
difference). My .config [3], lspci [4], bisec-log [5]. The md1
raid is a

|Personalities : [raid1]
|md1 : active raid1 sda2[1]
| 174843328 blocks [2/1] [_U]
|

I hope this was usefull. Now, I'm going to rebuild my raid now....

[1] http://download.breakpoint.cc/bug/bug_rc1.jpeg 166 KiB
[2] http://download.breakpoint.cc/bug/bug_last_commit.jpeg 217 KiB
[3] http://download.breakpoint.cc/bug/config.txt 27 KiB
[4] http://download.breakpoint.cc/bug/lspci.txt 2.3 KiB
[5] http://download.breakpoint.cc/bug/git-bisect-log.txt 2.3K

Sebastian


2007-10-26 09:29:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

On Fri, Oct 26 2007, Sebastian Siewior wrote:
> After umount of
> /dev/md/1 on /mnt/sec type xfs (rw,nosuid,nodev,noatime,nodiratime)
>
> I end up with [1] on v2.6.24-rc1. It worked fine with v2.6.23. Bisec
> came to conclusion that
>
> |fd5d806266935179deda1502101624832eacd01f is first bad commit
> |commit fd5d806266935179deda1502101624832eacd01f
> |Author: Jens Axboe <[email protected]>
> |Date: Tue Oct 16 11:05:02 2007 +0200
> |
> | block: convert blkdev_issue_flush() to use empty barriers
> |
> | Then we can get rid of ->issue_flush_fn() and all the driver private
> | implementations of that.
> |
> | Signed-off-by: Jens Axboe <[email protected]>
> |
> |:040000 040000 75f5c511fd785fa4c75b955f1818c382593a22ec adf460e288740ca9c0b5ab823dfc2ca04980f3d2 M block
> |:040000 040000 a972fe9a2433a00c79fd5ead584c482cd08b4b2c 0f4534094b63fe21c5cf8a6170736abe5c518c00 M drivers
> |:040000 040000 ba2a9f681826db89ba5192a7db7139baecde786a fc68a77a41d25c283f39f712216566e9b36d00fc M include
>
> is the bad boy here.
> The last panic (with first bad commit) is [2] (there is almost no
> difference). My .config [3], lspci [4], bisec-log [5]. The md1
> raid is a
>
> |Personalities : [raid1]
> |md1 : active raid1 sda2[1]
> | 174843328 blocks [2/1] [_U]
> |
>
> I hope this was usefull. Now, I'm going to rebuild my raid now....
>
> [1] http://download.breakpoint.cc/bug/bug_rc1.jpeg 166 KiB
> [2] http://download.breakpoint.cc/bug/bug_last_commit.jpeg 217 KiB
> [3] http://download.breakpoint.cc/bug/config.txt 27 KiB
> [4] http://download.breakpoint.cc/bug/lspci.txt 2.3 KiB
> [5] http://download.breakpoint.cc/bug/git-bisect-log.txt 2.3K

Thanks a lot, a full report on this issue. Will get this fixed up asap.

--
Jens Axboe

2007-10-26 09:34:44

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

On Fri, Oct 26 2007, Jens Axboe wrote:
> On Fri, Oct 26 2007, Sebastian Siewior wrote:
> > After umount of
> > /dev/md/1 on /mnt/sec type xfs (rw,nosuid,nodev,noatime,nodiratime)
> >
> > I end up with [1] on v2.6.24-rc1. It worked fine with v2.6.23. Bisec
> > came to conclusion that
> >
> > |fd5d806266935179deda1502101624832eacd01f is first bad commit
> > |commit fd5d806266935179deda1502101624832eacd01f
> > |Author: Jens Axboe <[email protected]>
> > |Date: Tue Oct 16 11:05:02 2007 +0200
> > |
> > | block: convert blkdev_issue_flush() to use empty barriers
> > |
> > | Then we can get rid of ->issue_flush_fn() and all the driver private
> > | implementations of that.
> > |
> > | Signed-off-by: Jens Axboe <[email protected]>
> > |
> > |:040000 040000 75f5c511fd785fa4c75b955f1818c382593a22ec adf460e288740ca9c0b5ab823dfc2ca04980f3d2 M block
> > |:040000 040000 a972fe9a2433a00c79fd5ead584c482cd08b4b2c 0f4534094b63fe21c5cf8a6170736abe5c518c00 M drivers
> > |:040000 040000 ba2a9f681826db89ba5192a7db7139baecde786a fc68a77a41d25c283f39f712216566e9b36d00fc M include
> >
> > is the bad boy here.
> > The last panic (with first bad commit) is [2] (there is almost no
> > difference). My .config [3], lspci [4], bisec-log [5]. The md1
> > raid is a
> >
> > |Personalities : [raid1]
> > |md1 : active raid1 sda2[1]
> > | 174843328 blocks [2/1] [_U]
> > |
> >
> > I hope this was usefull. Now, I'm going to rebuild my raid now....
> >
> > [1] http://download.breakpoint.cc/bug/bug_rc1.jpeg 166 KiB
> > [2] http://download.breakpoint.cc/bug/bug_last_commit.jpeg 217 KiB
> > [3] http://download.breakpoint.cc/bug/config.txt 27 KiB
> > [4] http://download.breakpoint.cc/bug/lspci.txt 2.3 KiB
> > [5] http://download.breakpoint.cc/bug/git-bisect-log.txt 2.3K
>
> Thanks a lot, a full report on this issue. Will get this fixed up asap.

Does this work?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..cf47fcb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
* kmapping pages)
*/
cmd->use_sg = req->nr_phys_segments;
+ if (!cmd->use_sg)
+ return 0;

/*
* If sg table allocation fails, requeue request later.
@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
if (req->bio) {
int ret;

- BUG_ON(!req->nr_phys_segments);
+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);

ret = scsi_init_io(cmd);
if (unlikely(ret))

--
Jens Axboe

2007-10-26 11:35:33

by Sebastian Siewior

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

* Jens Axboe | 2007-10-26 11:32:42 [+0200]:

>On Fri, Oct 26 2007, Jens Axboe wrote:
>> >
>> > I hope this was usefull. Now, I'm going to rebuild my raid now....
>> >
>> Thanks a lot, a full report on this issue. Will get this fixed up asap.
No problem, thanks for working on that :)

>
>Does this work?
>
>diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>index 61fdaf0..cf47fcb 100644
>--- a/drivers/scsi/scsi_lib.c
>+++ b/drivers/scsi/scsi_lib.c
>@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
> * kmapping pages)
> */
> cmd->use_sg = req->nr_phys_segments;
>+ if (!cmd->use_sg)
>+ return 0;
>
> /*
> * If sg table allocation fails, requeue request later.
>@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
> if (req->bio) {
> int ret;
>
>- BUG_ON(!req->nr_phys_segments);
>+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);
>
> ret = scsi_init_io(cmd);
> if (unlikely(ret))
>

Nope. I get [1] on manual umount and [2] on system reboot. This is
24-rc1 with this patch on top.

[1] http://download.breakpoint.cc/bug/bug_rc1_patch_manual.jpeg 163 KiB
[2] http://download.breakpoint.cc/bug/bug_rc1_patch_reboot.jpeg 171 KiB
>--
>Jens Axboe

Sebastian

2007-10-26 11:44:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

On Fri, Oct 26 2007, Sebastian Siewior wrote:
> * Jens Axboe | 2007-10-26 11:32:42 [+0200]:
>
> >On Fri, Oct 26 2007, Jens Axboe wrote:
> >> >
> >> > I hope this was usefull. Now, I'm going to rebuild my raid now....
> >> >
> >> Thanks a lot, a full report on this issue. Will get this fixed up asap.
> No problem, thanks for working on that :)
>
> >
> >Does this work?
> >
> >diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >index 61fdaf0..cf47fcb 100644
> >--- a/drivers/scsi/scsi_lib.c
> >+++ b/drivers/scsi/scsi_lib.c
> >@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
> > * kmapping pages)
> > */
> > cmd->use_sg = req->nr_phys_segments;
> >+ if (!cmd->use_sg)
> >+ return 0;
> >
> > /*
> > * If sg table allocation fails, requeue request later.
> >@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
> > if (req->bio) {
> > int ret;
> >
> >- BUG_ON(!req->nr_phys_segments);
> >+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);
> >
> > ret = scsi_init_io(cmd);
> > if (unlikely(ret))
> >
>
> Nope. I get [1] on manual umount and [2] on system reboot. This is
> 24-rc1 with this patch on top.
>
> [1] http://download.breakpoint.cc/bug/bug_rc1_patch_manual.jpeg 163 KiB
> [2] http://download.breakpoint.cc/bug/bug_rc1_patch_reboot.jpeg 171 KiB

Ah, second BUG() for same issue. Try this one. This?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..57fde7b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
* kmapping pages)
*/
cmd->use_sg = req->nr_phys_segments;
+ if (!cmd->use_sg)
+ return 0;

/*
* If sg table allocation fails, requeue request later.
@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
if (req->bio) {
int ret;

- BUG_ON(!req->nr_phys_segments);
+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);

ret = scsi_init_io(cmd);
if (unlikely(ret))
@@ -1236,9 +1238,10 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
if (ret != BLKPREP_OK)
return ret;
/*
- * Filesystem requests must transfer data.
+ * Filesystem requests must transfer data, unless it's an empty
+ * barrier.
*/
- BUG_ON(!req->nr_phys_segments);
+ BUG_ON(!req->nr_phys_segments && !bio_empty_barrier(req->bio));

cmd = scsi_get_cmd_from_req(sdev, req);
if (unlikely(!cmd))

--
Jens Axboe

2007-10-27 10:44:47

by Sebastian Siewior

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

* Jens Axboe | 2007-10-26 13:42:30 [+0200]:

>> [2] http://download.breakpoint.cc/bug/bug_rc1_patch_reboot.jpeg 171 KiB
>
>Ah, second BUG() for same issue. Try this one. This?
>
>diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>index 61fdaf0..57fde7b 100644
>--- a/drivers/scsi/scsi_lib.c
>+++ b/drivers/scsi/scsi_lib.c
>@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
> * kmapping pages)
> */
> cmd->use_sg = req->nr_phys_segments;
>+ if (!cmd->use_sg)
>+ return 0;
>
> /*
> * If sg table allocation fails, requeue request later.
>@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
> if (req->bio) {
> int ret;
>
>- BUG_ON(!req->nr_phys_segments);
>+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);
>
> ret = scsi_init_io(cmd);
> if (unlikely(ret))
>@@ -1236,9 +1238,10 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
> if (ret != BLKPREP_OK)
> return ret;
> /*
>- * Filesystem requests must transfer data.
>+ * Filesystem requests must transfer data, unless it's an empty
>+ * barrier.
> */
>- BUG_ON(!req->nr_phys_segments);
>+ BUG_ON(!req->nr_phys_segments && !bio_empty_barrier(req->bio));
>
> cmd = scsi_get_cmd_from_req(sdev, req);
> if (unlikely(!cmd))
>

I'm afraid you did not make it to the next level. I hope you have
another man :). [1] shows the result. I double checked it, it seems to
be the same bug() with the second patch.

[1] http://download.breakpoint.cc/bug/bug_patch2.jpeg 134 KiB

>Jens Axboe

Sebastian

2007-10-27 11:39:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

On Sat, Oct 27 2007, Sebastian Siewior wrote:
> * Jens Axboe | 2007-10-26 13:42:30 [+0200]:
>
> >> [2] http://download.breakpoint.cc/bug/bug_rc1_patch_reboot.jpeg 171 KiB
> >
> >Ah, second BUG() for same issue. Try this one. This?
> >
> >diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >index 61fdaf0..57fde7b 100644
> >--- a/drivers/scsi/scsi_lib.c
> >+++ b/drivers/scsi/scsi_lib.c
> >@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
> > * kmapping pages)
> > */
> > cmd->use_sg = req->nr_phys_segments;
> >+ if (!cmd->use_sg)
> >+ return 0;
> >
> > /*
> > * If sg table allocation fails, requeue request later.
> >@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
> > if (req->bio) {
> > int ret;
> >
> >- BUG_ON(!req->nr_phys_segments);
> >+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);
> >
> > ret = scsi_init_io(cmd);
> > if (unlikely(ret))
> >@@ -1236,9 +1238,10 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
> > if (ret != BLKPREP_OK)
> > return ret;
> > /*
> >- * Filesystem requests must transfer data.
> >+ * Filesystem requests must transfer data, unless it's an empty
> >+ * barrier.
> > */
> >- BUG_ON(!req->nr_phys_segments);
> >+ BUG_ON(!req->nr_phys_segments && !bio_empty_barrier(req->bio));
> >
> > cmd = scsi_get_cmd_from_req(sdev, req);
> > if (unlikely(!cmd))
> >
>
> I'm afraid you did not make it to the next level. I hope you have
> another man :). [1] shows the result. I double checked it, it seems to
> be the same bug() with the second patch.
>
> [1] http://download.breakpoint.cc/bug/bug_patch2.jpeg 134 KiB

OK, can you see what this produces?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..4042269 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
* kmapping pages)
*/
cmd->use_sg = req->nr_phys_segments;
+ if (!cmd->use_sg)
+ return 0;

/*
* If sg table allocation fails, requeue request later.
@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
if (req->bio) {
int ret;

- BUG_ON(!req->nr_phys_segments);
+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);

ret = scsi_init_io(cmd);
if (unlikely(ret))
@@ -1236,9 +1238,11 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
if (ret != BLKPREP_OK)
return ret;
/*
- * Filesystem requests must transfer data.
+ * Filesystem requests must transfer data, unless it's an empty
+ * barrier.
*/
- BUG_ON(!req->nr_phys_segments);
+ if (!req->nr_phys_segments && !bio_empty_barrier(req->bio))
+ blk_dump_rq_flags(req, "scsi");

cmd = scsi_get_cmd_from_req(sdev, req);
if (unlikely(!cmd))

--
Jens Axboe

2007-10-27 17:56:54

by Sebastian Siewior

[permalink] [raw]
Subject: Re: [BUG] panic after umount (biscted)

* Jens Axboe | 2007-10-27 13:39:16 [+0200]:

>> [1] http://download.breakpoint.cc/bug/bug_patch2.jpeg 134 KiB
>
>OK, can you see what this produces?
>
>diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>index 61fdaf0..4042269 100644
>--- a/drivers/scsi/scsi_lib.c
>+++ b/drivers/scsi/scsi_lib.c
>@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
> * kmapping pages)
> */
> cmd->use_sg = req->nr_phys_segments;
>+ if (!cmd->use_sg)
>+ return 0;
>
> /*
> * If sg table allocation fails, requeue request later.
>@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request *req)
> if (req->bio) {
> int ret;
>
>- BUG_ON(!req->nr_phys_segments);
>+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size);
>
> ret = scsi_init_io(cmd);
> if (unlikely(ret))
>@@ -1236,9 +1238,11 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
> if (ret != BLKPREP_OK)
> return ret;
> /*
>- * Filesystem requests must transfer data.
>+ * Filesystem requests must transfer data, unless it's an empty
>+ * barrier.
> */
>- BUG_ON(!req->nr_phys_segments);
>+ if (!req->nr_phys_segments && !bio_empty_barrier(req->bio))
>+ blk_dump_rq_flags(req, "scsi");
>
> cmd = scsi_get_cmd_from_req(sdev, req);
> if (unlikely(!cmd))

Good, [1] has the dmesg output after umount.

[1] http://download.breakpoint.cc/bug/bug_patch3.jpeg 36 KiB
>--
>Jens Axboe

Sebastian