2014-04-12 11:34:40

by Sander Eikelenboom

[permalink] [raw]
Subject: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[<ffffffff814c6bc1>] [<ffffffff814c6bc1>] kobject_put+0x11/0x70

Hi,

I just ran into the oops belowafter some uptime.

--
Sander

[175753.946560] IP: [<ffffffff814c6bc1>] kobject_put+0x11/0x70
[175753.964484] PGD 0
[175753.982157] Oops: 0000 [#1] SMP
[175753.999575] Modules linked in:
[175754.016705] CPU: 4 PID: 23869 Comm: kworker/u12:3 Not tainted 3.14.0-mw-20140409a+ #1
[175754.033879] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010
[175754.050839] Workqueue: writeback bdi_writeback_workfn (flush-8:16)
[175754.067560] task: ffff88000f1d91e0 ti: ffff8800046c6000 task.ti: ffff8800046c6000
[175754.084258] RIP: e030:[<ffffffff814c6bc1>] [<ffffffff814c6bc1>] kobject_put+0x11/0x70
[175754.100764] RSP: e02b:ffff8800046c76f8 EFLAGS: 00010002
[175754.117096] RAX: 0000000000000001 RBX: 00000000000001d0 RCX: 00000000010f5319
[175754.133166] RDX: 00000000010f5318 RSI: ffff88005f717e20 RDI: 00000000000001d0
[175754.149101] RBP: ffff8800046c7708 R08: 0000000000017e20 R09: ffff880057588718
[175754.164706] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[175754.180080] R13: 0000000000000020 R14: 0000000000080000 R15: ffff88002ad154c8
[175754.195262] FS: 00007f1ac8154700(0000) GS:ffff88005f700000(0000) knlGS:0000000000000000
[175754.210318] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[175754.225097] CR2: 000000000000020c CR3: 0000000002210000 CR4: 0000000000000660
[175754.239767] Stack:
[175754.254060] ffff88000835cc40 ffff88000835cc40 ffff8800046c7718 ffffffff816e04b7
[175754.268330] ffff8800046c7768 ffffffff8171002c ffff8800591589c0 ffff8800571da000
[175754.282367] ffff8800046c7768 ffff880059158800 ffff88002ad154c8 0000000000000000
[175754.296180] Call Trace:
[175754.309706] [<ffffffff816e04b7>] put_device+0x17/0x20
[175754.323129] [<ffffffff8171002c>] scsi_init_io+0x10c/0x170
[175754.336275] [<ffffffff81710266>] scsi_setup_fs_cmnd+0x66/0xa0
[175754.349247] [<ffffffff8171bf44>] sd_prep_fn+0x2a4/0xd30
[175754.362023] [<ffffffff817000bd>] ? frontend_changed+0x2dd/0x3e0
[175754.374461] [<ffffffff814a34ee>] blk_peek_request+0xbe/0x220
[175754.386674] [<ffffffff8170fcbd>] ? scsi_request_fn+0x2dd/0x470
[175754.398681] [<ffffffff8170fa21>] scsi_request_fn+0x41/0x470
[175754.410403] [<ffffffff81116e85>] ? lock_acquire+0xe5/0x150
[175754.421913] [<ffffffff8149f8e7>] __blk_run_queue+0x37/0x50
[175754.433239] [<ffffffff814a0a89>] queue_unplugged+0x39/0xb0
[175754.444271] [<ffffffff814a3aab>] blk_flush_plug_list+0x1fb/0x280
[175754.455069] [<ffffffff814a3b48>] blk_finish_plug+0x18/0x50
[175754.465622] [<ffffffff812a1526>] ext4_writepages+0x446/0xd20
[175754.476014] [<ffffffff81114af6>] ? __lock_acquire+0x516/0x2210
[175754.486096] [<ffffffff8119dcd1>] do_writepages+0x21/0x50
[175754.495922] [<ffffffff8121c3e0>] __writeback_single_inode+0x40/0x220
[175754.505550] [<ffffffff8121d911>] writeback_sb_inodes+0x291/0x440
[175754.515007] [<ffffffff8121db5f>] __writeback_inodes_wb+0x9f/0xd0
[175754.524158] [<ffffffff8121ddd3>] wb_writeback+0x243/0x2c0
[175754.533058] [<ffffffff8121e0d8>] bdi_writeback_workfn+0x118/0x480
[175754.541760] [<ffffffff810e607b>] ? process_one_work+0x15b/0x490
[175754.550297] [<ffffffff810e60e5>] process_one_work+0x1c5/0x490
[175754.558544] [<ffffffff810e607b>] ? process_one_work+0x15b/0x490
[175754.566551] [<ffffffff810e73bb>] worker_thread+0x11b/0x370
[175754.574323] [<ffffffff81112dcd>] ? trace_hardirqs_on+0xd/0x10
[175754.581861] [<ffffffff810e72a0>] ? manage_workers.isra.21+0x2b0/0x2b0
[175754.589261] [<ffffffff810ee1a4>] kthread+0xe4/0x100
[175754.596332] [<ffffffff810ee0c0>] ? __init_kthread_worker+0x70/0x70
[175754.603198] [<ffffffff81b97e7c>] ret_from_fork+0x7c/0xb0
[175754.609811] [<ffffffff810ee0c0>] ? __init_kthread_worker+0x70/0x70
[175754.616232] Code: 89 f7 e8 03 19 00 00 0f b6 43 04 eb a2 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 0d <f6> 47 3c 01 74 29 f0 83 6b 38 01 74 0a 48 83 c4 08 5b 5d c3 0f
[175754.629397] RIP [<ffffffff814c6bc1>] kobject_put+0x11/0x70
[175754.635534] RSP <ffff8800046c76f8>
[175754.641515] CR2: 000000000000020c
[175754.647383] ---[ end trace 8a401ccf86be679c ]---


2014-04-14 11:30:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[<ffffffff814c6bc1>] [<ffffffff814c6bc1>] kobject_put+0x11/0x70

On Sat, Apr 12, 2014 at 01:34:31PM +0200, Sander Eikelenboom wrote:
> Hi,
>
> I just ran into the oops belowafter some uptime.

Classic use after free introduced by my recent changes, sorry.

This should fix it:

---
From: Christoph Hellwig <[email protected]>
Subject: scsi: don't reference freed command in scsi_init_sgtable

When scsi_init_io fails we have to release our device reference, but
we do this trying to reference the just freed command. Add a local
scsi_device pointer to fix this.

Reported-by: Sander Eikelenboom <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 65a123d..54eff6a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1044,6 +1044,7 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
*/
int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
{
+ struct scsi_device *sdev = cmd->device;
struct request *rq = cmd->request;

int error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask);
@@ -1091,7 +1092,7 @@ err_exit:
scsi_release_buffers(cmd);
cmd->request->special = NULL;
scsi_put_command(cmd);
- put_device(&cmd->device->sdev_gendev);
+ put_device(&sdev->sdev_gendev);
return error;
}
EXPORT_SYMBOL(scsi_init_io);

2014-04-14 13:06:20

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[<ffffffff814c6bc1>] [<ffffffff814c6bc1>] kobject_put+0x11/0x70


Monday, April 14, 2014, 1:30:15 PM, you wrote:

> On Sat, Apr 12, 2014 at 01:34:31PM +0200, Sander Eikelenboom wrote:
>> Hi,
>>
>> I just ran into the oops belowafter some uptime.

> Classic use after free introduced by my recent changes, sorry.

> This should fix it:

Thx !

> ---
> From: Christoph Hellwig <[email protected]>
> Subject: scsi: don't reference freed command in scsi_init_sgtable

> When scsi_init_io fails we have to release our device reference, but
> we do this trying to reference the just freed command. Add a local
> scsi_device pointer to fix this.

> Reported-by: Sander Eikelenboom <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 65a123d..54eff6a 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1044,6 +1044,7 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
> */
> int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
> {
> + struct scsi_device *sdev = cmd->device;
> struct request *rq = cmd->request;
>
> int error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask);
> @@ -1091,7 +1092,7 @@ err_exit:
> scsi_release_buffers(cmd);
> cmd->request->special = NULL;
> scsi_put_command(cmd);
> - put_device(&cmd->device->sdev_gendev);
> + put_device(&sdev->sdev_gendev);
> return error;
> }
> EXPORT_SYMBOL(scsi_init_io);

2014-04-14 19:00:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[<ffffffff814c6bc1>] [<ffffffff814c6bc1>] kobject_put+0x11/0x70

On Mon, Apr 14, 2014 at 02:57:00PM -0400, Joe Lawrence wrote:
> I hit a similar crash last week on a franken-kernel here (3.14 + scsi
> misc + qlogic patches + out of tree drivers + terriblethingsIknow). I
> think there is one other similar use-after-free that's been in place
> for a while now:

Yes, that looks the same. I'll fix it and prepare a patch that
consolidates this duplicate code while at it.

2014-04-14 19:07:26

by Joe Lawrence

[permalink] [raw]
Subject: Re: 3.15-mw: Oops Workqueue: writeback bdi_writeback_workfn (flush-8:16) RIP: e030:[<ffffffff814c6bc1>] [<ffffffff814c6bc1>] kobject_put+0x11/0x70

On Mon, 14 Apr 2014 04:30:15 -0700
Christoph Hellwig <[email protected]> wrote:

> On Sat, Apr 12, 2014 at 01:34:31PM +0200, Sander Eikelenboom wrote:
> > Hi,
> >
> > I just ran into the oops belowafter some uptime.
>
> Classic use after free introduced by my recent changes, sorry.
>
> This should fix it:
>
> ---
> From: Christoph Hellwig <[email protected]>
> Subject: scsi: don't reference freed command in scsi_init_sgtable
>
> When scsi_init_io fails we have to release our device reference, but
> we do this trying to reference the just freed command. Add a local
> scsi_device pointer to fix this.
>
> Reported-by: Sander Eikelenboom <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 65a123d..54eff6a 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1044,6 +1044,7 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
> */
> int scsi_init_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
> {
> + struct scsi_device *sdev = cmd->device;
> struct request *rq = cmd->request;
>
> int error = scsi_init_sgtable(rq, &cmd->sdb, gfp_mask);
> @@ -1091,7 +1092,7 @@ err_exit:
> scsi_release_buffers(cmd);
> cmd->request->special = NULL;
> scsi_put_command(cmd);
> - put_device(&cmd->device->sdev_gendev);
> + put_device(&sdev->sdev_gendev);
> return error;
> }
> EXPORT_SYMBOL(scsi_init_io);

Hi Christoph,

I hit a similar crash last week on a franken-kernel here (3.14 + scsi
misc + qlogic patches + out of tree drivers + terriblethingsIknow). I
think there is one other similar use-after-free that's been in place
for a while now:

int scsi_prep_return(struct request_queue *q, struct request *req, int ret)
{
struct scsi_device *sdev = q->queuedata;

switch (ret) {
case BLKPREP_KILL:
req->errors = DID_NO_CONNECT << 16;
/* release the command and kill it */
if (req->special) {
struct scsi_cmnd *cmd = req->special;
scsi_release_buffers(cmd);
scsi_put_command(cmd); <<
put_device(&cmd->device->sdev_gendev); <<
req->special = NULL;
}
break;
...

and the backtrace looked like:

general protection fault: 0000 [#1] SMP
Modules linked in: ccmod(POF) ftmod(OF) ipmi_devintf ipmi_msghandler bonding sg x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ixgbe(OF) aesni_intel igb(OF) ptp lrw pps_core gf128mul glue_helper i2c_algo_bit mdio ablk_helper cryptd pcspkr dca ntb i2c_core matroxfb(OF) videosw(OF) nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c dm_service_time sd_mod(OF) crc_t10dif crct10dif_common qla2xxx(OF) scsi_transport_fc mpt2sas(OF) raid_class scsi_tgt scsi_transport_sas dm_mirror dm_region_hash dm_log dm_multipath dm_mod
CPU: 8 PID: 23976 Comm: systemd-udevd Tainted: PF W O 3.14.0+ #2
Hardware name: Stratus ftServer 6400/G7LAY, BIOS BIOS Version 6.3:57 12/25/2013
task: ffff880420b3d7c0 ti: ffff880729138000 task.ti: ffff880729138000
RIP: 0010:[<ffffffff812e2a2d>] [<ffffffff812e2a2d>] kobject_put+0xd/0x60
RSP: 0018:ffff880729139a80 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6ce3 RCX: 00000001002e0003
RDX: 000000000000002e RSI: ffffea0021370400 RDI: 6b6b6b6b6b6b6ce3
RBP: ffff880729139a88 R08: ffff88084dc16300 R09: 00000001002e0002
R10: ffff88104f603a80 R11: ffffea0021370400 R12: ffff88084dc16300
R13: 0000000000000001 R14: ffff881026935388 R15: ffff880ff56b3a18
FS: 00007f2eed940880(0000) GS:ffff88085fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2eed0e2c20 CR3: 000000084d6c8000 CR4: 00000000000407e0
Stack:
ffff88076248d0a8 ffff880729139a98 ffffffff813e57d7 ffff880729139ac0
ffffffff8141bf3e 000000018141be66 ffff88076248d0a8 ffff88104def6840
ffff880729139b20 ffffffffa00a46e0 ffff88104def6840 ffff88076248d0a8
Call Trace:
[<ffffffff813e57d7>] put_device+0x17/0x20
[<ffffffff8141bf3e>] scsi_prep_return+0x9e/0xc0
[<ffffffffa00a46e0>] sd_prep_fn+0x70/0xcd0 [sd_mod]
[<ffffffff812bb49f>] blk_peek_request+0x12f/0x250
[<ffffffff8141cdf8>] scsi_request_fn+0x48/0x570
[<ffffffff812b66f3>] __blk_run_queue+0x33/0x40
[<ffffffff812b67aa>] queue_unplugged+0x2a/0xa0
[<ffffffff812bb928>] blk_flush_plug_list+0x1d8/0x230
[<ffffffff812bbd14>] blk_finish_plug+0x14/0x40
[<ffffffff8116b239>] __do_page_cache_readahead+0x209/0x290
[<ffffffff8116b52d>] force_page_cache_readahead+0x6d/0xa0
[<ffffffff8116b843>] page_cache_sync_readahead+0x43/0x50
[<ffffffff81161b35>] generic_file_aio_read+0x4f5/0x720
[<ffffffff8120dc2b>] blkdev_aio_read+0x4b/0x70
[<ffffffff811d27e7>] do_sync_read+0x67/0xa0
[<ffffffff811d2efb>] vfs_read+0x9b/0x160
[<ffffffff811d3a05>] SyS_read+0x55/0xd0
[<ffffffff81637aa9>] system_call_fastpath+0x16/0x1b

Regards,

-- Joe