2014-07-22 14:51:42

by Mike Qiu

[permalink] [raw]
Subject: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

In ata_sas_port_alloc(), it haven't initialized scsi_host field in
ata_port, although scsi_host is in parameters list and unused in this
function.

With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
ata_qc_new() try to use scsi_host, while it
is a NULL pointer for ipr IOA and error message shows below:

Unable to handle kernel paging request for data at address 0x00000114
Faulting instruction address: 0xc0000000005c2580
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c0000000005c2580] .ata_qc_new_init+0x30/0x1f0
LR [c0000000005c9384] .ata_scsi_translate+0x44/0x230
Call Trace:
0xc0000003ad332280 (unreliable)
.ata_scsi_translate+0x44/0x230
.ipr_queuecommand+0x2e0/0x780 [ipr]
.scsi_dispatch_cmd+0xec/0x400
.scsi_request_fn+0x52c/0x670
.__blk_run_queue+0x5c/0x80
.blk_execute_rq_nowait+0xf8/0x1c0
.blk_execute_rq+0x88/0x150
.scsi_execute+0xf0/0x1f0
.scsi_execute_req_flags+0xc4/0x170
.scsi_probe_and_add_lun+0x2d4/0xe00
.__scsi_scan_target+0x1a4/0x790
.scsi_scan_channel.part.3+0x80/0xc0
.scsi_scan_host_selected+0x1a0/0x240
.do_scan_async+0x30/0x210
.async_run_entry_fn+0x78/0x1c0
.process_one_work+0x1c4/0x4a0
.worker_thread+0x184/0x600
.kthread+0x10c/0x130
.ret_from_kernel_thread+0x58/0x7c

While scsi_host is unused in ata_sas_port_alloc(), better to set it
in ata_sas_port_alloc() instead of in driver.

Signed-off-by: Mike Qiu <[email protected]>
---
drivers/ata/libata-scsi.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0586f66..a472b6f 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host,
ap->flags |= port_info->flags;
ap->ops = port_info->port_ops;
ap->cbl = ATA_CBL_SATA;
+ ap->scsi_host = shost;

return ap;
}
--
1.8.1.4


2014-07-22 14:58:59

by Mike Qiu

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

[+cc Wendy, Brian King, Stephen]


On 07/22/2014 10:51 PM, Mike Qiu wrote:
> In ata_sas_port_alloc(), it haven't initialized scsi_host field in
> ata_port, although scsi_host is in parameters list and unused in this
> function.
>
> With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
> ata_qc_new() try to use scsi_host, while it
> is a NULL pointer for ipr IOA and error message shows below:
>
> Unable to handle kernel paging request for data at address 0x00000114
> Faulting instruction address: 0xc0000000005c2580
> Oops: Kernel access of bad area, sig: 11 [#1]
> ...
> NIP [c0000000005c2580] .ata_qc_new_init+0x30/0x1f0
> LR [c0000000005c9384] .ata_scsi_translate+0x44/0x230
> Call Trace:
> 0xc0000003ad332280 (unreliable)
> .ata_scsi_translate+0x44/0x230
> .ipr_queuecommand+0x2e0/0x780 [ipr]
> .scsi_dispatch_cmd+0xec/0x400
> .scsi_request_fn+0x52c/0x670
> .__blk_run_queue+0x5c/0x80
> .blk_execute_rq_nowait+0xf8/0x1c0
> .blk_execute_rq+0x88/0x150
> .scsi_execute+0xf0/0x1f0
> .scsi_execute_req_flags+0xc4/0x170
> .scsi_probe_and_add_lun+0x2d4/0xe00
> .__scsi_scan_target+0x1a4/0x790
> .scsi_scan_channel.part.3+0x80/0xc0
> .scsi_scan_host_selected+0x1a0/0x240
> .do_scan_async+0x30/0x210
> .async_run_entry_fn+0x78/0x1c0
> .process_one_work+0x1c4/0x4a0
> .worker_thread+0x184/0x600
> .kthread+0x10c/0x130
> .ret_from_kernel_thread+0x58/0x7c
>
> While scsi_host is unused in ata_sas_port_alloc(), better to set it
> in ata_sas_port_alloc() instead of in driver.
>
> Signed-off-by: Mike Qiu <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 0586f66..a472b6f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host,
> ap->flags |= port_info->flags;
> ap->ops = port_info->port_ops;
> ap->cbl = ATA_CBL_SATA;
> + ap->scsi_host = shost;
>
> return ap;
> }

2014-07-22 19:47:22

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

[ +cc Jesse Brandeburg - similar LKML report (but on x86_64) ]

On 07/22/2014 10:51 AM, Mike Qiu wrote:
> In ata_sas_port_alloc(), it haven't initialized scsi_host field in
> ata_port, although scsi_host is in parameters list and unused in this
> function.
>
> With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
> ata_qc_new() try to use scsi_host, while it
> is a NULL pointer for ipr IOA and error message shows below:
>
> Unable to handle kernel paging request for data at address 0x00000114
> Faulting instruction address: 0xc0000000005c2580
> Oops: Kernel access of bad area, sig: 11 [#1]
> ...
> NIP [c0000000005c2580] .ata_qc_new_init+0x30/0x1f0
> LR [c0000000005c9384] .ata_scsi_translate+0x44/0x230
> Call Trace:
> 0xc0000003ad332280 (unreliable)
> .ata_scsi_translate+0x44/0x230
> .ipr_queuecommand+0x2e0/0x780 [ipr]
> .scsi_dispatch_cmd+0xec/0x400
> .scsi_request_fn+0x52c/0x670
> .__blk_run_queue+0x5c/0x80
> .blk_execute_rq_nowait+0xf8/0x1c0
> .blk_execute_rq+0x88/0x150
> .scsi_execute+0xf0/0x1f0
> .scsi_execute_req_flags+0xc4/0x170
> .scsi_probe_and_add_lun+0x2d4/0xe00
> .__scsi_scan_target+0x1a4/0x790
> .scsi_scan_channel.part.3+0x80/0xc0
> .scsi_scan_host_selected+0x1a0/0x240
> .do_scan_async+0x30/0x210
> .async_run_entry_fn+0x78/0x1c0
> .process_one_work+0x1c4/0x4a0
> .worker_thread+0x184/0x600
> .kthread+0x10c/0x130
> .ret_from_kernel_thread+0x58/0x7c
>
> While scsi_host is unused in ata_sas_port_alloc(), better to set it
> in ata_sas_port_alloc() instead of in driver.
>
> Signed-off-by: Mike Qiu <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 0586f66..a472b6f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host,
> ap->flags |= port_info->flags;
> ap->ops = port_info->port_ops;
> ap->cbl = ATA_CBL_SATA;
> + ap->scsi_host = shost;
>
> return ap;
> }
>

2014-07-22 20:11:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

Hello,

Can you please test the following patch?

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d19c37a7..773f4e6 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
{
struct ata_queued_cmd *qc = NULL;
- unsigned int i, tag, max_queue;
-
- max_queue = ap->scsi_host->can_queue;
+ unsigned int max_queue = ap->host->n_tags;
+ unsigned int i, tag;

/* no command while frozen */
if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
@@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev,
{
spin_lock_init(&host->lock);
mutex_init(&host->eh_mutex);
+ host->n_tags = ATA_MAX_QUEUE;
host->dev = dev;
host->ops = ops;
}
@@ -6179,11 +6179,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
* The max queue supported by hardware must not be greater than
* ATA_MAX_QUEUE.
*/
- if (sht->can_queue > ATA_MAX_QUEUE) {
- dev_err(host->dev, "BUG: the hardware max queue is too large\n");
- WARN_ON(1);
- return -EINVAL;
- }
+ host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE);

/* host must have been started */
if (!(host->flags & ATA_HOST_STARTED)) {
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 5ab4e3a..92abb49 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -593,6 +593,7 @@ struct ata_host {
struct device *dev;
void __iomem * const *iomap;
unsigned int n_ports;
+ unsigned int n_tags; /* nr of NCQ tags */
void *private_data;
struct ata_port_operations *ops;
unsigned long flags;

2014-07-23 02:29:48

by Mike Qiu

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

I have tested with the ipr IOA, passed.

Reviewed-and Tested-by: Mike Qiu <[email protected]>

On 07/23/2014 04:11 AM, Tejun Heo wrote:
> Hello,
>
> Can you please test the following patch?
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index d19c37a7..773f4e6 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
> static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
> {
> struct ata_queued_cmd *qc = NULL;
> - unsigned int i, tag, max_queue;
> -
> - max_queue = ap->scsi_host->can_queue;
> + unsigned int max_queue = ap->host->n_tags;
> + unsigned int i, tag;
>
> /* no command while frozen */
> if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
> @@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev,
> {
> spin_lock_init(&host->lock);
> mutex_init(&host->eh_mutex);
> + host->n_tags = ATA_MAX_QUEUE;
> host->dev = dev;
> host->ops = ops;
> }
> @@ -6179,11 +6179,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
> * The max queue supported by hardware must not be greater than
> * ATA_MAX_QUEUE.
> */
> - if (sht->can_queue > ATA_MAX_QUEUE) {
> - dev_err(host->dev, "BUG: the hardware max queue is too large\n");
> - WARN_ON(1);
> - return -EINVAL;
> - }
> + host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE);
>
> /* host must have been started */
> if (!(host->flags & ATA_HOST_STARTED)) {
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 5ab4e3a..92abb49 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -593,6 +593,7 @@ struct ata_host {
> struct device *dev;
> void __iomem * const *iomap;
> unsigned int n_ports;
> + unsigned int n_tags; /* nr of NCQ tags */
> void *private_data;
> struct ata_port_operations *ops;
> unsigned long flags;
>

2014-07-23 02:37:47

by Mike Qiu

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

On 07/22/2014 10:51 PM, Mike Qiu wrote:
> In ata_sas_port_alloc(), it haven't initialized scsi_host field in
> ata_port, although scsi_host is in parameters list and unused in this
> function.
>
> With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
> ata_qc_new() try to use scsi_host, while it
> is a NULL pointer for ipr IOA and error message shows below:
...
>
> While scsi_host is unused in ata_sas_port_alloc(), better to set it
> in ata_sas_port_alloc() instead of in driver.
>
> Signed-off-by: Mike Qiu <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 0586f66..a472b6f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host,
> ap->flags |= port_info->flags;
> ap->ops = port_info->port_ops;
> ap->cbl = ATA_CBL_SATA;
> + ap->scsi_host = shost;

What about my patch itself, ata_sas_port_alloc() has "shot" in
parameters list, but unused.

Maybe better to set ap->scsi_host here, it is very convenient, and
drivers, like ipr, may forget to set this field, otherwise "shot" need
to be removed from parameters list I think.

Thanks,
Mike
> return ap;
> }

2014-07-23 09:04:05

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

On 07/23/2014 06:11 AM, Tejun Heo wrote:
> Hello,
>
> Can you please test the following patch?

Tested-by: Alexey Kardashevskiy <[email protected]>

on POWER8 + IBM IPR SCSI.




>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index d19c37a7..773f4e6 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
> static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
> {
> struct ata_queued_cmd *qc = NULL;
> - unsigned int i, tag, max_queue;
> -
> - max_queue = ap->scsi_host->can_queue;
> + unsigned int max_queue = ap->host->n_tags;
> + unsigned int i, tag;
>
> /* no command while frozen */
> if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
> @@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev,
> {
> spin_lock_init(&host->lock);
> mutex_init(&host->eh_mutex);
> + host->n_tags = ATA_MAX_QUEUE;
> host->dev = dev;
> host->ops = ops;
> }
> @@ -6179,11 +6179,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
> * The max queue supported by hardware must not be greater than
> * ATA_MAX_QUEUE.
> */
> - if (sht->can_queue > ATA_MAX_QUEUE) {
> - dev_err(host->dev, "BUG: the hardware max queue is too large\n");
> - WARN_ON(1);
> - return -EINVAL;
> - }
> + host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE);
>
> /* host must have been started */
> if (!(host->flags & ATA_HOST_STARTED)) {
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 5ab4e3a..92abb49 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -593,6 +593,7 @@ struct ata_host {
> struct device *dev;
> void __iomem * const *iomap;
> unsigned int n_ports;
> + unsigned int n_tags; /* nr of NCQ tags */
> void *private_data;
> struct ata_port_operations *ops;
> unsigned long flags;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Alexey

2014-07-23 14:46:39

by Tejun Heo

[permalink] [raw]
Subject: [PATCH libata/for-3.16-fixes] libata: introduce ata_host->n_tags to avoid oops on SAS controllers

Hello, guys.

Sorry about the trouble. Will soon push the below patch to Linus.

Thanks.

------ 8< ------
>From 1a112d10f03e83fb3a2fdc4c9165865dec8a3ca6 Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Wed, 23 Jul 2014 09:05:27 -0400

1871ee134b73 ("libata: support the ata host which implements a queue
depth less than 32") directly used ata_port->scsi_host->can_queue from
ata_qc_new() to determine the number of tags supported by the host;
unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
leading to the following oops.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
RIP: 0010:[<ffffffff814e0618>] [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
RSP: 0018:ffff88061a003ae8 EFLAGS: 00010012
RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
FS: 00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
Stack:
ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
Call Trace:
[<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
[<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
[<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
[<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
[<ffffffff81317613>] __blk_run_queue+0x33/0x40
[<ffffffff8131781a>] queue_unplugged+0x2a/0x90
[<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
[<ffffffff8131d274>] blk_finish_plug+0x14/0x50
[<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
[<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
[<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
[<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
[<ffffffff81219897>] blkdev_read_iter+0x37/0x40
[<ffffffff811e307e>] new_sync_read+0x7e/0xb0
[<ffffffff811e3734>] vfs_read+0x94/0x170
[<ffffffff811e43c6>] SyS_read+0x46/0xb0
[<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
[<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00

Fix it by introducing ata_host->n_tags which is initialized to
ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
scsi_host_template->can_queue in ata_host_register() for !SAS ones.
As SAS hosts are never registered, this will give them the same
ATA_MAX_QUEUE - 1 as before. Note that we can't use
scsi_host->can_queue directly for SAS hosts anyway as they can go
higher than the libata maximum.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Mike Qiu <[email protected]>
Reported-by: Jesse Brandeburg <[email protected]>
Reported-by: Peter Hurley <[email protected]>
Reported-by: Peter Zijlstra <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Fixes: 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
Cc: Kevin Hao <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
---
drivers/ata/libata-core.c | 16 ++++------------
include/linux/libata.h | 1 +
2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d19c37a7..677c0c1 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
{
struct ata_queued_cmd *qc = NULL;
- unsigned int i, tag, max_queue;
-
- max_queue = ap->scsi_host->can_queue;
+ unsigned int max_queue = ap->host->n_tags;
+ unsigned int i, tag;

/* no command while frozen */
if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
@@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev,
{
spin_lock_init(&host->lock);
mutex_init(&host->eh_mutex);
+ host->n_tags = ATA_MAX_QUEUE - 1;
host->dev = dev;
host->ops = ops;
}
@@ -6175,15 +6175,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
{
int i, rc;

- /*
- * The max queue supported by hardware must not be greater than
- * ATA_MAX_QUEUE.
- */
- if (sht->can_queue > ATA_MAX_QUEUE) {
- dev_err(host->dev, "BUG: the hardware max queue is too large\n");
- WARN_ON(1);
- return -EINVAL;
- }
+ host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE - 1);

/* host must have been started */
if (!(host->flags & ATA_HOST_STARTED)) {
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 5ab4e3a..92abb49 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -593,6 +593,7 @@ struct ata_host {
struct device *dev;
void __iomem * const *iomap;
unsigned int n_ports;
+ unsigned int n_tags; /* nr of NCQ tags */
void *private_data;
struct ata_port_operations *ops;
unsigned long flags;
--
1.9.3

Subject: Re: [PATCH libata/for-3.16-fixes] libata: introduce ata_host->n_tags to avoid oops on SAS controllers


Hi,

On Wednesday, July 23, 2014 10:46:31 AM Tejun Heo wrote:
> Hello, guys.
>
> Sorry about the trouble. Will soon push the below patch to Linus.
>
> Thanks.
>
> ------ 8< ------
> From 1a112d10f03e83fb3a2fdc4c9165865dec8a3ca6 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <[email protected]>
> Date: Wed, 23 Jul 2014 09:05:27 -0400
>
> 1871ee134b73 ("libata: support the ata host which implements a queue
> depth less than 32") directly used ata_port->scsi_host->can_queue from
> ata_qc_new() to determine the number of tags supported by the host;
> unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
> leading to the following oops.
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
> IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
> PGD 0
> Oops: 0002 [#1] SMP
> Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
> CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
> Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
> RIP: 0010:[<ffffffff814e0618>] [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
> RSP: 0018:ffff88061a003ae8 EFLAGS: 00010012
> RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
> RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
> RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
> R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
> R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
> FS: 00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
> Stack:
> ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
> ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
> ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
> Call Trace:
> [<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
> [<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
> [<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
> [<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
> [<ffffffff81317613>] __blk_run_queue+0x33/0x40
> [<ffffffff8131781a>] queue_unplugged+0x2a/0x90
> [<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
> [<ffffffff8131d274>] blk_finish_plug+0x14/0x50
> [<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
> [<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
> [<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
> [<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
> [<ffffffff81219897>] blkdev_read_iter+0x37/0x40
> [<ffffffff811e307e>] new_sync_read+0x7e/0xb0
> [<ffffffff811e3734>] vfs_read+0x94/0x170
> [<ffffffff811e43c6>] SyS_read+0x46/0xb0
> [<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
> [<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
> Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00
>
> Fix it by introducing ata_host->n_tags which is initialized to
> ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
> scsi_host_template->can_queue in ata_host_register() for !SAS ones.
> As SAS hosts are never registered, this will give them the same
> ATA_MAX_QUEUE - 1 as before. Note that we can't use

Hmmm, wasn't ATA_MAX_QUEUE used before not ATA_MAX_QUEUE - 1?

It seems that after your patch the loop in the ata_qc_new() will use
only 30 tags and not 31 ones?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

> scsi_host->can_queue directly for SAS hosts anyway as they can go
> higher than the libata maximum.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Mike Qiu <[email protected]>
> Reported-by: Jesse Brandeburg <[email protected]>
> Reported-by: Peter Hurley <[email protected]>
> Reported-by: Peter Zijlstra <[email protected]>
> Tested-by: Alexey Kardashevskiy <[email protected]>
> Fixes: 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
> Cc: Kevin Hao <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: [email protected]
> ---
> drivers/ata/libata-core.c | 16 ++++------------
> include/linux/libata.h | 1 +
> 2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index d19c37a7..677c0c1 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
> static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
> {
> struct ata_queued_cmd *qc = NULL;
> - unsigned int i, tag, max_queue;
> -
> - max_queue = ap->scsi_host->can_queue;
> + unsigned int max_queue = ap->host->n_tags;
> + unsigned int i, tag;
>
> /* no command while frozen */
> if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
> @@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev,
> {
> spin_lock_init(&host->lock);
> mutex_init(&host->eh_mutex);
> + host->n_tags = ATA_MAX_QUEUE - 1;
> host->dev = dev;
> host->ops = ops;
> }
> @@ -6175,15 +6175,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht)
> {
> int i, rc;
>
> - /*
> - * The max queue supported by hardware must not be greater than
> - * ATA_MAX_QUEUE.
> - */
> - if (sht->can_queue > ATA_MAX_QUEUE) {
> - dev_err(host->dev, "BUG: the hardware max queue is too large\n");
> - WARN_ON(1);
> - return -EINVAL;
> - }
> + host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE - 1);
>
> /* host must have been started */
> if (!(host->flags & ATA_HOST_STARTED)) {
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 5ab4e3a..92abb49 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -593,6 +593,7 @@ struct ata_host {
> struct device *dev;
> void __iomem * const *iomap;
> unsigned int n_ports;
> + unsigned int n_tags; /* nr of NCQ tags */
> void *private_data;
> struct ata_port_operations *ops;
> unsigned long flags;

2014-07-23 16:36:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH libata/for-3.16-fixes] libata: introduce ata_host->n_tags to avoid oops on SAS controllers

Hello,

On Wed, Jul 23, 2014 at 06:31:58PM +0200, Bartlomiej Zolnierkiewicz wrote:
> Hmmm, wasn't ATA_MAX_QUEUE used before not ATA_MAX_QUEUE - 1?
>
> It seems that after your patch the loop in the ata_qc_new() will use
> only 30 tags and not 31 ones?

It was always 31 (ATA_MAX_QUEUE - 1) with the tag 31 reserved for EH
commands. The previous patch just used limit value which is one too
high. I'm planning to change that but this is the way it has always
been.

Thanks.

--
tejun

Subject: Re: [PATCH libata/for-3.16-fixes] libata: introduce ata_host->n_tags to avoid oops on SAS controllers

On Wednesday, July 23, 2014 12:36:01 PM Tejun Heo wrote:
> Hello,
>
> On Wed, Jul 23, 2014 at 06:31:58PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > Hmmm, wasn't ATA_MAX_QUEUE used before not ATA_MAX_QUEUE - 1?
> >
> > It seems that after your patch the loop in the ata_qc_new() will use
> > only 30 tags and not 31 ones?
>
> It was always 31 (ATA_MAX_QUEUE - 1) with the tag 31 reserved for EH
> commands. The previous patch just used limit value which is one too
> high. I'm planning to change that but this is the way it has always
> been.

I see, thanks for explaining this.

BTW:
/* the last tag is reserved for internal command. */
if (tag == ATA_TAG_INTERNAL)
continue;

in ata_qc_new()'s loop now becomes a dead code (loop will be done
maximum ATA_MAX_QUEUE - 2 times and ATA_TAG_INTERNAL is defined
as ATA_MAX_QUEUE - 1). Compiler can probably optimize it away so
as a bonus of your patch we may get one condition check less in
a hot-path. :)

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

2014-07-23 17:20:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH libata/for-3.16-fixes] libata: introduce ata_host->n_tags to avoid oops on SAS controllers

Hello,

On Wed, Jul 23, 2014 at 06:46:05PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday, July 23, 2014 12:36:01 PM Tejun Heo wrote:
> BTW:
> /* the last tag is reserved for internal command. */
> if (tag == ATA_TAG_INTERNAL)
> continue;
>
> in ata_qc_new()'s loop now becomes a dead code (loop will be done
> maximum ATA_MAX_QUEUE - 2 times and ATA_TAG_INTERNAL is defined
> as ATA_MAX_QUEUE - 1). Compiler can probably optimize it away so
> as a bonus of your patch we may get one condition check less in
> a hot-path. :)

ATA_TAG_INTERNAL will soon be replaced by a qc flag so this should all
go away pretty soon. ata_qc_new() is quite badly implemented tho. It
has no reason to do atomic test_and_set_bit() and there are far more
efficient ways to implement what it does. If anybody is interested in
improving it, please go ahead.

Thanks.

--
tejun

2014-07-24 22:29:06

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port

On Wed, Jul 23, 2014 at 2:03 AM, Alexey Kardashevskiy <[email protected]> wrote:
> On 07/23/2014 06:11 AM, Tejun Heo wrote:
>> Hello,
>>
>> Can you please test the following patch?
>
> Tested-by: Alexey Kardashevskiy <[email protected]>
>
> on POWER8 + IBM IPR SCSI.

FWIW, I tested Linus's current git tree and the issue is fixed. Thanks!