Some controllers support limited IO queues, when over set
the number, it will return invalid field error.
Then NVME will be removed by driver.
Find the max number of IO queues that controller supports.
When it still got invalid result, set 1 IO queue at least to
bring NVME online.
Signed-off-by: Aaron Ma <[email protected]>
---
drivers/nvme/host/core.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2c43e12b70af..fb7f05c310c8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1134,14 +1134,24 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
{
- u32 q_count = (*count - 1) | ((*count - 1) << 16);
+ u32 q_count;
u32 result;
- int status, nr_io_queues;
-
- status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,
- &result);
- if (status < 0)
- return status;
+ int status = -1;
+ int nr_io_queues;
+ int try_count;
+
+ for (try_count = *count; try_count > 0; try_count--) {
+ q_count = (try_count - 1) | ((try_count - 1) << 16);
+ status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES,
+ q_count, NULL, 0, &result);
+ if (status < 0)
+ return status;
+ else if (status == 0) {
+ nr_io_queues = min(result & 0xffff, result >> 16) + 1;
+ *count = min(try_count, nr_io_queues);
+ break;
+ }
+ }
/*
* Degraded controllers might return an error when setting the queue
@@ -1150,10 +1160,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
*/
if (status > 0) {
dev_err(ctrl->device, "Could not set queue count (%d)\n", status);
- *count = 0;
- } else {
- nr_io_queues = min(result & 0xffff, result >> 16) + 1;
- *count = min(*count, nr_io_queues);
+ *count = 1;
}
return 0;
--
2.20.1
On Wed, 2019-04-17 at 22:12 +0800, Aaron Ma wrote:
> Some controllers support limited IO queues, when over set
> the number, it will return invalid field error.
> Then NVME will be removed by driver.
>
> Find the max number of IO queues that controller supports.
> When it still got invalid result, set 1 IO queue at least to
> bring NVME online.
To be honest a spec compliant device should not need this.
The spec states:
"Number of I/O Completion Queues Requested (NCQR): Indicates the number of I/O
Completion
Queues requested by software. This number does not include the Admin Completion
Queue. A
minimum of one queue shall be requested, reflecting that the minimum support is
for one I/O
Completion Queue. This is a 0’s based value. The maximum value that may be
specified is 65,534
(i.e., 65,535 I/O Completion Queues). If the value specified is 65,535, the
controller should return
an error of Invalid Field in Command."
This implies that you can ask for any value and the controller must not respond
with an error, but rather indicate how many queues it supports.
Maybe its better to add a quirk for the broken device, which needs this?
Best regards,
Maxim Levitsky
On Wed, 2019-04-17 at 20:32 +0300, Maxim Levitsky wrote:
> On Wed, 2019-04-17 at 22:12 +0800, Aaron Ma wrote:
> > Some controllers support limited IO queues, when over set
> > the number, it will return invalid field error.
> > Then NVME will be removed by driver.
> >
> > Find the max number of IO queues that controller supports.
> > When it still got invalid result, set 1 IO queue at least to
> > bring NVME online.
>
> To be honest a spec compliant device should not need this.
> The spec states:
>
> "Number of I/O Completion Queues Requested (NCQR): Indicates the number of I/O
> Completion
> Queues requested by software. This number does not include the Admin
> Completion
> Queue. A
> minimum of one queue shall be requested, reflecting that the minimum support
> is
> for one I/O
> Completion Queue. This is a 0’s based value. The maximum value that may be
> specified is 65,534
> (i.e., 65,535 I/O Completion Queues). If the value specified is 65,535, the
> controller should return
> an error of Invalid Field in Command."
>
>
> This implies that you can ask for any value and the controller must not
> respond
> with an error, but rather indicate how many queues it supports.
>
> Maybe its better to add a quirk for the broken device, which needs this?
I forgot to add the relevant paragraph:
"Note: The value allocated may be smaller or larger than the number of queues
requested, often in virtualized
implementations. The controller may not have as many queues to allocate as are
requested. Alternatively,
the controller may have an allocation unit of queues (e.g. power of two) and may
supply more queues to
host software to satisfy its allocation unit."
Best regards,
Maxim Levitsky
On 4/17/19 7:12 AM, Aaron Ma wrote:
> Some controllers support limited IO queues, when over set
> the number, it will return invalid field error.
> Then NVME will be removed by driver.
>
> Find the max number of IO queues that controller supports.
> When it still got invalid result, set 1 IO queue at least to
> bring NVME online.
>
> Signed-off-by: Aaron Ma <[email protected]>
> ---
> drivers/nvme/host/core.c | 29 ++++++++++++++++++-----------
> 1 file changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 2c43e12b70af..fb7f05c310c8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1134,14 +1134,24 @@ static int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword
>
> int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
> {
> - u32 q_count = (*count - 1) | ((*count - 1) << 16);
> + u32 q_count;
> u32 result;
> - int status, nr_io_queues;
> -
> - status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,
> - &result);
> - if (status < 0)
> - return status;
> + int status = -1;
> + int nr_io_queues;
> + int try_count;
> +
> + for (try_count = *count; try_count > 0; try_count--) {
> + q_count = (try_count - 1) | ((try_count - 1) << 16);
A macro here might help readability.
> + status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES,
> + q_count, NULL, 0, &result);
> + if (status < 0)
> + return status;
> + else if (status == 0) {
else following return is not needed.
> + nr_io_queues = min(result & 0xffff, result >> 16) + 1;
Likewise, a macro as above.
Ed
> + *count = min(try_count, nr_io_queues);
> + break;
> + }
> + }
>
> /*
> * Degraded controllers might return an error when setting the queue
> @@ -1150,10 +1160,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
> */
> if (status > 0) {
> dev_err(ctrl->device, "Could not set queue count (%d)\n", status);
> - *count = 0;
> - } else {
> - nr_io_queues = min(result & 0xffff, result >> 16) + 1;
> - *count = min(*count, nr_io_queues);
> + *count = 1;
> }
>
> return 0;
>
On 4/18/19 1:33 AM, Maxim Levitsky wrote:
> On Wed, 2019-04-17 at 20:32 +0300, Maxim Levitsky wrote:
>> On Wed, 2019-04-17 at 22:12 +0800, Aaron Ma wrote:
>>> Some controllers support limited IO queues, when over set
>>> the number, it will return invalid field error.
>>> Then NVME will be removed by driver.
>>>
>>> Find the max number of IO queues that controller supports.
>>> When it still got invalid result, set 1 IO queue at least to
>>> bring NVME online.
>> To be honest a spec compliant device should not need this.
>> The spec states:
>>
>> "Number of I/O Completion Queues Requested (NCQR): Indicates the number of I/O
>> Completion
>> Queues requested by software. This number does not include the Admin
>> Completion
>> Queue. A
>> minimum of one queue shall be requested, reflecting that the minimum support
>> is
>> for one I/O
>> Completion Queue. This is a 0’s based value. The maximum value that may be
>> specified is 65,534
>> (i.e., 65,535 I/O Completion Queues). If the value specified is 65,535, the
>> controller should return
>> an error of Invalid Field in Command."
>>
>>
>> This implies that you can ask for any value and the controller must not
>> respond
>> with an error, but rather indicate how many queues it supports.
>>
>> Maybe its better to add a quirk for the broken device, which needs this?
Adding quirk only makes the code more complicated.
This patch doesn't change the default behavior.
Only handle the NVME error code.
Yes the IO queues number is 0's based, but driver would return error and
remove the nvme device as dead.
So set it as 1 at least the NVME can be probed properly.
Regards,
Aaron
> I forgot to add the relevant paragraph:
>
> "Note: The value allocated may be smaller or larger than the number of queues
> requested, often in virtualized
> implementations. The controller may not have as many queues to allocate as are
> requested. Alternatively,
> the controller may have an allocation unit of queues (e.g. power of two) and may
> supply more queues to
> host software to satisfy its allocation unit."
>
>
> Best regards,
> Maxim Levitsky
>
On 4/18/19 5:30 AM, Edmund Nadolski (Microsoft) wrote:
> On 4/17/19 7:12 AM, Aaron Ma wrote:
>> Some controllers support limited IO queues, when over set
>> the number, it will return invalid field error.
>> Then NVME will be removed by driver.
>>
>> Find the max number of IO queues that controller supports.
>> When it still got invalid result, set 1 IO queue at least to
>> bring NVME online.
>>
>> Signed-off-by: Aaron Ma <[email protected]>
>> ---
>> drivers/nvme/host/core.c | 29 ++++++++++++++++++-----------
>> 1 file changed, 18 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 2c43e12b70af..fb7f05c310c8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1134,14 +1134,24 @@ static int nvme_set_features(struct nvme_ctrl
>> *dev, unsigned fid, unsigned dword
>> int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
>> {
>> - u32 q_count = (*count - 1) | ((*count - 1) << 16);
>> + u32 q_count;
>> u32 result;
>> - int status, nr_io_queues;
>> -
>> - status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count,
>> NULL, 0,
>> - &result);
>> - if (status < 0)
>> - return status;
>> + int status = -1;
>> + int nr_io_queues;
>> + int try_count;
>> +
>> + for (try_count = *count; try_count > 0; try_count--) {
>> + q_count = (try_count - 1) | ((try_count - 1) << 16);
>
> A macro here might help readability.
Will add in V2.
>
>> + status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES,
>> + q_count, NULL, 0, &result);
>> + if (status < 0)
>> + return status;
>> + else if (status == 0) {
>
> else following return is not needed.
* 0: successful read
* > 0: NVMe error status code
* < 0: Linux errno error code
3 conditions should be taken care.
status > 0 will be handled after loop.
else is needed.
>
>
>> + nr_io_queues = min(result & 0xffff, result >> 16) + 1;
>
> Likewise, a macro as above.
Will add in V2.
>
>
> Ed
>
>> + *count = min(try_count, nr_io_queues);
>> + break;
>> + }
>> + }
>> /*
>> * Degraded controllers might return an error when setting the
>> queue
>> @@ -1150,10 +1160,7 @@ int nvme_set_queue_count(struct nvme_ctrl
>> *ctrl, int *count)
>> */
>> if (status > 0) {
>> dev_err(ctrl->device, "Could not set queue count (%d)\n",
>> status);
>> - *count = 0;
>> - } else {
>> - nr_io_queues = min(result & 0xffff, result >> 16) + 1;
>> - *count = min(*count, nr_io_queues);
>> + *count = 1;
>> }
>> return 0;
>>
>
On Thu, 2019-04-18 at 14:21 +0800, Aaron Ma wrote:
> On 4/18/19 1:33 AM, Maxim Levitsky wrote:
> > On Wed, 2019-04-17 at 20:32 +0300, Maxim Levitsky wrote:
> > > On Wed, 2019-04-17 at 22:12 +0800, Aaron Ma wrote:
> > > > Some controllers support limited IO queues, when over set
> > > > the number, it will return invalid field error.
> > > > Then NVME will be removed by driver.
> > > >
> > > > Find the max number of IO queues that controller supports.
> > > > When it still got invalid result, set 1 IO queue at least to
> > > > bring NVME online.
> > >
> > > To be honest a spec compliant device should not need this.
> > > The spec states:
> > >
> > > "Number of I/O Completion Queues Requested (NCQR): Indicates the number of
> > > I/O
> > > Completion
> > > Queues requested by software. This number does not include the Admin
> > > Completion
> > > Queue. A
> > > minimum of one queue shall be requested, reflecting that the minimum
> > > support
> > > is
> > > for one I/O
> > > Completion Queue. This is a 0’s based value. The maximum value that may be
> > > specified is 65,534
> > > (i.e., 65,535 I/O Completion Queues). If the value specified is 65,535,
> > > the
> > > controller should return
> > > an error of Invalid Field in Command."
> > >
> > >
> > > This implies that you can ask for any value and the controller must not
> > > respond
> > > with an error, but rather indicate how many queues it supports.
> > >
> > > Maybe its better to add a quirk for the broken device, which needs this?
>
> Adding quirk only makes the code more complicated.
> This patch doesn't change the default behavior.
> Only handle the NVME error code.
>
> Yes the IO queues number is 0's based, but driver would return error and
> remove the nvme device as dead.
No, no, the spec says that no matter what the number queues you ask for, unless
it is 65,535, the controller should not fail with an error, but rather indicate
in the return value (in the completion entry) the actual number of queues it
allocated which can be larger or smaller that what you asked for.
If controller returns an error, that means its firmware has a bug, which is not
something unusual but usually those cases are handled with a quirk rather than
with general code change.
But anyway that is just my opinion, as someone who studied and implemented
(hopefully mostly correctly) the spec very recently (I am the author of nvme-
mdev device).
It doesn't really matter to me if this is implemented this or another way as
long as it doesn't break things.
Best regards,
Maxim Levitsky
>
> So set it as 1 at least the NVME can be probed properly.
>
> Regards,
> Aaron
>
> > I forgot to add the relevant paragraph:
> >
> > "Note: The value allocated may be smaller or larger than the number of
> > queues
> > requested, often in virtualized
> > implementations. The controller may not have as many queues to allocate as
> > are
> > requested. Alternatively,
> > the controller may have an allocation unit of queues (e.g. power of two) and
> > may
> > supply more queues to
> > host software to satisfy its allocation unit."
> >
> >
> > Best regards,
> > Maxim Levitsky
> >
On 4/18/19 3:21 PM, Aaron Ma wrote:
> On 4/18/19 1:33 AM, Maxim Levitsky wrote:
>> On Wed, 2019-04-17 at 20:32 +0300, Maxim Levitsky wrote:
>>> On Wed, 2019-04-17 at 22:12 +0800, Aaron Ma wrote:
>>>> Some controllers support limited IO queues, when over set
>>>> the number, it will return invalid field error.
>>>> Then NVME will be removed by driver.
>>>>
>>>> Find the max number of IO queues that controller supports.
>>>> When it still got invalid result, set 1 IO queue at least to
>>>> bring NVME online.
>>> To be honest a spec compliant device should not need this.
>>> The spec states:
>>>
>>> "Number of I/O Completion Queues Requested (NCQR): Indicates the number of I/O
>>> Completion
>>> Queues requested by software. This number does not include the Admin
>>> Completion
>>> Queue. A
>>> minimum of one queue shall be requested, reflecting that the minimum support
>>> is
>>> for one I/O
>>> Completion Queue. This is a 0’s based value. The maximum value that may be
>>> specified is 65,534
>>> (i.e., 65,535 I/O Completion Queues). If the value specified is 65,535, the
>>> controller should return
>>> an error of Invalid Field in Command."
>>>
>>>
>>> This implies that you can ask for any value and the controller must not
>>> respond
>>> with an error, but rather indicate how many queues it supports.
>>>
>>> Maybe its better to add a quirk for the broken device, which needs this?
>
> Adding quirk only makes the code more complicated.
> This patch doesn't change the default behavior.
> Only handle the NVME error code.
>
> Yes the IO queues number is 0's based, but driver would return error and
> remove the nvme device as dead.
IMHO, if a controller indicates an error with this set_feature command, then
we need to figure out why the controller was returning the error to host.
If you really want to use at least a single queue to see an alive I/O queue,
controller should not return the error because as you mentioned above,
NCQA, NSQA will be returned as 0-based. If an error is there, that could
mean that controller may not able to provide even a single queue for I/O.
Thanks,
Minwoo Im
>
> So set it as 1 at least the NVME can be probed properly.
>
> Regards,
> Aaron
>
>> I forgot to add the relevant paragraph:
>>
>> "Note: The value allocated may be smaller or larger than the number of queues
>> requested, often in virtualized
>> implementations. The controller may not have as many queues to allocate as are
>> requested. Alternatively,
>> the controller may have an allocation unit of queues (e.g. power of two) and may
>> supply more queues to
>> host software to satisfy its allocation unit."
>>
>>
>> Best regards,
>> Maxim Levitsky
>>
>
> _______________________________________________
> Linux-nvme mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
On 4/18/19 8:13 PM, Minwoo Im wrote:
>> Yes the IO queues number is 0's based, but driver would return error and
>> remove the nvme device as dead.
>
> IMHO, if a controller indicates an error with this set_feature command,
> then
> we need to figure out why the controller was returning the error to host.
>
> If you really want to use at least a single queue to see an alive I/O
> queue,
> controller should not return the error because as you mentioned above,
> NCQA, NSQA will be returned as 0-based. If an error is there, that could
> mean that controller may not able to provide even a single queue for I/O.
I was thinking about try to set 1 I/O queue in driver to try to probe
NVME device.
If it works, at least system can bootup to debug instead of just remove
NVME device and kernel boot hang at loading rootfs.
If you still concern this 1 I/O queue I can still set it as
*count = 0;
At least we try all count, NVME device still failed to respond.
Regards,
Aaron
>
> Thanks,
> Minwoo Im
On 4/18/19 9:52 PM, Aaron Ma wrote:
>
>
> On 4/18/19 8:13 PM, Minwoo Im wrote:
>>> Yes the IO queues number is 0's based, but driver would return error and
>>> remove the nvme device as dead.
>>
>> IMHO, if a controller indicates an error with this set_feature command,
>> then
>> we need to figure out why the controller was returning the error to host.
>>
>> If you really want to use at least a single queue to see an alive I/O
>> queue,
>> controller should not return the error because as you mentioned above,
>> NCQA, NSQA will be returned as 0-based. If an error is there, that could
>> mean that controller may not able to provide even a single queue for I/O.
>
> I was thinking about try to set 1 I/O queue in driver to try to probe
> NVME device.
> If it works, at least system can bootup to debug instead of just remove
> NVME device and kernel boot hang at loading rootfs.
If the controller returns error for that command, how can we assure that
the controller would support a single I/O queue ?
>
> If you still concern this 1 I/O queue I can still set it as
> *count = 0;
>
> At least we try all count, NVME device still failed to respond.
>
> Regards,
> Aaron
>
>>
>> Thanks,
>> Minwoo Im
On 4/18/19 9:33 PM, Minwoo Im wrote:
> If the controller returns error for that command, how can we assure that
> the controller would support a single I/O queue ?
Make sense, I will keep *count = 0 in V2.
Thanks,
Aaron
On Thu, Apr 18, 2019 at 02:21:57PM +0800, Aaron Ma wrote:
> On 4/18/19 1:33 AM, Maxim Levitsky wrote:
> >>
> >> Maybe its better to add a quirk for the broken device, which needs this?
>
> Adding quirk only makes the code more complicated.
> This patch doesn't change the default behavior.
> Only handle the NVME error code.
It does change the default behavior. If I have a degraded controller that
can't do IO in a machine with 1000's of CPUs, I have to iterate this
non-standard behavior 1000's of times before the drive is servicable
again. We currenlty figure that out in just a single try.
At least the quirks document *why* the driver is doing non-standard
behavior. We do the IO queue quirks for Macbooks, for example.
But why don't you file a bug report with the device vendor instead? Surely
a firmware fix provides the best possible outcome, and would make this
device work not only in all versions of Linux, but also every standard
compliant driver for any OS.
On 4/18/19 10:38 PM, Aaron Ma wrote:
>
>
> On 4/18/19 9:33 PM, Minwoo Im wrote:
>> If the controller returns error for that command, how can we assure that
>> the controller would support a single I/O queue ?
>
> Make sense, I will keep *count = 0 in V2.
IMHO, If you would like to set *count to 0, then what's gonna be V2?
I guess if some device is failed, we can make it as a quirk.
>
> Thanks,
> Aaron
>
On 4/18/19 9:48 PM, Keith Busch wrote:
> It does change the default behavior. If I have a degraded controller that
> can't do IO in a machine with 1000's of CPUs, I have to iterate this
> non-standard behavior 1000's of times before the drive is servicable
> again. We currenlty figure that out in just a single try.
>
> At least the quirks document *why* the driver is doing non-standard
> behavior. We do the IO queue quirks for Macbooks, for example.
>
> But why don't you file a bug report with the device vendor instead? Surely
> a firmware fix provides the best possible outcome, and would make this
> device work not only in all versions of Linux, but also every standard
> compliant driver for any OS.
I will do it, no v2 for now.
Thanks,
Aaron
On Thu, Apr 18, 2019 at 10:21:03PM +0800, Aaron Ma wrote:
>
>
> On 4/18/19 9:48 PM, Keith Busch wrote:
> > It does change the default behavior. If I have a degraded controller that
> > can't do IO in a machine with 1000's of CPUs, I have to iterate this
> > non-standard behavior 1000's of times before the drive is servicable
> > again. We currenlty figure that out in just a single try.
> >
> > At least the quirks document *why* the driver is doing non-standard
> > behavior. We do the IO queue quirks for Macbooks, for example.
> >
> > But why don't you file a bug report with the device vendor instead? Surely
> > a firmware fix provides the best possible outcome, and would make this
> > device work not only in all versions of Linux, but also every standard
> > compliant driver for any OS.
>
> I will do it, no v2 for now.
Honestly, unless this is a device shiping in a max market consumer
product already I don't think we should work around this crap at all,
given that this device has obviously never been tested at all. It
really needs a firmware fix instead of a host workaround.
On 4/25/19 10:39 PM, Christoph Hellwig wrote:
> Honestly, unless this is a device shiping in a max market consumer
> product already I don't think we should work around this crap at all,
> given that this device has obviously never been tested at all. It
> really needs a firmware fix instead of a host workaround.
Already pushed this issue to firmware eng team.
They will try to fix it.
As far as I know we don't need this host workaround.
Thanks,
Aaron