2019-02-20 18:49:54

by Benjamin Block

[permalink] [raw]
Subject: [PATCH] scsi: replace GFP_ATOMIC with GFP_KERNEL for sdev allocation

We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):

[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM 2107900 .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5525.857838] sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
allocates the block queue for the device later (scsi_mq_alloc_queue())
already uses GFP_KERNEL.

So replace it, and give the allocation a bit of a better chance to succeed,
with more ways of reclaim.

Signed-off-by: Benjamin Block <[email protected]>
---
drivers/scsi/scsi_scan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index dd0d516f65e2..e49e6099b852 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -220,7 +220,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);

sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
- GFP_ATOMIC);
+ GFP_KERNEL);
if (!sdev)
goto out;

--
2.20.1



2019-02-20 19:12:21

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] scsi: replace GFP_ATOMIC with GFP_KERNEL for sdev allocation

On Wed, 2019-02-20 at 19:48 +-0100, Benjamin Block wrote:
+AD4 We had a test-report where, under memory pressure, adding LUNs to the
+AD4 systems would fail (the tests add LUNs strictly in sequence):

Hi Benjamin,

There are two more instances of GFP+AF8-ATOMIC in scsi+AF8-scan.c. Have you verified
whether or not it is safe to change these into GFP+AF8-KERNEL too?

Thanks,

Bart.

2019-02-20 20:06:17

by Benjamin Block

[permalink] [raw]
Subject: Re: [PATCH] scsi: replace GFP_ATOMIC with GFP_KERNEL for sdev allocation

On Wed, Feb 20, 2019 at 11:11:31AM -0800, Bart Van Assche wrote:
> On Wed, 2019-02-20 at 19:48 +0100, Benjamin Block wrote:
> > We had a test-report where, under memory pressure, adding LUNs to the
> > systems would fail (the tests add LUNs strictly in sequence):
>
> Hi Benjamin,
>
> There are two more instances of GFP_ATOMIC in scsi_scan.c. Have you verified
> whether or not it is safe to change these into GFP_KERNEL too?
>

No, I was lazy, but I can take a look tomorrow and fix them up as well
if they are similar to scsi_alloc_sdev().

--
With Best Regards, Benjamin Block / Linux on IBM Z Kernel Development
IBM Systems & Technology Group / IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: AmtsG Stuttgart, HRB 243294


2019-02-21 09:18:51

by Benjamin Block

[permalink] [raw]
Subject: [PATCH v2 2/2] scsi: whitespace cleanup in scsi_scan.c

Noticed during editing that vim would remove some trailing spaces.

Signed-off-by: Benjamin Block <[email protected]>
---
drivers/scsi/scsi_scan.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 53380e07b40e..7e1a6c3dd42c 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -108,7 +108,7 @@ MODULE_PARM_DESC(scan, "sync, async, manual, or none. "
static unsigned int scsi_inq_timeout = SCSI_TIMEOUT/HZ + 18;

module_param_named(inq_timeout, scsi_inq_timeout, uint, S_IRUGO|S_IWUSR);
-MODULE_PARM_DESC(inq_timeout,
+MODULE_PARM_DESC(inq_timeout,
"Timeout (in seconds) waiting for devices to answer INQUIRY."
" Default is 20. Some devices may need more; most need less.");

@@ -604,7 +604,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
* not-ready to ready transition [asc/ascq=0x28/0x0]
* or power-on, reset [asc/ascq=0x29/0x0], continue.
* INQUIRY should not yield UNIT_ATTENTION
- * but many buggy devices do so anyway.
+ * but many buggy devices do so anyway.
*/
if (driver_byte(result) == DRIVER_SENSE &&
scsi_sense_valid(&sshdr)) {
@@ -850,7 +850,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
* Don't set the device offline here; rather let the upper
* level drivers eval the PQ to decide whether they should
* attach. So remove ((inq_result[0] >> 5) & 7) == 1 check.
- */
+ */

sdev->inq_periph_qual = (inq_result[0] >> 5) & 7;
sdev->lockable = sdev->removable;
@@ -994,7 +994,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
}

#ifdef CONFIG_SCSI_LOGGING
-/**
+/**
* scsi_inq_str - print INQUIRY data from min to max index, strip trailing whitespace
* @buf: Output buffer with at least end-first+1 bytes of space
* @inq: Inquiry buffer (input)
@@ -1495,7 +1495,7 @@ EXPORT_SYMBOL(__scsi_add_device);
int scsi_add_device(struct Scsi_Host *host, uint channel,
uint target, u64 lun)
{
- struct scsi_device *sdev =
+ struct scsi_device *sdev =
__scsi_add_device(host, channel, target, lun, NULL);
if (IS_ERR(sdev))
return PTR_ERR(sdev);
--
2.20.1


2019-02-21 09:19:14

by Benjamin Block

[permalink] [raw]
Subject: [PATCH v2 1/2] scsi: replace GFP_ATOMIC with GFP_KERNEL for allocations in scsi_scan.c

We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):

[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM 2107900 .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5525.857838] sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
later allocates the block queue for the device (scsi_mq_alloc_queue())
already uses GFP_KERNEL.

There are similar allocations in two other functions:
scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done
with GFP_KERNEL.

Here is the contexts for the three functions so far:

scsi_alloc_sdev()
scsi_probe_and_add_lun()
scsi_sequential_lun_scan()
__scsi_scan_target()
scsi_scan_target()
mutex_lock()
scsi_scan_channel()
scsi_scan_host_selected()
mutex_lock()
scsi_report_lun_scan()
__scsi_scan_target()
...
__scsi_add_device()
mutex_lock()
__scsi_scan_target()
...
scsi_report_lun_scan()
...
scsi_get_host_dev()
mutex_lock()

scsi_probe_and_add_lun()
...

scsi_add_lun()
scsi_probe_and_add_lun()
...

So replace all these, and give them a bit of a better chance to succeed,
with more chances of reclaim.

Signed-off-by: Benjamin Block <[email protected]>
---
drivers/scsi/scsi_scan.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index dd0d516f65e2..53380e07b40e 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -220,7 +220,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);

sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
- GFP_ATOMIC);
+ GFP_KERNEL);
if (!sdev)
goto out;

@@ -788,7 +788,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
*/
sdev->inquiry = kmemdup(inq_result,
max_t(size_t, sdev->inquiry_len, 36),
- GFP_ATOMIC);
+ GFP_KERNEL);
if (sdev->inquiry == NULL)
return SCSI_SCAN_NO_RESPONSE;

@@ -1079,7 +1079,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
if (!sdev)
goto out;

- result = kmalloc(result_len, GFP_ATOMIC |
+ result = kmalloc(result_len, GFP_KERNEL |
((shost->unchecked_isa_dma) ? __GFP_DMA : 0));
if (!result)
goto out_free_sdev;
--
2.20.1


2019-02-21 18:46:10

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] scsi: replace GFP_ATOMIC with GFP_KERNEL for allocations in scsi_scan.c

On Thu, 2019-02-21 at 10:18 +-0100, Benjamin Block wrote:
+AD4 Looking at the code of scsi+AF8-alloc+AF8-sdev(), and all the calling contexts,
+AD4 there seems to be no reason to use GFP+AF8-ATMOIC here. All the different
+AD4 call-contexts use a mutex at some point, and nothing in between that
+AD4 requires no sleeping, as far as I could see. Additionally, the code that
+AD4 later allocates the block queue for the device (scsi+AF8-mq+AF8-alloc+AF8-queue())
+AD4 already uses GFP+AF8-KERNEL.
+AD4
+AD4
+AD4 +AFs ... +AF0
+AD4
+AD4 So replace all these, and give them a bit of a better chance to succeed,
+AD4 with more chances of reclaim.

Reviewed-by: Bart Van Assche +ADw-bvanassche+AEA-acm.org+AD4



2019-02-27 14:48:27

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] scsi: replace GFP_ATOMIC with GFP_KERNEL for allocations in scsi_scan.c


Benjamin,

> We had a test-report where, under memory pressure, adding LUNs to the
> systems would fail (the tests add LUNs strictly in sequence):

Applied to 5.1/scsi-queue, thanks!

--
Martin K. Petersen Oracle Linux Engineering