LinuxLists.cc - attempting to format brd device results in OOM kills

2017-06-18 16:30:37

Subject: attempting to format brd device results in OOM kills

I've run across a regression from v4.11. If I boot a v4.12-rc1 or later
kernel, make a large brd device and try to format it, it quickly slows
down to a crawl and then the OOM killer kicks in.

I ran a bisect and it landed here:

commit f09a06a193d942a12c1a33c153388b3962222006 (HEAD, refs/bisect/bad)
Author: Christoph Hellwig <[email protected]>
Date: Wed Apr 5 19:21:16 2017 +0200

brd: remove discard support

It's just a in-driver reimplementation of writing zeroes to the pages,
which fails if the discards aren't page aligned.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

I've been reproducing it in a VM with ~8G allocated to it:

I have a modprobe.d file with this in it:

options brd rd_nr=1 rd_size=1073741824

I then just:

# modprobe brd
# mkfs -t ext2 /dev/ram0

It keels over pretty quickly after that.

My .config is attached.

Cheers,
--
Jeff Layton <[email protected]>

Attachments:

config (180.66 kB)

2017-06-18 22:21:42

by Jens Axboe

[permalink] [raw]

Subject: Re: attempting to format brd device results in OOM kills

On 06/18/2017 10:30 AM, Jeff Layton wrote:
> I've run across a regression from v4.11. If I boot a v4.12-rc1 or later
> kernel, make a large brd device and try to format it, it quickly slows
> down to a crawl and then the OOM killer kicks in.
>
> I ran a bisect and it landed here:
>
> commit f09a06a193d942a12c1a33c153388b3962222006 (HEAD, refs/bisect/bad)
> Author: Christoph Hellwig <[email protected]>
> Date: Wed Apr 5 19:21:16 2017 +0200
>
> brd: remove discard support
>
> It's just a in-driver reimplementation of writing zeroes to the pages,
> which fails if the discards aren't page aligned.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
>
>
> I've been reproducing it in a VM with ~8G allocated to it:
>
> I have a modprobe.d file with this in it:
>
> options brd rd_nr=1 rd_size=1073741824
>
> I then just:
>
> # modprobe brd
> # mkfs -t ext2 /dev/ram0
>
> It keels over pretty quickly after that.

Just checked, and creating a 1TB ram disk and then running mkfs.ext2 on it
writes 16851MiB of data. I can't say I'm surprised you OOM, if you run that
in a 8G VM, as you're about 8G short.

I'm puzzled as to why the discard change would make any difference, however.

--
Jens Axboe

2017-06-18 22:28:01

by Jens Axboe

[permalink] [raw]

Subject: Re: attempting to format brd device results in OOM kills

On 06/18/2017 04:21 PM, Jens Axboe wrote:
> On 06/18/2017 10:30 AM, Jeff Layton wrote:
>> I've run across a regression from v4.11. If I boot a v4.12-rc1 or later
>> kernel, make a large brd device and try to format it, it quickly slows
>> down to a crawl and then the OOM killer kicks in.
>>
>> I ran a bisect and it landed here:
>>
>> commit f09a06a193d942a12c1a33c153388b3962222006 (HEAD, refs/bisect/bad)
>> Author: Christoph Hellwig <[email protected]>
>> Date: Wed Apr 5 19:21:16 2017 +0200
>>
>> brd: remove discard support
>>
>> It's just a in-driver reimplementation of writing zeroes to the pages,
>> which fails if the discards aren't page aligned.
>>
>> Signed-off-by: Christoph Hellwig <[email protected]>
>> Reviewed-by: Hannes Reinecke <[email protected]>
>> Signed-off-by: Jens Axboe <[email protected]>
>>
>>
>> I've been reproducing it in a VM with ~8G allocated to it:
>>
>> I have a modprobe.d file with this in it:
>>
>> options brd rd_nr=1 rd_size=1073741824
>>
>> I then just:
>>
>> # modprobe brd
>> # mkfs -t ext2 /dev/ram0
>>
>> It keels over pretty quickly after that.
>
> Just checked, and creating a 1TB ram disk and then running mkfs.ext2 on it
> writes 16851MiB of data. I can't say I'm surprised you OOM, if you run that
> in a 8G VM, as you're about 8G short.
>
> I'm puzzled as to why the discard change would make any difference, however.

Reverted the patch, and I see identical behavior. The only difference is that
the whole device is trimmed first, as expected. But it still writes ~16G
afterwards.

Are you sure this commit is what broke things for you? Honestly, I don't see
how it could ever work with 1TB ram disk, 8G of RAM, and 16G of data written.

--
Jens Axboe

2017-06-18 22:43:47

by Jeff Layton

[permalink] [raw]

Subject: Re: attempting to format brd device results in OOM kills

On Sun, 2017-06-18 at 16:27 -0600, Jens Axboe wrote:
> On 06/18/2017 04:21 PM, Jens Axboe wrote:
> > On 06/18/2017 10:30 AM, Jeff Layton wrote:
> > > I've run across a regression from v4.11. If I boot a v4.12-rc1 or later
> > > kernel, make a large brd device and try to format it, it quickly slows
> > > down to a crawl and then the OOM killer kicks in.
> > >
> > > I ran a bisect and it landed here:
> > >
> > > commit f09a06a193d942a12c1a33c153388b3962222006 (HEAD, refs/bisect/bad)
> > > Author: Christoph Hellwig <[email protected]>
> > > Date: Wed Apr 5 19:21:16 2017 +0200
> > >
> > > brd: remove discard support
> > >
> > > It's just a in-driver reimplementation of writing zeroes to the pages,
> > > which fails if the discards aren't page aligned.
> > >
> > > Signed-off-by: Christoph Hellwig <[email protected]>
> > > Reviewed-by: Hannes Reinecke <[email protected]>
> > > Signed-off-by: Jens Axboe <[email protected]>
> > >
> > >
> > > I've been reproducing it in a VM with ~8G allocated to it:
> > >
> > > I have a modprobe.d file with this in it:
> > >
> > > options brd rd_nr=1 rd_size=1073741824
> > >
> > > I then just:
> > >
> > > # modprobe brd
> > > # mkfs -t ext2 /dev/ram0
> > >
> > > It keels over pretty quickly after that.
> >
> > Just checked, and creating a 1TB ram disk and then running mkfs.ext2 on it
> > writes 16851MiB of data. I can't say I'm surprised you OOM, if you run that
> > in a 8G VM, as you're about 8G short.
> >
> > I'm puzzled as to why the discard change would make any difference, however.
>
> Reverted the patch, and I see identical behavior. The only difference is that
> the whole device is trimmed first, as expected. But it still writes ~16G
> afterwards.
>
> Are you sure this commit is what broke things for you? Honestly, I don't see
> how it could ever work with 1TB ram disk, 8G of RAM, and 16G of data written.
>

My mistake! My brd rd_size parameter was too large by a factor of 1024
(I missed that it was in kbytes and not bytes). With it sanely sized to
1G (as I had actually intended), it works fine.

It's interesting that the older kernel survives this and the newer one
doesn't, but since it's such a pathological setup I'm not too worried
about it.

As far as that commit...no, I'm not sure that's what "broke" it for me.
That's where the bisect landed (and I think I did it right), but I
didn't independently verify whether reverting it helps or not.

Anyway here's the bisect log if you're interested:

$ git bisect log
# bad: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1
# good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect start 'v4.12-rc1' 'v4.11'
# bad: [221656e7c4ce342b99c31eca96c1cbb6d1dce45f] Merge tag 'sound-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 221656e7c4ce342b99c31eca96c1cbb6d1dce45f
# bad: [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
# good: [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag 'mac80211-next-for-davem-2017-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
git bisect good cec381919818a9a0cb85600b3c82404bdd38cf36
# bad: [6dc2cce9321198172cd96f955a5fc798a4cc35a6] Merge branch 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 6dc2cce9321198172cd96f955a5fc798a4cc35a6
# bad: [477d7caeede0e3a933368440fc877b12c25dbb6d] Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration
git bisect bad 477d7caeede0e3a933368440fc877b12c25dbb6d
# bad: [a5695a79088653c73c92ae8d48658cbc49f31884] coda: Convert to separately allocated bdi
git bisect bad a5695a79088653c73c92ae8d48658cbc49f31884
# good: [ee056f98126170ca8b16b9a4a6e20aae7c5c184e] blk-mq-sched: provide hooks for initializing hardware queue data
git bisect good ee056f98126170ca8b16b9a4a6e20aae7c5c184e
# bad: [2a79efd833dd51c4362af655b9b011393c423f18] lightnvm: fix some WARN() messages
git bisect bad 2a79efd833dd51c4362af655b9b011393c423f18
# bad: [48920ff2a5a940cd07d12cc79e4a2c75f1185aee] block: remove the discard_zeroes_data flag
git bisect bad 48920ff2a5a940cd07d12cc79e4a2c75f1185aee
# good: [ee472d835c264a4cb77f8cf878603e1e40f3559e] block: add a flags argument to (__)blkdev_issue_zeroout
git bisect good ee472d835c264a4cb77f8cf878603e1e40f3559e
# good: [19372e2769179ddd154a0d6fbbdb719eb5d0af12] loop: implement REQ_OP_WRITE_ZEROES
git bisect good 19372e2769179ddd154a0d6fbbdb719eb5d0af12
# bad: [5d1429fead5beacce6df052c31b28a97a11e250b] mmc: remove the discard_zeroes_data flag
git bisect bad 5d1429fead5beacce6df052c31b28a97a11e250b
# bad: [93c1defedcae701512957c279b850659d1dae78f] rbd: remove the discard_zeroes_data flag
git bisect bad 93c1defedcae701512957c279b850659d1dae78f
# bad: [f09a06a193d942a12c1a33c153388b3962222006] brd: remove discard support
git bisect bad f09a06a193d942a12c1a33c153388b3962222006
# first bad commit: [f09a06a193d942a12c1a33c153388b3962222006] brd: remove discard support

Anyway, sorry for the noise!
--
Jeff Layton <[email protected]>

2017-06-19 01:42:30

by Jens Axboe

[permalink] [raw]

Subject: Re: attempting to format brd device results in OOM kills

On 06/18/2017 04:43 PM, Jeff Layton wrote:
>>> Just checked, and creating a 1TB ram disk and then running mkfs.ext2 on it
>>> writes 16851MiB of data. I can't say I'm surprised you OOM, if you run that
>>> in a 8G VM, as you're about 8G short.
>>>
>>> I'm puzzled as to why the discard change would make any difference, however.
>>
>> Reverted the patch, and I see identical behavior. The only difference is that
>> the whole device is trimmed first, as expected. But it still writes ~16G
>> afterwards.
>>
>> Are you sure this commit is what broke things for you? Honestly, I don't see
>> how it could ever work with 1TB ram disk, 8G of RAM, and 16G of data written.
>>
>
> My mistake! My brd rd_size parameter was too large by a factor of 1024
> (I missed that it was in kbytes and not bytes). With it sanely sized to
> 1G (as I had actually intended), it works fine.
>
> It's interesting that the older kernel survives this and the newer one
> doesn't, but since it's such a pathological setup I'm not too worried
> about it.

Beats me, I don't see how anything could make a 16G ram disk work on an
8G setup? If the above has any change in behavior, I'd be inclined to
point at vm changes.

Puzzled!

--
Jens Axboe