Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751940AbdFVOXF (ORCPT ); Thu, 22 Jun 2017 10:23:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42328 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751149AbdFVOXE (ORCPT ); Thu, 22 Jun 2017 10:23:04 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8AF3614289A Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=oleg@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 8AF3614289A Date: Thu, 22 Jun 2017 16:23:00 +0200 From: Oleg Nesterov To: Cyrill Gorcunov Cc: Hugh Dickins , Andrey Vagin , LKML , Pavel Emelyanov , Dmitry Safonov , Andrew Morton , Adrian Reber Subject: Re: [criu] 1M guard page ruined restore Message-ID: <20170622142300.GA762@redhat.com> References: <20170620075206.GB1909@uranus.lan> <20170621152256.GC31050@uranus> <20170621155730.GA32554@redhat.com> <20170621160410.GF31050@uranus> <20170621170129.GA32752@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170621170129.GA32752@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 22 Jun 2017 14:23:03 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3963 Lines: 117 Cyrill, I am replying to my own email because I got lost in numerous threads/emails connected to stack guard/gap problems. IIRC you confirmed that the 1st load doesn't fail and the patch fixes the problem. So everything is clear, and we will discuss this change in another thread. But let me add that (imo) you should not change this test-case. You simply should not run it if kerndat_mm_guard_page_maps() detects the new kernel at startup. The new version makes no sense for criu, afaics. Yes, yes, thank you very much for this test-case, it found the kernel regression ;) But criu has nothing to do with this problem, and it is not clear right now if we are going to fix it or not. With the recent kernel changes criu should never look outside of start-end region reported by /proc/maps; and restore doesn't even need to know if a GROWSDOWN region will actually grow or not, because (iiuc) you do not need to auto-grow the stack vma during restore, criu re-creates the whole vma with the same length using MAP_FIXED and it should never write below the addr returned by mmap(MAP_FIXED). So (afaics) the only complication is that the process can be dumped on a system running with (say) stack_guard_gap=4K kernel parameter, and then restored on another system running with stack_guard_gap=1M. In this case the application may fail after restore if it tries to auto-grow the stack, but this is unlikely and this is another story. Oleg. On 06/21, Oleg Nesterov wrote: > > On 06/21, Cyrill Gorcunov wrote: > > > > On Wed, Jun 21, 2017 at 05:57:30PM +0200, Oleg Nesterov wrote: > > > > > > > > p = fake_grow_down; > > > > *p-- = 'c'; > > > > > > I guess this works? I mean, *p-- = 'c' should not fail... > > > > It fails. > > Hmm. Impossible ;) could you add the additional printf's to re-check? > > > Here is the complete code. It supposed to _extend_ stack but it fails > > on the latest master + Hugh's [PATCH] mm: fix new crash in unmapped_area_topdown() > > --- > > [root@fc2 criu]# ~/st2 > > start_addr 7fe6162a8000 > > start_addr 7fe6163d9000 > > Segmentation fault (core dumped) > > --- > > #include > > #include > > #include > > #include > > #include > > #include > > > > #include > > > > #define PAGE_SIZE 4096 > > > > int main(int argc, char **argv) > > { > > char *start_addr, *start_addr1, *fake_grow_down, *test_addr, *grow_down; > > volatile char *p; > > > > start_addr = mmap(NULL, PAGE_SIZE * 512, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > > if (start_addr == MAP_FAILED) { > > printf("Can't mal a new region"); > > return 1; > > } > > printf("start_addr %lx\n", start_addr); > > munmap(start_addr, PAGE_SIZE * 512); > > > > start_addr += PAGE_SIZE * 300; > > > > fake_grow_down = mmap(start_addr + PAGE_SIZE * 5, PAGE_SIZE, > > PROT_READ | PROT_WRITE, > > MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED | MAP_GROWSDOWN, -1, 0); > > if (fake_grow_down == MAP_FAILED) { > > printf("Can't mal a new region"); > > return 1; > > } > > printf("start_addr %lx\n", fake_grow_down); > > > > p = fake_grow_down; > > *p-- = 'c'; > > once again, I can't believe this STORE can fail... > > > *p = 'b'; > > Ah. I forgot about another kernel "feature" ;) not related to the recent guard > page changes... > > Could you test the patch below? > > Oleg. > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index 8ad91a0..edc5d68 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1416,7 +1416,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > * and pusha to work. ("enter $65535, $31" pushes > * 32 pointers and then decrements %sp by 65535.) > */ > - if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) { > +if (0) if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) { > bad_area(regs, error_code, address); > return; > }