Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751174AbdFTKXb (ORCPT ); Tue, 20 Jun 2017 06:23:31 -0400 Received: from mail-pf0-f175.google.com ([209.85.192.175]:33297 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750925AbdFTKX3 (ORCPT ); Tue, 20 Jun 2017 06:23:29 -0400 Date: Tue, 20 Jun 2017 03:23:20 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Cyrill Gorcunov cc: Hugh Dickins , Andrey Vagin , LKML , Pavel Emelyanov , Dmitry Safonov , Andrew Morton , Oleg Nesterov Subject: Re: [criu] 1M guard page ruined restore In-Reply-To: <20170620075206.GB1909@uranus.lan> Message-ID: References: <20170620075206.GB1909@uranus.lan> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3731 Lines: 82 On Tue, 20 Jun 2017, Cyrill Gorcunov wrote: > Hi Hugh! We're running our tests on latest vanilla kernel all the time, > and recently we've got an issue on restore: > > https://github.com/xemul/criu/issues/322 > > | (00.410614) 4: cg: Cgroups 1 inherited from parent > | (00.410858) 4: Opened local page read 3 (parent 0) > | (00.410961) 4: premap 0x00000000400000-0x00000000406000 -> 00007fe65badf000 > | (00.410981) 4: premap 0x00000000605000-0x00000000606000 -> 00007fe65bae5000 > | (00.410997) 4: premap 0x00000000606000-0x00000000607000 -> 00007fe65bae6000 > | (00.411013) 4: premap 0x000000025a0000-0x000000025c1000 -> 00007fe65bae7000 > | (00.411036) 4: Error (criu/mem.c:726): Unable to remap a private vma: Invalid argument > | (00.412779) 1: Error (criu/cr-restore.c:1465): 4 exited, status=1 > > Andrew has narrowed it down to the commit > > | commit 1be7107fbe18eed3e319a6c3e83c78254b693acb > | Author: Hugh Dickins > | Date: Mon Jun 19 04:03:24 2017 -0700 > | > | mm: larger stack guard gap, between vmas > > and looking into the patch I see the procfs output has been changed > > | diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > | index f0c8b33..520802d 100644 > | --- a/fs/proc/task_mmu.c > | +++ b/fs/proc/task_mmu.c > | @@ -300,11 +300,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) > | > | /* We don't show the stack guard page in /proc/maps */ > | start = vma->vm_start; > | - if (stack_guard_page_start(vma, start)) > | - start += PAGE_SIZE; > | end = vma->vm_end; > | - if (stack_guard_page_end(vma, end)) > | - end -= PAGE_SIZE; > | > | seq_setwidth(m, 25 + sizeof(void *) * 6 - 1); > | seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ", > > For which we of course are not ready because we've been implying the > guard page is returned here so we adjust addresses locally when saving > them into images. > > So now we need to figure out somehow if show_map_vma accounts [PAGE_SIZE|guard_area] or not, > I guess we might use kernel version here but it won't be working fine on custom kernels, > or kernels with the patch backported. > > Second I guess we might need to detect @stack_guard_gap runtime as > well but not yet sure because we only have found this problem and > hasn't been investigating it deeply yet. Hopefully will do in a > day or couple (I guess we still have some time before the final > kernel release). Sorry for breaking you: we realized there was some risk of that. Would it be acceptable to you, to judge which kind of a kernel it is, by whether it has a global variable stack_guard_gap? I don't know if that would be a horrible hack, or the kind of thing that you're used to doing all over the place. Judging by kernel version will be awkward, since the patch is being backported to stable kernels. But I'm surprised by your explanation above: maybe I'm confused, or maybe the explanation is different. Because as I see it, the change I made in that patch *maintained* consistency for CRIU: It used to be the case that there was a gap page included in the extent of the stack vma, but it didn't really belong in there, therefore show_map_vma() massaged the addresses shown to conceal it. Whereas now with the 1be7107fbe18 commit, the gap (page or more) is not included in the extent of the stack vma, so there's no longer any need to massage the addresses shown to conceal it. We do need to understand this fairly quickly, since those stable backports will pose more of a problem for you than the v4.12 release itself. Hugh