Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6729371ybi; Mon, 8 Jul 2019 07:43:36 -0700 (PDT) X-Google-Smtp-Source: APXvYqzV+CuPx67vuammWoZKZus+WJg3SHkJmWxCJs3aVUAptb0cdeckcJYqHzuuA5hQiv6HaTOI X-Received: by 2002:a17:90b:8cd:: with SMTP id ds13mr24869747pjb.141.1562597016697; Mon, 08 Jul 2019 07:43:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562597016; cv=none; d=google.com; s=arc-20160816; b=XjJETo3jc7ciQfijYqBk8tA6ZPqZ1DzrKCotnBth5/TKXNur6dXDzNXWx2aDHzCyYZ sdgWWbE6wsNwhtnF4i/3kF3f2+OyQjYvUeIPydun4O2rvbmrZKkca22BkUn4dFfIIC0j SDMuX6vw3tfOhuiZu6Ft3X1BGGqyRt0izGMFoT8nKUXnmc3Xiwq8qHAGg5nRANkBwloZ +OsHTiLe1TZH/RNqDX/sKnbA3s88Belfh1W6b+GNGvh+Dw8nJI8sZrDWTaUl91Zrf5P1 KkI7Gc3cvF3y/uiBBkdeIVlcboaHjo5p4XXEi9XJqnZLMBj6DIDg7RY1U59Oyzu0Gqdh uRdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:mail-followup-to:message-id:subject:to:from:date; bh=Szk4Q2JTtkdtuAAuI5HrXGaSHLkGZO8DVsA0qimiuMQ=; b=a626gjjw5s2G/B6lR/D7e7kzJG+SMd1U0Xm0AeI9XcqW9VDUF0Skv4lkffeD9JsLgi kRAfifiknALNjPYrRr88ZdHc89otX6u/lUfTmJ92nr8apzsKuKQKUvJIJ8LqLtnMlkms PsdWJQPMbz/0qnLRKcj68tQiR0/NUBM+B1BH2p9i9uk1265+e1t08CuuAm61IBw5T7Wr sMIFXbIhjHw5lrYgNXNqrq4t8973c3mOmcAMyDP0WrjDTW0IU4vdL5yic8prypnvqTW6 jdYYxB37aaP+psLs5ty5aBQ+j7NH6ICIiTlBeD+qcnH+6ZTkogL+P3yemlIi0HWCg9hb 2KCQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k65si19167291pge.422.2019.07.08.07.43.21; Mon, 08 Jul 2019 07:43:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726580AbfGHKfq (ORCPT + 99 others); Mon, 8 Jul 2019 06:35:46 -0400 Received: from swift.blarg.de ([138.201.185.127]:34415 "EHLO swift.blarg.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725869AbfGHKfq (ORCPT ); Mon, 8 Jul 2019 06:35:46 -0400 Received: by swift.blarg.de (Postfix, from userid 1000) id 1097840363; Mon, 8 Jul 2019 12:35:44 +0200 (CEST) Date: Mon, 8 Jul 2019 12:35:44 +0200 From: Max Kellermann To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Kernel 5.1.15 stuck in compaction Message-ID: <20190708103543.GA10364@swift.blarg.de> Mail-Followup-To: linux-kernel@vger.kernel.org, linux-mm@kvack.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, one of our web servers got repeatedly stuck in the memory compaction code; two PHP processes have been busy at 100% inside memory compaction after a page fault: 100.00% 0.00% php-cgi7.0 [kernel.vmlinux] [k] page_fault | ---page_fault __do_page_fault handle_mm_fault __handle_mm_fault do_huge_pmd_anonymous_page __alloc_pages_nodemask __alloc_pages_slowpath __alloc_pages_direct_compact try_to_compact_pages compact_zone_order compact_zone | |--61.30%--isolate_migratepages_block | | | |--20.44%--node_page_state | | | |--5.88%--compact_unlock_should_abort.isra.33 | | | --3.28%--_cond_resched | | | --2.19%--rcu_all_qs | --3.37%--pageblock_skip_persistent ftrace: <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: _cond_resched <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: rcu_all_qs <-_cond_resched <...>-962300 [033] .... 236536.493919: compact_unlock_should_abort.isra.33 <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: pageblock_skip_persistent <-compact_zone <...>-962300 [033] .... 236536.493919: isolate_migratepages_block <-compact_zone <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: _cond_resched <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: rcu_all_qs <-_cond_resched <...>-962300 [033] .... 236536.493920: compact_unlock_should_abort.isra.33 <-isolate_migratepages_block <...>-962300 [033] .... 236536.493920: pageblock_skip_persistent <-compact_zone <...>-962300 [033] .... 236536.493920: isolate_migratepages_block <-compact_zone <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block Nothing useful in /proc/PID/{stack,wchan,syscall}. slabinfo/kmalloc-{16,32} are going through the roof (~ 15 GB each), and this memleak-lookalike triggering the oomkiller all the time is what drew our attention to this server. Right now, the server is still stuck, and I can attempt to collect more information on request. Max