Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754126AbdDJRpQ (ORCPT ); Mon, 10 Apr 2017 13:45:16 -0400 Received: from mail-bl2nam02on0111.outbound.protection.outlook.com ([104.47.38.111]:14634 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753529AbdDJQpO (ORCPT ); Mon, 10 Apr 2017 12:45:14 -0400 Authentication-Results: techsingularity.net; dkim=none (message not signed) header.d=none;techsingularity.net; dmarc=none action=none header.from=cs.rutgers.edu; From: "Zi Yan" To: "Mel Gorman" , "Kirill A. Shutemov" Cc: "Andrew Morton" , "Andrea Arcangeli" , "Rik van Riel" , "Michal Hocko" , "Vlastimil Babka" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm, numa: Fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa Date: Mon, 10 Apr 2017 11:45:08 -0500 Message-ID: <84B5E286-4E2A-4DE0-8351-806D2102C399@cs.rutgers.edu> In-Reply-To: <20170410094825.2yfo5zehn7pchg6a@techsingularity.net> References: <20170410094825.2yfo5zehn7pchg6a@techsingularity.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=_MailMate_7FD4DB26-5006-461D-8017-F272CC7B83F5_="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Mailer: MailMate (2.0BETAr6082) X-Originating-IP: [12.1.252.66] X-ClientProxiedBy: BN6PR1001CA0005.namprd10.prod.outlook.com (10.174.84.18) To DM5PR14MB1657.namprd14.prod.outlook.com (10.171.156.15) X-MS-Office365-Filtering-Correlation-Id: 2983d881-e31d-4bf0-059c-08d48030f14b X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:DM5PR14MB1657; X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;3:LL2FXn4htPltScl2rsDSiWRKxXymgBPhjPPz5OJ84oSZDMfjCnKid5dwy6A2Cv1jAAImT68pOvFllHyNcPoD80b5SN5d0PbfR/c/liTqQUz7jCq3siudOr/xAfeQNw72wCt+CaaRb8yjPYH7toJXuZ/LwQj4u9ABM6+4GV6JTl8nSbcyWjvSpG6KsmGwQYeiC05QcAI8GgYz9UgDLn5TEHIyb5Esd33RAnCCF8wN4Ej3VNAfJbb03Aykss0vPlrAJAd58Gt5zuKxamjTWFI09FusxHn1u7E8fY40AMjYMTQFpdfpiCazOVrmRKhZD5j975TNcNEsx62TDEppLHkhJw==;25:M6lPakOX0UXr1t/8V9Qaeuh+o1tTX6E27pLs2A1yplslCB9tKqK9cEvaW8n4GtHsk3HwQ3h11G37CPPo8NZBNv4oiT0Obun8xtplT2enBfr4ME3oLu18b8i9R7I+/kiKtPmyVfiFm6fDfRmHa6l9uSIPGOh+Y1dDBQ3MTts/u0d9Qdm9xxgZR4fLKzrsbaWrU7iARZIffxRpQ1spL6cLjLowcwIA1K5uUpCeQqT7ipmH79iSeG2mALManaZUUboCCe3W/MPp60Pi8d/goA4Cr1LfqXQpHiWuG44Dz1545VPfhi18m8zBgo3ysBcHO2/0TUVJ4cOmtmQfx9dj+M8C1asPJPp9PORTGhAOKYN+9uqKexO2l3W9/E3TovzTDhL6cWATY7NL7gOkhwDCHaJwZXP9MEcJX2QrSlEvrx2Mm+PafXGHXWlPHRg+NyUicHZf3vQatP5ZNsx9k1mVBX5t1g== X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;31:3bl+0ZfneSgkleZV3TEbYBbmqKL17FaMFpO44CShC8srmCRO0BDjy56Xc7B1SkWXd4BwN/4I58rXanaWuhVLhmsQ2VKrkkEVIQTzsXXEM6kaxAsbrFBTeUQHl1QCf0Z4imrI8nR8Rr1yzBPEKIyb9hmesyYqXtT1YZS17K/MsamXvW+yFAo2CVNR21ptZHDyYyc3yUjWAemtNll6kEV+hTa/MmKlGN1ek/iI308omjk=;20:the/HrQnpnEr7bsJL6I8e5tQwlKqGpkk+TD+g/NMsi0gVWs19+iKH/5bpCZyCgmrFNUApiiQgk5YXnR2haERY336Bf5ic7eVvU1xJoeJNLVGxw+22h/nxHE3MTUtUofyL6bpdW5dUSy+Boeoeh23276tph//Fxp+VgQIzOjoXLk9iio0CNVELgbfXG6Sc36PoNnTNaBmIzJVCDY1iYb2+4mO3W+6sl6OkscYA6Jd0WzDa7xsyEJoU3tZMM5JInkchBhTeHwVzT5fBGpNPQ4vGbtL+BgNZLS4gXzxjtObW4HJuwlXTPEGqbqgqYzlcMPnXheOJqVa9uPZJ/auzegN8dunCcrn2VliNwmdtYX0KIzibVJxcmv+KbQiiJSPA6SWwZEIQzTLJ1LUdKg8rT3S1MhRqfZJBFOFCZhuVx3kUwsCMBkiqJSs+Kf+/EDVgkwLwfFbtxUAwQX6ONzsk34/yhc12/NKZ5L41FBLaXyUZwTYok/Bz/C59NOaID635ucJ X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(9452136761055); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(93006095)(93001095)(6041248)(20161123564025)(20161123560025)(20161123562025)(20161123555025)(201703131423075)(201702281529075)(201702281528075)(201703061421075)(6072148);SRVR:DM5PR14MB1657;BCL:0;PCL:0;RULEID:;SRVR:DM5PR14MB1657; X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;4:eiDPLFyudlMoVbZclTl6jx4u0zahBQNnVgeQ+bY6VMauUmFCUGxccnieqegWiDJi5yig+18gYrQKyyKZqFMLgLlZkcF+5GH9J3qZID15C29tpqT/Q5CrFFWn8OLvT2M5+2Yz4b1Om//wmOY+j06wMrYzGhPy3PtUexXfgihtI1cEDo9kBDzbErwG1QYscyXVfKxGuclV5SGqF3/vzrgP+FYv9T6lKJGF8hKzHBWD/vjORHzvGWCzVZpgvPYUBnZW80bxlBXnKsV6to3NBpooMjFwWizBiDC21ctZqoZ+XoHxO6VGfTm+1bTI6ZI+YeN0nKgh6mo2u1a6QJUkiYUyI1kipihw3+ejOChYlo8BLIb9Yx0X47QlQgkIXHeXNN7lWihlG6UDxdgJEQTWnyM/Z4nghh6ZRSLN9vaOeXsSaImDVpoPRSejE6sbajpKR6jPMIcHVfrFX7gszefrk1+nP4w1GkXjpba2ME8eYOi9IE+XU3/8Ib3xIPws7M9AYqPdH6M8BSgL39RaSMgVYkuuzAuCyfCpetKkE4qUTEXhy4hTEMq42rZIa4j5fd4aaZbUHAKI6b5PrXxFkjLA163Rq9apF6Uljp5r4dvAA+oYK3BeWf1sPcVtoly2BERS1orzUUQtM8evgLy3WRKAwRuAtTCJoAYSbTZYwZ1/Llfqnb5YPSc1Lmc27e5+py7pV3rStLNrqqJn2/sim8j2cS1PCBjM1k3GdyTqUtMy+5MSJbpwS3Pcyi07IdxLyoMDJmsePnIJOI8UByCd3s2GMpBQ9rMbkqg3Bp+5cSv7OX6pPgY= X-Forefront-PRVS: 027367F73D X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(39410400002)(39850400002)(39400400002)(39840400002)(39450400003)(24454002)(90366009)(305945005)(7736002)(5660300001)(6306002)(54906002)(81166006)(53936002)(77096006)(6486002)(8676002)(88552002)(86362001)(6246003)(83716003)(229853002)(5890100001)(75432002)(82746002)(2950100002)(42882006)(50226002)(84326002)(3846002)(2906002)(66066001)(42186005)(38730400002)(568964002)(189998001)(76176999)(25786009)(53546009)(4326008)(33656002)(106356001)(6116002)(50986999)(14583001)(72826003);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR14MB1657;H:[10.20.179.89];FPR:;SPF:None;MLV:nov;PTR:InfoNoRecords;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM5PR14MB1657;23:1/jNxwQ+G573BKWuaMPA1ChLCkAZfP9TLZVxpyPxv?= =?us-ascii?Q?zw3fWM+7k84Xvnor8VlDkF+eJ6VR4yvfapJ5XwHp+hzmkP4gH6fX4La+OIm9?= =?us-ascii?Q?84jOdUXygPvBngaiRvW10jBm6u/0jjikwD91erEMu2d9aiIWfZQMaa12QRBx?= =?us-ascii?Q?2pbo+rTVR7hepz207P4NvqPBNzlyyXKELvT4UO0/jE/fdfPSl3HisMOtIuPH?= =?us-ascii?Q?+Ned5NKAVJygWQooAQYd+BIXCNzkBbH7QwQ7Y5fLIx77xJ7VXdMtE2jU7ZNJ?= =?us-ascii?Q?ngbLPpCJdOZz3gDF9A/Vrbv8f5y2g1gN8HYso+JQgr99ou3TpZWy+Uri6rnU?= =?us-ascii?Q?3fjh06RvGzBNIP9PRUMG3yW77WSPySV0mDBSeLJ6vCucy2GMbXRyb5dUHuyw?= =?us-ascii?Q?OUcyPOdxQfGv3y1tLWCrllRPhKC1FoRFVKt2f7X9gBYLd8lHCvybunzoo1md?= =?us-ascii?Q?nLvwesSV+YMw6PZR7jbqRnAfSJt8r/vI3l9F7WPiHdw5T3WW6gEAmAgTBCHB?= =?us-ascii?Q?G93bGe/bTKIr7r2u2O3ZxRgfybKqYZfg3kxtsFpUMVs+jjbaKU73rpGIjrMa?= =?us-ascii?Q?tk3hA7rl0AUsBEWFZB/GG4pg5ZUhxcN5PGX+hS0ehEQkh1MQSpzoRJCejJF/?= =?us-ascii?Q?IYRLeLfyw2u2byC1yeggObTM2RDXLjpzW63nU8M24SBK6QJ1nIpQbHkEQ5c0?= =?us-ascii?Q?PI9tcL3DBwEaVvC9SQL+2q8FXwYBD2n9Uy9w7qR7tnFceuUY2qDavWXWaECy?= =?us-ascii?Q?Iz3/HtV4gr9cZse2kdtdur1pqguxoFK9nPKExP/zyIyHsexLaYomqij73g6w?= =?us-ascii?Q?Mjb5NuKLzG3Gu9Lqvbu7Uz0MB6TAeGZ3AdZonj8QE8rhatpWNRmSs50HZxB3?= =?us-ascii?Q?7JvIwIDDGZ9bPnAez0HnzybcWqekVuD45oa0gcJK/Hhj5GIZr39+3kVaWpIr?= =?us-ascii?Q?d1eXjbGCRyYvVircfwyJQX7o45D0YLYcWlsW7ajUFlhTs5XnpHUhnSgAXx9T?= =?us-ascii?Q?Rp25QzLn3eTczmVGBoIMPP30GjIvRIgqRV1NUugP7S2zCO3ItE+9TZcN1VWq?= =?us-ascii?Q?pRAXY/ji9rZrgIlvfAgOjnTsYAsf4eJxUEQZpD1kUCI1j+godo8NO7nJql3m?= =?us-ascii?Q?AEFLY100pmuysEDsvueqWIyzq9YRVxDetnXAYR1PyEIrvCbLg65kdaCcLKAL?= =?us-ascii?Q?l4zS/T/8eJWC5IaZxPKg2fQwGj33SobME4g?= X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;6:3mWjxjNmJq7lEIuv+415TXTRkiZjbpV54huWzkZyoySdR9LLSExcjiJA+DSTA5+MimwZ250Ymf1TMrvkeFa3Eey91sAlgU92BdeLCaecUDxDjUWc+YhTddfbCVQqTJtKvWlD3+S1dMvTrHwB2GDlPw+bPPhTA5pAZCOO6852MYsVTG8Ob770ipiQZduNRCHrkySK2cUcl9t5I9JC/wD2E1i269+HGCRKG248IqocHHxJ/TrUONlbVHty97otzVmgg7aBx0z1x7YyYT9Ec0SDlz44uCYguWgoXD5whhdzWWEThlBfd/mHEp4eirdfMGBrVe7FfxDH6ENMAEFgYzcX876ApADgP31BBSGNLWW6RG+go7fq5G7fwr9PPV7CBCHKX/2gvaNVy4gYd4OMlrbnU9ca431tm+Dz0bAdUNluAhI2mb6/WpZNArRC5HfCVOfnvo8fpunuLOgwFH9Ylu7sMQ==;5:QzwD/UwOYEhjMqEfhFVgklNCCBOU8GiDL0p8wfpWIHXByUkk8TCRtjMpriqtC6WFbX1MH8HNfCtpqiGsfyRtQewOipjdgDEAhJk7BY3AWxmOZea8HT92GZt97STtqknAm7JygPDAoI6OXA+T9UvLWg==;24:yiQjC/NmACNR7xvK5vupbVTihM9/lXnBYtqa6W/5RWwZDJiCw7JEibe92aL8psQGyfMzarwYALklrT6PTzOw6R+Mzg6K/hTOZ6oHwExMs2Q= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM5PR14MB1657;7:4u5gz42LUx3KEwagZtNV6V/Qn7SNPtq5Eod/usmWgvSAyhsmbMbezPfH0VK73mI9eBbi49pGLOTHYHQFjNTjUNRENv4LPpdcIgUhWy2hAYZ5N2TXZNcNEyacdj3KfOjJNSknAWa6PBXW1wmpGMLmEkn6CEYA1cUhjtHiwMy8E/mYDlf5Myjq1+J9YE6Ppk/4G+mXjBb1BPiV99bc8CFba6Yvh2Tv3Ol08qnwN2WhOyRmBSWHgf+lIbq7wRwQzwc1Ap+3N0Qkis9Je2q/wbyK9u3zc2x8mpDJUZItWRLWTyCaqPVOxwmCCHUe0DpJFHvLsQQSEN9TotzTFkS77bfutw== X-OriginatorOrg: cs.rutgers.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Apr 2017 16:45:05.1065 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR14MB1657 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3443 Lines: 90 This is an OpenPGP/MIME signed message (RFC 3156 and 4880). --=_MailMate_7FD4DB26-5006-461D-8017-F272CC7B83F5_= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On 10 Apr 2017, at 4:48, Mel Gorman wrote: > A user reported a bug against a distribution kernel while running > a proprietary workload described as "memory intensive that is not > swapping" that is expected to apply to mainline kernels. The workload > is read/write/modifying ranges of memory and checking the contents. The= y > reported that within a few hours that a bad PMD would be reported follo= wed > by a memory corruption where expected data was all zeros. A partial re= port > of the bad PMD looked like > > [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000= 002e0396009e2) > [ 5195.341184] ------------[ cut here ]------------ > [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35! > .... > [ 5195.410033] Call Trace: > [ 5195.410471] [] change_protection_range+0x7dd/0x93= 0 > [ 5195.410716] [] change_prot_numa+0x18/0x30 > [ 5195.410918] [] task_numa_work+0x1fe/0x310 > [ 5195.411200] [] task_work_run+0x72/0x90 > [ 5195.411246] [] exit_to_usermode_loop+0x91/0xc2 > [ 5195.411494] [] prepare_exit_to_usermode+0x31/0x40= > [ 5195.411739] [] retint_user+0x8/0x10 > > Decoding revealed that the PMD was a valid prot_numa PMD and the bad PM= D > was a false detection. The bug does not trigger if automatic NUMA balan= cing > or transparent huge pages is disabled. > > The bug is due a race in change_pmd_range between a pmd_trans_huge and > pmd_nond_or_clear_bad check without any locks held. During the pmd_tran= s_huge > check, a parallel protection update under lock can have cleared the PMD= > and filled it with a prot_numa entry between the transhuge check and th= e > pmd_none_or_clear_bad check. > > While this could be fixed with heavy locking, it's only necessary to > make a copy of the PMD on the stack during change_pmd_range and avoid > races. A new helper is created for this as the check if quite subtle an= d the > existing similar helpful is not suitable. This passed 154 hours of test= ing > (usually triggers between 20 minutes and 24 hours) without detecting ba= d > PMDs or corruption. A basic test of an autonuma-intensive workload show= ed > no significant change in behaviour. > > Signed-off-by: Mel Gorman > Cc: stable@vger.kernel.org Does this patch fix the same problem fixed by Kirill's patch here? https://lkml.org/lkml/2017/3/2/347 -- Best Regards Yan Zi --=_MailMate_7FD4DB26-5006-461D-8017-F272CC7B83F5_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJY67aUAAoJEEGLLxGcTqbMFWoH/0hqzDkuUbG27mhz9YoSKBzN ReV7Mmjuqwwmfyp9iHM5XYlsSZAIWc0VlKm3q7OL6sPjXXq4rTXDkLoksELd8z9q Fyxoouwa1Y7IpkZjwHWgKwR8QPFdEBR/fekak8bkM52oI4yHa4/xupItt9etQBM4 TPbSnW4aa8fFy6NihjTybC5+1sXpDxH1jCcv8L5gcSStIWI07HHw9Tl5oVKeoXGx rhd15cPwMYlknSFJ4tZIj4Dv+z9/PexGUs+P7qSLVO7Fngf/w5FstTxgSiO04FcI HpwojB7rE/1wM+Fnn5GxjfXl7JOV4w8pnHq+4mRk9edC0dkWfGfa8VDECZcNXls= =Xgr1 -----END PGP SIGNATURE----- --=_MailMate_7FD4DB26-5006-461D-8017-F272CC7B83F5_=--