Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp3357281rwo; Mon, 24 Jul 2023 09:47:51 -0700 (PDT) X-Google-Smtp-Source: APBJJlHIZozsIDZ7zneTXSWoeuKvuLgqmFZEcL3TUqYH4rrEM1+GuwkiPc12DVFYAHhflekX9RZj X-Received: by 2002:a05:6808:10d6:b0:3a3:e0ad:e332 with SMTP id s22-20020a05680810d600b003a3e0ade332mr13904948ois.38.1690217271436; Mon, 24 Jul 2023 09:47:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690217271; cv=none; d=google.com; s=arc-20160816; b=reu9SEEhwrsTliW/pzec6CeeYJmQ7MNUAYgK+wXi0gVUpih5u13gdter0ADVS3xn9X rUpP3aXKZp3JC770LvH8VtnCcwsK6hR2qDdAT3Z5CsvLM4GsPrFQPfdUahBglaKpxHUd PhWqnUosL4fCnj7dRCs3vR9hFNQFnwlR3TrLsiRqEqXTTCo2OFgzQyYODvq/qkuavFtt U6KfhQ30NlMUnrfE45KBRBUxwmrGODc2C8U71TBq8OWMLUaFD/d6EI6glB+EM9cVAFPD XhgX0trSYatobmCjj1a08E0pbFoVqh4tkjd6oLmFfseW6P6p0dAG5F4pQrv2TuYi+NkE CegA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=u3kbNyGxTzgeac58CFsNMUin81+ND8D8Z3uhWh4woO8=; fh=SGyqcZHxvQIwuf/7bWQxG86bO91FFVFF+gzaHvm+FBA=; b=a8p0swi3Kh5zYFsTAqadgq/eKaHht6xz/DyGwI+vs2I4hlLlXSXigcfwFUm3l0/x/l W3Su6BM4OZUPCu8gJPyIbb8D9cUlLr8SSj4GBZ67bzTTFYmhSJHL6h5mfVydPPkzufJB I11lB3OkFy+/L/s5RQ8HCL3Lx+8pfjAxIMR+kIlcHG0Cn9RJmGqEzfDEN316c/2WZGWi v5gHzNni+EmQbuvZAS8ly2WWlLt0CPkGeNQ6nwUp+SeBsZ2js071q0GfSz7j/7RvzEBZ Ns98SzwZ9zMtQtkfql84OaffZz+gzkLaZh4zagGhBhyBov438Hq6symXFOdoRJjOR+OD U3kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=tigo2h0U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pi12-20020a17090b1e4c00b00262ef440ed4si12996462pjb.27.2023.07.24.09.47.39; Mon, 24 Jul 2023 09:47:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=tigo2h0U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230121AbjGXQLH (ORCPT + 99 others); Mon, 24 Jul 2023 12:11:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229896AbjGXQLF (ORCPT ); Mon, 24 Jul 2023 12:11:05 -0400 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF29D10E5 for ; Mon, 24 Jul 2023 09:11:02 -0700 (PDT) Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-5223910acf2so8351a12.0 for ; Mon, 24 Jul 2023 09:11:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690215061; x=1690819861; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=u3kbNyGxTzgeac58CFsNMUin81+ND8D8Z3uhWh4woO8=; b=tigo2h0UJbLSdNBggb8+yka3tLWlOxt9O4d6NIR4mh9iEiIf1en0sl8VXgCF7R+a/t tT7LAnPTQCFRJoo0KWqoqyRU9MAHIoPOGygj1DT5qhRnO0ELBnXIh5zBzmwOe5dznmpR oK2ugIQjvmAQRczWo4nzF8BFniwOIxp8Ob3QNgni7FEXBDJPkhRsjBlyvf+2pOpfyScy U2GCJzaS3tue+rOz+Sz9IInuWpFJ7fOOQw2hFvT1311EWcKQhsxRHWi26As9g75IdYr+ ayHwpRltXX4GhBn6/blJt9V95VxuAayWW8bx5GgGtHMzMjrA98jhAVUaJyPI8/gpIiuw u1CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690215061; x=1690819861; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u3kbNyGxTzgeac58CFsNMUin81+ND8D8Z3uhWh4woO8=; b=knYkoLN9mUPfjVOIoaiTRKfiCmlsaSgdK5lBgTUZ7PkQJU96sgnvFZ4nGNQT6V3dYC E2TIscb3ay8bIqm+flWYcc8e+Q+Gw/25ED7m4EALLO17P9etFa/ejU45v3eWzilK/W6Y qqMaB4tYdHzgcWkQDRYjv8cjbuC32LU37R32/30tgp2faQy+1MF9oG2JV3/C4E2ARru6 hPM5slMC7bNWvr1Lb2Pkim6tLYqxfAc0r3+Xg5T1cutko+vNRBex7mF88rQw1VJGZieE 3n/FUNDAV2sVsHOjh1VFiEpsXozmPtD8+xNVkXcQ5pUoLMGIwWqd2NZEY/zymjiKsyFu PN+w== X-Gm-Message-State: ABy/qLbbAHR5M6w+DMbhg3JndSgHF/urAiay4VDEdVNzjQxwmkTHIV6X J3NullEUQDue4emFo8MmGTZNosIFnjmGIZPwaYdsVg== X-Received: by 2002:a50:d798:0:b0:522:28a1:2095 with SMTP id w24-20020a50d798000000b0052228a12095mr116981edi.3.1690215061016; Mon, 24 Jul 2023 09:11:01 -0700 (PDT) MIME-Version: 1.0 References: <20230713101415.108875-6-usama.anjum@collabora.com> <7eedf953-7cf6-c342-8fa8-b7626d69ab63@collabora.com> <382f4435-2088-08ce-20e9-bc1a15050861@collabora.com> <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> In-Reply-To: <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Mon, 24 Jul 2023 18:10:49 +0200 Message-ID: Subject: Re: [v2] fs/proc/task_mmu: Implement IOCTL for efficient page table scanning To: Muhammad Usama Anjum Cc: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Alex Sierra , Alexander Viro , Andrew Morton , Axel Rasmussen , Christian Brauner , Cyrill Gorcunov , Dan Williams , David Hildenbrand , Greg KH , "Gustavo A . R . Silva" , "Liam R . Howlett" , Matthew Wilcox , Mike Rapoport , Nadav Amit , Pasha Tatashin , Paul Gofman , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , Yang Shi , Yun Zhou , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 24 Jul 2023 at 17:22, Muhammad Usama Anjum wrote: > > On 7/24/23 7:38=E2=80=AFPM, Micha=C5=82 Miros=C5=82aw wrote: > > On Mon, 24 Jul 2023 at 16:04, Muhammad Usama Anjum > > wrote: > >> > >> Fixed found bugs. Testing it further. > >> > >> - Split and backoff in case buffer full case as well > >> - Fix the wrong breaking of loop if page isn't interesting, skip intea= d > >> - Untag the address and save them into struct > >> - Round off the end address to next page > >> > >> Signed-off-by: Muhammad Usama Anjum > >> --- > >> fs/proc/task_mmu.c | 54 ++++++++++++++++++++++++++-------------------= - > >> 1 file changed, 31 insertions(+), 23 deletions(-) > >> > >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > >> index add21fdf3c9a..64b326d0ec6d 100644 > >> --- a/fs/proc/task_mmu.c > >> +++ b/fs/proc/task_mmu.c > >> @@ -2044,7 +2050,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, > >> unsigned long start, > >> * Break huge page into small pages if the WP operation > >> * need to be performed is on a portion of the huge page. > >> */ > >> - if (end !=3D start + HPAGE_SIZE) { > >> + if (end !=3D start + HPAGE_SIZE || ret =3D=3D -ENOSPC) { > > > > Why is it needed? If `end =3D=3D start + HPAGE_SIZE` then we're handlin= g a > > full hugepage anyway. > If we weren't able to add the complete thp in the output buffer and we ne= ed > to perform WP on the entire page, we should split and rollback. Otherwise > we'll WP the entire thp and we'll lose the state on the remaining THP whi= ch > wasn't added to output. > > Lets say max=3D100 > only 100 pages would be added to output > we need to split and rollback otherwise other 412 pages would get WP In this case *end will be truncated by output() to match the number of pages that fit. > >> @@ -2066,8 +2072,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, > >> unsigned long start, > >> { > >> struct pagemap_scan_private *p =3D walk->private; > >> struct vm_area_struct *vma =3D walk->vma; > >> + unsigned long addr, categories, next; > >> pte_t *pte, *start_pte; > >> - unsigned long addr; > >> bool flush =3D false; > >> spinlock_t *ptl; > >> int ret; > >> @@ -2088,12 +2094,14 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, > >> unsigned long start, > >> } > >> > >> for (addr =3D start; addr !=3D end; pte++, addr +=3D PAGE_SIZE= ) { > >> - unsigned long categories =3D p->cur_vma_category | > >> - pagemap_page_category(vma, addr, ptep_get(pte)= ); > >> - unsigned long next =3D addr + PAGE_SIZE; > >> + categories =3D p->cur_vma_category | > >> + pagemap_page_category(vma, addr, ptep_get= (pte)); > >> + next =3D addr + PAGE_SIZE; > > > > Why moving the variable declarations out of the loop? > Saving spaces inside loop. What are pros of declation of variable in loop= ? Informing the reader that the variables have scope limited to the loop body= . [...] > >> @@ -2219,22 +2225,24 @@ static int pagemap_scan_get_args(struct pm_sca= n_arg > >> *arg, > >> arg->category_anyof_mask | arg->return_mask) & ~PM_SCAN_C= ATEGORIES) > >> return -EINVAL; > >> > >> - start =3D untagged_addr((unsigned long)arg->start); > >> - end =3D untagged_addr((unsigned long)arg->end); > >> - vec =3D untagged_addr((unsigned long)arg->vec); > >> + arg->start =3D untagged_addr((unsigned long)arg->start); > >> + arg->end =3D untagged_addr((unsigned long)arg->end); > >> + arg->vec =3D untagged_addr((unsigned long)arg->vec); > > > > BTW, We should we keep the tag in args writeback(). > Sorry what? > After this function, the start, end and vec would be used. We need to mak= e > sure that the address are untagged before that. We do write back the address the walk ended at to arg->start in userspace. This pointer I think needs the tag reconstructed so that retrying the ioctl() will be possible. Best Regards Micha=C5=82 Miros=C5=82aw