Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp741322rdh; Thu, 26 Oct 2023 14:36:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGyZbitzC6zVpiNL0ohVuX7MK2jd2YxdOpnfnVVBW6J7vdgIC9H/YzRg5SB+zMd0St8Mk0K X-Received: by 2002:a81:dc0a:0:b0:5a7:b53f:c304 with SMTP id h10-20020a81dc0a000000b005a7b53fc304mr717817ywj.37.1698356165218; Thu, 26 Oct 2023 14:36:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698356165; cv=none; d=google.com; s=arc-20160816; b=G53Ny15+2sBGNgF5B7+cw8Noxz25y11Ic38Eb3BwGfIMSEHlqlP9L6YbVz9+ULtaEE a+ZeILRyRxAk7V2MmlGjOds6mG3AI6mGVCpGahoDRc1hMoGTNZLki5n3IwM2jdvuEa0v D5knTbsssgTNHaNLCxC5Vh5iJ5vr0c+A4L/+3gRJ6opUsw9RfYJokUTQRBBiYg9yHpii IlwS0weJSL4mUwQvVCKNFe7lqwmx3Zzw3sTr9PPLg0QOdx9MGrj5Wuq+UQqTllfHmO3g xfXbW/0OhOoDqIUGihBSaykbz6q9mTpJWfD1Ji943Dk81QQuk7KvH7PsUo0wleWyQ1g/ Ibfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :subject:cc:to:from:date; bh=SqFNPywpHyBXPr/GaHy8eCxQKUf8LZL2ciRF9cxsZKg=; fh=qs+3pj00Q72KxiZyDxdQrBEzGPfkm8t4CkCoocYEl8o=; b=tsZ923X7aJMQ2iqs7Um3bmkV3V96uWf1ECKSbSRR9whnGm2G0wEGD+cpT3pzXCw2Q5 QdcqSUZ0IZBENQ85jnnmyreGT2h6+g5XUGiVbuBldLhE3oErqUzfzKfxPaM+DvK0T5CT wCso+Hdr0lSBIePwxu/ph/IdUHtIRV6SEXAkx6TfIU9QJWEl23RYU/wKTDfXg9veOmyb 8sud1fxi1dHkzulCdH16m6tRyLJ9o0WzdciBTWxQMIxTjsEeWTl4My+y89hfOTWCMakd 5e5vzRqtszezjpNhmBqEYfAHxAXeQDammzeYLL2K3V+gfimQsSMXdhtip71OzHpZ3eEj /qSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id n8-20020a819e48000000b005a20efa02dbsi324031ywj.455.2023.10.26.14.35.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 14:36:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 940D980268EF; Thu, 26 Oct 2023 14:35:51 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232090AbjJZVfi (ORCPT + 99 others); Thu, 26 Oct 2023 17:35:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231977AbjJZVfg (ORCPT ); Thu, 26 Oct 2023 17:35:36 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E33EBDC for ; Thu, 26 Oct 2023 14:35:33 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 32000C433C8; Thu, 26 Oct 2023 21:35:30 +0000 (UTC) Date: Thu, 26 Oct 2023 17:35:27 -0400 From: Steven Rostedt To: Mathieu Desnoyers Cc: Peter Zijlstra , LKML , Thomas Gleixner , Ankur Arora , Linus Torvalds , linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal , Ingo Molnar , Daniel Bristot de Oliveira Subject: Re: [POC][RFC][PATCH v2] sched: Extended Scheduler Time Slice Message-ID: <20231026173527.2ad215cc@gandalf.local.home> In-Reply-To: <20231026152022.668ca0f3@gandalf.local.home> References: <20231025235413.597287e1@gandalf.local.home> <20231026105944.GJ33965@noisy.programming.kicks-ass.net> <20231026071413.4ed47b0e@gandalf.local.home> <20231026152022.668ca0f3@gandalf.local.home> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/=uzH+ow/_xOFCsWFh3XaCEE" X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 26 Oct 2023 14:35:51 -0700 (PDT) --MP_/=uzH+ow/_xOFCsWFh3XaCEE Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline On Thu, 26 Oct 2023 15:20:22 -0400 Steven Rostedt wrote: > Anyway, I changed the code to use: > > static inline unsigned clrbit(volatile unsigned *ptr) > { > unsigned ret; > > asm volatile("andb %b1,%0" > : "+m" (*(volatile char *)ptr) > : "iq" (0x2) > : "memory"); > > ret = *ptr; > *ptr = 0; > > return ret; > } Mathieu also told me that glibc's rseq has some extra padding at the end, that happens to be big enough to hold this feature. That means you can run the code without adding: GLIBC_TUNABLES=glibc.pthread.rseq=0 Attached is the updated test program. -- Steve --MP_/=uzH+ow/_xOFCsWFh3XaCEE Content-Type: text/x-c++src Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=extend-sched.c // Run with: GLIBC_TUNABLES=glibc.pthread.rseq=0 #include #include #include #include #include #include #include #include #include #include #include #include #include #include "rseq-abi.h" #include #define rseq(rseq, len, flags, sig) syscall(SYS_rseq, rseq, len, \ flags, sig); #define __weak __attribute__((weak)) //#define barrier() asm volatile ("" ::: "memory") #define rmb() asm volatile ("lfence" ::: "memory") #define wmb() asm volatile ("sfence" ::: "memory") static pthread_barrier_t pbarrier; static __thread struct rseq_abi __attribute__((aligned(sizeof(struct rseq_abi)))) rseq_map; static __thread struct rseq_abi *rseq_ptr; static bool no_rseq; static void init_extend_map(void) { extern ptrdiff_t __rseq_offset; extern unsigned int __rseq_size; int ret; if (no_rseq) return; if (__rseq_size) { if (__rseq_size < sizeof(rseq_map)) { printf("glibc rseq less than required mapping\n"); return; } rseq_ptr = __builtin_thread_pointer() + __rseq_offset; printf("Using glibc rseq %p\n", rseq_ptr); return; } rseq_ptr = &rseq_map; ret = rseq(rseq_ptr, sizeof(rseq_map), 0, 0); perror("rseq"); printf("ret = %d (%zd) %p\n", ret, sizeof(rseq_map), &rseq_map); if (ret < 0) rseq_ptr = NULL; } struct data; struct thread_data { unsigned long long start_wait; unsigned long long x_count; unsigned long long total; unsigned long long max; unsigned long long min; unsigned long long total_wait; unsigned long long max_wait; unsigned long long min_wait; struct data *data; }; struct data { unsigned long long x; unsigned long lock; struct thread_data *tdata; bool done; }; static inline unsigned long cmpxchg(volatile unsigned long *ptr, unsigned long old, unsigned long new) { unsigned long prev; asm volatile("lock; cmpxchg %b1,%2" : "=a"(prev) : "q"(new), "m"(*(ptr)), "0"(old) : "memory"); return prev; } static inline unsigned clrbit(volatile unsigned *ptr) { unsigned ret; asm volatile("andb %b1,%0" : "+m" (*(volatile char *)ptr) : "iq" (0x2) : "memory"); ret = *ptr; *ptr = 0; return ret; } static void extend(void) { if (!rseq_ptr) return; rseq_ptr->cr_flags = 1; } static void unextend(void) { unsigned prev; if (!rseq_ptr) return; prev = clrbit(&rseq_ptr->cr_flags); if (prev & 2) { tracefs_printf(NULL, "Yield!\n"); sched_yield(); } } #define sec2usec(sec) (sec * 1000000ULL) #define usec2sec(usec) (usec / 1000000ULL) static unsigned long long get_time(void) { struct timeval tv; unsigned long long time; gettimeofday(&tv, NULL); time = sec2usec(tv.tv_sec); time += tv.tv_usec; return time; } static void grab_lock(struct thread_data *tdata, struct data *data) { unsigned long long start, end, delta; unsigned long long end_wait; unsigned long long last; unsigned long prev; if (!tdata->start_wait) tdata->start_wait = get_time(); while (data->lock && !data->done) rmb(); extend(); start = get_time(); prev = cmpxchg(&data->lock, 0, 1); if (prev) { unextend(); return; } end_wait = get_time(); tracefs_printf(NULL, "Have lock!\n"); delta = end_wait - tdata->start_wait; tdata->start_wait = 0; if (!tdata->total_wait || tdata->max_wait < delta) tdata->max_wait = delta; if (!tdata->total_wait || tdata->min_wait > delta) tdata->min_wait = delta; tdata->total_wait += delta; data->x++; last = data->x; if (data->lock != 1) { printf("Failed locking\n"); exit(-1); } prev = cmpxchg(&data->lock, 1, 0); end = get_time(); if (prev != 1) { printf("Failed unlocking\n"); exit(-1); } tracefs_printf(NULL, "released lock!\n"); unextend(); delta = end - start; if (!tdata->total || tdata->max < delta) tdata->max = delta; if (!tdata->total || tdata->min > delta) tdata->min = delta; tdata->total += delta; tdata->x_count++; /* Let someone else have a turn */ while (data->x == last && !data->done) rmb(); } static void *run_thread(void *d) { struct thread_data *tdata = d; struct data *data = tdata->data; init_extend_map(); pthread_barrier_wait(&pbarrier); while (!data->done) { grab_lock(tdata, data); } return NULL; } int main (int argc, char **argv) { unsigned long long total_wait = 0; unsigned long long secs; pthread_t *threads; struct data data; int cpus; memset(&data, 0, sizeof(data)); cpus = sysconf(_SC_NPROCESSORS_CONF); threads = calloc(cpus + 1, sizeof(*threads)); if (!threads) { perror("threads"); exit(-1); } data.tdata = calloc(cpus + 1, sizeof(*data.tdata)); if (!data.tdata) { perror("Allocating tdata"); exit(-1); } tracefs_print_init(NULL); pthread_barrier_init(&pbarrier, NULL, cpus + 2); for (int i = 0; i <= cpus; i++) { int ret; data.tdata[i].data = &data; ret = pthread_create(&threads[i], NULL, run_thread, &data.tdata[i]); if (ret < 0) { perror("creating threads"); exit(-1); } } pthread_barrier_wait(&pbarrier); sleep(5); printf("Finish up\n"); data.done = true; wmb(); for (int i = 0; i <= cpus; i++) { pthread_join(threads[i], NULL); printf("thread %i:\n", i); printf(" count:\t%lld\n", data.tdata[i].x_count); printf(" total:\t%lld\n", data.tdata[i].total); printf(" max:\t%lld\n", data.tdata[i].max); printf(" min:\t%lld\n", data.tdata[i].min); printf(" total wait:\t%lld\n", data.tdata[i].total_wait); printf(" max wait:\t%lld\n", data.tdata[i].max_wait); printf(" min wait:\t%lld\n", data.tdata[i].min_wait); total_wait += data.tdata[i].total_wait; } secs = usec2sec(total_wait); printf("Ran for %lld times\n", data.x); printf("Total wait time: %lld.%06lld\n", secs, total_wait - sec2usec(secs)); return 0; } --MP_/=uzH+ow/_xOFCsWFh3XaCEE--