L. Dagum and R. Menon, OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998.

D. W. Walker and J. J. Dongarra, MPI: A standard message passing interface, Supercomputer, vol.12, pp.56-68, 1996.

L. S. Blackford, A. Petitet, and R. Pozo, An updated set of basic linear algebra subprograms (BLAS), ACM Transactions on Mathematical Software, vol.28, issue.2, pp.135-151, 2002.

J. Dongarra, P. Beckman, and T. Moore, The international exascale software project roadmap, International Journal of High Performance Computing Applications, vol.25, issue.1, pp.3-60, 2011.

P. Thoman, K. Dichev, and T. Heller, A taxonomy of task-based parallel programming technologies for high-performance computing, Springer Journal of Supercomputing, vol.74, issue.4, pp.1422-1434, 2018.

B. Acun, A. Langer, and E. Meneses, Power, reliability, and performance: One system to rule them all, IEEE Computer, vol.49, issue.10, pp.30-37, 2016.

C. Augonnet, S. Thibault, R. Namyst, and P. A. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

V. Kale and W. D. Gropp, A user-defined schedule for OpenMP, OpenMP, 2017.

. Stonybrook, , 2017.

V. Kale, C. Iwainsky, M. Klemm, M. Kordörfer, J. H. Ciorba et al., Towards A Standard Interface for User-Defined Scheduling in OpenMP, Proceedings of the International Workshop on OpenMP (iWomp, 2019.

S. Bak, Y. Guo, P. Balaji, and V. Sarkar, Optimized Execution of Parallel Loops via User-Defined Scheduling Policies, 48. Proceedings of the International Conference on Parallel Processing, vol.38, pp.1-38, 2019.

M. Grossman, V. Kumar, N. Vrvilo, Z. Budimlic, and V. Sarkar, A pluggable framework for composable HPC scheduling libraries, Proceedings of the International Parallel and Distributed Processing Symposium Workshops, 2017.

F. L. Orlando and . Us, , pp.723-732

O. Aumage, J. Bigot, H. Coullon, C. Pérez, and J. Richard, Combining both a component model and a task-based model for hpc applications: a feasibility study on gysela, Proceedings of the International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01518730

S. Madrid, , pp.635-644

A. Santana, V. Freitas, L. L. Pilla, M. Castro, and J. F. Méhaut, Reducing Global Schedulers Complexity through Runtime System Decoupling, Proceedings of the Brazilian Symposium on High Performance Computing Systems (WSCAD), pp.38-44, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01873526

T. L. Casavant and J. G. Kuhl, A taxonomy of scheduling in general-purpose distributed computing systems, IEEE Trans. Softw. Eng, vol.14, pp.141-154, 1988.

P. H. Penna, A. Gomes, A. T. Castro, and M. , A Comprehensive Performance Evaluation of the BinLPT Workload-aware Loop Scheduler. Concurrency and Computation: Practice and Experience, p.5170, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01986361

L. L. Pilla, C. P. Ribeiro, and P. Coucheney, A topology-aware load balancing algorithm for clustered hierarchical multi-core machines, Future Generation Computer Systems, vol.30, pp.191-201, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00953132

E. Jeannot, E. Meneses, G. Mercier, F. Tessier, and G. Zheng, Communication and Topology-aware Load Balancing in Charm++ with TreeMatch, Proceedings of the International Conference on Cluster Computing (CLUSTER), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00851148

. Indianapolis,

E. L. Padoin, M. Diener, P. Navaux, and J. Méhaut, Managing Power Demand and Load Imbalance to Save Energy on Systems with Heterogeneous CPU Speeds, Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.72-79, 2019.

D. Unat, A. Dubey, and T. Hoefler, Trends in data locality abstractions for HPC systems, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.10, pp.3007-3020, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01621371

M. S. Mollison and J. H. Anderson, Bringing theory into practice: A userspace library for multicore real-time scheduling, Proceedings of the Real-Time and Embedded Technology and Applications Symposium, pp.283-292, 2013.

M. Frasca, K. Madduri, and P. Raghavan, NUMA-aware graph mining techniques for performance and energy efficiency, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012.

P. Penna, M. Castro, P. Plentz, H. Freitas, F. Broquedis et al., BinLPT: A Novel Workload-Aware Loop Scheduler for Irregular Parallel Loops, Proceedings of the Brazilian Symposium on High Performance Computing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01596427

M. Durand, F. Broquedis, T. Gautier, and B. Raffin, An efficient openmp loop scheduler for irregular applications on large-scale numa machines, International Workshop on OpenMP (iWomp), pp.141-155, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00867438

A. Bhatele, S. Fourestier, H. Menon, L. V. Kale, and F. Pellegrini, Applying graph partitioning methods in measurementbased dynamic load balancing. tech. rep, 2011.

J. L. Fattebert, D. Richards, and J. Glosli, Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions, Computer Physics Communications, vol.183, issue.12, pp.2608-2615, 2012.

C. Mei, Y. Sun, and G. Zheng, Enabling and Scaling Biomolecular Simulations of 100 Million Atoms on Petascale Machines with a Multicore-optimized Message-driven Runtime, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p.11, 2011.

V. Freitas, A. Santana, M. Castro, and L. L. Pilla, A Batch Task Migration Approach for Decentralized Global Rescheduling, Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.1-12, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01860626

A. Sen, A quick introduction to the Google C++ Testing Framework, IBM DeveloperWorks, vol.20, pp.1-10, 2010.

G. Zheng, A. Bhatelé, E. Meneses, and L. V. Kalé, Periodic hierarchical load balancing for large supercomputers, The International Journal of High Performance Computing Applications, vol.25, issue.4, pp.371-385, 2011.

I. Karlin, A. Bhatele, and J. Keasler, Exploring Traditional and Emerging Parallel Programming Models using a Proxy Application, Proceedings of the International Parallel & Distributed Processing Symposium, 2013.

U. Boston,

V. Mehta, LeanMD: A Charm++ framework for high performance molecular dynamics simulation on large parallel machines, North Goodwin Avenue, pp.61801-2302, 2004.

S. Che, M. Boyer, and J. Meng, Rodinia: A benchmark suite for heterogeneous computing, Proceedings of the International Symposium on Workload Characterization (IISWC), pp.44-54, 2009.

A. Santana, V. Freitas, M. Castro, L. Pilla, and J. F. Méhaut, ARTful: A specification for userdefined schedulers targeting multiple HPC runtime systems, XXXXXXXXXXXXXXXXXX, vol.00, p.1, 2020.
URL : https://hal.archives-ouvertes.fr/hal-02454426