OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998. ,
MPI: A standard message passing interface, Supercomputer, vol.12, pp.56-68, 1996. ,
An updated set of basic linear algebra subprograms (BLAS), ACM Transactions on Mathematical Software, vol.28, issue.2, pp.135-151, 2002. ,
The international exascale software project roadmap, International Journal of High Performance Computing Applications, vol.25, issue.1, pp.3-60, 2011. ,
A taxonomy of task-based parallel programming technologies for high-performance computing, Springer Journal of Supercomputing, vol.74, issue.4, pp.1422-1434, 2018. ,
Power, reliability, and performance: One system to rule them all, IEEE Computer, vol.49, issue.10, pp.30-37, 2016. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
A user-defined schedule for OpenMP, OpenMP, 2017. ,
, , 2017.
Towards A Standard Interface for User-Defined Scheduling in OpenMP, Proceedings of the International Workshop on OpenMP (iWomp, 2019. ,
Optimized Execution of Parallel Loops via User-Defined Scheduling Policies, 48. Proceedings of the International Conference on Parallel Processing, vol.38, pp.1-38, 2019. ,
A pluggable framework for composable HPC scheduling libraries, Proceedings of the International Parallel and Distributed Processing Symposium Workshops, 2017. ,
, , pp.723-732
Combining both a component model and a task-based model for hpc applications: a feasibility study on gysela, Proceedings of the International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01518730
, , pp.635-644
Reducing Global Schedulers Complexity through Runtime System Decoupling, Proceedings of the Brazilian Symposium on High Performance Computing Systems (WSCAD), pp.38-44, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01873526
A taxonomy of scheduling in general-purpose distributed computing systems, IEEE Trans. Softw. Eng, vol.14, pp.141-154, 1988. ,
A Comprehensive Performance Evaluation of the BinLPT Workload-aware Loop Scheduler. Concurrency and Computation: Practice and Experience, p.5170, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01986361
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines, Future Generation Computer Systems, vol.30, pp.191-201, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00953132
Communication and Topology-aware Load Balancing in Charm++ with TreeMatch, Proceedings of the International Conference on Cluster Computing (CLUSTER), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00851148
,
Managing Power Demand and Load Imbalance to Save Energy on Systems with Heterogeneous CPU Speeds, Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.72-79, 2019. ,
Trends in data locality abstractions for HPC systems, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.10, pp.3007-3020, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01621371
Bringing theory into practice: A userspace library for multicore real-time scheduling, Proceedings of the Real-Time and Embedded Technology and Applications Symposium, pp.283-292, 2013. ,
NUMA-aware graph mining techniques for performance and energy efficiency, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012. ,
BinLPT: A Novel Workload-Aware Loop Scheduler for Irregular Parallel Loops, Proceedings of the Brazilian Symposium on High Performance Computing Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01596427
An efficient openmp loop scheduler for irregular applications on large-scale numa machines, International Workshop on OpenMP (iWomp), pp.141-155, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00867438
Applying graph partitioning methods in measurementbased dynamic load balancing. tech. rep, 2011. ,
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions, Computer Physics Communications, vol.183, issue.12, pp.2608-2615, 2012. ,
Enabling and Scaling Biomolecular Simulations of 100 Million Atoms on Petascale Machines with a Multicore-optimized Message-driven Runtime, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p.11, 2011. ,
A Batch Task Migration Approach for Decentralized Global Rescheduling, Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.1-12, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01860626
A quick introduction to the Google C++ Testing Framework, IBM DeveloperWorks, vol.20, pp.1-10, 2010. ,
Periodic hierarchical load balancing for large supercomputers, The International Journal of High Performance Computing Applications, vol.25, issue.4, pp.371-385, 2011. ,
Exploring Traditional and Emerging Parallel Programming Models using a Proxy Application, Proceedings of the International Parallel & Distributed Processing Symposium, 2013. ,
,
LeanMD: A Charm++ framework for high performance molecular dynamics simulation on large parallel machines, North Goodwin Avenue, pp.61801-2302, 2004. ,
Rodinia: A benchmark suite for heterogeneous computing, Proceedings of the International Symposium on Workload Characterization (IISWC), pp.44-54, 2009. ,
ARTful: A specification for userdefined schedulers targeting multiple HPC runtime systems, XXXXXXXXXXXXXXXXXX, vol.00, p.1, 2020. ,
URL : https://hal.archives-ouvertes.fr/hal-02454426