D. William, . Gropp, L. Ewing, A. Lusk, and . Skjellum, Using MPI: portable parallel programming with the message-passing interface, 1999.

B. Dupont-de-dinechin, P. Guironnet-de-massas, G. Lager, C. Léger, B. Orgogozo et al., A Distributed Run-Time Environment for the Kalray MPPA??-256 Integrated Manycore Processor, Procedia Computer Science, vol.18, pp.1654-1663, 2013.
DOI : 10.1016/j.procs.2013.05.333

B. Dupont-de-dinechin, R. Ayrignac, P. Beaucamps, P. Couvert, B. Ganne et al., Frédéric Riss, et al. A clustered manycore processor architecture for embedded and accelerated applications, High Performance Extreme Computing Conference (HPEC), pp.2013-2014, 2013.

J. Mottin, M. Cartron, and G. Urlini, The STHORM Platform, Smart Multicore Embedded Systems, pp.35-43, 2014.
DOI : 10.1007/978-1-4614-8800-2_3

N. Hemsoth, The Tiny Chip That Could Disrupt Exascale Computing, 2015.

A. Petitet and J. Dongarra, HPL -A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, 2008.

J. Jack, P. Dongarra, A. Luszczek, and . Petitet, The LINPACK benchmark: past, present and future. Concurrency and Computation: practice and experience, pp.803-820, 2003.

L. Chuck, R. J. Lawson, . Hanson, R. David, F. T. Kincaid et al., Basic linear algebra subprograms for Fortran usage, ACM Transactions on Mathematical Software (TOMS), vol.5, issue.3, pp.308-323, 1979.

J. Jack, J. D. Dongarra, S. Croz, I. S. Hammarling, and . Duff, A set of level 3 basic linear algebra subprograms, ACM Transactions on Mathematical Software (TOMS), vol.16, issue.1, pp.1-17, 1990.

Z. Xianyi, Z. Qian, and . Chothia, OpenBLAS, version 0.2. 8. URL http://www. openblas. net/. Fe tched, pp.9-13, 2013.

S. Khorsandi and A. Leon-garcia, Robust non-probabilistic bounds for delay and throughput in credit-based flow control, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications, pp.577-584, 1996.
DOI : 10.1109/INFCOM.1996.493351

D. Bertozzi, A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou et al., NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. Parallel and Distributed Systems, IEEE Transactions on, vol.16, issue.2, pp.113-129, 2005.

T. Von-eicken, E. David, S. C. Culler, K. E. Goldstein, and . Schauser, Active messages: a mechanism for integrated communication and computation, 1992.

K. Inc, MPPA-256 Cluster and I/O Subsystem Architecture, 2015.

S. Kumar, A. Jantsch, J. Soininen, M. Forsell, M. Millberg et al., A network on chip architecture and design methodology, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002, pp.105-112, 2002.
DOI : 10.1109/ISVLSI.2002.1016885

K. Inc, Kalray platforms and boards, 2015.

J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor High Performance Programming, 2013.

L. Prylli and B. Tourancheau, BIP: A new protocol designed for high performance networking on Myrinet, Parallel and Distributed Processing, pp.472-485, 1998.
DOI : 10.1007/3-540-64359-1_721

L. Prylli, B. Tourancheau, and R. Westrelin, Modeling of a high speed network to maximize throughput performance: the experience of BIP over Myrinet, Parallel and Distributed Processing Techniques and Applications-PDPTA, pp.341-349, 1998.

F. Semiconductor, QorIQ P4080 Communcations Processor Product Brief, Sep, 2008.

M. Bedford-taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat et al., The Raw microprocessor: A computational fabric for software circuits and general-purpose programs, pp.2225-2260, 2002.

J. Psota and A. Agarwal, rMPI: Message Passing on Multicore Processors with On-Chip Interconnect, High Performance Embedded Architectures and Compilers, pp.22-37, 2008.
DOI : 10.1007/978-3-540-77560-7_3

G. Pierre and . Paulin, OpenCL Programming Tools for the STHORM Multi-Processor Platform: Application to Computer Vision, 2013. 13th International Forum on Embedded MPSoC and Multicore, p.24, 2013.

N. Spiros, . Agathos, V. Vassilios, and . Dimakopoulos, Aggelos Mourelis, and Antonis Papadogiannakis Deploying OpenMP on an embedded multicore accelerator, Embedded Computer Systems: Architectures, Modeling , and Simulation (SAMOS XIII), 2013 International Conference on, pp.180-187, 2013.

M. Kang, E. Park, M. Cho, J. Suh, . Kang et al., MPI performance analysis and optimization on tile64/maestro, Proceedings of Workshop on Multi-core Processors for SpaceOpportunities and Challenges Held in conjunction with SMC- IT, pp.19-23, 2009.

M. Bruno-dausbourg, E. Boyer, C. Noulard, and . Pagetti, Deterministic Execution on Many-Core Platforms: application to the SCC, 4th Many-core Applications Research Community (MARC) Symposium, p.43, 2012.

G. Timothy, M. Mattson, T. Riepen, P. Lehnig, W. Brett et al., The 48-core SCC processor: the programmer's view, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010.

C. Clauss, S. Lankes, P. Reble, and T. Bemmerl, Evaluation and improvements of programming models for the Intel SCC many-core processor, 2011 International Conference on High Performance Computing & Simulation, pp.525-532, 2011.
DOI : 10.1109/HPCSim.2011.5999870