Using MPI: portable parallel programming with the message-passing interface, 1999. ,
A Distributed Run-Time Environment for the Kalray MPPA??-256 Integrated Manycore Processor, Procedia Computer Science, vol.18, pp.1654-1663, 2013. ,
DOI : 10.1016/j.procs.2013.05.333
Frédéric Riss, et al. A clustered manycore processor architecture for embedded and accelerated applications, High Performance Extreme Computing Conference (HPEC), pp.2013-2014, 2013. ,
The STHORM Platform, Smart Multicore Embedded Systems, pp.35-43, 2014. ,
DOI : 10.1007/978-1-4614-8800-2_3
The Tiny Chip That Could Disrupt Exascale Computing, 2015. ,
HPL -A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, 2008. ,
The LINPACK benchmark: past, present and future. Concurrency and Computation: practice and experience, pp.803-820, 2003. ,
Basic linear algebra subprograms for Fortran usage, ACM Transactions on Mathematical Software (TOMS), vol.5, issue.3, pp.308-323, 1979. ,
A set of level 3 basic linear algebra subprograms, ACM Transactions on Mathematical Software (TOMS), vol.16, issue.1, pp.1-17, 1990. ,
OpenBLAS, version 0.2. 8. URL http://www. openblas. net/. Fe tched, pp.9-13, 2013. ,
Robust non-probabilistic bounds for delay and throughput in credit-based flow control, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications, pp.577-584, 1996. ,
DOI : 10.1109/INFCOM.1996.493351
NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. Parallel and Distributed Systems, IEEE Transactions on, vol.16, issue.2, pp.113-129, 2005. ,
Active messages: a mechanism for integrated communication and computation, 1992. ,
MPPA-256 Cluster and I/O Subsystem Architecture, 2015. ,
A network on chip architecture and design methodology, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002, pp.105-112, 2002. ,
DOI : 10.1109/ISVLSI.2002.1016885
Kalray platforms and boards, 2015. ,
Intel Xeon Phi Coprocessor High Performance Programming, 2013. ,
BIP: A new protocol designed for high performance networking on Myrinet, Parallel and Distributed Processing, pp.472-485, 1998. ,
DOI : 10.1007/3-540-64359-1_721
Modeling of a high speed network to maximize throughput performance: the experience of BIP over Myrinet, Parallel and Distributed Processing Techniques and Applications-PDPTA, pp.341-349, 1998. ,
QorIQ P4080 Communcations Processor Product Brief, Sep, 2008. ,
The Raw microprocessor: A computational fabric for software circuits and general-purpose programs, pp.2225-2260, 2002. ,
rMPI: Message Passing on Multicore Processors with On-Chip Interconnect, High Performance Embedded Architectures and Compilers, pp.22-37, 2008. ,
DOI : 10.1007/978-3-540-77560-7_3
OpenCL Programming Tools for the STHORM Multi-Processor Platform: Application to Computer Vision, 2013. 13th International Forum on Embedded MPSoC and Multicore, p.24, 2013. ,
Aggelos Mourelis, and Antonis Papadogiannakis Deploying OpenMP on an embedded multicore accelerator, Embedded Computer Systems: Architectures, Modeling , and Simulation (SAMOS XIII), 2013 International Conference on, pp.180-187, 2013. ,
MPI performance analysis and optimization on tile64/maestro, Proceedings of Workshop on Multi-core Processors for SpaceOpportunities and Challenges Held in conjunction with SMC- IT, pp.19-23, 2009. ,
Deterministic Execution on Many-Core Platforms: application to the SCC, 4th Many-core Applications Research Community (MARC) Symposium, p.43, 2012. ,
The 48-core SCC processor: the programmer's view, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010. ,
Evaluation and improvements of programming models for the Intel SCC many-core processor, 2011 International Conference on High Performance Computing & Simulation, pp.525-532, 2011. ,
DOI : 10.1109/HPCSim.2011.5999870