, ? a comparison of the proposed token representations to other contextualized word representations proposed in the literature

, ? an evaluation of the constructed token representations on a wide range of sentence understanding tasks

, ? a set of ablation studies over the proposed model as well as a discussion of the obtained results

?. Henderson, J. Popa, and D. Nicoleta, A vector space for distributional semantics for entailment, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, 2016.

?. Popa, D. Nicoleta, . Perez, . Julien, J. Henderson et al., Implicit discourse relation classification with syntax-aware contextualized word representations, Proceedings of the 32nd International Florida Artificial Intelligence Research Society Conference, p.32, 2019.

D. Popa, . Nicoleta, . Perez, . Julien, J. Henderson et al., Towards Syntax-aware Token Embeddings, Peer -reviewed International Journal Articles ?, 2019.

. Bibliography and E. Bach, Informal lectures on formal semantics, 1989.

C. F. Baker, C. J. Fillmore, and J. B. Lowe, The Berkeley FrameNet Project, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol.1, 1998.

M. Bansal, K. Gimpel, and K. Livescu, Tailoring continuous word representations for dependency parsing, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.

M. Baroni, R. Bernardi, N. Do, and C. Shan, Entailment above the word level in distributional semantics, Proceedings of the 13th Conference of the European Chapter, 2012.

M. Baroni and A. Lenci, Distributional memory: A general framework for corpusbased semantics, Journal of Computational Linguistics, 2010.

M. Baroni and A. Lenci, How we BLESSed distributional semantic evaluation, Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, 2011.

M. Baroni and R. Zamparelli, Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010.

L. Bentivogli, R. Bernardi, M. Marelli, S. Menini, M. Baroni et al., SICK through the SemEval glasses. Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment, In Journal of Language Resources and Evaluation, 2016.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics, 2017.

A. Bordes, N. Usunier, A. Garcia-duran, J. Weston, Y. et al., Translating embeddings for modeling multi-relational data, Advances in Neural Information Processing Systems 26, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00920777

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, A large annotated corpus for learning natural language inference, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.

C. Braud and P. Denis, Comparing word representations for implicit discourse relation classification, Proceedings of Empirical Methods in Natural Language Processing, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01185927

C. Braud and P. Denis, Learning connective-based word representations for implicit discourse relation identification, Proceedings of Empirical Methods in Natural Language Processing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01397318

O. Camburu, T. Rocktäschel, T. Lukasiewicz, and P. Blunsom, e-SNLI: Natural language inference with natural language explanations, Advances in Neural Information Processing Systems, p.31, 2018.

D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco et al., Universal sentence encoder for english, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2018.

H. Chang, Z. Wang, L. Vilnis, and A. Mccallum, Distributional inclusion vector embedding for unsupervised hypernymy detection, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1, 2018.

C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants et al., , 2013.

, One billion word benchmark for measuring progress in statistical language modeling

J. Chen, Q. Zhang, P. Liu, X. Qiu, and X. Huang, Implicit discourse relation detection via a deep architecture with gated relevance network, Proceedings of 54th Annual Meeting of Association for Computational Linguistics, 2016.

Q. Chen, X. Zhu, Z. Ling, S. Wei, H. Jiang et al., Enhanced LSTM for natural language inference, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, 2017.

X. Chen, Z. Liu, and M. Sun, A unified model for word sense representation and disambiguation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.

J. Cheng and D. Kartsaklis, Syntax-aware multi-sense word embeddings for deep compositional models of meaning, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.

S. Clark, B. Coecke, and M. Sadrzadeh, A compositional distributional model of meaning, Proceedings of the Second AAAI Symposium on Quantum Interaction, 2008.

S. D. Clark and S. G. Pulman, Combining symbolic and distributional models of meaning, AAAI Spring Symposium: Quantum Interaction, 2007.

R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, Proceedings of the 25th International Conference on Machine Learning, 2008.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu et al., , 2011.

, Natural language processing (almost) from scratch, Journal of Machine Learning Research

A. Conneau and D. Kiela, SentEval: An evaluation toolkit for universal sentence representations, Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018.

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, Supervised learning of universal sentence representations from natural language inference data, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01897968

I. Dagan, O. Glickman, and B. Magnini, The PASCAL recognising textual entailment challenge, Proceedings of the First International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, 2006.

I. Dagan, D. Roth, M. Sammons, and F. M. Zanzotto, Recognizing textual entailment: Models and applications, Synthesis Lectures on Human Language Technologies, 2013.

Z. Dai and R. Huang, Improving implicit discourse relation classification by modeling inter-dependencies of discourse units in a paragraph, 2018.

I. Dasgupta, D. Guo, A. Stuhlmüller, S. J. Gershman, and N. D. Goodman, Evaluating compositionality in sentence embeddings, 2018.

P. Dasigi, W. Ammar, C. Dyer, and E. H. Hovy, Ontology-aware token embeddings for prepositional phrase attachment, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017.

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, 1990.

P. W. Foltz, W. Kintsch, and T. K. Landauer, The measurement of textual coherence with latent semantic analysis, Discourse Processes, 1998.

G. Frege, Die Grundlagen der Arithmetik: eine logisch mathematische Untersuchung uber den Begriff der Zahl, 1884.

R. Fu, J. Guo, B. Qin, W. Che, H. Wang et al., Learning semantic hierarchies: A continuous vector space approach, Speech, and Language Processing, 2015.

J. Ganitkevitch, B. Van-durme, and C. Callison-burch, PPDB: The paraphrase database, Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013.

H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang et al., Are you talking to a machine? Dataset and methods for multilingual image question answering, Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015.

M. Geffet and I. Dagan, The distributional inclusion hypotheses and lexical entailment, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005.

J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. Dauphin, Convolutional sequence to sequence learning, Proceedings of the 34th International Conference on Machine Learning, 2017.

S. Ghannay, B. Favre, Y. Estve, and N. Camelin, Word embeddings evaluation and combination, Journal of Language Resources and Evaluation, 2016.

S. Gidaris and N. Komodakis, Object detection via a multi-region and semantic segmentation-aware CNN model, Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015.

D. Gildea and D. Jurafsky, Automatic labeling of semantic roles, Journal of Computational Linguistics, 2002.

G. Glava? and I. Vuli?, Explicit retrofitting of distributional word vectors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol.1, 2018.

X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, International Conference on Artificial Intelligence and Statistics, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752497

A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM networks, Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, 2005.

E. Grefenstette and M. Sadrzadeh, Experimental support for a categorical compositional distributional model of meaning, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011.

J. Guo, W. Che, H. Wang, and T. Liu, Learning sense-specific word embeddings by exploiting bilingual resources, Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, 2014.

Z. Harris, Distributional structure, 1954.

H. He, K. Gimpel, L. , and J. , Multi-perspective sentence similarity modeling with convolutional neural networks, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.

J. Henderson and D. N. Popa, A vector space for distributional semantics for entailment, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, 2016.

F. Hill, R. Reichart, and A. Korhonen, SimLex-999: Evaluating semantic models with (genuine) similarity estimation, Journal of Computational Linguistics, 2015.

D. Hindle and M. Rooth, Structural ambiguity and lexical relations, Journal of Computational Linguistics, 1993.

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 1997.

M. Honnibal and M. Johnson, An improved non-monotonic transition system for dependency parsing, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.

M. Hu and B. Liu, Mining and summarizing customer reviews, Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004.

E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, Improving word representations via global context and multiple word prototypes, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol.1, 2012.

O. Irsoy and C. Cardie, Bidirectional recursive neural networks for token-level labeling with structure, 2013.

F. Issa, M. Damonte, S. B. Cohen, X. Yan, C. et al., Abstract meaning representation for paraphrase detection, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

M. Iyyer, V. Manjunatha, J. Boyd-graber, I. Daumé, and H. , Deep unordered composition rivals syntactic methods for text classification, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, 2015.

Y. Ji and J. Eisenstein, One vector is not enough: Entity-augmented distributed semantics for discourse relations, Transactions of the Association for Computational Linguistics, 2015.

Y. Ji, G. Haffari, and J. Eisenstein, A latent variable recurrent neural network for discourse-driven language models, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016.

R. Józefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu, Exploring the limits of language modeling, 2016.

N. Kalchbrenner, E. Grefenstette, and P. Blunsom, A convolutional neural network for modelling sentences, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.

D. Kartsaklis and M. Sadrzadeh, Prior disambiguation of word tensors for constructing sentence vectors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013.

K. Kawakami and C. Dyer, Learning to represent words in context with multilingual supervision, 2015.

T. Kenter and M. De-rijke, Short text similarity with word embeddings, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015.

S. Kim, J. Hong, I. Kang, and N. Kwak, Semantic sentence matching with denselyconnected recurrent and co-attentive information, Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence, 2019.

Y. Kim, Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, Character-aware neural language models, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the 2014 International Conference on Learning Representations, 2014.

W. Kintsch, Predication. Cognitive science, 2001.

R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba et al., Skip-thought vectors, Proceedings of the 28th International Conference on Neural Information Processing Systems, vol.2, 2015.

F. Kokkinos and A. Potamianos, Structural attention neural networks for improved sentiment analysis, Proceedings of the 15th Conference of the European Chapter, 2017.

A. Komninos and S. Manandhar, Dependency based embeddings for sentence classification tasks, Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 2016.

L. Kotlerman, I. Dagan, I. Szpektor, and M. Zhitomirsky-geffet, Directional distributional similarity for lexical inference, Journal of Natural Language Engineering, 2010.

G. Kruszewski, D. Paperno, and M. Baroni, Deriving boolean structures from distributional vectors, Transactions of the Association for Computational Linguistics, 2015.

M. Labeau and A. Allauzen, Character and subword-based word representation for neural language modeling prediction, 2017.

M. Lan, J. Wang, Y. Wu, Z. Niu, W. et al., Multi-task attention-based neural networks for implicit discourse relationship representation and identification, Proceedings of Empirical Methods in Natural Language Processing, 2017.

M. Lan, Y. Xu, and Z. Niu, Leveraging synthetic discourse data via multi-task learning for implicit discourse relation recognition, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013.

Y. Lan and J. Jiang, Embedding WordNet knowledge for textual entailment, Proceedings of the 27th International Conference on Computational Linguistics, 2018.

T. Landauer, D. Laham, and R. Rehder, How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans, Proceedings of the 19th Annual Conference of the Cognitive Science Society, 1997.

Q. Le and T. Mikolov, Distributed representations of sentences and documents, Proceedings of the 31st International Conference on International Conference on Machine Learning, vol.32, 2014.

W. Lei, X. Wang, M. Liu, I. Ilievski, X. He et al., SWIM: a simple word interaction model for implicit discourse relation recognition, Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017.

A. Lenci and G. Benotto, Identifying hypernyms in distributional semantic spaces, Proceedings of the First Joint Conference on Lexical and Computational Semantics, 2012.

O. Levy, I. Dagan, and J. Goldberger, Focused entailment graphs for open IE propositions, Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 2014.

O. Levy and Y. Goldberg, Dependency-based word embeddings, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.

O. Levy, S. Remus, C. Biemann, and I. Dagan, Do supervised distributional methods really learn lexical inference relations?, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015.

X. Li and D. Roth, Learning question classifiers, Proceedings of the 19th International Conference on Computational Linguistics, 2002.

D. Lin, Automatic retrieval and clustering of similar words, Proceedings of the 17th International Conference on Computational Linguistics, 1998.

Z. Lin, M. Kan, and H. T. Ng, Recognizing implicit discourse relations in the Penn Discourse Treebank, Proceedings of Empirical Methods in Natural Language Processing, 2009.

W. Ling, T. Luís, L. Marujo, R. F. Astudillo, S. Amir et al., Finding function in form: Compositional character models for open vocabulary word representation, 2015.

P. Liu, X. Qiu, and X. Huang, Learning context-sensitive word embeddings with neural tensor skip-gram model, Proceedings of the 24th International Conference on Artificial Intelligence, 2015.

Y. Liu and S. Li, Recognizing implicit discourse relations via repeated reading: Neural networks with multi-level attention, Proceedings of Empirical Methods in Natural Language Processing, 2016.

Y. Liu, S. Li, X. Zhang, and Z. Sui, Implicit discourse relation classification via multi-task neural networks, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016.

S. Macavaney and A. Zeldes, A deeper look into dependency-based word embeddings, Proceedings of the 2018 Conference of the North American Chapter, 2018.

M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, Building a large annotated corpus of english: The Penn Treebank, In Journal of Computational Linguistics -Special issue on using large corpora, 1993.

B. Mccann, J. Bradbury, C. Xiong, and R. Socher, Learned in translation: Contextualized word vectors, Advances in Neural Information Processing Systems, vol.30, 2017.

D. Mccarthy and J. Carroll, Disambiguating nouns, verbs, and adjectives using automatically acquired selectional preferences, Journal of Computational Linguistics, 2003.

O. Melamud, J. Goldberger, and I. Dagan, context2vec: Learning generic context embedding with bidirectional LSTM, Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, 2016.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, CoRR, 2013.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems 26, 2013.

D. Milajevs, D. Kartsaklis, M. Sadrzadeh, and M. Purver, Evaluating neural word representations in tensor-based compositional settings, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.

G. A. Miller, WordNet: A lexical database for english, Communications of the ACM, 1995.

S. Mirkin, L. Specia, N. Cancedda, I. Dagan, M. Dymetman et al., , 2009.

, Source-language entailment modeling for translating unknown terms, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

J. Mitchell and M. Lapata, Vector-based models of semantic composition, Proceedings of the 46th Annual Meeting on Association for Computational Linguistics, 2008.

J. Mitchell and M. Lapata, Composition in distributional models of semantics, In Journal of Cognitive Science, 2010.

N. Mrk?ic, D. Oséaghdha, B. Thomson, M. Ga?ic, L. Rojas-barahona et al., Counter-fitting word vectors to linguistic constraints, 2016.

, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

N. Mrk?i?, I. Vuli?, D. Séaghdha, I. Leviant, R. Reichart et al., Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints, Transactions of the Association for Computational Linguistics, 2017.

S. Necsulescu, S. Mendes, D. Jurgens, N. Bel, and R. Navigli, Reading between the lines: Overcoming data sparsity for accurate classification of lexical relationships, Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, 2015.

A. Neelakantan, J. Shankar, A. Passos, and A. Mccallum, Efficient non-parametric estimation of multiple embeddings per word in vector space, Proceedings of the, 2014.

, Conference on Empirical Methods in Natural Language Processing

K. A. Nguyen, M. Köper, S. Schulte-im-walde, and N. T. Vu, Hierarchical embeddings for hypernymy detection and directionality, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017.

M. Nickel and D. Kiela, Poincaré embeddings for learning hierarchical representations, Advances in Neural Information Processing Systems, vol.30, 2017.

M. Nickel, V. Tresp, and H. Kriegel, A three-way model for collective learning on multi-relational data, Proceedings of the 28th International Conference on Machine Learning, 2011.

M. Nickel, V. Tresp, and H. Kriegel, Factorizing YAGO: scalable machine learning for linked data, Proceedings of the 21st international conference on World Wide Web, 2012.

Y. Nie, Y. Wang, and M. Bansal, Analyzing compositionality-sensitivity of NLI models, Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence, 2019.

S. Padó, M. Galley, D. Jurafsky, and C. D. Manning, Textual entailment features for machine translation evaluation, Proceedings of the Fourth Workshop on Statistical Machine Translation, 2009.

S. Padó and M. Lapata, Dependency-based construction of semantic space models, Journal of Computational Linguistics, 2007.

B. Pang and L. Lee, A sentimental education: Sentiment analysis using subjectivity, Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004.

B. Pang and L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 2005.

B. Partee, Lexical semantics and compositionality. An invitation to cognitive science: Language, 1995.

B. Partee, A. Ter-meulen, and R. Wall, Mathematical Methods in Linguistics, 1990.

R. Pasunuru, H. Guo, and M. Bansal, Towards improving abstractive summarization via entailment generation, Proceedings of the Workshop on New Frontiers in Summarization, 2017.

J. Pennington, R. Socher, and C. D. Manning, GloVe: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.

M. Peters, W. Ammar, C. Bhagavatula, and R. Power, Semi-supervised sequence tagging with bidirectional language models, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

E. Pitler, A. Louis, and A. Nenkova, Automatic sense prediction for implicit discourse relations in text, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009.

E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee et al., Easily identifiable discourse relations, Proceedings of the 22nd International Conference on Computational Linguistics, 2008.

T. Plate, Holographic reduced representations: Convolution algebra for compositional distributed representations, Proceedings of the 12th International Joint Conference on Artificial Intelligence, 1991.

D. N. Popa, J. Perez, J. Henderson, and E. Gaussier, Implicit discourse relation classification with syntax-aware contextualized word representations, Proceedings of the 32nd International Florida Artificial Intelligence Research Society Conference, 2019.

D. N. Popa, J. Perez, J. Henderson, and E. Gaussier, Satoke: How can syntaxaware contextualized word representations benefit implicit discourse relation classification? Conférence sur l'Apprentissage automatique, CAp, 2019.

R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo et al., , 2008.

, The Penn Discourse TreeBank 2.0, Proceedings of the 6th International Conference on Language Resources and Evaluation

L. Qin, Z. Zhang, and H. Zhao, Implicit discourse relation recognition with contextaware character-enhanced embeddings, Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, 2016.

L. Qin, Z. Zhang, and H. Zhao, A stacking gated neural architecture for implicit discourse relation classification, Proceedings of Empirical Methods in Natural Language Processing, 2016.

L. Qin, Z. Zhang, H. Zhao, Z. Hu, and E. Xing, Adversarial connective-exploiting networks for implicit discourse relation classification, Proceedings of the 55th Annual Meeting of Association for Computational Linguistics, 2017.

M. Rei and T. Briscoe, Looking for hyponyms in vector space, Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 2014.

T. Rocktäschel, E. Grefenstette, K. M. Hermann, T. Kocisky, and P. Blunsom, Reasoning about entailment with neural attention, Proceedings of the 2016 International Conference on Learning Representations, 2016.

S. Roller, K. Erk, and G. Boleda, Inclusive yet selective: Supervised distributional hypernymy detection, Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, 2014.

S. Rudolph and E. Giesbrecht, Compositional matrix-space models of language, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010.

A. Rutherford and N. Xue, Discovering implicit discourse relations through brown cluster pair representation and coreference patterns, Proceedings of the 14th Conference of the European Chapter, 2014.

A. Rutherford and N. Xue, Improving the inference of implicit discourse relations via classifying explicit discourse connectives, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015.

B. Sacaleanu, C. Orasan, C. Spurk, S. Ou, O. Ferrandez et al., Entailment-based question answering for structured data, 22nd International Conference on on Computational Linguistics: Demonstration Papers, 2008.

S. Salant and J. Berant, Contextualized word representations for reading comprehension, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

E. Santus, A. Lenci, Q. Lu, and S. S. Im-walde, Chasing hypernyms in vector spaces with entropy, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014.

H. Schütze, Word space, Advances in Neural Information Processing Systems 5, 1993.

E. Shelhamer, J. Long, D. , and T. , Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

V. Shwartz, E. Santus, and D. Schlechtweg, Hypernyms under siege: Linguisticallymotivated artillery for hypernymy detection, Proceedings of the 15th Conference of the European Chapter, vol.1, 2017.

S. Singh, T. Rocktäschel, and S. Riedel, Towards Combined Matrix and Tensor Factorization for Universal Schema Relation Extraction, The North American Chapter of the Association for Computational Linguistics Workshop on Vector Space Modeling for NLP (VSM), 2015.

P. Smolensky, Tensor product variable binding and the representation of symbolic structures in connectionist systems, Journal of Artificial Intelligence, 1990.

R. Snow, D. Jurafsky, and A. Y. Ng, Semantic taxonomy induction from heterogenous evidence, Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, 2006.

R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning, Dynamic pooling and unfolding recursive autoencoders for paraphrase detection, Proceedings of the 24th International Conference on Neural Information Processing Systems, 2011.

R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, Semantic compositionality through recursive matrix-vector spaces, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012.

R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, Grounded compositional semantics for finding and describing images with sentences, Transactions of the Association for Computational Linguistics, 2014.

R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, Semisupervised recursive autoencoders for predicting sentiment distributions, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning et al., , 2013.

, Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning et al., , 2013.

, Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

R. K. Srivastava, K. Greff, and J. Schmidhuber, , 2015.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014.

K. S. Tai, R. Socher, and C. D. Manning, Improved semantic representations from treestructured long short-term memory networks, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol.1, 2015.

J. Tang, M. Qu, M. , and Q. , PTE: Predictive text embedding through large-scale heterogeneous text networks, Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.

T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, Complex embeddings for simple link prediction, Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016.

L. Tu, K. Gimpel, and K. Livescu, Learning to embed words in context for syntactic tasks, 2017.

L. A. Tuan, Y. Tay, S. C. Hui, and S. K. Ng, Learning term embeddings for taxonomic relation identification using dynamic weighting neural network, Proceedings of the, 2016.

, Conference on Empirical Methods in Natural Language Processing

J. Turian, L. Ratinov, and Y. Bengio, Word representations: A simple and general method for semi-supervised learning, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010.

P. D. Turney and S. M. Mohammad, Experiments with three approaches to recognizing lexical entailment, Journal of Natural Language Engineering, 2014.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones et al., Attention is all you need, Advances in Neural Information Processing Systems, vol.30, 2017.

L. Vilnis and A. Mccallum, Word representations via Gaussian embedding, Proceedings of the 2015 International Conference on Learning Representations, 2015.

I. Vulic and N. Mrksic, Specialising word vectors for lexical entailment, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

E. Vylomova, L. Rimell, T. Cohn, and T. Baldwin, Take and took, gaggle and goose, book and read: Evaluating the utility of vector differences for lexical relation learning, 2015.

X. Wang, S. Li, J. Li, L. , and W. , Implicit discourse relation recognition by selecting typical training examples, Proceedings of the 24th International Conference on Computational Linguistics, 2012.

Y. Wang, S. Li, J. Yang, X. Sun, W. et al., Tag-enhanced tree-structured neural networks for implicit discourse relation classification, Proceedings of the 8th International Joint Conference on Natural Language Processing, 2017.

J. Weeds, D. Clarke, J. Reffin, D. Weir, and B. Keller, Learning to distinguish hypernyms and co-hyponyms, Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, 2014.

J. Weeds and D. Weir, A general framework for distributional similarity, Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 2003.

J. Weeds, D. Weir, and D. Mccarthy, Characterising measures of lexical distributional similarity, Proceedings of the 20th International Conference on Computational Linguistics, 2004.

D. Weir, J. Weeds, J. Reffin, and T. Kober, Aligning packed dependency trees: A theory of composition for distributional semantics, Journal of Computational Linguistics, 2016.

R. F. West and K. E. Stanovich, Robust effects of syntactic structure on visual word processing, Memory & Cognition, 1986.

M. Westera and G. Boleda, Don't blame distributional semantics it can't do entailment, Proceedings of the 13th International Conference on Computational Semantics, 2019.

D. Widdows, Semantic vector products: Some initial investigations, Proceedings of the Second AAAI Symposium on Quantum Interaction, 2008.

L. Wittgenstein, Philosophical investigations, 1953.

C. Xu, Y. Bai, J. Bian, B. Gao, X. Liu et al., RC-NET: A general framework for incorporating knowledge into word representations, Proceedings of the 23rd ACM international conference on conference on information and knowledge management, 2014.

Y. Zhou, C. L. Pan, and Y. , Modelling sentence pairs with tree-structured attentive encoder, Proceedings of the 26th International Conference on Computational Linguistics, 2016.

A. Yessenalina and C. Cardie, Compositional matrix-space models for sentiment analysis, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011.

M. Yu and M. Dredze, Improving lexical embeddings with semantic knowledge, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol.2, 2014.

Z. Yu, H. Wang, X. Lin, W. , and M. , Learning term embeddings for hypernymy identification, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.

F. M. Zanzotto, I. Korkontzelos, F. Fallucchi, and S. Manandhar, Estimating linear models for compositional distributional semantics, Proceedings of the 23rd International Conference on Computational Linguistics, 2010.

P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao et al., Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling, Proceedings of the 26th International Conference on Computational Linguistics, 2016.

W. Y. Zou, R. Socher, D. Cer, and C. D. Manning, Bilingual word embeddings for phrase-based machine translation, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013.