Technical reports
Deep embodiment: grounding semantics in perceptual modalities
February 2017, 128 pages
This technical report is based on a dissertation submitted July 2016 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Darwin College.
DOI: 10.48456/tr-899
Abstract
Multi-modal distributional semantic models address the fact that text-based semantic models, which represent word meanings as a distribution over other words, suffer from the grounding problem. This thesis advances the field of multi-modal semantics in two directions. First, it shows that transferred convolutional neural network representations outperform the traditional bag of visual words method for obtaining visual features. It is then shown that these representations may be applied successfully to various natural language processing tasks. Second, it performs the first ever experiments with grounding in the non-visual modalities of auditory and olfactory perception using raw data. Deep learning, a natural fit for deriving grounded representations, is used to obtain the highest-quality representations compared to more traditional approaches. Multi-modal representation learning leads to improvements over language-only models in a variety of tasks. If we want to move towards human-level artificial intelligence, we will need to build multi-modal models that represent the full complexity of human meaning, including its grounding in our various perceptual modalities.
Full text
PDF (5.8 MB)
BibTeX record
@TechReport{UCAM-CL-TR-899, author = {Kiela, Douwe}, title = {{Deep embodiment: grounding semantics in perceptual modalities}}, year = 2017, month = feb, url = {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-899.pdf}, institution = {University of Cambridge, Computer Laboratory}, doi = {10.48456/tr-899}, number = {UCAM-CL-TR-899} }