NLP | Natural Language Processing Researcher
.01

ABOUT

PERSONAL DETAILS
8690 Paul Street, San fransico
amrk@itu.dk
Researcher in Natural Language Processing Computational Linguistics Structured Prediction Morphology and Syntax Program Synthesis

BIO

ABOUT ME

I am a postdoctoral researcher at the Department of Computer Science and Technology, University of Cambridge since 3rd September 2020. I am working with Dr Andreas Vlachos with focus on my research in claim verification using knowledge bases. Prior to this, I was a psotdoc at ITU Copenhagen for an year. I joined my PhD under the supervision of Prof Pawan Goyal at the Dept. of Computer Science and Engineering, IIT Kharagpur, India on July 2015. I defended my thesis titled ‘Addressing Language Specific Characteristics for Data-Driven Modelling of Lexical, Syntactic and Prosodic Tasks in Sanskrit’ on October 2019. Broadly, I am interested in anything that comes under computational linguistics and Natural Language Processing. Specifically my research interests lies in morphology, syntax, semantics, structured prediction and program synthesis.

HOBBIES

INTERESTS

Duis eu finibus urna. Pellentesque facilisis tellus vel leo accumsan, a tristique est luctus. Morbi quis euismod nulla. Sed eu nibh eros.

Duis eu finibus urna. Pellentesque facilisis tellus vel leo accumsan, a tristique est luctus. Morbi quis euismod nulla. Sed eu nibh eros.

Duis eu finibus urna. Pellentesque facilisis tellus vel leo accumsan, a tristique est luctus. Morbi quis euismod nulla. Sed eu nibh eros.

Duis eu finibus urna. Pellentesque facilisis tellus vel leo accumsan, a tristique est luctus. Morbi quis euismod nulla. Sed eu nibh eros.

FACTS

NUMBERS ABOUT ME

920
CUPS OF COFFEE
65
PROJECTS COMPLETED
2965
HOURS OF CODING
35
WORKSHOPS
2M
LINES OF CODE
100
SATISFIED CUSTOMERS

.02

RESUME

  • EDUCATION
  • 2015
    2019
    Kharagpur

    COMPUTER SCIENCE & ENGINEERING - P.hD.

    IIT Kharagpur

    I joined my PhD under the supervision of Prof Pawan Goyal at the Dept. of Computer Science and Engineering, IIT Kharagpur, India on July 2015. I defended my thesis titled ‘Addressing Language Specific Characteristics for Data-Driven Modelling of Lexical, Syntactic and Prosodic Tasks in Sanskrit’ on October 2019.
  • 2013
    2015
    Kharagpur

    COMPUTER SCIENCE & ENGINEERING - M.Tech

    IIT Kharagpur

    Graduated with a CGPA of 9.28 (of 10). Courses include Machine Learning, Natural Language Processing, Information Retrieval, Complex Networks, Graph Theory.
  • 2008
    2012
    Cochin, Kerala

    COMPUTER SCIENCE & ENGINEERING - B.Tech

    Federal Institute of Science and Technology (FISAT)

    Recipient of "Special Recognition Award" for graduating students 2012, from the department of CSE at FISAT.
  • VOLUNTEERING AND ACADEMIC SERVICES
  • 2017
    2019
    website

    ORGANISING COMMITTEE MEMBER

    CODS-COMAD 2018 & 2019

    Rendered my service as web-chair for both the editions of the Joint CODS-COMAD conference.
  • 2018
    Web Site

    PROGRAM COMMITTEE MEMBER

    COLING 2018, 2020, LREC 2020

    Reviewer for the The 27th and 28th International Conference on Computational Linguistics (COLING 2018, 2020) and LREC 2020
  • 2018
    Web Site

    STUDENT VOLUNTEER

    EMNLP 2018

    Student Volunteer at the 2018 Conference on Empirical Methods in Natural Language Processing, October 31–November 4 Brussels, Belgium
  • 2015
    2018
    website

    WEBMASTER

    CNeRG

    Rendered my service as webmaster and social media manager for CNeRG group at IIT Kharagpur.
  • 2016
    2019

    WEB CHAIR

    INTERNATIONAL WORKSHOPS AND SYMPOSIUMS

    Rendered my service as web-chair for various international events. The events include 6th International Sanskrit Computational Linguistics Symposium, Data Science in India 2017 (KDD 17), and Workshop on SCL (ICON 2016)
  • HONORS AND AWARDS
  • CONFERENCE TRAVEL GRANT

    RECIPIENT OF CONFERENCE TRAVEL GRANTS FROM MICROSOFT, EMNLP, ACM-IARCS, CNeRG

  • 2015

    PROJECT GRANT UNDER GOOGLE-IIT PILOT PROGRAM

    INDIC VIEW - MOBILE-OCR FOR INDIC LANGUAGES

    Received a project grant of INR 500,000.
  • 2015

    IBM DAY BEST DEMO AWARD

    FeRoSA RECOMMENDER ENGINE

    FeRoSA is a faceted recommendation engine for the papers in ACL Web Anthology. For details, Click here
  • 2012

    SPECIAL RECOGNITION AWARD

    FISAT AWARD FOR OUTGOING STUDENTS OF 2012

    Awarded for designing and co-ordinating a course on HTML5 and CSS3 as part of the FISAT extension programme.
.03

PUBLICATIONS

PUBLICATIONS LIST - My complete publciations list can be found at Google scholar profile
SORTING BY DATE
11 MAY 2020

An Interface for Morpho-syntactic annotation of Sanskrit Corpora.

LREC 2020, Marseille

Authors: Amrith Krishna, Shiv Vidhyut, Dilpreet Chawla, Sruti Sambhav,i Pawan Goyal.

Conferences
img

An Interface for Morpho-syntactic annotation of Sanskrit Corpora

Authors: Amrith Krishna, Shiv Vidhyut, Dilpreet Chawla, Sruti Sambhav,i Pawan Goyal. LREC 2020 Conferences Selected

We propose a web-based annotation framework, SHR++, for morpho-syntactic annotation of corpora in Sanskrit. SHR++ is designed to generate annotations for the word-segmentation, morphological parsing and dependency analysis tasks in Sanskrit. It incorporates analyses and predictions from various tools designed for processing texts in Sanskrit, and utilises them to ease the cognitive load of the human annotators. Specifically, SHR++ uses Sanskrit Heritage Reader (Goyal and Huet, 2016), a lexicon driven shallow parser for enumerating all the phonetically and lexically valid word splits along with their morphological analyses for a given string. This would help the annotators in choosing the solutions, rather than performing the segmentations by themselves. Further, predictions from a word segmentation tool (Krishna et al., 2018) are added as suggestions that can aid the human annotators in their decision making. Our evaluation shows that enabling this segmentation suggestion component reduces the annotation time by 20.15 %. SHR++ can be accessed online at http://vidhyut97.pythonanywhere.com/ and the codebase, for the independent deployment of the system elsewhere, is hosted at https://github.com/iamdsc/smart-sanskrit-annotator.

29 JUL 2019

Poetry to Prose Conversion in Sanskrit as a Linearisation Task: A Case for Low-Resource Languages

ACL 2019, Florence, Italy

Authors: Amrith Krishna, Vishnu Sharma, Bishal Santra, Aishik Chakraborty, Pavankumar Satuluri, Pawan Goyal.

Conferences Selected Paper (Short)
2 NOV 2018

FREE AS IN FREE WORD ORDER: AN ENERGY BASED MODEL FOR WORD SEGMENTATION AND MORPHOLOGICAL TAGGING IN SANSKRIT

EMNLP 2018, BRUSSELS, BELGIUM

Authors: Amrith Krishna, Bishal Santra, Sasi Prasanth Bandaru, Gaurav Sahu, Vishnu Dutt Sharma, Pavankumar Satuluri, Pawan Goyal.

Conferences Selected Paper | Code
img

Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit

Amrith Krishna, Bishal Santra, Sasi Prasanth Bandaru, Gaurav Sahu, Vishnu Dutt Sharma, Pavankumar Satuluri, Pawan Goyal. In the proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics(ACL) Conferences Selected

The configurational information in sentences of a free word order language such as Sanskrit is of limited use. Thus, the context of the entire sentence will be desirable even for basic processing tasks such as word segmentation. We propose a structured prediction framework that jointly solves the word segmentation and morphological tagging tasks in Sanskrit. We build an energy based model where we adopt approaches generally employed in graph based parsing techniques (McDonald et al., 2005a; Carreras, 2007). Our model outperforms the state of the art with an F-Score of 96.92 (percentage improvement of 7.06%) while using less than one tenth of the task-specific training data. We find that the use of a graph based approach instead of a traditional lattice-based sequential labelling approach leads to a percentage gain of 12.6% in F-Score for the segmentation task.

1 NOV 2018

UPCYCLE YOUR OCR: REUSING OCRS FOR POST-OCR TEXT CORRECTION IN ROMANISED SANSKRIT

CoNLL 2018, BRUSSELS, BELGIUM

Authors: Amrith Krishna, Bodhisattwa Prasad Majumder, Rajesh Shreedhar Bhat, Pawan Goyal

Conferences Selected Paper | Code
img

Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit

Amrith Krishna, Bodhisattwa Prasad Majumder, Rajesh Shreedhar Bhat, Pawan Goyal. Proceedings of the 22nd Conference on Computational Natural Language Learning, Association for Computational Linguistics Conference Paper

We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. Owing to the lack of resources our approach uses OCR models trained for other languages written in Roman. Currently, there exists no dataset available for Romanised Sanskrit OCR. So, we bootstrap a dataset of 430 images, scanned in two different settings and their corresponding ground truth. For training, we synthetically generate training images for both the settings. We find that the use of copying mechanism (Gu et al., 2016) yields a percentage increase of 7.69 in Character Recognition Rate (CRR) than the current state of the art model in solving monotone sequence-tosequence tasks (Schnober et al., 2016). We find that our system is robust in combating OCR-prone errors, as it obtains a CRR of 87.01% from an OCR output with CRR of 35.76% for one of the dataset settings. A human judgement survey performed on the models shows that our proposed model results in predictions which are faster to comprehend and faster to improve for a human than the other systems .

24 MAY 2018

BUILDING A WORD SEGMENTER FOR SANSKRIT OVERNIGHT

LREC 2018, MIYAZAKI, JAPAN

Authors: Vikas Reddy, Amrith Krishna, Vishnu Dutt Sharma, Prateek Gupta, Pawan Goyal.

Conferences
img

Building a Word Segmenter for Sanskrit Overnight

Amrith Krishna, Vishnu Dutt Sharma, Prateek Gupta, Pawan Goyal.In the proceedings of the Eleventh International Conference on Language Resources and Evaluation Demonstrations

There is abundance of digitised texts available in Sanskrit. However, the word segmentation task in such texts are challenging due to the issue of Sandhi . In Sandhi, words in a sentence often fuse together to form a single chunk of text, where the word delimiter vanishes and sounds at the word boundaries undergo transformations, which is also reflected in the written text. Here, we propose an approach that uses a deep sequence to sequence (seq2seq) model that takes only the sandhied string as the input and predicts the unsandhied string. The state of the art models are linguistically involved and have external dependencies for the lexical and morphological analysis of the input. Our model can be trained “overnight” and be used for production. In spite of the knowledge lean approach, our system preforms better than the current state of the art by gaining a percentage increase of 16.79 % than the current state of the art.

1 AUG 2017

A DATASET FOR SANSKRIT WORD SEGMENTATION

LaTeCH-CLfL Workshop, ACL, VANCOUVER, CANADA

Authors: Amrith Krishna, Pavankumar Satuluri, Pawan Goyal

Workshops
img

A Dataset for Sanskrit Word Segmentation

Authors: Amrith Krishna, Pavankumar Satuluri, Pawan Goyal. Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature Theses

The last decade saw a surge in digitisation efforts for ancient manuscripts in Sanskrit. Due to various linguistic peculiarities inherent to the language, even the preliminary tasks such as word segmentation are non-trivial in Sanskrit. Elegant models for Word Segmentation in Sanskrit are indispensable for further syntactic and semantic processing of the manuscripts. Current works in word segmentation for Sanskrit, though commendable in their novelty, often have variations in their objective and evaluation criteria. In this work, we set the record straight. We formally define the objectives and the requirements for the word segmentation task. In order to encourage research in the field and to alleviate the time and effort required in pre-processing, we release a dataset of 115,000 sentences for word segmentation. For each sentence in the dataset we include the input character sequence, ground truth segmentation, and additionally lexical and morphological information about all the phonetically possible segments for the given sentence. In this work, we also discuss the linguistic considerations made while generating the candidate space of the possible segments.

31 JUL 2017

A GRAPH BASED SEMI-SUPERVISED APPROACH FOR ANALYSIS OF DERIVATIONAL NOUNS IN SANSKRIT

TEXTGRAPHS 11, ACL, VANCOUVER, CANADA

Authors: Amrith Krishna, Pavankumar Satuluri, Harshavardhan Ponnada, Muneeb Ahmed, Gulab Arora, Kaustubh Hiware, Pawan Goyal

Workshops
img

A Graph Based Semi-Supervised Approach for Analysis of Derivational Nouns in Sanskrit

Workshops

Derivational nouns are widely used in Sanskrit corpora and is a prevalent means of productivity in the language. Currently there exists no analyser that identifies the derivational nouns. We propose a semi supervised approach for identification of derivational nouns in Sanskrit. We not only identify the derivational words, but also link them to their corresponding source words. The novelty of our work is primarily in its design of the network structure for the task. The edge weights are featurised based on the phonetic, morphological, syntactic and the semantic similarity shared between the words to be identified. We find that our model is effective for the task, even when we employ a labelled dataset which is only 5% to that of the entire dataset.

13 DEC 2016

WORD SEGMENTATION IN SANSKRIT USING PATH CONSTRAINED RANDOM WALKS

COLING 16, OSAKA, JAPAN

Authors: Amrith Krishna, Bishal Santra, Pavankumar Satuluri, Sasi Prasanth Bandaru, Bhumi Faldu, Yajuvendra Singh, Pawan Goyal

Conferences Selected
img

Word Segmentation in Sanskrit Using Path Constrained Random Walks

Book Chapters

In Sanskrit, the phonemes at the word boundaries undergo changes to form new phonemes through a process called as sandhi. A fused sentence can be segmented into multiple possible segmentations. We propose a word segmentation approach that predicts the most semantically valid segmentation for a given sentence. We treat the problem as a query expansion problem and use the path-constrained random walks framework to predict the correct segments.

12 DEC 2016

COMPOUND TYPE IDENTIFICATION IN SANSKRIT: WHAT ROLES DO THE CORPUS AND GRAMMAR PLAY?

WSSANLP, COLING 16, OSAKA, JAPAN

Authors: Amrith Krishna, Pavankumar Satuluri, Shubham Sharma, Apurv Kumar, Pawan Goyal

Workshops
img

Compound Type Identification in Sanskrit: What Roles do the Corpus and Grammar Play?

Amrith Krishna, Pavankumar Satuluri, Shubham Sharma, Apurv Kumar, Pawan Goyal Book Chapters Selected

We propose a classification framework for semantic type identification of compounds in Sanskrit. We broadly classify the compounds into four different classes namely, Avyayıbhava, Tatpurus. a, Bahuvrıhi and Dvandva. Our classification is based on the traditional classification system as mentioned in the ancient grammar treatise As. t. adhyayı by Pan. ini, written 25 centuries back. We construct an elaborate feature space for our system by combining conditional rules from the grammar As. t. adhy ayı, semantic relations between the compound components from a lexical database Amarakos. a and linguistic structures from the data using Adaptor Grammars. Our in-depth analysis of the feature space highlights the inadequacy of As. t. adhy ayı, a generative grammar, in classifying the data samples. Our experimental results validate the effectiveness of using lexical databases as suggested by Kulkarni and Kumar (2013) and put forward a new research direction by introducing linguistic patterns obtained from Adaptor grammars for effective identification of compound type. We utilise an ensemble based approach, specifically designed for handling skewed datasets and we achieve an overall accuracy of 0.77 using random forest classifiers.

22 APR 2016

FeRoSA: A FACETED RECOMMENDATION SYSTEM FOR SCIENTIFIC ARTICLES

PAKDD 2016, AUCKLAND, NEW ZELAND

Authors : Tanmoy Chakraborty, Amrith Krishna, Mayank Singh, Niloy Ganguly, Pawan Goyal, Animesh Mukherjee

Conferences
img

FeRoSA: A Faceted Recommendation System for Scientific Articles

Tanmoy Chakraborty, Amrith Krishna, Mayank Singh, Niloy Ganguly, Pawan Goyal, Animesh Mukherjee Book

The overwhelming number of scientific articles over the years calls for smart automatic tools to facilitate the process of literature review. Here, we propose for the first time a framework of faceted recommendation for scientific articles (abbreviated as FeRoSA ) which apart from ensuring quality retrieval of scientific articles for a query paper, also efficiently arranges the recommended papers into different facets (categories). Providing users with an interface which enables the filtering of recommendations across multiple facets can increase users’ control over how the recommendation system behaves. FeRoSA is precisely built on a random walk based framework on an induced subnetwork consisting of nodes related to the query paper in terms of either citations or content similarity. Rigorous analysis based an experts’ judgment shows that FeRoSA outperforms two baseline systems in terms of faceted recommendations (overall precision of 0.65). Further, we show that the faceted results of FeRoSA can be appropriately combined to design a better flat recommendation system as well.

22 JUL 2015

TOWARDS AUTOMATING THE GENERATION OF DERIVATIVE NOUNS IN SANSKRIT BY SIMULATING PANINI

WSC 15, BANGKOK, THAILAND

Authors: Amrith Krishna, Pawan Goyal

Conferences
img

TOWARDS AUTOMATING THE GENERATION OF DERIVATIVE NOUNS IN SANSKRIT BY SIMULATING PANINI

Journal Paper

About 1115 rules in Astadhyayi from A. 4.1. 76 to A. 5.4. 160 deal with generation of derivative nouns, making it one of the largest topical sections in Astadhyayi, called as the Taddhita section owing to the head rule A. 4.1. 76. This section is a systematic arrangement of rules that enumerates various affixes that are used in the derivation under specific semantic relations. We propose a system that automates the process of generation of derivative nouns as per the rules in Astadhyayi. The proposed system follows a completely object oriented approach, that models each rule as a class of its own and then groups them as rule groups. The rule groups are decided on the basis of selective grouping of rules by virtue of anuvrtti. The grouping of rules results in an inheritance network of rules which is a directed acyclic graph. Every rule group has a head rule and the head rule notifies all the direct member rules of the group about the environment which contains all the details about data entities, participating in the derivation process. The system implements this mechanism using multilevel inheritance and observer design patterns. The system focuses not only on generation of the desired final form, but also on the correctness of sequence of rules applied to make sure that the derivation has taken place in strict adherence to Astadhyayi. The proposed system's design allows to incorporate various conflict resolution methods mentioned in authentic texts and hence the effectiveness of those rules can be validated with the results from the system.

.04

TALKS & TEACHING

  • TALKS
  • 13 AUG
    2018

    Amrita Vishwa Vidyapeetham, Kochi

    AN INTRODUCTION TO MACHINE LEARNING AND DEEP LEARNING

    An introductory lecture on ML and DL.
  • 30 OCT
    2017

    International Seminar on Paradigm Shift in Indian Linguistics and its Implications for Applied Disciplines, IIAS Shimla,

    SYNTHESISING GRAMMARS AND PROGRAMS FOR NATURAL LANGUAGE FROM DATA

    Preliminary results and observations on program synthesis for paradigm learning of inflectional morphology.
  • 18 DEC
    2016

    Workshop for Bridging the gap between Sanskrit Computational Linguistics tools and management of Sanskrit Digital Libraries

    A DATASET FOR WORD SEGMENTATION IN SANSKRIT

    Annotation scheme for preparing dataset for word segmentation in Sanskrit.
  • 10 JAN
    2016

    ASTRA International Conference 2016, Deccan College, Pune

    AUTOMATED SANSKRIT TEXT SEGMENTATION AIDED BY STATISTICAL ANALYSIS

    An HMM based model for word segmentation in Sanskrit.
  • 10 JAN
    2016

    ASTRA International Conference 2016, Deccan College, Pune

    NAMED ENTITY RECOGNITION IN BHAGAVATHAM WITH RICH LINGUISTIC FEATURES

    A feature rich CRF based model for NER in Sanskrit.
.05

CONTACT

Get in touch


Amrith Krishna
4E02, Dept. of CS
ITU, Copenhagen


Reach out


Email - amrk [AT] itu [DOT] dk
Email - amrith [AT] iitkgp [DOT] ac [DOT] in (delayed response)
Twitter - @krishanmrith12