This study compares simple hash-based methods for computing distributional similarity with the complex singular value decomposition approach. It comes to the conclusion that the simpler methods produce both better quality and can be computed at a fraction of the time required for SVD. (2011)
Like the title says, several possible measures of word co-occurrence and distributional similarity are compared in a systematic way with each other and on large data. The quality differences were found be quite strong. (2008)
We participated in an effort to build a user-centric and realistic evaluation for email related NLP technologies. Special attention was paid to cost-effectivity as well as data protection. (2013)
How to work effectively with different, interrelated time lines especially when navigating them? This patent application covers some of our UI experiments mostly on tablets that allow for interactively moving informational entities on the screen to filter time lines. (2013)
Early work on automatic learning of meaningful relations between words. It was supersided by more recent work on distributional semantics, but it is nonetheless interesting. (2004)
One of the techniques that we use to build up a knowledge graph. The Cognitive Workbench actually uses multiple techniques depending on the type and structure of the incoming information. (2013)
A technology to crowd source semantics from unstructured data gained from a massive amount of mobile devices. Notably, it covers the privacy aspect, i.e. The data extracted does not allow for guessing the identity, name or any personal details of the user. The technology is currently not part of our publicly available products. (2013)
In this patent application, we cover the Magneto technology for differential semantic tag clouds. (2013)
This comparison evaluates the exploitation of unstructured data in industrial quality analysis methods. It shows that textual resources provides tremendously more and more detailed information for some tasks than established data mining methods on structured data.
Hänig, C., Schierle, M. und Trabold, D.: Comparison of Structured vs. Unstructured Data for Industrial Quality Analysis. In: Proceedings of the World Congress on Engineering and Computer Science 2010 Vol I (WCECS 2010), IAENG, 2010
(Best Paper Award)
In this study, we investigated differences between “heavy” (“take a computer”) and “light” verbs (“take a shower”).
Brain Research, Volume 1249, 16 January 2009, Pages 173–180
Theoretical and practical background for some of our core NLP technology. The goal was to improve our understanding of language independent algorithms that produce language-specific knowledge which then can be used in more specific solutions. Areas covered include lexical knowledge, lexical ambiguity, morphological level, as well as syntactical topics. 2007.
In this paper we present the specifics of the ExB algorithm which obtained the 2nd position in the Gland Segmentation Challenge (GlaS) organised at MICCAI 2015. Our method is based on a Multi-Path Convolutional Neural Network architecture for image segmentation. A major innovation of our model is the specialized border identification network which improves accuracy at the borders of glands and substantially improve the overall segmentation accuracy.
Sirinukunwattana, Korsuk, Josien PW Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J Matuszewski, Elia Bruni, Urko Sanchez, Anton Böhm, Olaf Ronneberger, Bassem Ben Cheikh, Daniel Racoceanu, Philipp Kainz, Michael Pfeiffer, Martin Urschler, David RJ Snead, Nasir M Rajpoot (2016): Gland Segmentation in Colon Histology Images: The GlaS Challenge Contest
Neuro-scientific research into how the brain builds up syntactic representations. This is fundamental research following a buttom up approach to understanding how the human brain is understanding a sentence.
June 2007, Vol. 19, No. 6, Pages 971-980 doi:10.1162/jocn.2007.19.6.971
This work extends our previous unsupervised parsing model by head detection and phrase type clustering and significantly improves the capability to parse sentences without labelled training data.
Hänig, C.: Improvements in Unsupervised Co-Occurrence Based Parsing. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), Association for Computational Linguistics, 2010
This work presents an algorithm that is able to detect verbs relying on completely unsupervised language processing methods. Being able to recognize action clues in textual resources enables our applications to apply methods for deeper language understanding (such as relation extraction) in a completely unsupervised manner.
Hänig, C.: Knowledge-free Verb Detection through Tag Sequence Alignment. In: Proceedings of the 18th International Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia, Northern European Association for Language Technology (NEALT), 2011
Early description of a large scale language resources project and the required technologies around it. (2004)
This work presents our state of the art multilingual text summarizer capable of single as well as multi-document text summarization. The algorithm is based on repeated application of TextRank on a sentence similarity graph, a bag of words model for sentence similarity and a number of linguistic pre- and post-processing steps using standard NLP tools. We submitted this algorithm for two different tasks of the MultiLing 2015 summarization challenge: Multilingual Singledocument Summarization and Multilingual Multi-document Summarization.
Thomas, Stefan, Christian Beutenmüller, Xose de la Puente, Robert Remus, and Stefan Bordag (2015): 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, page 260
Early experiments in automatic classification of brain activity using neural networks. The setup was visual presentation of single words, processing by the human brain. Neuromagnetic brain activity was recorded and then fed into an artificial neuronal network to classify which of the visually presented words had actually been processed by the brain.
Emergent Neural Computational Architectures Based on Neuroscience
Lecture Notes in Computer Science Volume 2036, 2001, pp 311-319
The patent application covers some of our technologies to analyse a document that the user is viewing or editing and to compute relevant contextual information and to show it to the user in the right point of time. (2013)
This work points out the challenges when analyzing polarity within a specific domain and when dealing with user-generated textual resources. Two comprehensively annotated corpora (English and German) constisting of user-generated data were made publicly available as gold standard data sets for experiments and evaluations.
C. Hänig, A. Niekler und C. Wünsch: PACE Corpus: a Multilingual Corpus of Polarity-Annotated Textual Data from the Domains Automotive and Cellphone. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), 2014
Hänig, C. und Schierle, M.: Relation Extraction based on Unsupervised Syntactic Parsing. In: Proceedings of the conference on Text Mining Services (TMS 2009), 2009
A verb’s argument structure defines the number and relationships of participants needed for a complete event. One-argument (intransitive) verbs require only a subject to make a complete sentence, while two- and three-argument verbs (transitives and ditransitives) normally take direct and indirect objects. In this MEG study, we scrutinised the neuro-magnetic brain response to different argument structures.
BMC Neuroscience 2008, 9:69 doi:10.1186/1471-2202-9-69
This patent application describes how we can efficiently search huge document collections in a resource constrained system like a mobile phone or tablet. Certainly, the applied techniques also improve the performance in server-based implementations. (2014)
In the past 4 years Deep Learning (DL) has re-entered the computer vision scene dramatically, by completely shifting the design paradigm compared to the last 20 years. Whereas before the error rates in image analysis were more or less stagnant, since 2012 DL kept halving them each year, in some recent cases even achieving super-human performance! All typical tasks such classification, detection and segmentation benefited across all related applications such as traffic sign recognition, natural image analysis, automatic captioning. These developments move computer vision from a scientific playground to a productizable technology.
Bruni, Elia, and Stefan Bordag (2016): Significant Advances in Medical Image Analysisorem. Bildverarbeitung für die Medizin 2016, 1-1
Buttom up, neuro-scientific research into the verb’s argument structure: “John gave Jim a book.” How does the brain know and represent the parts in the sentence that turn the verb “gave” into a story?
BMC Neuroscience 2009, 10:3 doi:10.1186/1471-2202-10-3
Our approach to sentiment analysis shows that polarity of phrases can be composed out of the word’s polarity. Our polarity model is language-indepedent and thus, can be easily adapted to new languages / domains.
Robert Remus und Christian Hänig: Towards Well-grounded Phrase-level Polarity Analysis. In: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) , Springer, 2011
The approach presented in this paper applies unsupervised syntactic parsing and language-independant statistical relation extraction on noisy data. A taxonomy is used to abstract away from the language-dependent word level to language-independent concepts and thus, this apprach can be adapted to new languages / domains without huge manual effort regarding the relation extraction approach.
Hänig, C., Bordag, S. und Quasthoff, U.: UnsuParse: Unsupervised Parsing with unsupervised Part of Speech tagging. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), 2008
An algorithm is described which can analyse the morphological structure of words without knowing anything about the language in advance. Hence, it is a completely unsupervised approach and it produced decent numbers in the Morphochallenge competition. (2008)
Early work on viewing language resources and language technology as a potential web service source. (2004)
A graph based word clustering approach shows that it is possible and feasible to have a completely unsupervised algorithm determine the various meanings of words. For example in a news paper corpus for the word space it would result in several meanings, one of which would be the outer space where space craft fly around in and the other meaning would be rentable office space. (2006)