Information Extraction

For instances, from a set of news about changes in direction posts in companies, it might be useful to fill in templates including data about each event notified in newswire articles. For instance, by analysing the following text,

John Smith leaves the post of vicepresident in Company Ltd. on March 4th, 2005. He will be substituted by Mary Brown.

the system should be able to find that:

John Smith and Mary Brown are people, March 4th, 2005 is a date, and Company Ltd. is an organisation. This first tasks, consisting in identifying entities of different kinds in the text, is called Named Entity Recognition.
There are two events in the text: leaving a post, and occupying that same post. Note that, in order to know that both refer to the same post, it may be needed to solve the anaphora that the pronoun He refers to John Smith,
The two events have the same date, and refer to the same post. The person involved in each event is different, John in the first one, and Mary in the second one.

In some occations, the kinds of entities are divided in sub-types. So, for instance, the organisations might be classified in governmental and private; and locations might be classified in cities, countries or geological formations. Next, cities might be classified in capital or non-capital, and geological formations in mountains, valleys, etc. In these cases, the problem of Named Entity Recognition and Classification is very similar to that of Ontology population.

Our work in this field includes the automatic identification, resolution and normalisation of temporal expressions, and recent work in generalising patterns for NE recognition.

Introduction

Publications

Demos

Some external links