Introduction
Information Extraction consists of extracting, from a text or a set
of texts, entities, events and relationships amongst them.
For instances, from a set of news about changes in direction posts
in companies, it might be useful to fill in templates including data
about each event notified in newswire articles. For instance, by
analysing the following text,
John Smith leaves the post of vicepresident in Company Ltd. on
March 4th, 2005. He will be substituted by Mary Brown.
the system should be able to find that:
- John Smith and Mary Brown are people, March
4th, 2005 is a date, and
Company Ltd. is an organisation. This first tasks, consisting
in identifying entities of different kinds in the text, is called
Named Entity Recognition.
- There are two events in the text: leaving a post, and occupying
that same post. Note that, in order to know that both refer to the
same post, it may be needed to solve the anaphora that the pronoun
He refers to John Smith,
- The two events have the same date, and refer to the same post. The
person involved in each event is different, John in the first one, and
Mary in the second one.
In some occations, the kinds of entities are divided in sub-types. So,
for instance, the organisations might be classified in governmental
and private; and locations might be classified in cities, countries or
geological formations. Next, cities might be classified in capital or
non-capital, and geological formations in mountains, valleys, etc. In
these cases, the problem of Named Entity Recognition and
Classification is very similar to that of Ontology population.
Our work in this field includes the automatic identification,
resolution and normalisation of temporal expressions, and recent work
in generalising patterns for NE recognition.
Publications
Click here to access our publications on
Information Extraction.
Demos
Coming soon...
Some external links
|