Tech Talk: Bruno Martins

Date and Time
Location
online
Tech Talk: Bruno Martins

Challenges in resolving place names over text

Bruno Martins

University of Lisbon

11:30 a.m. Tuesday, November 17, 2020 | Zoom*

 

Abstract: Toponym resolution concerns the disambiguation of place names in textual documents, envisioning the support for applications such as geographical search or the mapping of textually encoded information. Place names are first recognized through a named entity recognition model, and the disambiguation is then achieved by associating each of the place references to a unique position on the Earth’s surface, e.g., through the assignment of geospatial coordinates. The toponym resolution task is particularly challenging, given that place references are highly ambiguous (i.e., distinct locations can share the same place name, and multiple names can be used to refer to the same place). In this talk, I will discuss techniques for toponym resolution, with a particular emphasis on a novel deep learning approach. Contrarily to most previous methods, the novel approach does not involve matching references in the text against entries in a gazetteer, instead directly predicting geospatial coordinates. In brief, the neural network architecture considers multiple inputs (e.g.,the toponym to disambiguate together with the surrounding words), leveraging pre-trained contextual word embeddings for modeling the textual data. The intermediate representations are then used to predict a probability distribution over possible geospatial regions, and finally to predict the coordinates for the input toponym. I will present evaluation results over different types of corpora (e.g., modern newswire text or historical documents), and I will discuss the impact of model extensions related to (i) the use of external information concerning geophysical terrain properties, including information on terrain development or elevation, among others, and (ii) additional training data collected from Wikipedia articles, to guide and further help with model training.

Bio: Bruno Martins is an assistant professor at the Computer Science and Engineering Department of Instituto Superior Técnico of the University of Lisbon (IST/UL), and a researcher at the Information and Decision Support Systems Lab of INESC-ID, where he works on problems related to the general areas of information retrieval, text mining, and the geographical information sciences. He received his MSc and PhD degrees from the Faculty of Sciences of the University of Lisbon, both in Computer Science. Bruno has been involved in several research projects related to geospatial aspects in information access and retrieval, and he has accumulated significant expertise in addressing challenges at the intersection of language technologies, machine learning, and the geographical information sciences. He and his students have worked on many different application areas, and he is proudest of the many PhD/MSc students who have graduated under his supervision and are now building wonderful careers.

The Spatial Tech Talks are designed to stimulate discussion and interaction within the university’s spatial technology community, as well as to promote sharing of tools and techniques for mapping and spatial analysis.