ICDAR2017 Competition on Text Extraction from Biomedical Literature Figures
(ICDAR2017 DeTEXT Competition)
ubiquitous in biomedical literature, and they represent important biomedical
knowledge. The sheer volume of biomedical publications has made it necessary to
develop computational approaches for accessing figures. Consequently, during the
last few years, figure classification, retrieval and mining have garnered
significant attention in the biomedical research communities. Since text
frequently appears in figures, semantic analysis of such text may assist the
task of mining information from figures. Little research, however, has
specifically explored automated text extraction from biomedical figures and
their semantic analysis.
Unlike images in the open domain, biomedical figures present unique challenges. For example, biomedical figures typically have complex layout, small font size, short text, specific text, complex symbols and irregular text arrangement. The quality of figures vary depending on different publishers. Consequently, conventional OCR technologies and systems which are typically trained on open domain images do not work well on biomedical figures. To better leverage biomedical figures in research and analysis in the future as well as making them more searchable and computable, we propose Semantic Interpretation of Biomedical Figure Mining to address various challenges related to semantic biomedical figure mining.
Interpretation of Biomedical Figure Mining Challenge is being conducted to
assess the capability of text detection, recognition, mining and even NLP
algorithms to correctly detect and recognize text appearing in biomedical
literature figures. This ICDAR2017 Competition focuses on extracting (detecting and recognizing) text from biomedical
literature figures (ICDAR2017 DeTEXT Competition).
2017-Jan-22 -- ICDAR2017 DeTEXT Competition Announcement
This ICDAR2017 Competition is based on an open dataset, DeTEXT: A Database for Extracting TEXT from biomedical literature figures (Dataset Paper). DeTEXT is used to evaluate text detection and recognition algorithms for complex images (specifically for biomedical literature figures), and has several important features. First, DeTEXT is composed of 500 typical biomedical literature figures existing in about 300 full-text articles randomly picked from PubMed Central. Second, figures in DeTEXT are annotated with not only the text region’s orientation, location and ground truth text, but also the image quality and the text importance. Third and foremost, DeTEXT is the first public image dataset for biomedical literature figure detection, recognition, and retrieval. It is easy to be extended to a more large-scale set, by adding more figures randomly selected from PubMed Central.
The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017) will be taken place in Kyoto, Japan. ICDAR is the premier international forum (every two years) for researchers and practitioners in the document analysis and recognition community for identifying, encouraging and exchanging ideas on the state-of-the-art technology in document analysis, understanding, retrieval, and performance evaluation, and related pattern recognition methods and trends.
is set up around three tasks:
(1) Text Localization (Text Detection), where the objective is to obtain a rough estimation of the text areas in the figure, in terms of bounding boxes (with four corner points) that correspond to parts of text (words or text lines).
(2) Word Recognition, where the locations (bounding boxes) of words in the figure are assumed to be known and the corresponding text transcriptions are sought (without specific dictionaries).
(3) End-to-End Text Recognition, where the objective is to localize and recognize all words in the figure in a single step (without specific dictionaries).
This ICDAR2017 Competition (on Text
Extraction from Biomedical Literature Figures) will share the
Robust Reading Competition (RRC) portal for
organizing this competition,
serving the participators, and evaluating the submissions.
HERE is the competition platform for ICDAR2017 Competition on Text Extraction from Biomedical Literature Figures (ICDAR2017 DeTEXT Competition).
Registration, DeTEXT training set release
Until May 31, 2017
DeTEXT data (testing set) release
June 10, 2017
Submission of results deadline
June 30, 2017
November 10-15, 2017
For further information please contact Xu-Cheng YIN (xuchengyin AT ustb.edu.cn) and Hong YU (Hong.Yu AT umassmed.edu).
|Chun YANG, PhD
at Department of Computer Science and Technology, University of Science
and Technology Beijing, China.
Xu-Cheng YIN, Professor and Deputy Chair at Department of Computer Science and Technology, University of Science and Technology Beijing, China.
|Hong YU, Professor, Dept of Quantitative Health Sciences, University of Massachusetts Medical School Worcester; Adjunct Professor, School of Computer Science, University of Massachusetts Amherst; Research Health Scientist, VA Central Western Massachusetts, USA.|
| Dimosthenis, KARATZAS, Senior Research Fellow and Associate Director at the
Computer Vision Centre, Universitat Autónoma de Barcelona, Spain.
of Computer Science, Co-director,
UMass Center for Digital Health,
University of Massachusetts
Lowell, MA, USA.
NDEx Project Director, Department of Medicine, University of California,
San Diego, CA, USA.
GAO, Professor, Dept of Microbiology and Physiological Systems,
University of Massachusetts Medical School Worcester, MA, USA.