Robust Text Detection in Natural Scenes and Web Images
Text detection in natural scenes and web images is an important prerequisite for many content-based image analysis tasks. We propose an accurate and robust method for detecting texts in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method. Our technology won the first place of both "Text Localization in Real Scenes" and "Text Localization in Born-Digital Images (Web and Email)" in the ICDAR 2013 Robust Reading Competition.
2. Framework of Robust Text Detection System
By integrating several improvements over traditional MSER based methods and distance metric learning techniques, we propose a new scene text detection system. The structure of the proposed system is presented in Figure 1. The figure also shows the experimental result corresponding to each stage of a sample image.
Our robust scene text detection system mainly includes:
Figure 1. Flowchart of the proposed system and the corresponding experimental results after each step of a sample image. Text candidates are labeled by blue bounding rectangles; character candidates identified as characters are colored green, others red.
Note that in order to measure the performance of the proposed system using the ICDAR 2011 competition dataset, text candidates identified as text are further partitioned into words by classifying inner character distances into character spacings and word spacings using an AdaBoost classifier. This last stage is named as word partitioning.
3. Experiment and Competition Results
The performance of the ICDAR 2011 Robust Reading Competition dataset (Challenge 2: Reading Text in Scene Images) is presented in Table 1.
Table 1. Performance (%) comparison of text localization algorithms for the ICDAR 2011 Robust Reading Competition dataset.
|Shi et al's method||63.1||83.3||71.8||34(2): 107-116, Pattern Recognition Letters, 2013.|
|Neumann and Matas's method||64.7||73.1||68.7||CVPR 2012: 3538-3545.|
|Kim's Method||62.47||82.98||71.28||1st of the ICDAR 2011 Competition.|
|Yi's Method||58.09||67.22||62.32||2nd of the ICDAR 2011 Competition.|
|TH-TextLoc System||57.68||66.97||61.98||3rd of the ICDAR 2011 Competition.|
We also evaluate the performance of our approach on a multilingual (Chinese and English) dataset. This multilingual set includes 248 images for training and 239 images for testing, which is first collected and used by Pan et al (20(3): 800-813, IEEE TIP, 2011). Experimental results are shown in Table 2.
Table 2. Performance (%) comparison of text localization algorithms for the multilingual datase.
|Methods||Recall||Precision||f||Speed per image ( image size: 796*878)|
|Pan et al's method||65.9||64.5||65.2||3.11|
The results of the ICDAR 2013 Robust Reading Competition (http://dag.cvc.uab.es/icdar2013competition/) (Challenge1: Text Localization in Born-Digital Images, Challenge 2: Text Localization in Real Scenes) are presented in Table 3 and Table 4 respectively, where "USTB_TexStar" is our technology which won the first place of text detection in both challenges.
Table 3. Results for the ICDAR 2013 Robust Reading Competition (Challenge1: Text Localization in Born-Digital Images (Web and Email)).
|TH-TextLoc||75.85||86.82||80.96||Prof. Xiaoqing Ding's group, Tsinghua University|
|I2R_NUS_FAR||71.42||84.17||77.27||Prof. Tan Chew Lim's group, National University of Singapore|
|Text Detection||73.18||78.62||75.81||Prof. SÚverine Dubuisson's group, UPMC|
|I2R_NUS||67.52||85.19||75.34||Prof. Tan Chew Lim's group, National University of Singapore|
|OTCYMIST||74.85||67.69||71.09||Prof. A. G. Ramakrishnan's group, Indian Institute of Science Bangalore|
Table 4. Results for the ICDAR 2013 Robust Reading Competition (Challenge2: Text Localization in Real Scenes).
|TextSpotter||64.84||87.51||74.49||Prof. Jiri Matas's group, Czech Technical University|
|CASIA_NLPR||62.24||78.89||73.18||Prof. Cheng-Lin Liu's group, Institute of Automation, Chinese Academy of Sciences|
|Text_detector_CASIA||62.85||84.70||72.16||Prof. Chunheng Wang's group, Institute of Automation, Chinese Academy of Sciences|
|I2R_NUS_FAR||69.00||75.08||71.91||Prof. Tan Chew Lim's group, National University of Singapore|
|I2R_NUS||66.17||72.54||69.21||Prof. Tan Chew Lim's group, National University of Singapore|
|TH-TextLoc||65.19||69.96||67.49||Prof. Xiaoqing Ding's group, Tsinghua University|
|Text Detection||53.42||74.15||62.10||Prof. SÚverine Dubuisson's group, UPMC|
4. Online Demos
1) Demo of English text detection in natural scene images
This demo is constructed and trained on the ICDAR 2011 Robust Reading Competition training set (Challenge 2: Reading Text in Scene Images).
[Currently, the web server is crashed. It will work soon.]
2) Demo of multilingual (Chinese and English) text detection
This demo is constructed and trained on Pan et al's Multilingual training set (20(3): 800-813, IEEE TIP, 2011).
[Currently, the web server is crashed. It will work soon.]
3) APP Demo for mobile phones and tablets
We also implement an application software of camera-based real-time text detection in Android Smart Phones and Tablets. An application example in an Android Tablet is shown in the following figure.
If you are interested in this APP Demo, please email to Dr. Yin.
 Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), vol. 36, no. 5, pp. 970-983, 2014.
 Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, "Accurate and robust text detection: a step-in for text retrieval in natural scene images," Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'13), 2013.
 Xuwang Yin, Xu-Cheng Yin, Hong-Wei Hao, and Khalid Iqbal, "Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost," Proceedings of the 21st International IAPR Conference on Pattern Recognition (ICPR'12), 2012.
6. Corresponding Author
Discussion, and Cooperation of this technology and its applications all are welcome! We also prepare a technical white paper (in Chinese) of our technology. If you are interested in it, please email to Dr. Yin.
Xu-Cheng Yin Ph.D., Associate Professor
Mail: Department of Computer Science and Technology, School of Computer and Communication Engineering,
University of Science and Technology Beijing,
No. 30, Xueyuan Road, Haidian District, Beijing 100083, China
Office: ROOM 1005, Information Building
Last Modified: May 25, 2015