Robust Text Detection in Natural Scenes and Web Images

  

1. Abstract

     Text detection in natural scenes and web images is an important prerequisite for many content-based image analysis tasks. We propose an accurate and robust method for detecting texts in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method. Our technology won the first place of both "Text Localization in Real Scenes" and "Text Localization in Born-Digital Images (Web and Email)" in the ICDAR 2013 Robust Reading Competition.

2. Framework of Robust Text Detection System

      By integrating several improvements over traditional MSER based methods and distance metric learning techniques, we propose a new scene text detection system. The structure of the proposed system is presented in Figure 1. The figure also shows the experimental result corresponding to each stage of a sample image.

     Our robust scene text detection system mainly includes:
     1) In the character candidates extraction stage, character candidates are extracted using the MSER algorithm; most of the non-characters are reduced by the proposed MSERs pruning algorithm using the strategy of minimizing regularized variations.
     2) In the text candidates construction stage, distance weights and threshold are learned simultaneously using the proposed metric learning algorithm; character candidates are clustered into text candidates by the single-link clustering algorithm using the learned parameters.
     3) In the text candidates elimination stage, the posterior probabilities of text candidates corresponding to non-text are measured using the character classifier and text candidates with high probabilities for non-text are removed.
     4) In the text candidates classification stage, text candidates corresponding to true text are identified by the text classifier. An AdaBoost classifier is trained to decide whether an text candidate corresponding to true text or not. As characters in the same text tend to have similar features, the uniformity of character candidates' features are used as text candidate's features to train the classifier.

Figure 1. Flowchart of the proposed system and the corresponding experimental results after each step of a sample image. Text candidates are labeled by blue bounding rectangles; character candidates identified as  characters are colored green, others red.

    Note that in order to measure the performance of the proposed system using the ICDAR 2011 competition dataset, text candidates identified as text are further partitioned into words by classifying inner character distances into character spacings and word spacings using an AdaBoost classifier. This last stage is named as word partitioning.

3. Experiment and Competition Results

      The performance of the ICDAR 2011 Robust Reading Competition dataset (Challenge 2: Reading Text in Scene Images) is presented in Table 1.  

Table 1. Performance (%) comparison of text localization algorithms for the ICDAR 2011 Robust Reading Competition dataset.

Methods Recall Precision f  Remarks
Our method 68.26 86.29 76.22  
Shi et al's method 63.1 83.3 71.8 34(2): 107-116, Pattern Recognition Letters, 2013.
Neumann and Matas's method 64.7 73.1 68.7 CVPR 2012: 3538-3545.
Kim's Method 62.47 82.98 71.28 1st of the ICDAR 2011 Competition.
Yi's Method 58.09 67.22 62.32 2nd of the ICDAR 2011 Competition.
TH-TextLoc System 57.68 66.97 61.98 3rd of the ICDAR 2011 Competition.

       We also evaluate the performance of our approach on a multilingual (Chinese and English) dataset. This multilingual set includes 248 images for training and 239 images for testing, which is first collected and used by Pan et al (20(3): 800-813, IEEE TIP, 2011). Experimental results are shown in Table 2.

Table 2. Performance (%) comparison of text localization algorithms for the multilingual datase.

Methods Recall Precision f Speed per image ( image size: 796*878)
Our method 68.45  82.63 74.58 0.22
Pan et al's method 65.9   64.5 65.2 3.11

      The results of the ICDAR 2013 Robust Reading Competition (http://dag.cvc.uab.es/icdar2013competition/) (Challenge1: Text Localization in Born-Digital Images, Challenge 2: Text Localization in Real Scenes) are presented in Table 3 and Table 4 respectively, where "USTB_TexStar" is our technology which won the first place of text detection in both challenges. 

Table 3. Results for the ICDAR 2013 Robust Reading Competition (Challenge1: Text Localization in Born-Digital Images (Web and Email)).

Methods Recall Precision f  Remarks
USTB_TexStar 82.38 93.83 87.74  
TH-TextLoc 75.85 86.82 80.96 Prof. Xiaoqing Ding's group, Tsinghua University
I2R_NUS_FAR 71.42 84.17 77.27 Prof. Tan Chew Lim's group, National University of Singapore
Text Detection 73.18 78.62 75.81 Prof. SÚverine Dubuisson's group, UPMC
I2R_NUS 67.52 85.19 75.34 Prof. Tan Chew Lim's group, National University of Singapore
BDTD_CASIA 67.05 78.98 72.53
OTCYMIST 74.85 67.69 71.09 Prof. A. G. Ramakrishnan's group, Indian Institute of Science Bangalore

 

Table 4. Results for the ICDAR 2013 Robust Reading Competition (Challenge2: Text Localization in Real Scenes).

Methods Recall Precision f  Remarks
USTB_TexStar 66.45 88.47 75.89  
TextSpotter 64.84 87.51 74.49 Prof. Jiri Matas's group, Czech Technical University
CASIA_NLPR 62.24 78.89 73.18 Prof. Cheng-Lin Liu's group, Institute of Automation, Chinese Academy of Sciences
Text_detector_CASIA 62.85 84.70 72.16 Prof. Chunheng Wang's group, Institute of Automation, Chinese Academy of Sciences
I2R_NUS_FAR 69.00 75.08 71.91 Prof. Tan Chew Lim's group, National University of Singapore
I2R_NUS 66.17 72.54 69.21 Prof. Tan Chew Lim's group, National University of Singapore
TH-TextLoc 65.19 69.96 67.49 Prof. Xiaoqing Ding's group, Tsinghua University
Text Detection 53.42 74.15 62.10 Prof. SÚverine Dubuisson's group, UPMC

 

4. Online Demos

     1) Demo of English text detection in natural scene images

         This demo is constructed and trained on the ICDAR 2011 Robust Reading Competition training set (Challenge 2: Reading Text in Scene Images).

         [Currently, the web server is crashed. It will work soon.]        

     2) Demo of multilingual (Chinese and English) text detection

         This demo is constructed and trained on Pan et al's Multilingual training set (20(3): 800-813, IEEE TIP, 2011).

         [Currently, the web server is crashed. It will work soon.]

     3) APP Demo for mobile phones and tablets

        We also implement an application software of camera-based real-time text detection in Android Smart Phones and Tablets. An application example in an Android Tablet is shown in the following figure.

 

        If you are interested in this APP Demo, please email to Dr. Yin.       

5. References

      [1] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), vol. 36, no. 5, pp. 970-983, 2014.

      [2] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, "Accurate and robust text detection: a step-in for text retrieval in natural scene images," Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'13), 2013.

      [3] Xuwang Yin, Xu-Cheng Yin, Hong-Wei Hao, and Khalid Iqbal, "Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost," Proceedings of the 21st International IAPR Conference on Pattern Recognition (ICPR'12), 2012.

6. Corresponding Author

       Discussion, and Cooperation of  this technology and its applications all are welcome!  We also prepare a technical white paper (in Chinese) of our technology. If you are interested in it, please email to Dr. Yin.        

        Xu-Cheng Yin   Ph.D.,  Associate Professor

   Mail:      Department of Computer Science and Technology, School of Computer and Communication Engineering,

                 University of Science and Technology Beijing,

                 No. 30, Xueyuan Road, Haidian District, Beijing 100083, China

    Office:   ROOM 1005, Information Building

    Tel:       +86-10-8237-1191

    Fax:      +86-10-6233-2873   

    Email:      

---------------------------------------------------------------------

Last Modified:  May 25, 2015

 

reliable statistics
Visitors