Shiraz University - CSE Dept - Machine Learning Project # 3 by M.M.Haji
mehdi.haji@gmail.com

Text Segmentation Using Naive Bayes Classifiers

I. Introduction

In this project, we apply naive bayes classifier to separate text from non-text blocks in a document image. This is an important part of Document Image Analysis and Recognition, because the text must be detected first and then recognized. Hence, various approaches have been proposed to solve the problem but most of them use texture classification and image segmentation methods based on Fourier power spectrum, mean and covariance, spatial texture energy, DCT and wavelet transform (and ...) and finally a classification algorithm such as 2-means may be applied on these features to classify the image blocks as text or non-text. These methods usually does not need training data. In our new method, we generate discrete training instances from the real dataset of project 1 to learn a naive bayes classifier for the text segmentation problem. In the rest of this report we describe our new algorothm, show some experimental results, compare it with the methods described in project 2 and finally draw some conclusions.

II. Proposed Segmentation Method

In project 1 we generated 100,000 instances for the text concept, of course we don't need such a large training set here, so we selected 10,000 of them randomly and convert each attribute value to discrete (nominal) form to be appropriate for naive bayes learner. Each real attribute value will be converted to 'S2', 'S1', 'CE', 'B1' or 'B2', where 'S2' means very small, 'S1' means small, 'CE' means center, 'B1' means big and 'B2' means very big. In order to have approximately 2000 instances in each of the 5 bins for each attribute, we used different sets of rules for each of our 18 attributes as follows: (implemented  in discretize_textdata.m)

  A1   

   continuous    

  discrete    

(-inf,-15.8] S2
(-15.8,-0.7] S1
(-0.7,0.8] CE
(0.8,16.1] B1
(16.1,inf) B2
  A2   

   continuous    

  discrete    

(-inf,-13.1] S2
(-13.1,-0.4] S1
(-0.4,0.3] CE
(0.3,11.3] B1
(11.3,inf) B2

  A3   

   continuous    

  discrete    

(-inf,-9.5]

S2

(-9.5,-0.3]

S1

(-0.3,0.4]

CE

(0.4,11.4]

B1

(11.4,inf)

B2

 

  A4   

   continuous    

  discrete    

(-inf,-11.5] S2
(-11.5,-0.5] S1
(-0.5,0.4] CE
(0.4,11.3] B1
(11.3,inf) B2
  A5   

   continuous    

  discrete    

(-inf,-10] S2
(-10,-0.3] S1
(-0.3,0.2] CE
(0.2,9.4] B1
(9.4,inf) B2

  A6   

   continuous    

  discrete    

(-inf,-6.3]

S2

(-6.3,-0.3]

S1

(-0.3,0.2]

CE

(0.2,6.6]

B1

(6.6,inf)

B2

 

  A7   

   continuous    

  discrete    

(-inf,-10.6] S2
(-10.6,-0.4] S1
(-0.4,0.3] CE
(0.3,8.5] B1
(8.5,inf) B2
  A8   

   continuous    

  discrete    

(-inf,-7.3] S2
(-7.3,-0.2] S1
(-0.2,0.2] CE
(0.2,6.2] B1
(6.2,inf) B2

  A9   

   continuous    

  discrete    

(-inf,-5.2]

S2

(-5.2,-0.2]

S1

(-0.2,0.2]

CE

(0.2,4.8]

B1

(4.8,inf)

B2

 

  A10   

   continuous    

  discrete    

(-inf,-4.6] S2
(-4.6,-0.2] S1
(-0.2,0.2] CE
(0.2,4.3] B1
(4.3,inf) B2
  A11   

   continuous    

  discrete    

(-inf,-3.3] S2
(-3.3,-0.1] S1
(-0.1,0.2] CE
(0.2,3.7] B1
(3.7,inf) B2

  A12   

   continuous    

  discrete    

(-inf,-3.4]

S2

(-3.4,-0.2]

S1

(-0.2,0.2]

CE

(0.2,2.9]

B1

(2.9,inf)

B2

 

  A13   

   continuous    

  discrete    

(-inf,-3.4] S2
(-3.4,-0.1] S1
(-0.1,0.1] CE
(0.1,3.3] B1
(3.3,inf) B2
  A14   

   continuous    

  discrete    

(-inf,-2] S2
(-2,-0.1] S1
(-0.1,0.1] CE
(0.1,2] B1
(2,inf) B2

  A15   

   continuous    

  discrete    

(-inf,-2]

S2

(-2,-0.1]

S1

(-0.1,0.1]

CE

(0.1,2]

B1

(2,inf)

B2

 

  A16   

   continuous    

  discrete    

(-inf,-2] S2
(-2,-0.2] S1
(-0.2,0.2] CE
(0.2,3] B1
(3,inf) B2
  A17   

   continuous    

  discrete    

(-inf,-2.3] S2
(-2.3,-0.1] S1
(-0.1,0.1] CE
(0.1,2.4] B1
(2.4,inf) B2

 A18   

   continuous    

  discrete    

(-inf,-2.2]

S2

(-2.2,-0.1]

S1

(-0.1,0.2]

CE

(0.2,2.3]

B1

(2.3,inf)

B2

For the IsText? concept, let V1 = 'Yes' and V2 = 'No', now we need to evaluate P(Ai | Vj) for i = 1,2, ..., 18 and j = 1,2. In order to avoid zero conditional probabilities we used m-estimate with m = 1 and p = 0.2, this is done in priprob_text.m which results:

  P(A1 | Vj )    

V

    Yes             No       

A

S2  0.3199
0.1496
0.0687
0.1369
0.3250
0.0938
0.2496
0.3212
0.2462
0.0893
S1
CE
B1
B2

  P(A2 | Vj )    

V

    Yes             No       

A

S2  0.3116
0.1656
0.0490
0.1428
0.3309
0.0821
0.2458
0.3475
0.2274
0.0972
S1
CE
B1
B2
   P(A3 | Vj )   

V

     Yes            No      

A

S2  0.3228
0.1513
0.0566
0.1654
0.3038
 0.1018
0.2661
0.3112
0.2329
0.0881
S1
CE
B1
B2

 

  P(A4 | Vj )    

V

    Yes             No       

A

S2 0.3454
0.1384
0.0442
0.1291
0.3429
0.0635
0.2568
0.3333
0.2869
0.0595
S1
CE
B1
B2

  P(A5 | Vj )    

V

    Yes             No       

A

S2  0.3459
0.1344
0.0416
0.1304
0.3478
0.0684
0.2390
0.3576
0.2674
0.0677
S1
CE
B1
B2
   P(A6 | Vj )   

V

     Yes            No      

A

S2  0.3323
0.1386
0.0448
0.1403
0.3440
0.0771
0.2522
0.3191
0.2822
0.0694
S1
CE
B1
B2

 

  P(A7 | Vj )    

V

    Yes             No       

A

S2  0.3342
0.1359
0.0473
0.1166
0.3659
 0.0584
0.2464
0.3578
0.2666
0.0709
S1
CE
B1
B2

  P(A8 | Vj )    

V

    Yes             No       

A

S2 0.3387
0.1367
0.0395
0.1209
0.3643
0.0572
0.2585
0.3655
0.2515
0.0673
S1
CE
B1
B2
   P(A9 | Vj )   

V

     Yes            No      

A

S2  0.3412
0.1318
0.0452
0.1314
0.3503
 0.0578
0.2795
0.3174
0.2733
0.0720
S1
CE
B1
B2

 

  P(A10 | Vj )    

V

    Yes             No       

A

S2 0.3537
0.1359
0.0450
0.1221
0.3433
0.0610
0.2941
0.3212
0.2608
0.0629
S1
CE
B1
B2

  P(A11 | Vj )    

V

    Yes             No       

A

S2  0.3697
0.1287
0.0404
0.1192
0.3421
0.0553
0.3015
0.3358
0.2505
0.0569
S1
CE
B1
B2
   P(A12 | Vj )   

V

     Yes            No      

A

S2  0.3473
0.1285
0.0570
0.1206
0.3465
 0.0623
0.2623
0.3667
0.2445
0.0642
S1
CE
B1
B2

 

  P(A13 | Vj )    

V

    Yes             No       

A

S2  0.3543
0.1295
0.0334
0.1380
0.3448
0.0618
0.3151
0.2672
0.2952
0.0606
S1
CE
B1
B2

  P(A14 | Vj )    

V

    Yes             No       

A

S2 0.3609
0.1177
0.0385
0.1143
0.3687
0.0457
0.2782
0.3523
0.2738
0.0500
S1
CE
B1
B2
   P(A15 | Vj )   

V

     Yes            No      

A

S2  0.3571
0.1105
0.0408
0.1280
0.3636
0.0584
0.2994
0.2738
0.3047
0.0637
S1
CE
B1
B2

 

  P(A16 | Vj )    

V

    Yes             No       

A

S2 0.3719
0.0968
0.0570
0.1344
0.3400
0.0853
0.2168
0.3597
0.2714
0.0669
S1
CE
B1
B2

  P(A17 | Vj )    

V

    Yes             No       

A

S2 0.3495
0.1280
0.0427
0.1346
0.3452
0.0661
0.2894
0.3057
0.2818
0.0570
S1
CE
B1
B2
   P(A18 | Vj )   

V

     Yes            No      

A

S2 0.3355
0.1361
0.0488
0.1354
0.3442
0.0669
0.3142
0.3324
0.2265
0.0601
S1
CE
B1
B2

It is clear that we have no information about the source image and hence P( V1 ) = P( V2 ) = 0.5, so we can classify a new instance this way: if  P(A1 | V1 ).P(A2 | V1 ). ... P(A18 | V1 ) > P(A1 | V2 ).P(A2 | V2 ). ... P(A18 | V2 ) then input is a text block and otherwise is a non-text block. We used the same set of rule-based smoothing filters of project 2 for post processing. This simple naive classifier is implemented in remove_nontext5.m. Below you see the segmentation result of this classifier on two test images:

input image

segmented image using naive bayes classifier

 

input image

segmented image using naive bayes classifier

By some other expriments, we concluded this new method is more accurate and faster than the one we proposed in project 2. We estimated the classification accuracy of this method using 10-fold cross-validation and it was about 85%. In order to improve classification accuracy we used another version of naive bayes classifier that estimates probabilities based on analysis of the original continuous training data [2] the effect was interesting: the classification accuracy reduced to 82%.

References:

1. Tom M. Mitchell, "Machine Learning", WCB/McGraw-Hill, 1997.

2. George H. John and Pat Langley, "Estimating Continuous Distributions in Bayesian Classifiers", Proceedings of the 11'th Conference on Uncertainty in Artificial Intelligence, pp. 338-345, 1995.