Shiraz University - CSE Dept - Machine Learning Project # 3 by M.M.Haji
mehdi.haji@gmail.com
Text Segmentation Using Naive Bayes Classifiers
I. Introduction
In this project, we apply naive bayes classifier to separate text from
non-text blocks
in a document image. This is an important part of
Document Image Analysis and Recognition, because the text must be detected first
and then recognized. Hence, various approaches have been
proposed to solve the problem but most of them use texture classification and
image segmentation methods based on Fourier power spectrum, mean and covariance, spatial
texture energy, DCT and wavelet transform (and ...) and finally a classification algorithm
such as 2-means may be applied on these features to classify
the image blocks as text or non-text. These methods usually does not need training
data. In our new method, we generate discrete training instances from the real
dataset of
project 1 to learn a naive bayes classifier for the text
segmentation problem. In the rest of this report we describe our new algorothm,
show some experimental results, compare it with the methods described in
project 2 and
finally draw some conclusions.
II. Proposed Segmentation Method
In project 1 we generated 100,000 instances for
the text concept, of course we don't need such a large training set here, so we
selected 10,000 of them randomly and convert each attribute value to discrete
(nominal) form to be appropriate for naive bayes learner. Each real attribute
value will be converted to 'S2', 'S1', 'CE', 'B1' or 'B2', where 'S2' means very
small, 'S1' means small, 'CE' means center, 'B1' means big and 'B2' means very
big. In order to have approximately 2000 instances in each of the 5 bins for
each attribute, we used different sets of rules for each of our 18 attributes as
follows: (implemented in
discretize_textdata.m)
|
A1 |
continuous |
discrete |
|
(-inf,-15.8] |
S2 |
|
(-15.8,-0.7] |
S1 |
|
(-0.7,0.8] |
CE |
|
(0.8,16.1] |
B1 |
|
(16.1,inf) |
B2 |
|
|
A2 |
continuous |
discrete |
|
(-inf,-13.1] |
S2 |
|
(-13.1,-0.4] |
S1 |
|
(-0.4,0.3] |
CE |
|
(0.3,11.3] |
B1 |
|
(11.3,inf) |
B2 |
|
|
A3 |
continuous
|
discrete |
|
(-inf,-9.5] |
S2 |
|
(-9.5,-0.3] |
S1 |
|
(-0.3,0.4] |
CE |
|
(0.4,11.4] |
B1 |
|
(11.4,inf) |
B2 |
|
|
A4 |
continuous |
discrete |
|
(-inf,-11.5] |
S2 |
|
(-11.5,-0.5] |
S1 |
|
(-0.5,0.4] |
CE |
|
(0.4,11.3] |
B1 |
|
(11.3,inf) |
B2 |
|
|
A5 |
continuous |
discrete |
|
(-inf,-10] |
S2 |
|
(-10,-0.3] |
S1 |
|
(-0.3,0.2] |
CE |
|
(0.2,9.4] |
B1 |
|
(9.4,inf) |
B2 |
|
|
A6 |
continuous
|
discrete |
|
(-inf,-6.3] |
S2 |
|
(-6.3,-0.3] |
S1 |
|
(-0.3,0.2] |
CE |
|
(0.2,6.6] |
B1 |
|
(6.6,inf) |
B2 |
|
|
A7 |
continuous |
discrete |
|
(-inf,-10.6] |
S2 |
|
(-10.6,-0.4] |
S1 |
|
(-0.4,0.3] |
CE |
|
(0.3,8.5] |
B1 |
|
(8.5,inf) |
B2 |
|
|
A8 |
continuous |
discrete |
|
(-inf,-7.3] |
S2 |
|
(-7.3,-0.2] |
S1 |
|
(-0.2,0.2] |
CE |
|
(0.2,6.2] |
B1 |
|
(6.2,inf) |
B2 |
|
|
A9 |
continuous
|
discrete |
|
(-inf,-5.2] |
S2 |
|
(-5.2,-0.2] |
S1 |
|
(-0.2,0.2] |
CE |
|
(0.2,4.8] |
B1 |
|
(4.8,inf) |
B2 |
|
|
A10 |
continuous |
discrete |
|
(-inf,-4.6] |
S2 |
|
(-4.6,-0.2] |
S1 |
|
(-0.2,0.2] |
CE |
|
(0.2,4.3] |
B1 |
|
(4.3,inf) |
B2 |
|
|
A11 |
continuous |
discrete |
|
(-inf,-3.3] |
S2 |
|
(-3.3,-0.1] |
S1 |
|
(-0.1,0.2] |
CE |
|
(0.2,3.7] |
B1 |
|
(3.7,inf) |
B2 |
|
|
A12 |
continuous
|
discrete |
|
(-inf,-3.4] |
S2 |
|
(-3.4,-0.2] |
S1 |
|
(-0.2,0.2] |
CE |
|
(0.2,2.9] |
B1 |
|
(2.9,inf) |
B2 |
|
|
A13 |
continuous |
discrete |
|
(-inf,-3.4] |
S2 |
|
(-3.4,-0.1] |
S1 |
|
(-0.1,0.1] |
CE |
|
(0.1,3.3] |
B1 |
|
(3.3,inf) |
B2 |
|
|
A14 |
continuous |
discrete |
|
(-inf,-2] |
S2 |
|
(-2,-0.1] |
S1 |
|
(-0.1,0.1] |
CE |
|
(0.1,2] |
B1 |
|
(2,inf) |
B2 |
|
|
A15 |
continuous
|
discrete |
|
(-inf,-2] |
S2 |
|
(-2,-0.1] |
S1 |
|
(-0.1,0.1] |
CE |
|
(0.1,2] |
B1 |
|
(2,inf) |
B2 |
|
|
A16 |
continuous |
discrete |
|
(-inf,-2] |
S2 |
|
(-2,-0.2] |
S1 |
|
(-0.2,0.2] |
CE |
|
(0.2,3] |
B1 |
|
(3,inf) |
B2 |
|
|
A17 |
continuous |
discrete |
|
(-inf,-2.3] |
S2 |
|
(-2.3,-0.1] |
S1 |
|
(-0.1,0.1] |
CE |
|
(0.1,2.4] |
B1 |
|
(2.4,inf) |
B2 |
|
|
A18 |
continuous
|
discrete |
|
(-inf,-2.2] |
S2 |
|
(-2.2,-0.1] |
S1 |
|
(-0.1,0.2] |
CE |
|
(0.2,2.3] |
B1 |
|
(2.3,inf) |
B2 |
|
For the IsText? concept, let V1 = 'Yes' and V2 = 'No', now we need to
evaluate P(Ai | Vj) for i = 1,2, ..., 18 and j = 1,2. In order to avoid zero
conditional probabilities we used m-estimate with m = 1 and p = 0.2, this is done in
priprob_text.m which results:
|
P(A1 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3199
0.1496
0.0687
0.1369
0.3250 |
0.0938
0.2496
0.3212
0.2462
0.0893 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A2 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3116
0.1656
0.0490
0.1428
0.3309 |
0.0821
0.2458
0.3475
0.2274
0.0972 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A3 | Vj )
|
V |
|
Yes
|
No
|
|
A |
S2 |
0.3228
0.1513
0.0566
0.1654
0.3038 |
0.1018
0.2661
0.3112
0.2329
0.0881 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A4 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3454
0.1384
0.0442
0.1291
0.3429 |
0.0635
0.2568
0.3333
0.2869
0.0595 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A5 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3459
0.1344
0.0416
0.1304
0.3478 |
0.0684
0.2390
0.3576
0.2674
0.0677 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A6 | Vj )
|
V |
|
Yes
|
No
|
|
A |
S2 |
0.3323
0.1386
0.0448
0.1403
0.3440 |
0.0771
0.2522
0.3191
0.2822
0.0694 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A7 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3342
0.1359
0.0473
0.1166
0.3659 |
0.0584
0.2464
0.3578
0.2666
0.0709 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A8 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3387
0.1367
0.0395
0.1209
0.3643 |
0.0572
0.2585
0.3655
0.2515
0.0673 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A9 | Vj )
|
V |
|
Yes
|
No
|
|
A |
S2 |
0.3412
0.1318
0.0452
0.1314
0.3503 |
0.0578
0.2795
0.3174
0.2733
0.0720 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A10 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3537
0.1359
0.0450
0.1221
0.3433 |
0.0610
0.2941
0.3212
0.2608
0.0629 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A11 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3697
0.1287
0.0404
0.1192
0.3421 |
0.0553
0.3015
0.3358
0.2505
0.0569 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A12 | Vj )
|
V |
|
Yes
|
No
|
|
A |
S2 |
0.3473
0.1285
0.0570
0.1206
0.3465 |
0.0623
0.2623
0.3667
0.2445
0.0642 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A13 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3543
0.1295
0.0334
0.1380
0.3448 |
0.0618
0.3151
0.2672
0.2952
0.0606 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A14 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3609
0.1177
0.0385
0.1143
0.3687 |
0.0457
0.2782
0.3523
0.2738
0.0500 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A15 | Vj )
|
V |
|
Yes
|
No
|
|
A |
S2 |
0.3571
0.1105
0.0408
0.1280
0.3636 |
0.0584
0.2994
0.2738
0.3047
0.0637 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A16 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3719
0.0968
0.0570
0.1344
0.3400 |
0.0853
0.2168
0.3597
0.2714
0.0669 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A17 | Vj ) |
V |
|
Yes
|
No
|
|
A |
S2 |
0.3495
0.1280
0.0427
0.1346
0.3452 |
0.0661
0.2894
0.3057
0.2818
0.0570 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
|
P(A18 | Vj )
|
V |
|
Yes
|
No
|
|
A |
S2 |
0.3355
0.1361
0.0488
0.1354
0.3442 |
0.0669
0.3142
0.3324
0.2265
0.0601 |
|
S1 |
|
CE |
|
B1 |
|
B2 |
|
It is clear that we have no information about the source image and hence P( V1
) = P( V2 ) = 0.5, so we can classify a new instance this
way: if P(A1 | V1 ).P(A2
| V1 ). ... P(A18 | V1
) > P(A1 | V2 ).P(A2
| V2 ). ... P(A18 | V2
) then input is a text block and otherwise is a non-text block. We used the same
set of rule-based smoothing filters of project 2 for
post processing. This simple
naive classifier is implemented in remove_nontext5.m.
Below you see the segmentation result of this classifier on two test images:
|

input image |

segmented image using naive bayes
classifier |
|

input image |

segmented image using naive bayes
classifier |
By some other expriments, we concluded this new method
is more accurate and faster than the one we proposed in
project 2. We estimated the classification accuracy of this method using
10-fold cross-validation and it was about 85%. In
order to improve classification accuracy we used another version of naive bayes
classifier that estimates probabilities based on analysis of the original
continuous training data [2] the effect was interesting: the classification
accuracy reduced to 82%.
References:
1. Tom M. Mitchell, "Machine Learning", WCB/McGraw-Hill, 1997.
2. George H. John and Pat Langley, "Estimating Continuous
Distributions in Bayesian Classifiers", Proceedings of the 11'th Conference on
Uncertainty in Artificial Intelligence, pp. 338-345, 1995.