BLEU_SCORE gives bad scores - what am I doing wrong?

Question

I want to calculate the BLEU_SCORE but it gives me bad results I don't know why?
For example, this is my reference and predicted sentences:
ref : 
['electron', 'and', 'a', 'proton']
predicted : 
['electron', 'and', 'a', 'proton']
ref : 
['to', 'reach', 'the', 'nectar', 'at', 'the', 'bottom', 'of', 'flowers']
predicted : 
['to', 'reach', 'the', 'nectar', 'at', 'the', 'bottom', 'of', 'flowers']
ref : 
['during', 'the', 'summer', 'near', 'the', 'north', 'pole']
predicted : 
['during', 'the', 'summer', 'near', 'the', 'north', 'pole']
ref : 
['only', 'blue', 'light', 'is', 'reflected', 'by', 'the', 'block']
predicted : 
['only', 'blue', 'light', 'is', 'reflected', 'by', 'the', 'block']
ref : 
['between', '20', 'and', '40', 'degrees', 'latitude']
predicted : 
['between', '20', 'and', '40', 'degrees', 'latitude']
ref : 
['external', 'and', 'internal', 'combustion', 'engines']
predicted : 
['external', 'and', 'internal', 'combustion', 'engines']
ref : 
['cleaning', 'disinfecting', 'and', 'in', 'swimming', 'pools']
predicted : 
['cleaning', 'disinfecting', 'and', 'in', 'swimming', 'pools']
ref : 
['body', 'mass', 'index', 'bmi']
predicted : 
['body', 'mass', 'index', 'bmi']
ref : 
['they', 'put', 'nutrients', 'into', 'the', 'soil', 'that', 'plants', 'use', 'to', 'grow']
predicted : 
['they', 'put', 'nutrients', 'into', 'the', 'soil', 'that', 'plants', 'use', 'to', 'grow']
ref : 
['structure', 'of', 'earth', 'interior']
predicted : 
['structure', 'of', 'earth', 'interior']

And here is the code I used to calculate the BLEU_SCORE :
from nltk.translate.bleu_score import corpus_bleu

print("Individual n-gram")
print("Individual 1-gram")
print('BLEU-1: %f' % corpus_bleu(ref, pre, weights=(1.0, 0, 0, 0)))
print("Individual 2-gram")
print('BLEU-2: %f' % corpus_bleu(ref, pre, weights=(0, 1.0, 0, 0)))
print("Individual 3-gram")
print('BLEU-3: %f' % corpus_bleu(ref, pre, weights=(0, 0, 1.0, 0)))
print("Individual 4-gram")
print('BLEU-4: %f' % corpus_bleu(ref, pre, weights=(0, 0, 0, 1.0)))

OUTPUT:
Individual n-gram
Individual 1-gram
BLEU-1: 0.015625
Individual 2-gram
BLEU-2: 0.000000
Individual 3-gram
BLEU-3: 0.000000
Individual 4-gram
BLEU-4: 0.000000

Anyone can help with this I don't know why it doesn't give me good results?

BlackCurrant · Accepted Answer

You must be getting a warning with this output. Warning pretty much tells you the reason why the scores are 0. Because there ARE NO 2-gram , 3 -grams in you example which are overlapping.

Here is the detailed explanation, I couldn't explain it better-
https://github.com/nltk/nltk/issues/1838

EDIT-
Solution-

Although the warning tells the reason, here is how you can fix this-

Notice the ref and pre,

from nltk.translate.bleu_score import corpus_bleu
ref =[[['electron', 'and', 'a', 'proton']]]
pre =[['electron', 'and', 'a', 'proton']]

print("Individual n-gram")
print("Individual 1-gram")
print('BLEU-1: %f' % corpus_bleu(ref, pre, weights=(1.0, 0, 0, 0)))
print("Individual 2-gram")
print('BLEU-2: %f' % corpus_bleu(ref, pre, weights=(0, 1.0, 0, 0)))
print("Individual 3-gram")
print('BLEU-3: %f' % corpus_bleu(ref, pre, weights=(0, 0, 1.0, 0)))
print("Individual 4-gram")
print('BLEU-4: %f' % corpus_bleu(ref, pre, weights=(0, 0, 0, 1.0)))

You can refer help of python-

BLEU_SCORE gives bad scores - what am I doing wrong?

One Answer

Add your own answers!

Ask a Question