Code Golf Asked on November 30, 2021
The task is to to compete for the shortest regex (in bytes) in your preferred programming language which can distinguish between English and Spanish with minimum 60%
90%
accuracy.
Silvio Mayolo‘s submission (pinned as Best Answer
) has secured his spot as the winner of original contest against any chance of being contested. In order to provide room for further submissions, he has generously allowed the scoring requirement to be pushed to 90% accuracy.
The following word lists (based on these) must be used: English, Spanish
The Spanish wordlist is already transliterated into ASCII, and there is no word present in either which is also present in the other.
A naive approach to distinguishing Spanish from English might be to match if the word ends in a vowel:
[aeiou]$
i
9 bytesHere’s a live example, where 6
of 8
words are successfully identified, for 75%
accuracy:
const regex = /[aeiou]$/i;
const words = [
'hello',
'hola',
'world',
'mundo',
'foo',
'tonto',
'bar',
'barra'
];
words.forEach(word => {
const match = word.match(regex);
const langs = ['English', 'Spanish'];
const lang = langs[+!!match];
console.log(word, lang);
});
(a(d?|is|r|se?)|dor|eis|ese|je|n|[ns]te|os?|res?)$
For 18,004 out of the 20,000 words in es_clean.json
and en_clean.json
, this regex matches iff the input word is Spanish.
Answered by Lynn on November 30, 2021
a
No, I'm not joking. The single-character regular expression a
successfully identifies Spanish words over English given your input files 60.6064% of the time, which makes it a valid submission.
Here's a complete, runnable Perl script that checks the percentage of this regular expression, assuming you've downloaded english.json
and spanish.json
into the same folder as the script.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my @english;
my @spanish;
my $fh;
open $fh, '<', 'english.json';
while (<$fh>) {
push @english, $1 if /"(w+)"/;
}
close $fh;
open $fh, '<', 'spanish.json';
while (<$fh>) {
push @spanish, $1 if /"(w+)"/;
}
close $fh;
my $correct = 0;
my $total = 0;
my $re = qr/a/;
for (@english) {
$total++;
$correct++ unless /$re/;
}
for (@spanish) {
$total++;
$correct++ if /$re/;
}
say "$correct / $total (@{[100*$correct/$total]}%)";
Answered by Silvio Mayolo on November 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP