|
XCIN Mail-list
|
| Indexed By Date: [Previous] [Next] | Indexed By Thread: [Previous] [Next] |
| Subject: | Re: 檢查 tsi.src 本身 consistent |
| From: | Kuang-che Wu <kcwu@camel.ck.tp.edu.tw> |
| Organization: | Taipei Chien-kuo Senior High School |
| Date: | 27 Dec 2000 17:02:33 GMT |
| To: | xcin@tlug.sinica.edu.tw |
| Delivered-To: | xcin-gate@tlug.sinica.edu.tw |
| Delivered-To: | xcin-list@tlug.sinica.edu.tw |
| Reply-To: | xcin@tlug.sinica.edu.tw |
Kuang-che Wu <kcwu@camel.ck.tp.edu.tw> 提到:
> 這是用程式檢查 tsi.src-20001130 本身是否 consistent,
這是我用來
1.上次找 "可能" 是有問題四字詞的程式, 現在應該不需要了
2.檢查 tsi.src 本身是否 consistent, 跑半分鐘就夠了 :Q
寫的很爛的 perl code, 應急隨便寫的
以 BSD license 釋出,
code 品質很差, 還堪用, 如果需要的話大家就湊合著用吧 :p
應該用 c with libtabe 寫會比較好 ^^;
#!/usr/bin/perl
use strict;
my @tsi;
my @tsis;
my %orig;
my %tsifreq;
my %word2yin;
sub load_tsi
{
my ($t,$c,@line);
open F,"tsi.src";
while(<F>) {
($t,$c,@line)=split ' ';
push @tsis,$t;
$orig{$t}=$_;
$tsifreq{$t}=$c;
push @{$tsi[length($t)/2]->{$t}},@line;
}
close F;
}
sub build_word2yin
{
for my $w(keys %{$tsi[1]}) {
for my $yins(@{$tsi[1]->{$w}}) {
push @{$word2yin{$w}},split("\\[|\\]|,",$yins);
}
}
}
sub check_bad4word {
my(@line,@lastline);
open F,"tsi.src";
while(<F>) {
@line=split ' ';
if(length($lastline[0])==8 and length($line[0])>8 and
substr($line[0],0,8) eq $lastline[0]) {
printf("%s\n%s\n",$lastline[0],$line[0]);
}
@lastline=@line;
}
close F;
}
sub check_badyin
{
for my $tsi(@tsis) {
my $len=length($tsi)/2;
next if 0==@{$tsi[$len]->{$tsi}};
for my $i(0 .. (length $tsi)/2-1) {
my $ch=substr($tsi,$i*2,2);
my @yins=split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i]);
push @yins,split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i+length($tsi)])
if $tsi[$len]->{$tsi}[$i+length($tsi)];
push @yins,split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i+2*length($tsi)])
if $tsi[$len]->{$tsi}[$i+2*length($tsi)];
push @yins,split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i+3*length($tsi)])
if $tsi[$len]->{$tsi}[$i+3*length($tsi)];
my $flag=0;
for my $yin(@yins) {
for my $y(@{$word2yin{$ch}}) {
if ($yin eq $y) {
$flag=1; last;
}
}
last if $flag;
}
if($flag==0) {
print "$tsi:$ch:";
print join ",",@yins;
print "\n$orig{$tsi}$orig{$tsi}\n";
}
}
}
}
load_tsi;
build_word2yin;
check_badyin;
#check_bad4word;
To Unsubscribe: send mail to majordomo@linux.org.tw
with "unsubscribe xcin" in the body of the message
| Indexed By Date | Previous: |
檢查 tsi.src 本身 consistent From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw> |
|---|---|---|
| Next: |
Re: 關於 bims 猜詞 From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw> |
|
| Indexed By Thread | Previous: |
檢查 tsi.src 本身 consistent From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw> |
| Next: |
Re: 檢查 tsi.src 本身 consistent From: Tzu-hsien Yu <thyu@ck.tp.edu.tw> |