Great Circle Associates

XCIN Mail-list
(December 2000)


Indexed By Date: [Previous] [Next] Indexed By Thread: [Previous] [Next]

Subject: Re: 檢查 tsi.src 本身 consistent
From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw>
Organization: Taipei Chien-kuo Senior High School
Date: 27 Dec 2000 17:02:33 GMT
To: xcin@tlug.sinica.edu.tw
Delivered-To: xcin-gate@tlug.sinica.edu.tw
Delivered-To: xcin-list@tlug.sinica.edu.tw
Reply-To: xcin@tlug.sinica.edu.tw

Kuang-che Wu <kcwu@camel.ck.tp.edu.tw> 提到:
> 這是用程式檢查 tsi.src-20001130 本身是否 consistent,
這是我用來
1.上次找 "可能" 是有問題四字詞的程式, 現在應該不需要了
2.檢查 tsi.src 本身是否 consistent, 跑半分鐘就夠了 :Q
寫的很爛的 perl code, 應急隨便寫的
以 BSD license 釋出,
code 品質很差, 還堪用, 如果需要的話大家就湊合著用吧 :p

應該用 c with libtabe 寫會比較好 ^^;

#!/usr/bin/perl
use strict;

my @tsi;
my @tsis;
my %orig;
my %tsifreq;
my %word2yin;

sub load_tsi
{
  my ($t,$c,@line);
  open F,"tsi.src";
  while(<F>) {
    ($t,$c,@line)=split ' ';
    push @tsis,$t;
    $orig{$t}=$_;
    $tsifreq{$t}=$c;
    push @{$tsi[length($t)/2]->{$t}},@line;
  }
  close F;
}

sub build_word2yin
{
  for my $w(keys %{$tsi[1]}) {
    for my $yins(@{$tsi[1]->{$w}}) {
      push @{$word2yin{$w}},split("\\[|\\]|,",$yins);
    }
  }
}

sub check_bad4word {
  my(@line,@lastline);
  open F,"tsi.src";

  while(<F>) {
    @line=split ' ';
    if(length($lastline[0])==8 and length($line[0])>8 and
       substr($line[0],0,8) eq $lastline[0]) {
      printf("%s\n%s\n",$lastline[0],$line[0]);
    }
    @lastline=@line;
  }
  close F;
}


sub check_badyin
{
  for my $tsi(@tsis) {
    my $len=length($tsi)/2;
    next if 0==@{$tsi[$len]->{$tsi}};
    for my $i(0 .. (length $tsi)/2-1) {
      my $ch=substr($tsi,$i*2,2);
      my @yins=split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i]);
      push @yins,split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i+length($tsi)])
        if $tsi[$len]->{$tsi}[$i+length($tsi)];
      push @yins,split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i+2*length($tsi)])
        if $tsi[$len]->{$tsi}[$i+2*length($tsi)];
      push @yins,split("\\[|\\]|,",$tsi[$len]->{$tsi}[$i+3*length($tsi)])
        if $tsi[$len]->{$tsi}[$i+3*length($tsi)];
      my $flag=0;
      for my $yin(@yins) {
        for my $y(@{$word2yin{$ch}}) {
          if ($yin eq $y) {
            $flag=1; last;
          }
        }
        last if $flag;
      }
      if($flag==0) {
        print "$tsi:$ch:";
        print join ",",@yins;
        print "\n$orig{$tsi}$orig{$tsi}\n";
      }
    }
  }
}
load_tsi;
build_word2yin;
check_badyin;

#check_bad4word;

To Unsubscribe: send mail to majordomo@linux.org.tw
with "unsubscribe xcin" in the body of the message



References:
Indexed By Date Previous: 檢查 tsi.src 本身 consistent
From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw>
Next: Re: 關於 bims 猜詞
From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw>
Indexed By Thread Previous: 檢查 tsi.src 本身 consistent
From: Kuang-che Wu <kcwu@camel.ck.tp.edu.tw>
Next: Re: 檢查 tsi.src 本身 consistent
From: Tzu-hsien Yu <thyu@ck.tp.edu.tw>