|
XCIN Mail-list
|
| Indexed By Date: [Previous] [Next] | Indexed By Thread: [Previous] [Next] |
| Subject: | Re: Some ideas about improving libtabe |
| From: | Chih-Hao Tsai <hao520@yahoo.com> |
| Organization: | Taiwan Linux User Group News Server |
| Date: | Thu, 19 Apr 2001 02:47:44 -0500 |
| To: | xcin@tlug.sinica.edu.tw |
| Delivered-To: | xcin-gate@tlug.sinica.edu.tw |
| Delivered-To: | xcin-list@tlug.sinica.edu.tw |
| Reply-To: | xcin@tlug.sinica.edu.tw |
Pai-Hsiang Hsiao wrote: > 我現在有的是一個不大不小的 Chinese TreeBank. 基本上是一些簡體中文的新聞稿等 > 東西, 約有十萬詞. 這些語料是由人工斷詞及加 part-of-speech tag. 我要做簡繁轉 > 換及加注音時, 就比較簡單及可靠點. 簡繁轉換基本上做完了, 一對多的情形部份用詞 > 庫挑出來比對, 再用人工校對. 注音也差不多, 先查教育部的詞典, 再比對 tsi.src, > 最後再用人工加. (我看到現在的結果, tsi.src 在這部份已經快成為教育部 clc dict > 的 super set. Thanks to everyone who contributes, great work!) 如果不需要用到詞類標記的話,或可考慮 PH corpus。GB 碼分詞新 聞語料庫,有兩百多萬詞。 ftp://ftp.cogsci.ed.ac.uk/pub/chinese/ -- Chih-Hao Tsai | ICQ#5734422 | http://www.geocities.com/hao520 To Unsubscribe: send mail to majordomo@linux.org.tw with "unsubscribe xcin" in the body of the message
| Indexed By Date | Previous: |
Re: [填補注音] From: thhsieh@tlug.sinica.edu.tw |
|---|---|---|
| Next: |
Re: Xcin 2.5.2 for Darwin/Mac OS X From: "Yuting Kuo" <yuting@bigfoot.com> |
|
| Indexed By Thread | Previous: |
Re: Some ideas about improving libtabe From: Pai-Hsiang Hsiao <phsiao@fas.harvard.edu> |
| Next: |
Re: Some ideas about improving libtabe From: Pai-Hsiang Hsiao <phsiao@fas.harvard.edu> |