# 构建种子词表
* [Gluonnlp GBW Dataset](https://gluon-nlp.mxnet.io/api/modules/data.html#gluonnlp.data.GBWStream)
* [Original GBW Dataset](http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz)
* TOFEL 词表
然后抽出词汇中所有的单词(word)组成种子词, ps. 此时不用考虑词组(phrase).
# 扩增种子词表并构建种子词组词表
根据种子词表在 [wordnet](http://wordnetweb.princeton.edu/perl/webwn) 中进行词表扩增,同时构建出词组词表
# 构建结构化词典
爬取目标网站 https://www.merriam-webster.com/
## 单词存储格式说明
"word": "$什么词",
"type": "$type1", // 词的类型,例如:transitive verb, noun
"detailed": [
[["$释义1-1", "$释义1-2"], ["$例1","$例2"],["$path_to_ex_1","$path_to_ex_2"]]],
[["$释义2-1", "$释义2-2"], ["$例1","$例2"],["$path_to_ex_1","$path_to_ex_2"]]],
"type": "$type2",
"word": "come",
"type": "intransitive verb",
"detailed": [
[["to move toward something", "approach"], ["come here."], ["Entry 1", "1", "a"]],
[["to move or journey to a vicinity with a specified purpose"], ["come see us.","come and see what's going on."], ["Entry 1", "1", "b"]],
"type": "transitive verb",
## 词组存储格式说明
"phrase": "$什么词组",
"center word": "$中心词", // 例如: come up 的中心词是 come
"type": "$type1", // 词的类型,例如:transitive verb, noun
"detailed": [
[["$释义1-1", "$释义1-2"], ["$例1","$例2"], ["$path_to_ex_1","$path_to_ex_2"]],
[["$释义2-1", "$释义2-2"], ["$例1","$例2"], ["$path_to_ex_1","$path_to_ex_2"]],
"type": "$type2",
"external_explation": //二次检索获得的词的基本释义,列表形式
"type": "$type1", // 词的类型,例如:transitive verb, noun
"detailed": [
[["$释义1-1", "$释义1-2"], ["$例1","$例2"], ["$path_to_ex_1","$path_to_ex_2"]],
[["$释义2-1", "$释义2-2"], ["$例1","$例2"], ["$path_to_ex_1","$path_to_ex_2"]],
"type": "$type2",
"phrase": "come a cropper",
"center word": "come",
"type": "transitive verb", // 词的类型,例如:transitive verb, noun
"detailed": [
[["to fail completely"], ["The plan came a cropper."], ["Entry 1", "2"]]
"external_explation": []
"phrase": "come across",
"center word": "come",
"type": "transitive verb",
"detailed": [
[["to meet, find, or encounter especially by chance"], ["Researchers have come across important new evidence."]],
"external_explation": [
"type": "intransitive verb",
"detailed": [
[["to give over or furnish something demanded"],[],["1"]],
[["to produce an impression"],["comes across as a good speaker"],["2"]],
[["come through"],[],["3"]]
注:词组的 explanation 是单词解释中的词组拓展,例如上面 come across 的 explanation 是来自有 come 中 come across 的解释;external expaltion 则是通过直接检索 come across 得到的。
## 额外要求 (Bonus)
### 动词的变形 (1分)
"verb": "$哪个动词",
"past": "$过去式", // past 是 past tense 的缩写
"pp": "$过去分词", // pp 是 past participle 的缩写
"present": "$现在分词", // present 是 present participle 的缩写
"ts": "$第三人称单数", // ts 是 present tense third-person singular 的缩写
## 评分标准
- 爬取 30000 以上词汇(1分)
- 爬取每个词汇相关的词组(1分)
- 存储爬取词条的所有动词变形(1分) |
\ No newline at end of file |