给定一个 ISO 639-2T 范围个人的语言代码,我如何在程序上找到匹配的语言代码?宏语言 代码,如果存在这样的匹配?
例如,如何从 "nob"(挪威语Bokmål,范围个体)到 "nor"(挪威语,范围宏语言)?
一般来说,在同一个国家,可能会有多个不属于同一个宏语言的个体语言,所以单单按照国家进行分组会出现假阳性。
java.util.locale知道ISO 639三个字母的语言代码,并能识别上例中的两种代码,但没有范围的概念,也没有macrolanguage的概念。
在我的例子中,一个没有误报的启发式方法也很有帮助。
你可以做一个你自己的宏语言列表,以及相应的个人语言。
这是我前段时间做的一个选择。
public static final Map<String, String> macroLanguages = new HashMap<>();
static {
macroLanguages.put("aao", "ara"); //https://iso639-3.sil.org/code/ara
macroLanguages.put("abh", "ara");
macroLanguages.put("abv", "ara");
macroLanguages.put("acm", "ara");
macroLanguages.put("acq", "ara");
macroLanguages.put("acw", "ara");
macroLanguages.put("acx", "ara");
macroLanguages.put("acy", "ara");
macroLanguages.put("adf", "ara");
macroLanguages.put("aeb", "ara");
macroLanguages.put("aec", "ara");
macroLanguages.put("afb", "ara");
macroLanguages.put("ajp", "ara");
macroLanguages.put("apc", "ara");
macroLanguages.put("apd", "ara");
macroLanguages.put("arb", "ara");
macroLanguages.put("arq", "ara");
macroLanguages.put("ars", "ara");
macroLanguages.put("ary", "ara");
macroLanguages.put("arz", "ara");
macroLanguages.put("auz", "ara");
macroLanguages.put("avl", "ara");
macroLanguages.put("ayh", "ara");
macroLanguages.put("ayl", "ara");
macroLanguages.put("ayn", "ara");
macroLanguages.put("ayp", "ara");
macroLanguages.put("bbz", "ara");
macroLanguages.put("pga", "ara");
macroLanguages.put("shu", "ara");
macroLanguages.put("ssh", "ara");
macroLanguages.put("ekk", "est"); //https://iso639-3.sil.org/code/est
macroLanguages.put("vro", "est");
macroLanguages.put("bos", "hbs"); //https://iso639-3.sil.org/code/hbs
macroLanguages.put("hrv", "hbs");
macroLanguages.put("srp", "hbs");
macroLanguages.put("cnr", "hbs");
macroLanguages.put("ltg", "lav"); //https://iso639-3.sil.org/code/lav
macroLanguages.put("lvs", "lav");
macroLanguages.put("nno", "nor"); //https://iso639-3.sil.org/code/nor
macroLanguages.put("nob", "nor");
macroLanguages.put("aae", "sqi"); //https://iso639-3.sil.org/code/sqi
macroLanguages.put("aat", "sqi");
macroLanguages.put("aln", "sqi");
macroLanguages.put("als", "sqi");
macroLanguages.put("ydd", "yid"); //https://iso639-3.sil.org/code/yid
macroLanguages.put("yih", "yid");
macroLanguages.put("ccx", "zha"); //https://iso639-3.sil.org/code/zha
macroLanguages.put("ccy", "zha");
macroLanguages.put("zch", "zha");
macroLanguages.put("zeh", "zha");
macroLanguages.put("zgb", "zha");
macroLanguages.put("zgm", "zha");
macroLanguages.put("zgn", "zha");
macroLanguages.put("zhd", "zha");
macroLanguages.put("zhn", "zha");
macroLanguages.put("zlj", "zha");
macroLanguages.put("zln", "zha");
macroLanguages.put("zlq", "zha");
macroLanguages.put("zqe", "zha");
macroLanguages.put("zyb", "zha");
macroLanguages.put("zyg", "zha");
macroLanguages.put("zyj", "zha");
macroLanguages.put("zyn", "zha");
macroLanguages.put("zzj", "zha");
macroLanguages.put("cdo", "zho"); //https://iso639-3.sil.org/code/zho
macroLanguages.put("cjy", "zho");
macroLanguages.put("cmn", "zho");
macroLanguages.put("cpx", "zho");
macroLanguages.put("czh", "zho");
macroLanguages.put("czo", "zho");
macroLanguages.put("gan", "zho");
macroLanguages.put("hak", "zho");
macroLanguages.put("hsn", "zho");
macroLanguages.put("lzh", "zho");
macroLanguages.put("mnp", "zho");
macroLanguages.put("nan", "zho");
macroLanguages.put("wuu", "zho");
macroLanguages.put("yue", "zho");
macroLanguages.put("cnp", "zho");
macroLanguages.put("csp", "zho");
macroLanguages.put("pes", "fas"); //https://iso639-3.sil.org/code/fas
macroLanguages.put("prs", "fas");
}