In this paper, a Chinese news topic detection system is designed and tested by using the improved time window strategy and adopting the self-adaptive inverse document frequency. 文中通过改进加窗策略,采用自适应倒排文档频率,设计了一个中文新闻主题检测系统并进行了实验。
This paper compares several feature selection methods in text categorization, proposes a new feature selection method based on term frequency and inverse document frequency. 本文在分析比较几种用于文本分类的特征选择方法的基础上,提出了一种基于术语频率和逆文档频率的特征选择方法TDF。
This paper gives a term weighting method based on inverse document frequency, HTML tags and length of Chinese phrase, presents the method to select Web text feature based on the messy genetic algorithm. 该文设计了一个综合考虑位置、频率和词长3个因素的中文Web文本词权重的计算公式,提出了一种用变长度染色体遗传算法提取Web文本特征的方法。
Comparison of Out Document Frequency Weight Method with Inverse Document Frequency(IDF) Weight Method for Chinese Documents 汉语文献文外频率加权与逆文献频率加权方法的比较
Traditional algorithms only consider about TF ( Term Frequency ), IDF ( Inverse Document Frequency(IDF) ) and so on, and do not consider DI ( Distribution Information ) among and inside classes and LFHW ( Low Frequency but High Weight ) terms. 传统的特征权重算法着重于考虑频率和反文档频率(IDF)等因素,而未考虑特征的类间、类内分布与低频高权信息。