To overcome the shortage of information gain in text categorization, this paper proposes a method of feature reduction based on the relative document frequency balance information gain ( RDFBIG ). 针对文本分类中信息增益降维方法的不足,提出了一种基于相对文档频的平衡信息增益(RDFBIG)降维方法。
The comprehensive method firstly used the new document frequency to select features to filter out some terms, and then employed the attribute reduction algorithm to eliminate redundancy. 该方法首先利用新型文档频进行特征初选以过滤掉一些词条,然后利用所提属性约简算法消除冗余。
On feature selection, document frequency was combined with mutual information, and performance was improved. 特征选择的方法上,结合了文档频数和互信息量,并对他们进行了改进。