基于隐马尔科夫模型的文本分类
Text Categorization Based on Hidden Markov Model
投稿时间:2012-10-10  
中文关键词:文本分类  隐马尔科夫模型  信息增益  χ2检验  泊松分布
英文关键词:text categorization  Hidden Markov Model  information gain  χ2 test  Poisson distribution
基金项目:
作者单位
刘晓飞 石家庄铁道大学 信息科学与技术学院 
邸书灵 石家庄铁道大学 信息科学与技术学院 
摘要点击次数: 1941
全文下载次数: 4516
中文摘要:
      文本分类经过多年的发展,已经产生了很多成熟高效的算法。将隐马尔科夫模型用于文本分类,对每个文本类构建一个隐马尔科夫模型,χ2检验获得类别特征词集,其状态转移表示按照一指定顺序对类别特征词集进行遍历,状态输出符号为特征词的词频,隐马尔科夫模型的状态转移过程隐含的表示了属于该类的文本的形成过程。具有最大概率的HMM分类器就是所属的类别。该算法不仅将特征词考虑进去,而且还将词频信息融入隐马尔科夫模型中,实验结果表明该方法分类效率较高。
英文摘要:
      Text classification has generated a lot of mature and efficient algorithm after years of development. Applying Hidden Markov Model to text categorization and building a Hidden Markov Model for each text class, χ2 test achieves class feature word set, the models state transition is showed in accordance with a specified order of the category feature word set to traverse. The status output symbol is word frequency, and the Hidden Markov Models state transfer process is implicit representation of the formation process of the text belonging to the class. The text belongs to the class of the greatest probability of HMM classifier. The algorithm not only takes into account the feature words but also takes word frequency information into Hidden Markov Model, and the experimental results show that the classification efficiency is higher.
刘晓飞,邸书灵.基于隐马尔科夫模型的文本分类[J].石家庄铁道大学学报(自然科学版),2013,(1):101-105,110.
查看全文  下载PDF阅读器
关闭