A Neural Network Language Document Representation Technique for Web-Page Classification

OSANYIN, Q. A. and AJOSE-ISMAIL, B. M. (2020) A Neural Network Language Document Representation Technique for Web-Page Classification. International Journal of Computer Applications, 176 (14). pp. 38-46.

[img] Text
A Neural Network Language Document Representation Technique for Web-Page Classification.pdf

Download (698kB)

Abstract

The task of assigning a web page to the correct category is getting cumbersome because of the influx of digital documents on the World Wide Web. The performance of applications such as web directories, question and answering system, web content filtering systems depends on the key performance of automatic web page classification systems. From extant literature, the performance of web page classification system depends on adequate textual representation of the web content. Several statistical document representation techniques such as bag of words models, n-grams models and topic models have been proposed by authors to capture the real semantics of web documents but are fraught with several challenges such as semantic mismatch, multiple meanings of words. Thus, this paper proposes a recent neural network language model (Doc2Vec) which utilizes document embedding’s to solve the document representation problem of web page classification system. Results obtained confirms the earlier assumption that Doc2Vec performs robustly on very high dimensional text such as web documents, it also capture the real semantics of the web document.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: Mr. Bolanle Yisau I.
Date Deposited: 07 Jun 2021 09:46
Last Modified: 07 Jun 2021 09:46
URI: http://eprints.federalpolyilaro.edu.ng/id/eprint/1644

Actions (login required)

View Item View Item