Web Mining
Web Mining

In Customer Relationship Management (CRM), web mining is the integration of data gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web. (Mining means that extracting something valuable from a baser substance, like mining gold from the earth.)

Web mining is an application of data mining techniques to seek out information patterns from web data. It helps to enhance the power of web search engine by identifying the web pages and classifying the web documents.

Web mining is the method of using data mining techniques and algorithms to extract data directly from the web by extracting it from web documents and services, web content, hyperlinks and server logs. The goal of web mining is to look for patterns in web data by collecting and analyzing data in order to achieve insight into trends, the industry and users generally.

The contents of data mined from the web may be a collection of facts that web pages are meant to contain, and these may consist of text, structured data like lists and tables, and even pictures, video and audio.

Contents

In Hindi:- 

In English:- 

Web mining permits you to look for patterns in data through content mining, structure mining, and usage mining. Content mining is used to examine the information collected by search engines and web spiders. Structure mining is used to examine data related to the structure of a specific site and usage mining is used to examine data related to a specific user’s browser as well as information gathered by forms the user may have submitted during web transactions.

The information gathered through web mining is evaluated by using traditional data mining parameters like clustering and classification, association, and examination of sequential patterns.

There are 3 general classes of data which will be discovered by web mining:-

  • Web activity, from server logs and browser activity tracking.
  • Web graph, from links between pages, people and different information.
  • Web content, for the information found on web pages and inside of documents.

At Scale Unlimited, we concentrate on the last one – extracting value from web pages and other documents found on the web. Search is the biggest web miner by far and generates the most revenue.

There are several other valuable ends uses for web mining results. A partial list includes:-

  • Business intelligence
  • Competitive intelligence
  • Pricing analysis
  • Events
  • Product information
  • Popularity
  • Reputation

 

Applications of Web Mining

With the rapid growth of the World Wide Web, web mining becomes an awfully well liked topic in web analysis. E-commerce and E-services are claimed to be the killer applications for web mining, and web mining currently also plays a vital role for E-commerce website and E-services to know how their websites and services are used and to provide better services for their clients.  A few applications are:

  • E-commerce customer Behavior Analysis
  • E-commerce transaction Analysis
  • E-commerce website design
  • E-banking
  • M-commerce
  • Web advertisement
  • Search Engine
  • Online Auction.

 

Types Of Web Mining

There are 3 types of web mining:-

  1. Web content Mining

  • Web content mining is used for mining of useful information, data, and knowledge from web page content and web documents that are mostly text, pictures and audio/video files. Techniques used in this discipline have been heavily drawn from natural language processing (NLP) and data retrieval.
  • Web structure mining helps to seek out useful data or information pattern from the structure of hyperlinks.
  • Due to heterogeneity and absence of structure in web data, automated discovery of latest information pattern can be challenging to some extent.
  • Web content mining performs scanning and mining of the text, pictures, and groups of web pages according to the content of the input (query), by displaying the list in search engines.

For example: If a user needs to look for a specific book, then search engine provides the list of suggestions.

 

  1. Web Usage Mining

  • This is the process of extracting patterns and information from server logs to achieve insight on user activity including where the users are from, how many clicked what item on the site and the types of activities being done on the site.
  • Web usage mining is used for mining the online log records and help to find the user access patterns of web pages.
  • Web server registers a blog entry for every web page of a website.
  • Analysis of similarities in blog records can be helpful to identify the potential customers for e-commerce corporations.

Some of the techniques to find and analyze the web usage pattern are:

  1. i) Session and visitor analysis

  • The analysis of pre-processed data can be performed in session analysis that includes the record of visitors, days, sessions etc. This data can be used to analyze the behavior of visitors.
  • The report is generated after this analysis, that contains the details of frequently visited web pages, common entry, and exit.
  1. ii) OLAP (Online Analytical Processing)

  • It performs a multi-dimensional analysis of complex data.
  • It can be performed on totally different parts of log related data in a certain interval of time.
  • OLAP tool is used to derive the most important business intelligence metrics.

 

  1. Web Structure Mining

This is the method of analyzing the nodes and connection structure of a website through the use of graph theory. There are two things which will be obtained from this: the structure of a website in terms of how it’s connected to other sites and the document structure of a site itself, as to how each page is connected.

  • The web structure mining can be used to discover the link structure of the link.
  • It is used to identify that the web pages are linked either by data or by direct link connection.
  • The purpose of structure mining is to produce the structural summary of a site and similar web pages of websites.

Example: web structure mining will be very useful to corporations to determine the connection between 2 commercial websites.

Differentiate Between Web Content Mining, Web Structure Mining, and Web Usage Mining

Web Content Mining Web Structure Mining Web Usage Mining
IR View DB View
View of Data
  • Unstructured
  • Structured
  • Semi-Structured
  • Website as DB
Link Structure Interactivity
Main Data
  • Text Documents
  • Hypertext Documents
Hypertext Documents

 

Link Structure
  • Server Logs
  • Browser Logs
Representation
  • Bang of Words, n-gram terms
  • Phrases, concepts or ontology
  • Relational
  • Edge Label Graph
  • Relational
Graph
  • Relational Table
  • Graph
Method
  • Machine Learning
  • Statistical (including NLP)
  • Proprietary algorithms
  • Association Rules
Proprietary algorithms

 

  • Machine Learning
  • Statistical
  • Association Rules
Application Categories
  • Categorization
  • Clustering
  • Finding Extract rules
  • Finding Patterns in text
  • Finding frequent substructures
  • Website Schema Discovery
  • Categorization
  • Clustering

 

  • Site Construction
  • Adaptation and Management

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.