Mining Frequent Patterns without Tree Generation
- This methodology helps organizations in proactive knowledge-driven decision making by focusing on important information of their data rather the entire database.
- The frequent patterns are generated without any candidate set or tree construction using numerical approach.
- This approach uses “2n series” which are used to represent the frequent items.
- The unique property of this approach is that the sum of the subsets of 2n series is unique. i.e., no two rows with different itemsets will share a same value.
- Entire database is encoded and can be stored in main memory which in turn reduces the IO access while mining frequent patterns.
- This leads to the reduction in both space and time complexity.
Sentiment Classification of Tweets using a Scoring Model
- Food-price related tweets are collected from specific period of time and analyze the impact of food price crises.
- Downloaded the tweets using Twitter streaming API
- Statistical scoring model is used to classify the relevant tweets, depending on the sentiment they express (i. e. “positive tweet” “negative tweet” and “neutral tweet”). Sample of tweets was then used to train to classify the tweets in the correct category and identify the sentiment of new tweets.
Elimination of Redundant Association Rules
- Association rule mining plays an important role in data mining and knowledge discovery.
- Traditional association rule mining algorithms generate lots of rules based on the support and confidence values, many such rules thus generated are redundant.
- The eminence of the information is affected by the redundant association rules.
- The proposed algorithm removes redundant association rules to improve the quality of the rules and decreases the size of the rule list.
- It also reduces memory consumption for further processing of association rules.
Finding Impacting Factors Through Social Media Analytics
- The use of social media is widespread in all around the world, especially Twitter which is an excellent channel where people can freely express their views on various topics.
- Food price rise crisis is an important issue which affects people in various walks of life and in turn economy of the country.
- Our objective is to analyze social media data to determine the factors that impact the food price.
- Association rules are extracted and summarized for the large set of tweets after extracting relevant keywords/features.
- From the twitter conversations, the clustering aspects or features are then analyzed on how to correlate the impact on food price crisis.
- Geographical mentions are also included to differentiate the variability among different cities/regions.
- Capture the user intent and recommend the web page that contains user expected information.
- An important challenge of such system must include a need of being self-adaptive because the needs of online user may change dynamically and also design an accurate classifier for improving the accuracy of recommendation system.
- Personalized Recommendation using CF and CBF methods in conjunction with Social Networks Data and Geo-tag. Handling cold start problem in CF based Recommendation system.
Data Mining in Biological Data
- Design a rough set based Bi clustering algorithm for efficiently finding useful pattern in gene expression.
- Addressing research challenges of summarizing and analyzing the sentiment of the vast amounts of user-generated content (Big Data Analytics).
- Analyzing the information in social networking websites.
- Monitor the Twitter conversations in India to understand how tweets are relate to world events.
- Collect the food-price related tweets from January 2015 to till date (around 10,000 tweets) and analyze the impact of food price crises.
- Data mining techniques is used to extract the relevant keywords/ features related to food price crises.
- Soft computing based integration of data from heterogeneous sources.
- Application of data mining techniques in crowd sourcing.