Big Data and Web: An Efficient Algorithm Design for DISC
Abstract: – Data-intensive computing is a paradigm to address the data gap and a platform to allow the advancement in research to process massive amounts of data and implement such applications which previously analyzed to be impractical or infeasible.The existing one-pass analytics algorithm observed to be data-intensive and contrarily requires the ability to efficiently process high volumes of data. MapReduce is supposed to be a programming model for processing large datasets using a cluster of machines. However, the existing MapReduce model is considerably not well-suited for high volume trimmer data, since it is towards batch processing and requires the data set to be fully loaded into the cluster before running analytical queries. This paper examines, from anefficiency standpoint, what the architectural design changes are necessary to bring the benefits of the MapReduce model and streaming algorithm to incremental, the existing MR algorithms.
INTRODUCTION
“Data-intensive Scalable computing is a class of parallel computing applications that use- data parallel approach to process terabytes or petabytes of data and hence represented as big data. The computing applications are deemed according to compute-intensive and data-intensive based on the type of computational requirements and data volumes”.
”The advent of the Internet and World Wide Web has given the reason for storing of large amount of information and presenting them online.The business and government organizations create large amounts of both structured and unstructured information which needs to be processed, analyzed, and linked. An IDC white paper sponsored by EMC Corporation estimated the amount of information currently stored in a digital form in 2007 at 281 Exabyte’s and the overall compound growth rate at 57% with information in organizations growing at even a faster rate”. [3] ―”The storing, managing, accessing, and processing of this vast amount of data represents a fundamental need and an immense challenge in order to satisfy needs to Search-Analyze-MineVisualize[SAMV] this data as information”