更新时间:2021-07-02 19:27:06
封面
版权信息
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Too Big or Not Too Big
What is big data?
A brief history of data
Dawn of the information age
Dr. Alan Turing and modern computing
The advent of the stored-program computer
From magnetic devices to SSDs
Why we are talking about big data now if data has always existed
Definition of big data
Building blocks of big data analytics
Types of Big Data
Structured
Unstructured
Semi-structured
Sources of big data
The 4Vs of big data
When do you know you have a big data problem and where do you start your search for the big data solution?
Summary
Big Data Mining for the Masses
What is big data mining?
Big data mining in the enterprise
Building the case for a Big Data strategy
Implementation life cycle
Stakeholders of the solution
Implementing the solution
Technical elements of the big data platform
Selection of the hardware stack
Selection of the software stack
The Analytics Toolkit
Components of the Analytics Toolkit
System recommendations
Installing on a laptop or workstation
Installing on the cloud
Installing Hadoop
Installing Oracle VirtualBox
Installing CDH in other environments
Installing Packt Data Science Box
Installing Spark
Installing R
Steps for downloading and installing Microsoft R Open
Installing RStudio
Installing Python
Big Data With Hadoop
The fundamentals of Hadoop
The fundamental premise of Hadoop
The core modules of Hadoop
Hadoop Distributed File System - HDFS
Data storage process in HDFS
Hadoop MapReduce
An intuitive introduction to MapReduce
A technical understanding of MapReduce
Block size and number of mappers and reducers
Hadoop YARN
Job scheduling in YARN
Other topics in Hadoop
Encryption
User authentication
Hadoop data storage formats
New features expected in Hadoop 3
The Hadoop ecosystem
Hands-on with CDH
WordCount using Hadoop MapReduce
Analyzing oil import prices with Hive
Joining tables in Hive
Big Data Mining with NoSQL
Why NoSQL?
The ACID BASE and CAP properties
ACID and SQL
The BASE property of NoSQL
The CAP theorem
The need for NoSQL technologies
Google Bigtable
Amazon Dynamo
NoSQL databases
In-memory databases
Columnar databases
Document-oriented databases
Key-value databases
Graph databases