coverpage
Pig Design Patterns
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files eBooks discount offers and more
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Setting the Context for Design Patterns in Pig
Understanding design patterns
The scope of design patterns in Pig
Hadoop demystified – a quick reckoner
Pig – a quick intro
Understanding Pig through the code
Summary
Chapter 2. Data Ingest and Egress Patterns
The context of data ingest and egress
Types of data in the enterprise
Ingest and egress patterns for multistructured data
The ingress and egress patterns for the NoSQL data
The ingress and egress patterns for structured data
The ingress and egress patterns for semi-structured data
JSON ingress and egress patterns
Summary
Chapter 3. Data Profiling Patterns
Data profiling for Big Data
Rationale for using Pig in data profiling
The data type inference pattern
The basic statistical profiling pattern
The pattern-matching pattern
The string profiling pattern
The unstructured text profiling pattern
Summary
Chapter 4. Data Validation and Cleansing Patterns
Data validation and cleansing for Big Data
Choosing Pig for validation and cleansing
The constraint validation and cleansing design pattern
The regex validation and cleansing design pattern
The corrupt data validation and cleansing design pattern
The unstructured text data validation and cleansing design pattern
Summary
Chapter 5. Data Transformation Patterns
Data transformation processes
The structured-to-hierarchical transformation pattern
The data normalization pattern
The data integration pattern
The aggregation pattern
The data generalization pattern
Summary
Chapter 6. Understanding Data Reduction Patterns
Data reduction – a quick introduction
Data reduction considerations for Big Data
Dimensionality reduction – the Principal Component Analysis design pattern
Numerosity reduction – the histogram design pattern
Numerosity reduction – sampling design pattern
Numerosity reduction – clustering design pattern
Summary
Chapter 7. Advanced Patterns and Future Work
The clustering pattern
The topic discovery pattern
The natural language processing pattern
The classification pattern
Future trends
Summary
Index
更新时间:2021-07-16 12:08:11