Kevin R. Crook
kevincrook.com
Book Recommendations for Data Science
These are my personal opinion recommendations for books for Data Science at the present time. My recommendations change
over time based on new technologies coming into play, old technologies fading, and new books and new editions of old
books hitting the market.
- Linux
- Helmke, Matthew, Sobell, Mark G.,
Practical Guide to Linux Commands, Editors, and Shell Programming, 4th Edition,
make sure you have the 4th edition of January 2018
- Command Line: Part I, Chapters 1 through 5
- vi and vim Editors: Part II, Chapter 6
- Bash Shell (using): Part III, Chapter 8
- Bash Shell (programming): Part IV, Chapter 10
- awk: Part IV, Chapter 14
- sed: Part IV, Chapter 15
- Regular Expressions: Part VII, appendix A
- SQL
- Plew, Ron, Jones, Arie, and Stephens, Ryan,
Sams Teach Yourself SQL in 24 Hours
- Python
- Core Python
- Lubanovic, Bill, Introducing Python
- Numpy and Pandas
- McKinney, William Wesley, Python for Data Analysis, 2nd Edition,
make sure you have the 2nd edition that is based on Python 3,
older editions are based on Python 2 which you don't want
- Machine Learning and Deep Learning
- Geron, Aurelien, Hands-On Machine Learning with Scikit-Learn and TensorFlow,
my personal opinion is that this is currently the best book for beginners to learn
machine learning and deep learning
- Hadoop Ecosystem (Hadoop is fading in popularity - consider skipping and going straight to Spark)
- Beginner Books:
- Achari, Shiva, Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop
- Eadline, Douglas, Hadoop 2 Quick-Start Guide, Learn the Essentials of Big Data Computing
in the Apache Hadoop 2 Ecosystem
- Advanced Books:
- White, Tom, Hadoop: The Definitive Guide, 4th Edition,
make sure you have the latest 4th edition from April 2015 as
earlier editions covered Hadoop 1 not Hadoop 2 - big difference.
- See also the Data Algorithms books under my Spark section which covers a lot of
algorithms in both Hadoop MapReduce and in Spark
- Spark
- Beginner Book:
- Aven, Jeffrey, Sams Teach Yourself Apache Spark in 24 Hours
- Advanced Books:
- Parsian, Mahmoud, Data Algorithms , covers both Hadoop MapReduce and Spark
- Laserson, Uri, Ryza, Sandy, Owen, Sean, Wills, Josh, Advanced Analytics with Spark