Practical Data Analysis Using Jupyter Notebook
上QQ阅读APP看书,第一时间看更新

Installing Python and using Jupyter Notebook

I'm going to start by admitting this chapter may become obsolete in the future because installing open source software on your workstation can be a painful process and, in some cases, is being replaced by preinstalled virtual machines or cloud versions. For example, Microsoft offers a free Azure subscription option for a cloud-hosted Jupyter Notebook.

Understanding all of the dependencies of software versions, hardware, Operating System(OS) differences, and library dependencies can be complex. Further, your IT department rules on software installations in enterprise environments may have security restrictions that prohibit access to your workstation filesystem. In all likelihood, with more innovation in cloud computing, most of the steps will already be done ahead of time, eliminating the need to install software altogether.

With that said, I'm going to walk you through the process of installing Python and Jupyter Notebook, pointing out tips and pitfalls to educate you on key concepts along the way. I would compare using these technology tools to work with data to driving a car. The ability to drive should not be dependent on your ability to repair the car engine! Just knowing that you need an engine should be sufficient to drive and move forward. So, my focus is on setting up your workstation for data analysis quickly without focusing on the layers of details behind these powerful technologies.

The open source project that created the Jupyter Notebook app evolved from iPython back in 2014. Many of the features that existed in iPython still exist today in Jupyter, for example, the interactive GUI to run Python commands and parallel processing. There is a kernel to control the input/output between your computer's CPU, memory, and filesystem. Finally, there's also the feature of a notebook that collects all of the commands, code, charts, and comments in a single shareable file with the .ipynbextension.

Just to give some context of how popular Jupyter notebooks have become for data analysis, I discovered a public GitHub repository by Peter Parente that collects a daily count of the number of public .pynb files found in GitHub since 2014. The growth is exponential, as the number grew from just over 65,000 to 5.7 million by November 2019, which means it has been doubling every year for the last five years!

The first prerequisite to using a Jupyter notebook is installing Python. We are going to use Python version 3.3 or greater and there are two methods you can use to install the software: a direct download or a package manager. A direct download will have more control over the installation on your workstation but then it requires additional time to manage dependent libraries. That said, using a package manager to install Python has become preferred method, hence, I cover this method in this chapter.

Python is a powerful coding language with support on multiple OS platforms, including Windows, macOS, and Linux. I encourage you to read more about the history of this powerful software language and the creator, Guido van Rossum.

Python, at its core, is a command-line programming language, so you must be comfortable with running some commands from a prompt. When we have finished installation, you will have a Python command-line window, which will look like the following screenshot if your workstation has Windows OS:

Think of the Python installation as a means to an end because what we really want to use as data analysts is a Jupyter notebook, which is also known as an Integrated Development Environment(IDE) used to run code and call libraries in a self-contained Graphical User Interface (GUI).

Since I recommend using a package manager for installation, the first decision you must make is which package manager to use for the installation on your computer. A package manager is designed to streamline the versions and layers of dependencies between the open source libraries, your OS, and software. The most common ones are conda, pip, or docker.

From researching the differences, I prefer conda over pip for someone just getting started, especially if you are unfamiliar with running command-line commands and managing software installs directly on your PC. For an app-store-like experience, where all you have to do is download, install with a few prompts, and then launch the software, I would recommend Anaconda especially since it includes Python, several popular libraries required for data analysis, and Jupyter all as part of the download package.

Remember, the goal is to get Jupyter Notebook up and running on your workstation, so feel free to choose installation alternatives, especially if you prefer a Command-Line Interface ( CLI).

Installing Anaconda

Follow these steps to install Anaconda. For this walkthrough, I have selected a Windows OS installer but the screenshots of installation will be similar regardless of which one is selected:

  1. Download the software by choosing which installer is required based on your workstation's OS. To do this, navigate to the Anaconda Distribution page, which should look similar to the following screenshot and is found on https://www.anaconda.com/:
  1. You should see the Setup wizard as shown inthe following screenshot after you download the software and launch the installer on your PC:
  1. Select the default options in the install wizard and you should see a message similar to the following screenshot:
  1. Now that Anaconda has completed the installation, you must launch the Anaconda Navigator application from your PC, which is shown in the following screenshot using a Windows OS. Since there are multiple OS options available such as Windows, macOS, or Ubuntu, your screen will vary from the following screenshot:

I think of the installation process as similar to why an artist would need to buy a canvas, easel, and supplies to begin painting. Now that we have a working Python environment installed and available to use called Anaconda, you are ready to launch Jupyter and create your first notebook.

Running Jupyter and installing Python packages for data analysis

Once the software is installed on your PC, launching your Jupyter notebook can be done in either of two ways. The first is via a command-line prompt with the jupyter notebookcommandfromAnaconda Prompt, which will look similar to the following screenshot:

You can also use the Anaconda Navigator software and click the Launch button from My Applications in Jupyter Notebook, which is shown in the following screenshot:

Either option will launch a new web browser session with the http://localhost:8888/treeURL,which is known as the Jupyter dashboard. If you do not see something similar to the following screenshot,you may need to reinstall the Anaconda software or check whether firewall ports are blocking commands or dependencies. In an enterprise setting, you may have to review your corporate policies or request IT support:

If you would like to try JupyterLab instead of Jupyter Notebook, either solution will work. JupyterLab uses the exact same Notebook server and file format as the classic Jupyter Notebook so that it is fully compatible with the existing notebooks and kernels. The classic Notebook and JupyterLab can run side to side on the same computer. You can easily switch between the two interfaces.

Notice that Jupyter defaults with access to your workstation's filesystem based on how it was installed. This should be sufficient in most cases but if you would like to change the default project home/root folder, you can easily change it using Anaconda Prompt.Just run the cd command to change directory before you type the jupyter notebook command.

For example, I created a project folder on my local c:\ drive path on my Windows PC first and then ran the Anaconda Prompt window with the following commands:

          >cd \
          
>cd projects
>jupyter notebook

If you walk through this example, your Command Prompt window should look like the following screenshot if you're using Windows OS:

Once complete, the list of files and folders displayed in the Jupyter session will be blank and your session will look similar to the following screenshot:

You should now have the Jupyter software actively running on your workstation, ready to walk through all of the features available, which we will cover next.