Practical Data Analysis Using Jupyter Notebook
上QQ阅读APP看书,第一时间看更新

Storing and retrieving data files

What I like about using Jupyter is that it is a self-contained solution for data analysis. What I mean by that statement is you can interact with the filesystem to add, update, and delete folders and files plus run Python commands all in one place. As you continue using this tool, I think you will find it much easier to navigate by staying in one ecosystem compared to hopping between multiple windows, apps, or systems on your workstation.

Let's begin with getting comfortable navigating the menu options to add, edit, or delete files. Jupyter defaults the dashboard by listing all files and folders that are accessible on your workstation from the directory paths it was installed. This is can be configured to change the starting folder but we will use the Windows default. In the following screenshot, I have highlighted the important sections of the Jupyter dashboard with letters for easy reference:

In theA section, the URL defaults tohttp://localhost:888/treewhen running on your personal workstation. This URL will change if the notebook is hosted on a server or cloud. Notice, as you make selections to folders or files in theB section, the URL address will change to follow the location and path of your selections.

In theB section, you will find a hierarchy of accessible folders or files that are visible to the dashboard. If you click on any file, it will attempt to open it in the editor, whether or not the file is usable by Jupyter. Readable file extensions by the editor include images in formats such as .jpeg,.jpg, and .svg; semi-structured data files such as .json,.csv, and .xml; and code such as .html,.py(Python), and.js(JavaScript). Note that the URL path will change from thetreeparameter wordtoeditas it opens the file.

If the editor does not recognize a file, it will provide an error in the first line and tell you why, similar to the following screenshot:

In the C section, you can select and filter one or more files or folders displayed on the dashboard. This can be used to organize your project workspace when creating multiple notebooks and organizing data files for analysis. Once any file or folder is selected, the title Select items to perform actions on them will change to the action buttons Rename and Duplicate and a red trashcan icon, which deletes the files or folder, as shown in the following screenshot:

In the dashboard, you will also notice the tabs labeled Files, Running, and Clusters. These are used by the Jupyter app to keep you oriented and track processes that are actively running. Clusters is an advanced feature and beyond the scope of this book. We have already covered the Files tab from section B.

Let's discuss the Running tab. It has two sections: Terminals, which would be system shell commands such as Powershell in the Windows OS, and Notebooks, which will show you all active notebooks that are in use. Once we create a few notebooks, I encourage you to refresh the browser to see which notebook files are active to better understand this feature. Use the Shutdown button if it becomes necessary to kill an active notebook that is unresponsive or taking up too much of your computer resources (CPU/RAM).

In theD section, you will see anUpload button that allows you to add files to the dashboard in any folder you have navigated. TheNew button includes a submenu to create aText File,Folder, orPython 3 Notebook.