In this post, I present step-by-step instructions for installing Anaconda, a Python package manager for data science, and a number of included data science Python packages, including Spyder and Jupyter Notebook.
What is Anaconda?
If you want to do some data science with Python, it is highly recommended that you use Anaconda. According to the Anaconda website, “Anaconda® is a package manager, an environment manager, a Python/R data science distribution, and a collection of over 1,500+ open source packages.” So, what that means is that it looks after installing and managing various Python packages for data science in an easy-to-use manner. It’s way easier than dealing with all the interactions and dependencies between the various versions of those packages.
Why do you want to use Anaconda? Again, according to the Anaconda Starter Guide:
Many scientific packages require a specific version of Python to run. It’s difficult to keep various Python installations on one computer from interacting and breaking, and harder to keep them up-to-date. Anaconda Distribution makes management of multiple Python versions on one computer easier, and provides a large collection of highly optimized, commonly used data science libraries to get you started faster.
What is included with Anaconda? From that Starter Guide, here’s the highlight of some of the things that are either installed with Anaconda, or can be installed with Anaconda:
- NumPy: scientific computer library for Python
- SciPy: scientific computing library for Python
- Matplotlib:2D plotting library for Python
- Pandas: powerful Python data structures and data analysis toolkit
- Seaborn: statistical graphics library for Python
- Bokeh: interactive web visualization library
- Scikit-Learn: Python modules for machine learning and data mining
- NLTK: natural language toolkit
- Jupyter Notebook: web app that allows you to create and share documents that contain live code, equations, visualizations and explanatory text
- R essentials: 80+ of the most used R packages for data science
That’s just a few of the highlights. There are over 1500 packages that can be installed with Anaconda.
This post leads you through the steps to install Anaconda on a computer running the Debian Linux distribution. My Debian box (actually, a Vagrant Virtualbox VM running Debian) uses the XFCE desktop environment; you’ll need some GUI to use Anaconda. But, you can also use the command-line based “conda” program if you don’t want a GUI.
Installing Anaconda
The Anaconda website is at https://www.anaconda.com. The documentation for it is at https://docs.anaconda.com/anaconda/. You can find the downloads at https://www.anaconda.com/distribution/; at the time of this writing, the latest version for Linux is Anaconda 2018.12. I’m going to use the Python 3.7 version; if you want to use the Python 2.7 version, download that instead. You can see all the downloads at https://repo.anaconda.com/archive/. Either download the installer with your browser, or download it from a Terminal window. I downloaded it from a Terminal window.
Open a Terminal window on the GUI, and download the latest 64-bit version of the Anaconda installer:
wget https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh
It’s quite large (652.5M), so it takes a while to download.
Installation instuctions are at https://docs.anaconda.com/anaconda/install/linux/. To install:
bash Anaconda3-2018.12-Linux-x86_64.sh
Be sure you execute this in the directory that you downloaded the file to!
- Press enter to review the license agreement, hitting space to scroll through the agreement. When asked if you accept the license terms, type yes and press enter to accept.
- Press enter to confirm the install location of /home/username/anaconda3 (for me, my username is vagrant)
- The installer now installs a number of packages, and then asks if you want the installer to initialize Anaconda in your .bashrc file
- Typically, you’ll want the installer to update your .bashrc file. Type yes and press enter to do so. You might not want it to if you have multiple installations of Anaconda, but that’s pretty rare.
- After your .bashrc file is updated, you’ll be asked if you want to install Microsoft VSCode.
- I’m not going to install it. Type no and press enter to NOT install Microsoft VSCode. Note: if you want to install it, you would need to run the installer as root.
- Close and re-open the Terminal window so that the changes to your .bashrc file are read in.
You can check for successful installation by running “anaconda-navigator” from the terminal.
anaconda-navigator
When it starts up, you’ll be asked if you want to provide anonymized usage information. If you’re OK with that, click the “OK, and don’t show again” button (otherwise, it asks every time you start it up); if not, uncheck the “Yes, I’d like to help improve Anaconda.” checkbox, then click the “OK, and don’t show again” button.
And here’s Anaconda Navigator in all its glory!
Notice the applications that have been installed. I’ve got:
- Jupyter Lab 0.35.3
- Jupyter Notebook 5.7.4
- Jupyter Qt console 4.4.3
- Spyder 3.3.2
Exit anaconda-navigator (File / Quit). If told that Anaconda Navigator is still busy, click Yes to exit.
The installer and the packages that it includes are quite large. Let’s free up some space now. First, delete the file Anaconda3-2018.12-Linux-x86_64.sh.
rm Anaconda3-2018.12-Linux-x86_64.sh
Then, using conda, the command line version of Anaconda, remove the downloaded package installers. From the Terminal, run:
conda clean --tarballs
You’ll be given a list of all of the package installers that will be deleted, as well as the size of those files. You’ll then be asked whether to proceed; type y and press enter to delete them.
Updating packages installed with Anaconda
Now, let’s make sure we’ve got all of the latest versions of packages that have been installed with Anaconda. But, rather than using Anaconda, we’ll use the command line version, conda. The reason for this is you need to update packages one at a time with Anaconda; with conda, you can update everything at once.
In your Terminal window, execute:
conda update --all
This will check to see which packages can be updated, list them, and ask you to proceed. Type y and press enter.
After doing this, free up space again:
conda clean --tarballs
And, you’ve now got Anaconda installed and up-to-date!