Kaggle to Google Colab

Every Data Science and Machine Learning enthusiast have heard of two popular words Kaggle and Google Colab. Let me introduce these words to newbie.

  1. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
  2. Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education.

Most of the new aspirate face difficulty in downloading the datasets from Kaggle to Google Colab. I have found out the easiest way to download the datasets from Kaggle to Colab via Google Drive. Google Drive is used to store datasets for later use by the Colab. Lets get right into it.

Follow the below steps carefully,

Step 1: Create your Kaggle API Token

  • Go to Your Profile and click on Edit Profile.
  • Scroll the page until API section and click on Create New API Token button
API section
  • A file named kaggle.json will get downloaded containing your username and token key

Step 2: Upload kaggle.json to Google Drive

  • Create a folder in Google Drive ( preferred name: Kaggle ) where we will be storing our Kaggle Datasets
  • Upload your downloaded kaggle.json file to the created folder
kaggle.json uploaded to Google Drive

Step 3: Open Colab notebook

Open your Google colab notebook where you want to use Kaggle datasets

Step 4: Mount Google Drive to Google Colab notebook

  • Run the below script to mount your Google Drive
from google.colab import drive
drive.mount('/content/gdrive/')
  • Click the link to authenticate using Google account
Authorize your Google account
  • Copy the authentication code
  • Paste the code into the input cell
  • Congrats! Now your Google Drive is mounted,
Google Drive mounted

Step 5: Configure Kaggle

Below code will set the Kaggle configuration path to kaggle.json. Note that, if you have used different folder name or directory for kaggle.json, please use the same instead of /Kaggle in the below code

import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/My Drive/Kaggle"

Step 6: Change present working directory

Below code will set the present working directory to,
/content/gdrive/My Drive/Kaggle

%cd /content/gdrive/My Drive/Kaggle

Step 7: Download the Kaggle datasets

  • Go to Kaggle datasets Dashboard and click on Copy API Command
  • Your API Command will look like kaggle datasets download -d <username>/<datasets>
  • Run the command using ! symbol,
!kaggle datasets download -d heptapod/titanic
  • You can check the file using ls command,
Titanic dataset downloaded from Kaggle as zip file
  • Note that, the datasets are downloaded as a zip file. You need to manually unzip the file. But, there is a keyword --unzip used to instantly unzip the file after download and delete the zip file
!kaggle datasets download -d heptapod/titanic --unzip
Titanic dataset is downloaded and instantly unzipped

Congratulations!! We have successfully downloaded the datasets from Kaggle. Happy Learning…

Reference:

  1. Please refer to Kaggle API to learn more about Kaggle API commands line tools

How to download Kaggle Datasets into Google Colab via Google Drive was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.