How to use R with you Data storage in Google Drive

Freddy Domínguez
4 min readDec 19, 2022

--

The intention of this blog is to teach how to use R in Colaboratory using the the classic environment with our data stored from Google Drive.

Running R into Google Drive, widescreen powered by Adobe Firefly

Firstly, we update the data to Drive. Here, you can find the command-separed values file, which contains prices of sowing in Peru from January 2020 to July 2021. The whole data was collected in the open-sources of the Peruvian government.

Drive folder contains the CSV file

In the latest version in Colaboratory doesn't support R magic command (updated: December 19, 2022)

Colaboratory error message for R magic command

To avoid throwing this error, we should install a previous rpy2 package version that works correctly:

!pip install -q rpy2==3.5.1

To enable R language recognition in each cell we need to enable the magic command:

%load_ext rpy2.ipython

To run any R command in the notebook we must indicate the notebook compiler that the running cell contains R coding lines, to do it that. The rpy2 has two essential commands: The former%R runs a single line of R language sentence, which must be at the begging of the cell, and the latter %%R the magic command must put in the first line of the cell and them put the whole body of sentences in R language:

%R data <- c('banana')
%%R
install.packages('reshape2')
library(reshape2)
install.packages('lubridate')
library(lubridate)
library(plyr)
library(tidyverse)
library(ggplot2)
library(data.table)
install.packages('DBI')
library(DBI)
install.packages('RMySQL')
library(RMySQL)

In the above examples, we notice we can even install any R package as we would do it in any other R environment.

Now lets read the file from our drive: So we mount the Gdrive in Colaboratory environment and then put us in the directory which contains the csv file:

from google.colab import drive
drive.mount('/content/drive')
path="/content/drive/My Drive/Colab Notebooks/"
%cd {path}
  1. mount Gdrive
  2. set file location into a python variable
  3. use cd maggic command to change terminal location
Mount drive (1)
Mount drive (2)

In order to check everything is correct, we inspect the current directory by using ls magic, we should have seen the file:

Now lets do data analytics:

%%R
precios <- read.csv(file = 'PERU.GOB.PE.csv',sep=',')

we can read the first row

%%R
head(precios,1)
output (1)

Display all columns from the dataset:

%%R
names(precios)
output (2)

We can transform the name of each products to lower case:

%%R
precios$Producto <- tolower(precios$Producto)
head(precios)
output (3)

Lets melt the data by Producto, meaning the each column that is not "Producto" will convert a value in only one column associate with a number (price of the product in the month):

%%R
library(reshape2)
precios_melt <- melt(precios,variable.name = "Fecha",value.name = "Precio")
precios_melt$Fecha = as.Date(parse_date_time(precios_melt$Fecha, "%b.%y"))
names(precios_melt) <- toupper(names(precios_melt))
str(precios_melt)
output (4)

Continuing with the cleaning-process: we will omit the NA values, round the values to 2 digits according to the Peruvian regulations, and discard those are not valid values.

%%R
precios_melt <- na.omit(precios_melt)
precios_melt$PRECIO <- round(precios_melt$PRECIO,2)
precios_melt <- precios_melt[precios_melt$PRECIO>0,]
head(precios_melt)
output (5)

Conclusion

As we can notice, Colaboratory tool gives us a great advantage when we don’t have the necessary tools installed on the assigned machine and urgently need to run R scripts to inspect and handle huge data using google drive storage.

You can find the colaboratory notebook here.

Thank you for being here, please comment down your views, if any mistakes found the article will be updated

--

--

Freddy Domínguez

Peruvian #Software #Engineer CIP 206863, #Business #Intelligence #Data #Science. I work with people to create ideas, deliver outcomes, and drive change