randpy.ai Banner - Explore R and Python Integration

Visualizations with R and Python

Visualizing the R for Excel workshop with both R and Python
Author

Jeffrey Sumner

Published

May 8, 2023

library(reticulate)
reticulate::use_virtualenv("r_python_worksessions")

Reading data into R/Python

Below are the ways that we will be reading in our data from the /data directory with both R and python. For R users, we will use the readr:: and readxl:: packages while python users will read data with pandas. Let’s view the examples below.

For R we will need to initialize a few packages.

library(tidyverse)
library(readxl)
library(here)

ca_np <- readr::read_csv(here::here("data", "ca_np.csv"))
ci_np <- readxl::read_xlsx(here::here("data", "ci_np.xlsx"))

glimpse(ca_np)
Rows: 789
Columns: 7
$ region    <chr> "PW", "PW", "PW", "PW", "PW", "PW", "PW", "PW", "PW", "PW", …
$ state     <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", …
$ code      <chr> "CHIS", "CHIS", "CHIS", "CHIS", "CHIS", "CHIS", "CHIS", "CHI…
$ park_name <chr> "Channel Islands National Park", "Channel Islands National P…
$ type      <chr> "National Park", "National Park", "National Park", "National…
$ visitors  <dbl> 1200, 1500, 1600, 300, 15700, 31000, 33100, 32000, 24400, 31…
$ year      <dbl> 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, …

For python we will need to import pandas and Path from pathlib. Path will behave similarly to the here function that the R book uses.

import pandas as pd
from pathlib import Path
ca_np = pd.read_csv(Path("../../data/ca_np.csv"))
ci_np = pd.read_excel(Path("../../data/ci_np.xlsx"), engine = 'openpyxl')

ca_np.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 789 entries, 0 to 788
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     789 non-null    object
 1   state      789 non-null    object
 2   code       789 non-null    object
 3   park_name  789 non-null    object
 4   type       789 non-null    object
 5   visitors   789 non-null    int64 
 6   year       789 non-null    int64 
dtypes: int64(2), object(5)
memory usage: 43.3+ KB

Creating visuals: Visitors to Channel Islands NP

Below we will create our first basic visual using both R and python. With the R code we will only use ggplot2 to create this visual. For python, we will use plotnine which uses R’s ggplot2 syntax.

For R we will need to initialize a few packages.

ggplot2::ggplot(data = ci_np, aes(x = year, y = visitors)) +
  geom_line()

import matplotlib.pyplot as plt
from plotnine import ggplot, aes, geom_line
plt.switch_backend('agg')

# Creating the plot
plot = (
    ggplot(data=ci_np, mapping=aes(x='year', y='visitors'))
    + geom_line()
)

# Displaying the plot
print(plot)
<string>:3: FutureWarning: Using print(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().

Now that we have a few basic plots created across each package/library, let’s create a base plot object and create variations off of it. This will be done similar to the R for Excel

gg_base_r <- ggplot(data = ci_np, aes(x = year, y = visitors))

gg_base_r +
  geom_point()

from plotnine import ggplot, aes, geom_point
gg_base_py = ggplot(data=ci_np, mapping=aes(x='year', y='visitors'))
gg_base_py + geom_point()
<string>:1: FutureWarning: Using repr(plot) to draw and show the plot figure is deprecated and will be removed in a future version. Use plot.show().
<Figure Size: (640 x 480)>