Visualizations with R and Python

R
Python
Visualization
Visualizing the R for Excel workshop with both R and Python
Author

Jeffrey Sumner

Published

May 8, 2023

Reading data into R/Python

Below are the ways that we will be reading in our data from this post’s local data/ directory with both R and python. For R users, we will use the readr:: and readxl:: packages while python users will read data with pandas. Let’s view the examples below.

For R we will need to initialize a few packages.

library(readr)
library(readxl)

ca_np <- readr::read_csv("data/ca_np.csv")
ci_np <- readxl::read_xlsx("data/ci_np.xlsx")

str(ca_np)
spc_tbl_ [789 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ region   : chr [1:789] "PW" "PW" "PW" "PW" ...
 $ state    : chr [1:789] "CA" "CA" "CA" "CA" ...
 $ code     : chr [1:789] "CHIS" "CHIS" "CHIS" "CHIS" ...
 $ park_name: chr [1:789] "Channel Islands National Park" "Channel Islands National Park" "Channel Islands National Park" "Channel Islands National Park" ...
 $ type     : chr [1:789] "National Park" "National Park" "National Park" "National Park" ...
 $ visitors : num [1:789] 1200 1500 1600 300 15700 ...
 $ year     : num [1:789] 1963 1964 1965 1966 1967 ...
 - attr(*, "spec")=
  .. cols(
  ..   region = col_character(),
  ..   state = col_character(),
  ..   code = col_character(),
  ..   park_name = col_character(),
  ..   type = col_character(),
  ..   visitors = col_double(),
  ..   year = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

For python we will need to import pandas and Path from pathlib. Path will behave similarly to the here function that the R book uses.

import pandas as pd
from pathlib import Path

ca_np_path = Path("data/ca_np.csv")
ci_np_path = Path("data/ci_np.xlsx")

ca_np = pd.read_csv(ca_np_path)
ci_np = pd.read_excel(ci_np_path, engine='openpyxl')

ca_np.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 789 entries, 0 to 788
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   region     789 non-null    object
 1   state      789 non-null    object
 2   code       789 non-null    object
 3   park_name  789 non-null    object
 4   type       789 non-null    object
 5   visitors   789 non-null    int64 
 6   year       789 non-null    int64 
dtypes: int64(2), object(5)
memory usage: 43.3+ KB

Creating visuals: Visitors to Channel Islands NP

Below we will create our first basic visual using both R and python. With the R code we will only use ggplot2 to create this visual. For python, we will use plotnine which uses R’s ggplot2 syntax.

For R we will need to initialize a few packages.

ggplot2::ggplot(data = ci_np, ggplot2::aes(x = year, y = visitors)) +
  ggplot2::geom_line()

import matplotlib.pyplot as plt
from plotnine import ggplot, aes, geom_line
plt.switch_backend('agg')

# Creating the plot
plot = (
    ggplot(data=ci_np, mapping=aes(x='year', y='visitors'))
    + geom_line()
)

# Displaying the plot
print(plot)
<ggplot: (640 x 480)>

Now that we have a few basic plots created across each package/library, let’s create a base plot object and create variations off of it. This will be done similar to the R for Excel

gg_base_r <- ggplot2::ggplot(data = ci_np, ggplot2::aes(x = year, y = visitors))

gg_base_r +
  ggplot2::geom_point()

from plotnine import ggplot, aes, geom_point
gg_base_py = ggplot(data=ci_np, mapping=aes(x='year', y='visitors'))
gg_base_py + geom_point()

You can find the public source repository for this and other posts at JeffreySumner/rpy-blog.