# String Basics

Wednesday August 31, 2016

# What are strings?

The simplest distinction:

• Character: a symbol in a written language, like letters, numerals, punctuation, space, etc.

• String: a sequence of characters bound together

class("r")
## [1] "character"
class("Ryan")
## [1] "character"

# Why do we care?

• A lot of interesting data out there is in character form!
• Webpages, emails, surveys, logs, search queries, etc.
• Even if you just care about numbers eventually, youâ€™ll need to understand how to get numbers from text

# How to make strings

Just use double quotes or single quotes and type anything in between

str.1 = "Statistical"
str.2 = 'Computing'

We often prefer double quotes to single quotes, because then we can use apostrophes

str.3 = "isn't that bad"

# Whitespaces

Whitespaces count as characters and can be included in strings:

• " " for space
• "\n" for newline
• "\t" for tab
message = "Dear Mr. Carnegie,\n\nThanks for the great school!\n\nSincerely, Ryan"

# Printing strings

To print to the console, use the cat() function

message
## [1] "Dear Mr. Carnegie,\n\nThanks for the great school!\n\nSincerely, Ryan"
cat(message)
## Dear Mr. Carnegie,
##
## Thanks for the great school!
##
## Sincerely, Ryan

# Vectors of strings

The character is a basic data type in R (like numeric, or logical), so we can make vectors of out them. Just like we would with numbers

str.vec = c(str.1, str.2, str.3) # Collect 3 strings
str.vec # All elements of the vector
## [1] "Statistical"    "Computing"      "isn't that bad"
str.vec[3] # The 3rd element
## [1] "isn't that bad"
str.vec[-(1:2)] # All but the 1st and 2nd
## [1] "isn't that bad"
head(str.vec, 2) # The first 2 elements
## [1] "Statistical" "Computing"
tail(str.vec, 2) # The last 2 elements
## [1] "Computing"      "isn't that bad"
rev(str.vec) # Reverse the order
## [1] "isn't that bad" "Computing"      "Statistical"

# Matrices of strings

Same idea with matrices

str.mat = matrix("", 2, 3) # Build an empty 2 x 3 matrix
str.mat[1,] = str.vec # Fill the 1st row with str.vec
str.mat[2,1:2] = str.vec[1:2] # Fill the 2nd row, only entries 1 and 2, with those of str.vec
str.mat[2,3] = "isn't a fad" # Fill the 2nd row, 3rd entry, with a new string
str.mat # All elements of the matrix
##      [,1]          [,2]        [,3]
## [1,] "Statistical" "Computing" "isn't that bad"
## [2,] "Statistical" "Computing" "isn't a fad"
t(str.mat) # Transpose of the matrix
##      [,1]             [,2]
## [1,] "Statistical"    "Statistical"
## [2,] "Computing"      "Computing"
## [3,] "isn't that bad" "isn't a fad"

# Converting other data types to strings

Easy! Make things into strings with as.character()

as.character(0.8)
## [1] "0.8"
as.character(0.8e+10)
## [1] "8e+09"
as.character(1:5)
## [1] "1" "2" "3" "4" "5"
as.character(TRUE)
## [1] "TRUE"

# Converting strings to other data types

Not as easy! Depends on the given string, of course

as.numeric("0.5")
## [1] 0.5
as.numeric("0.5 ")
## [1] 0.5
as.numeric("0.5e-10")
## [1] 5e-11
as.numeric("Hi!")
## Warning: NAs introduced by coercion
## [1] NA
as.logical("True")
## [1] TRUE
as.logical("TRU")
## [1] NA
as.numeric(c("0.5", "TRUE"))
## Warning: NAs introduced by coercion
## [1] 0.5  NA

# Converting to lower or upper case

Use the tolower() or toupper() functions

tolower("I'M NOT ANGRY I SWEAR")
## [1] "i'm not angry i swear"
toupper("Mom, I don't want my veggies")
## [1] "MOM, I DON'T WANT MY VEGGIES"
toupper("Hulk, sMasH")
## [1] "HULK, SMASH"