CSE

  • Associate Analytics

    UNIT - I

    Introduction to R, RStudio (GUI)

    The R environment

    R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

    It has

    * An effective data handling and storage facility.

    * A suite of operators for calculations on arrays, in particular matrices.

    * A large, coherent, integrated collection of intermediate tools for data analysis.

    * Graphical facilities for data analysis and display.

    * A well developed, simple and effective programming language.

    R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly and has been extended by a large collection of packages.

    R can be regarded as an implementation of the S language which was developed at Bell Laboratories by Rick Becker, John Cambers and Allan Wilks.

    R and Statistics

    Many people use R as a statistical system. It is an environment with which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment but many are supplied as packages. There are about 25 packages supplied with R called “standard” and “recommended” packages and many more available through CRAN family.

    Obtaining and installing R

    R can be downloaded from the ‘Comprehensive R Archive Network’ (CRAN). The base file has a size of around 29MB.

    R and the window environment

    To use R, you first need to install the R program on your computer.

    1. Go to http://cran.r-project.org.

    2. Under “Download and Install R”, click on the “Windows” link.

    3. Under “Subdirectories”, click on the “base” link.

    4. On the next page, you should see a link saying something like “Download R 2.10.1 for Windows” (or R X.X.X, where X.X.X gives the version of R, eg. R 2.11.1). Click on this link.

    5. You may be asked if you want to save or run a file “R-2.10.1-win32.exe”. Choose “Save” and save the file on the Desktop. Then double-click on the icon for the file to run it.

    6. You will be asked what language to install it in - choose English.

    7. The R Setup Wizard will appear in a window. Click “Next” at the bottom of the R Setup wizard window.

    8. The next page says “Information” at the top. Click “Next” again.

    9. The next page says “Information” at the top. Click “Next” again.

    10. The next page says “Select Destination Location” at the top. By default, it will suggest to install R in “C:\Program Files” on your computer.

    11. Click “Next” at the bottom of the R Setup wizard window.

    12. The next page says “Select components” at the top. Click “Next” again.

    13. The next page says “Startup options” at the top. Click “Next” again.

    14. The next page says “Select start menu folder” at the top. Click “Next” again.

    15. The next page says “Select additional tasks” at the top. Click “Next” again.

    16. R should now be installed. This will take about a minute. When R has finished, you will see “Completing the R for Windows Setup Wizard” appear. Click “Finish”.

    17. To start R, you can either follow step 18, or 19:

    18. Check if there is an “R” icon on the desktop of the computer that you are using. If so, double-click on the “R” icon to start R. If you cannot find an “R” icon, try step 19 instead.

    19. Click on the “Start” button at the bottom left of your computer screen, and then choose “All programs”, and start R by selecting “R” (or R X.X.X, where X.X.X gives the version of R, eg. R 2.10.0) from the menu of programs.

    20. The R console (a rectangle) should pop up:

         

    Using R interactively

    when you use the R program it issues a prompt when it expects input commands. The default prompt is ‘>’.

    To know the current working directory

    >getwd()

    Where all the data typed on the console is saved.

    To change the working directory to another

    >setwd(“”)

    Ex:setwd(“c:/programfiles/newfolder”)

    To quit the R program the command is

    >q()

    At this point you will be asked whether you want to save the data from your R session. Data which is saved will be available in the future R sessions.

    R commands, case sensitivity

    R is an expression language with a very simple syntax. It is case sensitive, so ‘A’ and ‘a’ are different symbols and would refer to different variables.

    All alphanumeric symbols are allowed to use in R names and also ‘.’ And ‘-‘ with the restriction that a name must start with ‘.’ Or a letter and if it start with ‘.’ the second character must not be a digit. Names are effectively unlimited in length.

    Your first session

    Start the R system, the main window (RGui) with a sub window(R console) will appear.

         

    The editor in RGui

    The console window in R is only useful when we want to enter one or two statements. It is not useful when we want to edit or write larger block of R code. In RGui window we can open a new script, go to ‘file’ menu and select ‘new script’. An empty R editor will appear where we can enter R code. This code can be saved as a normal text file with .R extension. To run code in an R editor, select the code and use <ctrl>+R to run the selected code.

    R as a basic calculator

    In the console window, the cursor is prompted to type in some R commands. Use R as a simple calculator for doing the calculations.

    Basic Arithmetic Operations

    > 4+5 # adding two numbers

    [1] 9

    > 4-5 # subtracting two numbers

    [1] -1

    > 4*5 # multiplying two numbers

    [1] 20

    > 4/5 # division

    [1] 0.8

    > 4%%5 #modulus

    [1] 4

    > 4%/%5 #division without decimal

    [1] 0

    Logarithmic and Trigonometric operations

    > log10(100) #log with base 10

    [1] 2

    > log2(10) #log with base 2

    [1] 3.321928

    > sin(90)

    [1] 0.8939967

    > sin(90)

    [1] 0.8939967

    > cos(0)

    [1] 1

    Power operations

    > sqrt(4)

    [1] 2

    > 2^3

    [1] 8

    > 2^10

    [1] 1024

    Boundary valued functions

    ceiling() gives values of upper boundary

    floor() gives values of lower boundary

    round() rounds the value up to the nearest based on its decimal value

    > ceiling(3.6)

    [1] 4

    > floor(3.6)

    [1] 3

    > round(3.6)

    [1] 4

    > round(3.1)

    [1] 3

    Assignment operator

    Results of calculation can be stored in objects using the assignment operator.

    * An arrow(<-) formed by a smaller than character and a hypen without space.

    * The equal charater(=)

    >x <- 3+4

    >y = 4

    To print object just enter the name of the object.

    >x

    [1] 7 #output

    >y

    [1] 4 #output

    To list the objects that you have in your current R session use the function ls()

    >ls()

    [1] ”x” “y”

    Strings

    > "hello world"

    [1] "hello world"

    > print("hello")

    [1] "hello"

    Examples of saving values in to the variables

    > a<-5

    > b<-a+5

    > c<-b-2

    > d<-c*2

    > e<-d%%2

    > f<-e%/%3

    > a

    [1] 5

    > b

    [1] 10

    > c

    [1] 8

    > d

    [1] 16

    > e

    [1] 0

    > f

    [1] 0

    > str1<-"welcome to R"

    > str1

    [1] "welcome to R"

    > .age

    [1] 22

    > age<-22

    > .age

    [1] 22

    > age

    [1] 22

    > 1a<-10

    Error: unexpected symbol in "1a"

    > 1a

    Error: unexpected symbol in "1a"

    >rm(list=ls())

    #The rm() removes the list of elements that are stored in the current R session

    R data objects

    In contrast to other Programming languages like C and java in R the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are

    1. Double: If we want to do calculation on numbers, we can use the data type double to represent the numbers. Decimal values are called numeric’s in R. it is the default computational data type.

    Ex: 3.1415, 8.0, 8.1

    Double are used to represent continuous variables like the weight or length of a person.

    >x<-8.14

    >y<-8.0

    >z<-87.0+12.9

    To check if an object is double use function is.double() or we can use typeof() to ask R the type of the object.

    > typeof(x)

    [1] "double"

    > decimal<-25.6

    > decimal

    [1] 25.6

    > typeof(decimal)

    [1] "double"

    > is.double(decimal)

    [1] TRUE

    2.Interger: integers are natural numbers. They can be used to represent counting variables.

    Example: number of children. So, in order to create an integer variable in R, we invoke the as.integer() function.

    >nchild<-as.integer(3)

    >is.integer(nchild)

    [1]TRUE

    >nchild<-3.0

    >is.integer(nchild)

    [1]FALSE

    We can coerce a numeric value into an integer with the same as.integer() function

    >age<-22

    > class(age)

    [1] "numeric"

    > new<-as.integer(age)

    > new

    [1] 22

    > typeof(new)

    [1] "integer"

    > class(new)

    [1] "integer"

    3. Complex : Objects of the type ‘complex’ are used to represent complex numbers. A complex value is defined using pure and imaginary value i. To create object of type complex we use the function as.complex().

    >z=1+2i

    >z

    [1] 1+2i

    >typeof(z)

    [1] “complex”

    >class(z)

    [1] “complex”

    > test1<-as.complex(-25+4i)

    > is.complex(test1)

    [1] TRUE

    > typeof(test1)

    [1] "complex"

    > class(test1)

    [1] "complex"

    > test2<-6

    > typeof(test2)

    [1] "double"

    > as.complex(test2)

    [1] 6+0i

    > is.complex(test2)

    [1] FALSE

    > typeof(test2)

    [1] "double"

    > class(test2)

    [1] "numeric"

    > test3<-2+3i

    > test4<-3-4i

    > test3+test4

    [1] 5-1i

    > test3-test4

    [1] -1+7i

    > test3*test4

    [1] 18+1i

    > test3/test4

    [1] -0.24+0.68i

    > test3%%test4

    Error: unimplemented complex operation

    > test3%/%test4

    Error: unimplemented complex operation

    > test3^3

    [1] -46+9i

    > test4^2

    [1] -7-24i

    > test3^test4

    [1] -1369.949-1957.09i

    4. Logical: An object of data type logical can have the value TRUE or FALSE and is used to indicate if a condition is true or false. Such objects are usually the result of logical expressions.

    Logical expressions are often built from logical operators

        < smaller than

        <= smaller than or equal to

        > larger than

        >= larger than or equal to

        == equal to

        != is not equal to

    The logical operators and, or and not are given by &,|and !.

    > x<-9

    > y<-x>10

    > y

    [1] FALSE

    > typeof(y)

    [1] "logical"

    > class(y)

    [1] "logical"

    > u<-TRUE

    > v<-FALSE

    > u|v

    [1] TRUE

    > u&v

    [1] FALSE

    > u==v

    [1] FALSE

    > u!=v

    [1] TRUE

    >c1<-c(9,166) # c is for combine or concatenation values under a single variable name

    > c2<-(3<c1)&(c1<=10)

    > c2

    [1] TRUE FALSE

    5. Character: A character object is represented by a collection of characters between double quotes (“ ”). They are used to represent the string values in R.

    Example : “x”, ”my character”

    > char1<-c("welcome","to","new","data","statistical","tool")

    > char1

    [1] "welcome"      "to"       "new"       "data"       "statistical"       “tool”

    > typeof(char1)

    [1] "character"

    > class(char1)

    [1] "character"

    > char2<-3

    > typeof(char2)

    [1] "double"

    > class(char2)

    [1] "numeric"

    > as.character(char2)

    [1] "3"

    > typeof(char2)

    [1] "double"

    > class(char2)

    [1] "numeric"

    > is.character(char2)

    [1] FALSE

    > char3<-as.character(char2)

    > typeof(char3)

    [1] "character"

    > class(char3)

    [1] "character"

    > is.character(char3)

    [1] TRUE

    Two character values can be concatenated with the paste() function.

    > fname<-"Data"

    > sname<-"science"

    > fullname<-paste(fname,sname)

    > fullname

    [1] "Data Science"

    > nchar(fullname)

    [1] 12

    It is often more convinent to create a readable string with the sprintf() function which has a C language syntax.

    > sprintf("%s has %d rupees", "sam", 100)

    [1] "sam has 100 rupees"

    To extracta substring, we can apply substr() function.

    > substr("welcome to R class",start<-4,stop<-12)

    [1] "come to R"

    Note : in R the array subscript starts from 1.

    To replace occurrence of words we use sub() function.

    > sub("welcome","hello","welcome to R class") # replacing welcome with hello.

    [1] "hello to R class"

    sub() function changes the 1st occurrence but to do for every occurrence gsub() function is used

    > gsub("welcome","hello","welcome to R class welcome to kmit")

    [1] "hello to R class hello to kmit"

    > up<-toupper(fullname)

    > low<-tolower(fullname)

    > up

    [1] "DATA SCIENCE"

    > low

    [1] "data science"

    R provide help for all the function using help command.

    >help(“sum”)

    >help(“sub”)

    6. Factor: The factor data type is used to represents data in categorical form (data of which the value range is a collection of codes). It is a vector of categorical data.

    factor() function is used to create a factor variable.

    Example: variable gender with values male and female. Variable blood type with values A, AB ,O.

    An individual code of the value range is called the level of the factor variable.

    Example: gender has 2 levels male and female.

    Factors objects can be created from character objects or from numeric objects using the function factor().

    >gender<- c(“male”,”male”,”female”,”male”,”female”)

    The object gender is a character object. We need to transform it to factor.

    >gender<-factor(gender)

    >gender

    [1] male male female male female

    Use the function levels() to see the different levels a factor variable has

    >levels(gender)

    [1] “female” “male”

    > v<-c(1,4,6,7,3,4,2,0,10)

    > v

    [1] 1 4 6 7 3 4 2 0 10

    > is.factor(v)

    [1] FALSE

    > factor(v)

    [1] 1 4 6 7 3 4 2 0 10

    Levels: 0 1 2 3 4 6 7 10

    > v<-factor(v)

    > v

    [1] 1 4 6 7 3 4 2 0 10

    Levels: 0 1 2 3 4 6 7 10

    > is.factor(v)

    [1] TRUE

    > one<-factor(v,levels=c(1,3))

    > levels(one)=c("one","three")

    > one

    [1] one <NA> <NA> <NA> three <NA> <NA> <NA> <NA>

    Levels: one three

    Ordering: if the order of the levels is important, then we need to use ordered factor. Use the function ordered() and specify the order with the level argument.

    > income<-c("high","low","average", "low", "average", "high", "low")

    > typeof(income)

    [1] "character"

    > income<-factor(income)

    > income

    [1] high       low       average low       average high       low

    Levels: average high low

    > income<-ordered(income,levels=c("low","average","high"))

    > income

    [1] high       low       average low       average high       low

    Levels: low < average < high

    The last line indicates the ordering of the levels within the factor variable.

    7. Date and Times: R has several data and time related functions.

    date() function returns a date without time as character string. Sys.Date() and Sys.time() function returns the system’s date and time as a date and POSIXIT/POSIXCT objects respectively.

    To represent a calendar date in R use the function as.Date() to create an object of class date.

    > date()

    [1] "Tue Feb 02 16:05:37 2016"

    > Sys.time()

    [1] "2016-02-02 16:05:37 IST"

    > class(date())

    [1] "character"

    > Sys.Date()

    [1] "2016-02-02"

    > class(Sys.Date())

    [1] "Date"

    > class(Sys.time())

    [1] "POSIXct" "POSIXt"

    > mydate=as.Date(c("4-08-2014","2-02-2016"))

    > class(mydate)

    [1] "Date"

    > mydate2<-as.Date(c("28-02-2015","20-01-2016"))

    > days<-mydate[1]-mydate[2]

    > days

    Time difference of 912 days

    > class(days)

    [1] "difftime"

    format() function is to used to print the date in different format.

    Symbols used

    Description

    Example

    %d

    day as a number

    (0-31)

    %a

    abbreviated weekday

    Mon

    %A

    unabbreviated weekday

    Monday

    %m

    month(00-12)

    00-12

    %b

    abbreviated month

    Jan

    %B

    unabbreviated weekday

    January

    %y

    2 digit year

    16

    %Y

    4 digit year

    2016


    > Sys.Date()

    [1] "2016-02-02"

    > format(Sys.Date(),format<-"%d")

    [1] "02"

    > format(Sys.Date(),format<-"%a")

    [1] "Tue"

    > format(Sys.Date(),format<-"%A")

    [1] "Tuesday"

    > format(Sys.Date(),format<-"%m")

    [1] "02"

    > format(Sys.Date(),format<-"%b")

    [1] "Feb"

    > format(Sys.Date(),format<-"%B")

    [1] "February"

    > format(Sys.Date(),format<-"%y")

    [1] "16"

    > format(Sys.Date(),format<-"%Y")

    [1] "2016"

    > format(Sys.Date(),format<-"%d %a %A %Y %y %b %B")

    [1] "02 Tue Tuesday 2016 16 Feb February"

    8.Vectors: A vector is a list of values which can be number, string, logical values or any other type, as long as they’re all the same type.

    The c( ) function(combine/concatenate) create a new vector by combining a list of values.

    > c('a','b','c')

    [1] "a" "b" "c"

    > c(4,7,9)

    [1] 4 7 9

    > vect_x<-c(4,7,9)

    > vect_x

    [1] 4 7 9

    Vector cannot hold values with different mode.

    > c(1,TRUE,"three")

    [1] "1" "TRUE" "three"

    All the values were converted to a single mode(character). So that the vector can hold them all.

    > test5<-c(1,"hello",2.5)

    > test4

    [1] "1" "hello" "2.5"

    > typeof(test5)

    [1] "character"

    > class(test5)

    [1] "character"

    Vector access:

    We can retrieve an individual value with in a vector by providing its numeric index in square brackets.([])

    We can create a vector with some strings in it.

    > sent1<-c("welcome","to","R")

    > sent1

    [1] "welcome"   "to"    "R"

    > sent1[1]

    [1] "welcome"

    In R vector indices start at 1.

    We can use a vector within the square brackets to access multiple values or we can retrieve range of values.

    >sent1[c(1,3)]

    [1] “welcome”    “R”

    > sent1[1:3]

    [1] "welcome"   "to"    "R"

    > sent1[c(1,3)]

    [1] "welcome"    "R"

    We can assign new values within an existing vector.

    >sent1[1]="hello"

    >sent1

    [1] "hello"   "to"    "R"

    We can also add new element to the existing vector

    >sent1[4:7] <-c(“data” ,”science”, ”class”)

    >sent1

    [1] "hello"    "to"    "R"    “data”    ”science”    ”class”

    Vector names

    We can assign names to a vector’s element by passing a second vector filled with names to the names() argument.

    > rank<-1:3

    > names(rank)<-c("first","second","third")

    > rank

    first    second    third

      1         2           3

    > rank["first"]

    first

    1

    > rank["third"]=4

    > rank

    first    second    third

      1         2           3

    Vector Math:

    Most arithmetic operations work well on vectors as they do on single value.

    > a<-c(1,3,5)

    > a+1

    [1] 2 4 6

    > a-1

    [1] 0 2 4

    > a*2

    [1] 2 6 10

    > a/2

    [1] 0.5 1.5 2.5

    > a%%2

    [1] 1 1 1

    > a%/%2

    [1] 0 1 2

    > b<-c(2,4,6)

    > b

    [1] 2 4 6

    >a+b

    [1] 3 7 11

    > a==b

    [1] FALSE FALSE FALSE

    Sequence vector:

    If we need a vector with a sequence of numbers, we can create it with start : end notation or using seq() function.

    > seq(4,8)

    [1] 4 5 6 7 8

    >4:8

    [1] 4 5 6 7 8

    Sequence also allows us to use increment other than 1.

    > seq(4,8,0.4)

    [1] 4.0 4.4 4.8 5.2 5.6 6.0 6.4 6.8 7.2 7.6 8.0

    > seq(9,5)

    [1] 9 8 7 6 5

    > seq(9,5,by=-2)

    [1] 9 7 5

    Recycling and repeating:

    When applying an operation to two vectors that requires them to be the same length else R automatically recycles or repeats the shorter one, until it is long enough to match the longer one.

    > rep<-c(1,2,3,4,5)

    > rep

    [1] 1 2 3 4 5

    > rep[c(TRUE,FALSE)]

    [1] 1 3 5


    9. Matrices: A matrix is a 2-dimensional array. Vectors is just a simple list of values but if we need data in rows and columns we use matrix. Matrix is represented by matrix function

    eg:

    >matrix(0,2,2) #first parameter is the values, second the number of rows and third the columns.

           [,1] [,2]

    [1,]     0 0

    [2,]     0 0

    m1<-1:16

    > matrix(m1,4,4)

         [,1] [,2] [,3] [,4]

    [1,]  1  5  9  13

    [2,]  2  6  22  14

    [3,]  3  7  11  15

    [4,]  4  8  12  16

    (Or) it can also be represented as

    > dim(m1)<-c(4,4)

    > m1

         [,1] [,2] [,3] [,4]

    [1,]  1  5   9   13

    [2,]  2  6  22  14

    [3,]  3  7  11  15

    [4,]  4  8  12  16


    > m1<-1:16

    > matrix(m1,4,4)

         [,1] [,2] [,3] [,4]

    [1,]  1  5   9   13

    [2,]  2  6  10  14

    [3,]  3  7  11  15

    [4,]  4  8  12  16

    or it can also be represented as

    > dim(m1)<-c(4,4)

    > m1

         [,1] [,2] [,3] [,4]

    [1,]  1  5   9   13

    [2,]  2  6  10  14

    [3,]  3  7  11  15

    [4,]  4  8  12  16

    Accessing of matrix elements

    To print 2nd row and 3rd column value.

    > m1[2,3]

    [1] 10

    To assign new values to 2nd row and 3rd column.

    > m1[2,3]<-22

    > m1[2,3]

    [1] 22

    Only rows can be represented by

    > m1[4,]

    [1] 4 8 12 16

    Only columns are represented by

    > m1[,4]

    [1] 13 14 15 16

    Arithmetic and logical operations can also be performed on matrices

    > m3<-1:16

    > dim(m3)<-c(4,4)

    > m3

         [,1] [,2] [,3] [,4]

    [1,]  1  5   9   13

    [2,]  2  6  10  14

    [3,]  3  7  11  15

    [4,]  4  8  12  16

    >m1

         [,1] [,2] [,3] [,4]

    [1,]  1  5   9   13

    [2,]  2  6  22  14

    [3,]  3  7  11  15

    [4,]  4  8  12  16

    > m1 + m3 # addition of two matrices

         [,1] [,2] [,3] [,4]

    [1,]  2  10  18  26

    [2,]  4  12  32  28

    [3,]  6  14  22  30

    [4,]  8  16  24  32

    >m1-m3 # subtraction of two matrices

         [,1] [,2] [,3] [,4]

    [1,]  0  0  0  0

    [2,]  0  0  12 0

    [3,]  0  0  0  0

    [4,]  0  0  0  0

    > m1/m3

         [,1] [,2] [,3] [,4]

    [1,]  1  1  1.0  1

    [2,]  1  1  2.2  1

    [3,]  1  1  1.0  1

    [4,]  1  1  1.0  1

    > m1%%m3

         [,1] [,2] [,3] [,4]

    [1,]  0  0  0  0

    [2,]  0  0  2  0

    [3,]  0  0  0  0

    [4,]  0  0  0  0

    > m1 * m3

         [,1] [,2] [,3] [,4]

    [1,]  1  25  81  169

    [2,]  4  36  220  196

    [3,]  9  49  121  225

    [4,]  16  64  144  256

    > a_matrix=matrix(1:16,nrow=4,ncol=4,

    + dimnames=list(

    + c("r1","r2","r3","r4"),

    + c("c1","c2","c3","c4"))

    + )

    > a_matrix

        c1 c2 c3 c4

    r1  1  5  9  13

    r2  2 6   10  14

    r3  3  7  11  15

    r4  4  8  12  16

    identical ( )function

    We can check identity of two matrices also

    identical(a_matrix,b_matrix)

    function          specifies

    ---------     -------------

    dim()            dimension of a matrix

    t()                matrix transpose

    %*%            matrix multiplication

    solve()           matrix inverse

    as.matrix()      used to coerce the argument into a matrix object

    Array : Array are similar to matrices but can have more than two dimensions.

    We can create an array easily with the array( ) function, where we give the data as the first and a vector with the size of the dimensions as the second argument.

    >my.array <- array(1:24,dim=c(3,4,2))

    > my.array

    , , 1

         [,1] [,2] [,3] [,4]

    [1,]  1  4  7  10

    [2,]  2  5  8  11

    [3,]  3  6  9  12

    , , 2

         [,1] [,2] [,3] [,4]

    [1,]  13  16  19  22

    [2,]  14  17  20  23

    [3,]  15  18  21  24

    >my.array1 <-c(1:12)

    > arr2<-array(my.array1,

    + dim<-c(2,3,2),

    + dimnames<-list(c("r1","r2"),

    + c("c1","c2","c3")))

    > arr2

    , , 1

        c1 c2 c3

    r1  1  3  5

    r2  2  4  6

    , , 2

        c1 c2 c3

    r1  7  9  11

    r2  8  10  12

    Rows and cols dimension

    we can count the rowlength and collength

    > nrow(a_matrix)

    [1] 4

    > ncol(a_matrix)

    [1] 4

    reshape of matrix is also possible

    > dim(a_matrix)=c(2,8)

    > a_matrix

         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]

    [1,]  1    3    5    7    9    11    13    15

    [2,]  2    4    6    8    10    12    14    1

    It must match the length of the original matrix

    a_matrix

    we can use either nrow or NROW and ncol or NCOL

    Number Ranges

    > display<-1:15

    > display

    [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    > sum(display>8)

    [1] 7

    Here sum is not the summation but it gives the values of the given condition.

    List: List is a generic vector containing other objects.

    > num<-c(2,3,5)

    > string<-c("hai","hello","welcome")

    > bool<-c(TRUE,FALSE,TRUE)

    > list1<-list(num,string,bool)

    > list1

    [[1]]

    [1] 2 3 5

    [[2]]

    [1] "hai"   "hello"   "welcome"

    [[3]]

    [1] TRUE FALSE TRUE

    Example

    >n =c(2,3,5)

    >s = c(“aa”,”bb”,”cc”,”dd”,”ee”)

    >b=c(TRUE,FALSE,TRUE,FALSE,FALSE)

    >x=list(n,s,b)

    List slicing:

    We can retrieve a list slice with the single square bracket “[ ]” operator.

    >x[2]

    [[1]]

    [1] “aa”   “bb”   “cc”   “dd”

    We can also retrieve a slice with multiple members with an index vector.

    >x[c(2,3)]

    [[1]]

    [1] “aa”   “bb”   “cc”   “dd”

    [[2]]

    [1] TRUE  TRUE  FALSE  TRUE

    Member reference: In order to reference a list member directly we need to use double square bracket [[ ]].

    >x[[2]]

    We can modify its content directly

    >x[[2]][1] =”ta”

    Accessing list elements is also possible by member references. In order to refer a list memebers, we need to use double square brackets.

    > mylist<-list(vect<-1:5,str6<-"available",num2<-5)

    > mylist $vect

    NULL

    > mylist $str6

    NULL

    > mylist $num2

    NULL

    > names(mylist)

    NULL

    > mylist $new<-"new item added"

    > typeof(mylist)

    [1] "list"

    > class(mylist)

    [1] "list"

    > mylist $str<-"NULL"

    > mylist $vect<- mylist $vect++1

    > mylist

    [[1]]

    [1] 1 2 3 4 5

    [[2]]

    [1] "available"

    [[3]]

    [1] 5

    $new

    [1] "new item added"

    $str

    [1] "NULL"

    $vect

    numeric(0)

    Data Frame : data frame is used for storing data tables. A data frame is more general than a matrix in that different columns can contain different modes of data (numeric, character, and so on). Data frames are the most common data structure you’ll deal with in R.It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.

    > n = c(2, 3, 5) 
    > s = c("aa", "bb", "cc") 
    > b = c(TRUE, FALSE, TRUE) 
    > df = data.frame(n, s, b)    

    The function data.frame() creates data frames, tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R's modeling software. A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame". If no variables are included, the row names determine the number of rows.

    The patient dataset in table 1 consists of numeric and character data.

    Table 1: A student dataset

    SID

    AdmDate

    Branch

    Status

    1

    10/15/2009

    CSE

    Passed

    2

    11/01/2009

    ECE

    Passed

    3

    10/21/2009

    CSE

    Passed

    4

    10/28/2009

    ECE

    Passed

    Because there are multiple modes of data, you can’t contain this data in a matrix. In this case, a data frame would be the structure of choice.

    A data frame is created with the data.frame() function:

    1

    mydata <- data.frame(col1, col2, col3,…)

    where col1, col2, col3, … are column vectors of any type (such as character, numeric, or logical). Names for each column can be provided with the names function. Each column must have only one mode, but you can put columns of different modes together to form the data frame.

    set the row or column names of a data frame.

    rownames(x) <- value

    colnames(x) <- value

    To check whether it is a data frame or not

    >is.data.frame(df)

    [1] TRUE

    Build-in Data Frame

    We use built-in data frames in R for our tutorials. For example, here is a built-in data frame in R, called mtcars.

    > mtcars

    mpg cyl disp hp drat wt qsec vs am gear carb

    Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4

    Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

    Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

    Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

    Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

    Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1

    Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4

    Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

    Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2

    The top line of the table, called the header, contains the column names. Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.

    To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma. In other words, the coordinates begins with row position, then followed by a comma, and ends with the column position. The order is important.

    Here is the cell value from the first row, second column of mtcars.

    > mtcars[1, 2] 
    [1] 6

    Moreover, we can use the row and column names instead of the numeric coordinates.

    > mtcars["Mazda RX4", "cyl"] 
    [1] 6

    Lastly, the number of data rows in the data frame is given by the nrow function.

    > nrow(mtcars)    # number of data rows 
    [1] 32

    And the number of columns of a data frame is given by the ncol function.

    > ncol(mtcars)    # number of columns 
    [1] 11

    Further details of the mtcars data set is available in the R documentation.

    > help(mtcars)

    Retrieve of first and last few elements:

    head(): To obtain the first several rows of a data frame .

    tail() to obtain the last several rows. These functions may also be applied to obtain the first or last values in a vector.

    head(x, n=6)

       x – A matrix, data frame, or vector.

       n – The first n rows (or values if x is a vector) will be returned.

    tail(x, n=6)

       x – A matrix, data frame, or vector.

       n – The last n rows (or values if x is a vector) will be returned.

    Example:

    > head(mtcars)

    mpg cyl disp hp drat wt qsec vs am gear carb

    Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4

    Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4

    Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

    Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

    Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

    Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

    > tail(mtcars)

    mpg cyl disp hp drat wt qsec vs am gear carb

    Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2

    Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2

    Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4

    Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6

    Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8

    Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2

    Data Frame Column Vector:

    We can reference a data frame column with the double square bracket "[[]]" operator.

    For example, to retrieve the ninth column vector of the built-in data set mtcars.

    > mtcars[[9]]

    [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

    We can also retrieve the same column vector by its name.

    > mtcars[['am']]

    [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

    > mtcars[["am"]]

    [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

    We can also retrieve with the "$" operator instead of the double square bracket operator.

    > mtcars$am

    [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

    We can also retrieve the same column vector by use the single square bracket "[]"operator. We prepend the column name with a comma character, which signals a wildcard match for the row position.

    > mtcars[,"am"]

    [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

    Data Frame Column Slice

    We retrieve a data frame column slice with the single square bracket "[]" operator.

    Numeric Indexing

    The following is a slice containing the first column of the built-in data set mtcars.

    > mtcars[1] 
                       mpg 
    Mazda RX4         21.0 
    Mazda RX4 Wag     21.0 
    Datsun 710        22.8 
                       ............

    Numeric Indexing

    We can retrieve the same column slice by its name.

    > mtcars["mpg"] 
                       mpg 
    Mazda RX4         21.0 
    Mazda RX4 Wag     21.0 
    Datsun 710        22.8 
                       ............

    To retrieve a data frame slice with the two columns mpg and hp, we pack the column names in an index vector inside the single square bracket operator.

    > mtcars[c("mpg", "hp")] 
                       mpg  hp 
    Mazda RX4         21.0 110 
    Mazda RX4 Wag     21.0 110 
    Datsun 710        22.8  93 
                       ............

    Data Frame Row Slice

    We retrieve rows from a data frame with the single square bracket operator, just like what we did with columns. However, in additional to an index vector of row positions, we append an extra comma character. This is important, as the extra comma signals a wildcard match for the second coordinate for column positions.

    Numeric Indexing

    For example, the following retrieves a row record of the built-in data set mtcars. Please notice the extra comma in the square bracket operator, and it is not a typo. It states that the 1974 Camaro Z28 has a gas mileage of 13.3 miles per gallon, and an eight cylinder 245 horse power engine, ..., etc.

    > mtcars[24,] 
                mpg cyl disp  hp drat   wt  ... 
    Camaro Z28 13.3   8  350 245 3.73 3.84  ...

    To retrieve more than one rows, we use a numeric index vector.

    > mtcars[c(3, 24),] 
                mpg cyl disp  hp drat   wt  ... 
    Datsun 710 22.8   4  108  93 3.85 2.32  ... 
    Camaro Z28 13.3   8  350 245 3.73 3.84  ...

    Numeric Indexing

    We can retrieve a row by its name.

    > mtcars["Camaro Z28",] 
                mpg cyl disp  hp drat   wt  ... 
    Camaro Z28 13.3   8  350 245 3.73 3.84  ...

    And we can pack the row names in an index vector in order to retrieve multiple rows.

    > mtcars[c("Datsun 710", "Camaro Z28"),] 
                mpg cyl disp  hp drat   wt  ... 
    Datsun 710 22.8   4  108  93 3.85 2.32  ... 
    Camaro Z28 13.3   8  350 245 3.73 3.84  ...

    Numeric Indexing

    Lastly, we can retrieve rows with a logical index vector. In the following vector L, the member value is TRUE if the car has automatic transmission, and FALSE if otherwise.

    > L = mtcars$am == 0 
    > L 
     [1]   FALSE FALSE FALSE  TRUE ...

    Here is the list of vehicles with automatic transmission.

    > mtcars[L,] 
                         mpg cyl  disp  hp drat    wt  ... 
    Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215  ... 
    Hornet Sportabout   18.7   8 360.0 175 3.15 3.440  ... 
                     ............

    And here is the gas mileage data for automatic transmission.

    > mtcars[L,]$mpg 
     [1] 21.4 18.7 18.1 14.3 24.4 ...

    Reading Datasets using R

    We can import datasets from various sources having various files types like .txt, .csv.

    Comma separated value (.csv) file:

    The sample data can be in .csv format. Each cell inside such data file is separated by a special character (usually comma), although other characters can be used as well.

    The first row of the data file should contain the column names instead actual data.

    Example :    col1    col2    col3

                        1    a1    b1

                        2    a2    b2

                        3    a3    b3

    Create a excel file with some values as above as save it with .csv extension.

    >mydata = read.csv(“mydata.csv”)

    >mydata

    >help(read.csv) # help for read.csv( ) function

    Table file

    A data table can resides in a text file. The cells inside the table are separated by blank charaters.

    Example:

    100        a1        b1

    200        a2        b2

    300        a3        b3

    Create a file with .txt extension (mydata.txt).

    >mydata = read.table(“mydata.txt”)

    >mydata

    Control Structures: R has the standard control structures.

    if-else:

               if (cond) expr

               if (cond) expr1 else expr2

    >x<-3

    >if(x>0) {

    print(“x is a positive number”)

    }

    >if (x<0) {

    print(“x is negative number”)

    } else {

    printf(“x is a positive number”)

    }

    >if (x<0) {

    print(“x is negative number”)

    } elseif(x==0) {

    print(“zero”)

    } else {

    printf(“x is a positive number”)

    }

    While

    while( ) loop will execute a block of commands until the condition is no longer satisfied.

                   while(condition) expr

    >x<-1

    >while (x<5)

    {

    x<-x+1

    print(x)

    }

    Output:

    [1] 2

    [1] 3

    [1] 4

    [1] 5

    For loop:

    The for loop of R language can be written as

               for( i in arr) {

                          expr1;expr2;….}

    It goes through the vector arr every time of one element i and execute a groupof commands inside it.

    >Example:

    >arr <- c(1:10)

    <>>for (i in arr) {

    Print(i)

    }

Industry      Interaction

Higher Education

Job Skills

Soft Skills

Comm. English

Mock Test

E-learning