R programming language is a well known statistical environment, that is widely used for data analysis. So I am going to talk a bit about R and the things I have learnt just to give you an introduction and get to know this environment.
First of all to install this environment in Ubuntu or Debian you do it with this
$ sudo apt-get install r-base
Once you got that installed, you only have to type R to enter into the environment and will see a message about the R version and finally get the R prompt that have a symbol like this >
So the environment here works with variables and functions, as any programming language, and also as a calculator. So for example you could do simple math operations like
> (2*5) + 4
[1] 14
or assing and store things into variables to work with them later like
> x = (2*5) + 4
In order to get to see the variables you have stored until now you use the ls() function. To remove any variable you don’t want anymore you could do it with rm(), like this
> rm(x)
There are also arrangements of objects called vectors on which you can do arithmetic operations or even many other functions, this is a example of creating one
> y = c(5,2,5,3,2,4)
and this are examples of operations
> y + 1
[1] 6 3 6 4 3 5
> y[3]
[1] 5
> y[c(1, 5)]
[1] 5 2
But maybe the most important data type is a data frame, which is is a list of vectors of equal length. In other words is used to store tables. For example you can have this
> id = c(1, 2, 3)
> name = c(“Steve”, “Cristine”, “John”)
> age = c(30, 34, 21)
> sex = c(“M”, “F”, “M”)
> people = data.frame(id, name, age, sex)
where people is the data frame that stores people data, and with this people data frame you could get to do more intersting things
> people[1,2]
[1] Steve
> people$name
[1] Steve Cristine John
> people$age[2]
[1] 34
with this you could even make a plot using the function plot()
> plot(age ~ name, people)
And finaly you can even connect to MySQL to get your own queries and treat them in a data frame using the library RMySQL
> library(RMySQL)
> connection = dbConnect(MySQL(), user = “username”, password = “thepassword”,
dbname = “my_database”)
> resultset = dbGetQuery(connection, “select * from table1”)
> dbDisconnect(connection)
once we are here we can threat the resultset as a data frame and use it for whatever we want as we saw before. So this is a quick summary of the main R functionality and hope it have helped to take a quick glance to the R language.
To conclude I would also like to recommend the use of RStudio as an IDE for R since it is very intuitive and simple, and have helpfool tools like the plot panel where you can get to see all the plots you make or the package panel where you can manage easily all the libraries and package you might need.
Posted on febrero 16, 2011
0