One of my new year’s resolutions was to revitalize the Alamo R User Group in order to reach out to the San Antonio community and give Trinity students a chance to learn more about R, discuss topics around R and data analytics in the real world, and mingle with people from the San Antonio business … Continue Reading »
Jeff Leek posted a very interesting guide for anyone who needs to share data with a statistician. The goals of this guide are to provide some instruction on the best way to share data to avoid the most common pitfalls and sources of delay in the transition from data collection to data analysis. The Leek group works … Continue Reading »
R provides many ways for selecting data from a data frame. You can use, e.g., , logical functions and functions like subset. If you know SQL you might think that all this could be way easier if you could just use some of the SQL commands that you know. As I found on the Revolutionanalytics blog, there is … Continue Reading »
Cluster analysis is a useful method for finding structure in a mass of data. The main question in cluster analysis is: “Which objects are similar and which are not?” To answer this question, cluster analysis algorithms try to separate the data in clusters, where the clusters have a maximized similarity within the cluster and a … Continue Reading »
If you work in a scientific environment, you might work with LaTex. The question is how to get the table which we calculated with R to our LaTeX-paper without typing by hand. One answer (as usual, there is always more than one way) is to use the package “xtable”. To use this package, we have … Continue Reading »
Sometimes, especially if you use a variable as a container for a lot of calculation results, e.g., if there are correlations to calculate for a set of variables, you want to save the output to a file to print it out or use it otherwise. For tables that is not a problem. Here we can … Continue Reading »
For some cases it is helpful to store the data not in a file, but in a database. Databases have some advantages when it comes to a large amount of data. The most important factor is that for calculations, just the actual data that is needed for the calculation needs to be loaded in the random access memory (RAM). Another advantage is the possibility to run calculations (stored procedures) with some database engines which will speed up some complex calculations with large data sets, as well as the abolition of exporting the results for other programs e.g. to plot the data with a GIS System. A very good and easy way to implement the database connection in R is with RODBC.