Skip to main content

My experiments with R

 

artwork by@allison_horst 

 

After framing the title I realised that it is reminiscent of The title of ‘my experiments with truth’ by Mahatma Gandhi. As, I have never read that book or its summary, so I can not put any disclaimer here whether or not this article is inspired by that work. Now these tidbits aside, let us move on to the topic.
I was first introduced to this magical language called ‘R’ in my masters. To say that my masters course exposed me to the areas of research and methods of teaching for science which I was not exposed to in my graduate course (which was basically an extension of school work), would be an understatement. It widened my perspective on science which otherwise I would have treated in the same as textbooks.
Even though we were taught bioinformatics in bachelors, the need for learning programming wasn’t mentioned even in passing. It will definitely not improve your career prospects. It was just about using a set of tools for some analysis and then proceed to experiments. I understand that nowadays the onus is on the students to learn. But if you don’t guide them towards the emerging field and future, then what good si coursework for.
 We again had bioinformatics in masters. Although there the theory was taught in a much better manner and we were exposed to myriad ways of utilizing the tools. But again, how the tools come into the picture? Never was the idea of writing the programs you need as part of bioinformatics was impressed upon our mind.
I guess that’s where we are lagging.  Programming is thought in terms of subject-specific areas. The basics were taught to us in computer science classes but the interdisciplinary approach lacked severely. Even though now NEP has proposed teaching programming from 6th onwards, I double policymakers will see it terms more than that. That even our common universities will think about integrating subjects with each other.
The reason I was exposed to R was my new professor for biostatistics. That was the first thing she started with. Teaching us the basics of using R. Even though I had biostatistics in bachelors and I again had in later in the OCES training program, the teaching has always focussed on learning the methods. We were supposed to fil pages of answer sheets with tables while calculating ANOVA. The concession? later on, we were allowed to use scientific calculators. That’s it. It is good that we are supposed to know the principle. But that should not be it. Why do we stop there? Why it is not taught that after you know the principle, you can do that in a snap after just applying the formula in excel? Why we are left to figure that on our own?  Many of us do, but it’s a lack of vision on the part of curriculum designers.
But I digressed here. Its something where we can go on and on. Coming back to my professor. She was an IISc pedigree. And an ecologist( And you could see her passion for this in her eyes while she was teaching ecology). Apparently that’s one interaction where they had to do a lot of statistics and had exposure to R. Almost everyone working in IISc has either working knowledge of R ar has at least heard it. (And that the fact it helps avoid expenditure on expensive software). But IF you move towards college and other central and state universities, this is a novelty there. And I am just talking about R. Unless you are a career bioinformatician, Hardly anyone knows java, C++, or python, the backbone of many of the programs we use.  SO, this is defiantly the question of exposure. And we have got a long way to go here.
IT was while learning R that I have the first-hand account of how the stereotypes of boys being better at technical stuff are just not true. I knew I was comparatively tech-savvy and better than some of my counterparts in bioinformatics. But It was here I realised I was even better than some of the misogynist boys who remark casually (sexism always manifests in form of jokes first… we are not serious, you know). Learning it faster, finishing it faster, and doing it even better is what makes you realise you are good at stuff, not your believes in the equivalency of gender, however deep and rational you think your belief is.
I fell in love with using it. I am not an expert in this. I always want to learn more. Use it more. R is amazing. I guess I feel about it the same way other people feel about Python. But R is one of the most using programming languages in the biology sector. If you know how to use it, this free and open-source programs can be used to do almost anything. With your dataset, you can plot beautiful graphs and do complex statistical analyses.  IF you want you can do all kinds of genomics analysis or use it as an alternative to FlowJo. It all depends on whether you are willing to invest time and effort in exchange for buying software that costs a bomb.

 

Artwork by @alison_horst

 

Right now I am mostly using it for creating graphs. I have used it previously flow cytomtery analysis. If you can learn it and don't have the money to spare on FlowJo, it can produce superior plots and analysis. But the support for beginner are very limited. So, I was using it mostly for mostly doing statistical analysis. I looked up to using R for producing plots because at that time I was in a place where I had to constantly modify some minute details in each of 20-25 plots  (If you know better, you won't ask why). And like all novices, I was doing that in Excel. excel is great if you wan to plot 1-2 graphs and be done with that. Other that that, to manually modify each required parameter of plot for each dataset again again becomes a chronic pain. Excel was never aimed for that kind of work. yet we all still use it for that. Than I came across the ggplot2 package. It creates such beautiful plots and in so many formats. if you use ggpubr in addition to it, creating publication ready plots is breezy. I admit, once you get into it, the pull of embellishing your plots here and there is almost irresistible (but you will get out of it once deadline is hanging over your head). Once you establish your code for a set layout, its just the matter of changing the name of dataset to be used. thats it. and any further modification in layout, if require, can easily be done by just modifying the code once and apply it to all of your datasets. No more selecting each graph and making its axis line bold manually for a dozens of graphs. (I could not help but sneak in one of my own example plots).

After using this, I have a newfound respect for community developed, FOSS programs. It removes the limits from your analysis part. I wish more researcher here start incorporating R in their workflow.

 

Comments