Beyond Excel: An engineers perspective
“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” - Abraham Maslow, 1966
I have been using Microsoft Excel for the better part of a decade now, from university into the workplace and in every professional role I have worked. It is a fantastic tool for rapidly prototyping ideas, performing calculations and manipulating data and it is also the bane of my existence. Now, I am guilty of being a hammer; I have made databases, calculators, simulations, budgets, forecasts, plans and a million other one-off analysis sheets in Excel and I need to pay my penance for the poor engineers in my wake. I am not apologising because I make poor sheets, quite the opposite, but because we did not use the right tool for the right job and now it is difficult to backtrack as a sheet becomes embedded into a companies processes.
Unwieldy files, multiple versions, copying, pasting and transposing data from sheet to sheet, hidden changes that manifest as embarrassing errors or skew a narrative, messy data tables that are painful to analyse, broken references and the utterly infuriating process of getting a chart to behave led me to look for alternatives a few years ago.
Through university, I gained exposure to MATLAB in my Control Engineering courses and earned respect for it as a powerful tool. However, the high cost and esoteric syntax left me unsatisfied and unsure of its place. I was exposed to R in my Biotechnology courses for statistical analysis, and while it did the job, I thought its scope was too narrow for general use and syntax a bit clunky. R’s history of being tacked onto by multiple academics to solve their particular problem showed. Functions arguments were not consistent, and there wasn’t an overarching philosophy. I was reintroduced a few years later to R and the work of Hadley Wickham and the team at RStudio have entirely changed my opinion of R from a niche and weird stats language into an elegant programming and analysis platform.
R has been evolving at a frenetic pace and surging in popularity. We are generating data at an ever-increasing rate, and better tools are required to tease the signals from the noise. Engineers and metallurgists must incorporate these advances into their daily workflows if we as a profession want to stay relevant in the information age, leaders are not going to be satisfied with half-baked tools or analysis that is ignorant of statistics and variance. It is the potent combination of deep domain knowledge and data analysis skills that are needed to unlock hidden value in mineral value chains.
I have been embracing this challenge and refined my statistical and analytical skills through Massive Open Online Courses (Coursera and EdX) and taught myself R through a subscription to DataCamp. I have found that writing my data analyses in R alleviates many of the problems I have had with excel, namely:
- Excellent version control for tracking changes
- Reproducible results
- Re-usability of past analysis and code
- Clear separation of data, analysis and visualisation
- Self-describing analysis where it is clear what are the inputs and outputs
- Easy integration of statistical tests and measures
While there is a learning curve, having R as a tool at your disposal allows you to attack problems that you could not in Excel or solve existing ones to a higher standard in tight time-frames. Below is an example of a workflow that I used to solve real metallurgical problems however the underlying data has been obfuscated. I hope to post more examples of workflows in the future.