Academia is attractive for a lot of reasons, but it’s a field where it can be extremely challenging to forge a career. That helps to explain why over the last 10 years, ~60% of new physics PhDs end up taking jobs in areas of industry such as engineering, computer software and hardware, business, and non-STEM.
Looking at the July Employment Situation Summary from the U.S. Bureau of Labor Statistics, unemployment rates, while no longer the record low, are at an impressive 3.7%. However, a recent report from the Society for Human Resource Management showed 83% of HR professionals are experiencing difficulty finding the right candidates, and 75% cite a skills shortage among applicants as the reason. And the fact that there are fewer people seeking employment than (almost) ever is only compounding the problem.
There is a shortage of skilled technology employees, and a shortage of funding for skilled scientists. How can we surmount both of these pressing crises?
Tech companies should consider more seriously hiring from fields outside the traditional software engineering degree paradigm.
As a former astrophysicist and current data scientist in cybersecurity, I’ve seen the overlap within the fields, and it is substantial at a fundamental level.
Exercises in Statistics
In practice, astronomy and data science are both exercises in statistics. The sorts of problems astronomers work on are big questions, like: What sorts of things do we find in the universe? How did they all form, and how do they evolve in time? How does it all connect up into a single picture?
When we try to answer these questions, we don’t have the luxury of being able to set up and repeat experiments in a laboratory. Instead, we collect data over large portions of the sky, and then apply statistical methods to extract information. We ask, what conclusions can we support from this data? How confident are we in the conclusions? What are all the ways we could be wrong? How does our choice of data set influence the conclusions we can draw?
Data science operates the same way, and sometimes even applies the same methods. In both cases, we need to know the nitty-gritty details and pitfalls about collecting data, processing it and cleaning it, extracting information, and drawing conclusions.
Demanding Good Research
As a profession, both physicists and data scientists have to be good researchers. In both cases, this means finding relevant research in the literature, reading and writing scientific papers, publishing in journals, attending conferences and presenting one’s research, collaborating with colleagues, and working as a team.
In their day-to-day work, astronomers and data scientists use tools and methods that are broadly the same. In terms of statistics and math, both disciplines are concerned with probabilities and Bayesian inference. We ask: how likely is this particular event? Are we able to predict future events? What trends and relationships can we find in the data? Can we find a model that explains the data, without overfitting?
In both worlds, I’ve seen Monte Carlo / Markov Chain Monte Carlo simulations, Bayesian statistics, regression methods, maximum likelihood methods, etc. Mathematically, both fields require familiarity with linear algebra and numerical approximation techniques. Data science makes light use of some calculus concepts that you find in physics as well.
In practice, this is done with specialized software and by writing code. Both kinds of scientist will be writing using scientific libraries, likely in Python or R, or more rarely, heavy-duty simulation code in C/C++.
Largest Data Sets
At these scales, entirely new methods need to be invented to collect, store, process, and transport the data.
Take for example that first-ever image of a black hole that you probably saw in the news a couple months ago: the data used to make that single image was collected for years from telescopes all around the world, and it was of such volume that that they couldn’t send it over the internet. Instead they physically shipped hard drives around to get the data to a single location where they could process it. At this petabyte scale, we frequently require supercomputers or distributed computing methods, and / or lots of processing time.
Data science isn’t always at that extreme scale, but the datasets are still large enough that special consideration is needed. Both scientists will be concerned with algorithmic efficiency (can you make your code run faster if you change a few things?) and building robust data pipelines.
In all cases, we don’t always start by knowing what questions we are trying to answer. Usually we don’t know if it is even possible to answer a particular question! We make discoveries which are sometimes surprising and have to think outside the box. Research requires creativity, which a lot of people seem to forget.
Emerging technical fields like data science offer an exciting, innovative way to apply the software, math and statistics skills learned from non-traditional sources like astrophysics to make real contributions to the field, which is moving rapidly. Data scientists have the opportunity to apply cutting-edge techniques to solve real-world, difficult problems. And those studying non-traditionally tech-related subjects possess the core skills needed to solve those problems creatively, proficiently and efficiently.
About the author: Ryan Foltz is a data scientist at Exabeam, where he applies the latest machine learning approaches to cybersecurity. Prior to Exabeam, he finished his PhD in astrophysics at the University of California, Riverside, where he worked with an international collaboration of astronomers to study the processes that can violently affect galaxies as they grow and evolve over cosmic time. My research focused on galaxy clusters, the largest objects in the universe. In my free time I make indie video games, which I’ve done since the early 2000s as the founder and lead game designer of Epic Banana Studios.
This article originally appeared on Datanami.