Most serious data scientists prefer R to Python, but if you want to work in data science or machine learning in an investment bank, you're probably going to have to put your partiality to R aside. Banks overwhelmingly use Python instead.
"Python is preferred to R in banks for a number of reasons," says the New York-based head of data science at one leading bank. "There's greater availability of machine learning packages like sklearn in Python; it's better for generic programming tasks and is more easily productionized; plus Python's better for data cleaning (like Perl used to be) and for text analysis."
For this reason, he said banks have moved their data analysis to Python almost entirely. There are a few exceptions: some strats jobs use R, but for the most part Python predominates.
Nonetheless, R still has its fans. Jeffrey Ryan, the former star quant at Citadel is a big proponent of R and runs an annual conference on R in finance (canceled this year due to COVID-19). "R was designed to be data-centric and was researcher built," says Ryan. "Whereas Python co-opted R's data frame and time series, via Pandas [the open source software library for data manipulation in Python built by Wes McKinney, a former software developer at Two Sigma.]"
R is still used in statistical work and research, says Ryan. By comparison, Python is the tool of "popular data analysis," and is easy to use without learning statistics. "Python found a whole new audience of programmers at the exact right moment in history," Ryan reflects. "When programmers (more numerous than statisticians) want to work with data, Python has the appeal of a single language that "does it all" - even if it technically does none of this by design."
Given the importance of data in financial services, it might be presumed that banks would favor the more capable language, even if it does require extra effort to master. However, Graham Giller, chief executive officer at Giller Investments and a former head of data science research at JPMorgan and Deutsche Bank, says banks have settled on Python over R because banks' IT departments are predominantly run by computer scientists rather than people who care a lot about data.
"Personally I like R a lot," says Giller. "R is much more of a tool for professional statisticians, meaning people who are interested in inference about data, rather than computer scientists who are people interested in code." As the computer scientists in banks have gained traction, Giller says banks have "replaced quants with IT professionals or with quants who deep down want to be IT professionals," and they've brought Python with them.
For the pure mathematicians in finance, it's all a bit frustrating. Pandas was built on the back of R, but has taken on a life of its own. "Pandas started out as a way to bring an R like environment to Python," says Giller, observing that Pandas can be "horrifically slow and inefficient" by comparison.
Most people don't care about this though: the more that Python and Pandas are used, the more use cases they have. "R has a relatively smaller user base than Python at this point," says Ryan. "This in turn means a lot of tools start to get created around python and data, and it builds upon its success."
Have a confidential story, tip, or comment you’d like to share? Contact: [email protected] in the first instance. Whatsapp/Signal/Telegram also available.
Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)
Photo by Vitaly Vlasov from Pexels