I’ve previously complained about MATLAB, but continued to use it because I didn’t have a better tool. After seeing so many engineers and scientists agree that Python is a superior tool compared to MATLAB, I finally decided to make the switch. In retrospect, it’s one of the best tool decisions I’ve ever made. Here are some lovely things that I’ve gained from using a more general-purpose tool for algorithm development.
1. Memory management: it’s super-effective
The recent MAC OSX versions of MATLAB (R2012 and beyond) have well-known memory management deficiencies. MATLAB fails to properly free memory when certain variables (particularly plot figures) are no longer needed, due primarily to the poor reimplementation of the MATLAB engine in a Java Virtual Machine (JVM). Thus, even with a huge memory allocation (1GB+), MATLAB will eventually start using up swapdisk, slow down, and grind to a halt.
This garbage collection failure is especially true for plots. Even when figures are manually closed or closed using the
close all command, the memory is still allocated. Depending on MATLAB’s memory allocation, creating too many plots (30-100) will eventually cause the engine to run out of memory and enter an infinite loop of
Not enough heap memory to complete operation messages. At that point, MATLAB must be restarted. For a tool whose primary use is the fast and efficient plotting of complex data, this is a serious drawback.
Meanwhile… Python is extremely stable. The most popular version of Python at this moment (2.7) is the backbone of many popular web services that require >99% uptime (Quora, LinkedIn, Amazon, Pinterest). All of Dropbox’s infrastructure – including desktop clients for Windows, Mac, and Linux – are written in Python.
Plot-wise, Matplotlib (Python’s MATLAB-like plotting library) isn’t the most memory-efficient tool, but it certainly doesn’t send Python into an infinite loop of useless error messages if it runs out of memory.
2. Error handling like a boss
MATLAB does not deal with errors well, neither at the user level nor the system level. At the user level, error types cannot be enumerated, so catching specific error types becomes a maze of bookkeeping about which types correspond to which errors. At the system level, major errors such as memory allocation failures do not trigger the correct operations.
For example, when the heap memory error mentioned above occurs, the virtual machine should catch the error, immediately stop all processing, and perform garbage cleanup. Instead, at least on my Macbook Pro, MATLAB sits there until the user manually discovers the hangup (typically by clicking in the window and seeing the eternally spinning Mac rainbow wheel).
Meanwhile… Python’s error handling is terrific. The error handling system supports the creation of new error and exception classes that implement a base class. Specific error types can be caught, and the program will correctly terminate when an unhandled error is encountered.
3. Embarrassingly parallel processing, if you’re into that sort of thing
MATLAB simply does not support multithreading. Each operation runs on its own, which is grossly inefficient for naturally parallelizable methods like running a test on many device datasets. The expensive Parallel Processing Toolbox is essentially an automated way to spawn multiple MATLAB instances, as opposed to a true multithreading solution.
Meanwhile… Python can be run with any number of threads and any level of memory allocation (although its garbage collection becomes problematic above ~1-2 GB). A single instance of Python could utilize all of the computing powerful available on a given machine to complete a processing operation, allowing the user to focus on improving code instead of babysitting large-scale computations.
Note: the speedup with Python would not necessarily be proportional to the number of threads in use, due to the Global Interpreter Lock. Still, having the possibility of multiple threads is better than being limited to one thread.
4. Lovely, lovely object-oriented design
MATLAB’s support for typical abstraction forms is poor. MATLAB classes are needlessly complicated, can only be instantiated as static objects, and deliver slower performance than non-object implementations. In addition, class data is not cleared each time a given program is run, even if the scope of an instantiated class is within a function. This makes for strange bugs, and eliminates the ability to scope variables that is supposed to be hallmark of OOP.
These drawbacks incentivize developers to abandon object-oriented design in favor of procedural programming, which carries its own set of problems for large-scale applications. Procedural programming means every small change to a component requires considering every level of the application that depends on that component, instead of hiding the implementation behind a wall of abstraction. Thus, as MATLAB programs become larger, they become big balls of mud.
Meanwhile… Python supports objects every bit as powerful and fully-featured as C++ (although its support for inheritance is a bit iffy). Classes can be static, or shared, or synchronized between threads.
5. Package management like it’s FedEx
MATLAB does not support packages or namespaces, opting instead for the path approach (which was abandoned decades ago by all modern languages including C). Since programs cannot explicitly declare which packages are required, MATLAB code running perfectly on one machine may deliver unexpected results on another. In addition, since resources cannot be placed within well-contained, organized, well-documented buckets, MATLAB code tends to end up as a sprawling mess of interconnected m-files.
Meanwhile… Python’s package system is one of the best in the industry, and the entire package needs of a program can be determined using just the source code. A bunch of tools (easy_install, pip) are available for downloading and installing Python packages, all of which are available as open-source.
6. Data-centric resources that aren’t Excel
MATLAB lacks the concept of a ‘data table’. MATLAB’s primary data structure is a two-dimensional array of double values which, although powerful for linear algebra-based operations, is unacceptable for the vast majority of the world’s data, which usually has timestamps, string entries, and multiple types of tables.
Meanwhile… Python incorporates data-centric features from industry-standard tools like R and Ruby thanks to the popular Pandas package. The essential feature of this package is a single object (pandas.DataFrame) that allows any type of data to be organized into a single table.
7. A Swiss Army Knife of string processing resources
MATLAB uses an arcane C-style method of string processing. While fast and easily learned, it is not very flexible and leads to major issues parsing even the simplest text formats.
Meanwhile… Python is the most popular string processing tool in the world. A file parser that would have taken over an hour in MATLAB can be written in ten lines of Python in under ten minutes.
8. Cloud computing like it’s 2013
MATLAB requires a license to run on a given machine, and thus can’t be used natively in the Amazon Web Services Elastic Compute Cloud (EC2), where machine instances are only temporary. To run cloud instances, you have to use the MATLAB Distributed Computing Server with special-purpose EC2 instances, and the whole thing gets really expensive really quick.
Meanwhile… Not only is Python easily installable on EC2 instances; Ubuntu instances have Python 2.6 and 2.7 pre-installed. Installing other necessary packages (NumPy, SciPy, Matplotlib, pandas) requires a single apt-get line:
$ sudo apt-get install python-numpy python-scipy python-matplotlib python-pandas.
MATLAB requires a rather expensive license per user that must be renewed yearly. Corporations spend $2000 per user per year on a program that provides subpar performance and little online support.
Meanwhile… Python is free. And open-source.
But what about linear algebra?
MATLAB’s chief claim to fame is its efficient linear algebra capabilities, brought mostly through bindings to the famous Fortran-based LAPACK resource. However, Python’s NumPy/SciPy packages not only recreate this capability – they are, in fact, also binded into LAPACK. Moreover, Scipy’s commands are so similar to MATLAB that it’s sometimes difficult to tell the difference.
It is important to note that Python’s support for MATLAB-like arrays and matrices, along with its software-centric data structures (lists and dictionaries), make it a formidable data tool. Matrices can be placed into lists, which can be organized as dictionary entries, all within a single object. The flexibility of the language is extraordinarily empowering.
So do I give anything up by switching from MATLAB to Python?
Yes. The one MATLAB feature that Python does not support (yet) is an IDE that allows the user to graphically view code, variables, the console, and files in a single place. Developing in Python typically revolves around editing text in an editor of your choice (vim, emacs, Sublime, etc.) and then running it from the command line. However, one good option is Enthought Canopy, a Python distribution that comes bundled with a pretty nice interface. In addition, it installs Numpy, Scipy, and Matplotlib for you, so it’s pretty much ready-to-go straight out of the box.
If you’ve got a lot of legacy code written in MATLAB, then it probably doesn’t make sense for you to invest a lot of time rewriting everything in Python. But if you’re starting out on a new project…give Python a chance. Chances are you’ll be happy you did. Happy coding.