Debugging · Geography Programming Courses

In terms of memory allocation, the simplest way to monitor memory is with the library memory_profiler. This doesn't come with either Python or Anaconda, but is available for quick and easy install for both, provided you have the appropriate install permissions. If you have a standard Python install, details of how you can install it can be found on pypi. For Anaconda, it is best to integrate it with the rest of Anaconda by using the management system "Conda", thus (you may need to restart Spyder):
conda install memory_profiler

Once installed, the easiest way to use it is to add:
from memory_profiler import profile
to the top of your code. If you then put:
from memory_profiler import profile
@profile
Immediately above any function you want to profile. Although it will only profile functions, it is relatively simple to shift all of a script into a function and call it. This @profile is what is known as a "decorator" – there are a wide range of decorators for doing jobs in Python (list); they're essentially a signal to run some other code, in this case in the memory_profiler library.

Here, then, is our code marked up for this: test2.py, and here's an example of the output (just one of the loop runs) for size == 100:

Line #    Mem usage    Increment   Line Contents
================================================
   9     24.4 MiB      0.0 MiB  @profile
  10                            def fill_row():
  11     24.4 MiB      0.0 MiB    data_row = []
  12     24.4 MiB      0.0 MiB     for i in range(size):
  13     24.4 MiB      0.0 MiB       data_row.append(random.random())
  14     24.4 MiB      0.0 MiB    return data_row

The first column is the line number. The second is the memory after the line has run. The third is the increment in the memory requirements between that and the previous state. The last is the line itself. As you can see, the memory allocation in this specific run was too small to register as changing. The units are, unusually but correctly, listed in Mebibytes. You can increase the precision of the output by adjusting to, for example:
@profile(precision=6)
I had to crank the size up to 1000 to see any impact:


Line #    Mem usage    Increment   Line Contents
================================================
   9  24.3671875000 MiB   0.0000000000 MiB   @profile(precision=10)
  10                                         def fill_row():
  11  24.3671875000 MiB   0.0000000000 MiB     data_row = []
  12  24.3789062500 MiB   0.0117187500 MiB      for i in range(size):
  13  24.3789062500 MiB   0.0000000000 MiB       data_row.append(random.random())
  14  24.3789062500 MiB   0.0000000000 MiB     return data_row

The allocation of memory for each line might seem suprising – why is the allocation happening on the for-loop but not the appending of a new number, however, range generates a new list of numbers. random.random() must just be, for example, linking to the already existing random number table in Python. In general, random number generation is optimised to heck, because it is so important in so many things. There are a number of additional nice features in memory_profiler, and it is worth reading the short documentation.

So, using all this. Can you estimate how long this code will take to run using 10000 as the size, and how much memory it will use?

Geography Programming Courses

Profiling memory