Make plots

General remarks

There are many different programs to make figures, e.g. gnuplot, matplotlib, inkscape,.. When designing figures the following things should be kept in mind:

Clarity - no ambiguity and confusion, clear color/symbol choices for both color and b/w printing, be mindful of colorblindness, use labels (with units!) + figure captions, fontsize/type should match manuscript text (although there is debate over serif vs. sans-serif font), avoid excessive abbreviations whenever possible, all plots/graphs consistent in style
Precision - truthful results and distortion free presentation, be careful about unconventional scaling and x/y axis choices, 3D representations, non-zero y-axis
Efficiency - no unnecessary “junk” in your graphs, appropriate for the intended medium (talk, paper, poster)
Resources:

Histograms:

Picking bin widths: https://onlinelibrary.wiley.com/doi/10.1002/pmrj.12145
General tips: https://towardsdatascience.com/histograms-why-how-431a5cfbfcd5

Matplotlib

Using the matplotlib package is often the easiest way of plotting data, especially if the analyzing scripts are already in python.

import matplotlib.pyplot as plt
import numpy as np
 
def LJ_potential(r,sigma,epsilon,r_cut):
    conds = [r < r_cut, r >= r_cut]
    funcs = [lambda x: 4*epsilon*((sigma/x)**(12)-(sigma/x)**6) , lambda x: 0]
    return np.piecewise(r, conds, funcs)
 
fig, ax = plt.subplots(1, 1)
 
 
r = np.arange(0.6,4,0.01)
ax.plot(r, LJ_potential(r,1.0,1.0,3.0),c='k')
 
ax.set_xlim([0.8,3.1])
ax.set_ylim([-1.1,2])
 
ax.set_xlabel(r'$r \ [\sigma]$')
 
ax.set_ylabel(r'$U_\mathrm{LJ}(r) \ [\epsilon / k_\mathrm{B}]$')
 
plt.savefig("example.pdf")
plt.show()

This produces a pretty nice plot right away. There are many ways of customizing the plots, if the same settings are used over and over again they can be put in a style class, like described here. Alternatively, each setting can be changed manually, or also changed in place, often with arguments for each function within matplotlib:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from mpl_toolkits.axes_grid1.inset_locator import inset_axes, mark_inset
 
mpl.rcParams['text.usetex'] = True
mpl.rcParams['font.family'] = "serif"
mpl.rcParams['font.serif'] = "Computer Modern"
 
mpl.rcParams['legend.fontsize'] = 12
mpl.rcParams['axes.labelsize'] = 12
mpl.rcParams['font.size'] = 12
mpl.rcParams['xtick.labelsize'] = 12
mpl.rcParams['ytick.labelsize'] =  12
 
mpl.rcParams['legend.frameon'] = False
mpl.rcParams['legend.shadow'] = False
mpl.rcParams['legend.numpoints'] = 1
mpl.rcParams['legend.scatterpoints'] = 1
mpl.rcParams['legend.handlelength']  = 0.3
mpl.rcParams['legend.handletextpad'] = 0.2
 
mpl.rcParams['lines.linewidth'] = 1.5
mpl.rcParams['lines.markersize']= 5
 
mpl.rcParams['savefig.bbox'] = "tight"
mpl.rcParams['savefig.pad_inches'] = 0.1
mpl.rcParams['savefig.transparent'] = True
 
 
mpl.rcParams['xtick.major.size'] =  4
mpl.rcParams['ytick.major.size'] =  4
mpl.rcParams['xtick.minor.size'] =  1.5
mpl.rcParams['ytick.minor.size'] =  1.5
mpl.rcParams['xtick.minor.visible'] =  True
mpl.rcParams['ytick.minor.visible'] =  True
mpl.rcParams['ytick.right'] =  True
mpl.rcParams['xtick.top'] = True
mpl.rcParams['xtick.direction']   = "in"
mpl.rcParams['ytick.direction']   = "in"
 
def LJ_potential(r,sigma,epsilon,r_cut):
    conds = [r < r_cut, r >= r_cut]
    funcs = [lambda x: 4*epsilon*((sigma/x)**(12)-(sigma/x)**6) , lambda x: 0]
    return np.piecewise(r, conds, funcs)
 
fig, ax = plt.subplots(1, 1, figsize=(6,4))
axins = inset_axes(ax, width="60%", height="50%", loc=1,
                   bbox_to_anchor=(.2, .4, .6, .5),
                   bbox_transform=ax.transAxes)
 
 
r = np.arange(0.6,4,0.001)
ax.plot(r, LJ_potential(r,1.0,1.0,3.0),c='red',label='Lennard-Jones potential')
axins.plot(r, LJ_potential(r,1.0,1.0,3.0),c='red')
axins.axhline(y=0,c='k',linestyle='--')
mark_inset(ax, axins, loc1=3, loc2=1, fc="none", ec="0.5")
 
axins.set_xlim([2.95,3.05])
axins.set_ylim([-0.01,0.005])
 
ax.set_xlim([0.8,3.1])
ax.set_ylim([-1.1,2])
 
ax.set_xlabel(r'$r \ [\sigma]$')
ax.set_ylabel(r'$U_\mathrm{LJ}(r) \ [\epsilon / k_\mathrm{B}]$')
ax.legend(loc=4)
 
plt.savefig("example.pdf")
plt.show()

Format matplotlib for a two-column LaTeX document

Often, preparing figures for a manuscript means to keep the final document formatting in mind. As mentioned, there are many different design philosophies, but if you want to make your plot look the same and fit into revtex4-1, this should get you started:

import matplotlib
import matplotlib.pyplot as plt
 
# set the font to be cm=Computer Modern=LaTeX default font
matplotlib.rcParams['mathtext.fontset'] = 'cm'
matplotlib.rcParams['font.family'] = 'STIXGeneral'
 
# make the labels a bit smaller (default font size is 10)
matplotlib.rcParams['xtick.labelsize'] = 9
 
# this enables full LaTeX equation rendering
matplotlib.rcParams['text.usetex'] = True
 
# a column in a two-column revtex4-1 LaTeX article is 8.6 cm wide
# you can also use 8 cm for a slightly smaller figure 
cm = 1/2.54
fig, ax = plt.subplots(1,2, figsize=(8.6*cm,6*cm))
 
# plot ...
 
# make sure nothing overlaps
plt.tight_layout()
plt.savefig("my_shiny_plot.pdf")
plt.show()

Then, in your LaTeX document one can simply include this figure with:

\begin{figure}[!t]
    \centering
    \includegraphics{my_shiny_plot.pdf}
    \caption{A shiny plot}
    \label{fig:shiny}
\end{figure}

This should result in correct font and font sizes. You can also explicitly specify the figure size with \includegraphics[width=8cm]{my_shiny_plot.pdf}. It is always useful to render one figure and include it in the manuscript, tinker with it until it looks right, and then apply the desired formatting onto all figures.

Plot data from files

For reading data from files use numpy.genfromtxt():

mport matplotlib.pyplot as plt
import numpy as np
 
fig, ax = plt.subplots(1, 1,figsize=(6, 4))
 
FS = np.genfromtxt('Frenkel_casestudy_09.dat')
ax.scatter(FS[:,0], FS[:,1], label="$T = 0.9 \;\epsilon/ k_\mathrm{B}$", color='r')
 
FS = np.genfromtxt('Frenkel_casestudy_20.dat')
ax.scatter(FS[:,0], FS[:,1], label="$T = 2.0 \;\epsilon/ k_\mathrm{B}$", color='b')
 
ax.set_xlim([0.0,1.0])
ax.set_ylim([-0.5,2.5])
 
ax.set_xlabel(r'$\rho \ [m/\sigma^3]$')
 
ax.set_ylabel(r'$p \ [\epsilon / \sigma^3]$')
ax.legend()
 
plt.savefig("example.pdf")
plt.show()

Plotting/Analyzing all files in a folder

Assuming there are many files which are all supposed to be plotted by the same script, python can automatically find all files matching a certain pattern:

import numpy as np
import os, re
import matplotlib.pyplot as plt
 
# find all files in a folder 'out' which end on "pressure.dat"
files_to_plot = []
for subdir, dirs, files in os.walk('./out/'):
    for filename in files:
        filepath = subdir + filename
        if filepath.endswith(".pressure.dat"):
            files_to_plot.append(filepath)
 
 
fig, ax = plt.subplots(1, 1)
 
for i,f in enumerate(files_to_plot):
    data = np.genfromtxt(f) # open the file
    ax.plot(data[:,0],data[:,1], label="%s"%f) # plot
 
ax.set_xlabel(r'"time" $\ [\mathrm{ MC \ cycles}]$')
ax.set_ylabel(r'$p \ [\epsilon / \sigma^3]$')
l = ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
 
plt.savefig("example.pdf",bbox_extra_artists=(l,), bbox_inches='tight')
plt.show()

If only some files need to be plotted, a simple list can be provided:

files_to_plot = ['./out/file1.dat','./out/file2.dat','./out/file3.dat']

This is also useful if the files don't follow a common pattern.

A more complicated example, where all pressure data is averaged and plotted:

import numpy as np
import os, re
import matplotlib.pyplot as plt
from matplotlib import cm
 
def P_correction(rho,sigma,epsilon,cutoff):
    c = 16/3.*np.pi*rho**2*sigma**3*epsilon*(2/3.0*(sigma/cutoff)**9-(sigma/cutoff)**3)
    return c
 
# find all files in a folder which end on "pressure.dat"
files_to_plot = []
L = []
N = []
T = []
for subdir, dirs, files in os.walk('./out/'):
    for filename in files:
        filepath = subdir + filename
        if filepath.endswith(".pressure.dat"):
            # finds all floats in filename - expecting three (N,L and T)
            all_floats = re.findall(r"[-+]?\d*\.\d+|\d+", filepath)
            if len(all_floats)>0:
                L.append(float(all_floats[1]))
                N.append(float(all_floats[2]))
                T.append(float(all_floats[0]))
                files_to_plot.append(filepath)
 
 
# sorting one list according to values in another list, e.g. make sure we plot in order
# not actually strictly necessary
files_to_plot = [x for _,x in sorted(zip(L,files_to_plot))]
N = [x for _,x in sorted(zip(L,N))]
T = [x for _,x in sorted(zip(L,T))]
# parameters_for_sorting HAVE to be sorted last
L = [y for y,_ in sorted(zip(L,files_to_plot))]
 
 
# Define the colors to be used using rainbow map (or any other map), or any other color list, or just a single color 
colors = [cm.rainbow(i) for i in np.linspace(0, 1, len(files_to_plot))]
 
# now to the actual plotting:
fig, (ax1,ax2) = plt.subplots(1, 2)
 
FS = np.genfromtxt('Frenkel_casestudy_09.dat')
ax1.plot(FS[:,0], FS[:,1], label="FS", color='k')
 
 
FS = np.genfromtxt('Frenkel_casestudy_20.dat')
ax2.plot(FS[:,0], FS[:,1], label="FS", color='k')
 
# constants
sigma = 1.
epsilon = 1.
cutoff = 3.
 
 
for i,f in enumerate(files_to_plot):
    data = np.genfromtxt(f)
    data = data[20:]
    rho = N[i]/L[i]**3
    P = np.average(data[:,1])
    P_ex = P_correction(rho,sigma,epsilon,cutoff)
    if T[i]==0.9:
        ax1.scatter(rho, P+P_ex, label="%1.3f"%rho, color=colors[i])
    else:
        ax2.scatter(rho, P+P_ex, label="%1.3f"%rho, color=colors[i])
 
ax1.set_xlabel(r'$\rho \ [m/\sigma^3]$')
ax1.set_ylabel(r'$p \ [\epsilon / \sigma^3]$')
ax1.legend()
 
ax2.set_xlabel(r'$\rho \ [m/\sigma^3]$')
ax2.set_ylabel(r'$p \ [\epsilon / \sigma^3]$')
ax2.legend()
 
plt.savefig("example.pdf")
plt.show()

Extracting data from plot images

Often, you want to have the data from a plot/graph in a publication or book. It can be pretty difficult to read them off by eye, especially if the axes are logarithmic. g3data is a nice, but old, tool to do this.

install g3data
take screenshot of the plot in question and save it as png somewhere
open the png with g3data
set reference points/values on the axes, set logarithmic axes if needed
click on the data points you want to extract
save as txt somewhere

install g3data on Mac

brew install automake autoconf gtk+
git clone https://github.com/pn2200/g3data.git
cd g3data/
autoreconf -i
./configure 
make
./g3data/g3data

install g3data on Ubuntu

sudo apt-get install g3data

StattLabWiki

Table of Contents