# Log-log plots in R

06 Apr 2015Plotting data on log-log scale can be helpful when your data cover a large range of values and/or to detect power law behaviors in your data.
Create such a plot in R is quite simple, you just have to add **log = “xy”** as parameter when you call the plot function. In the same way, you
can use **log = “x”** or **log = “y”** to plot only the x-axis or the y-axis on log scale.

For example, we can plot the histogram of a sample **X** on a log-log scale. First, we use **hist** to compute the histogram of **X** with 500 bins.
Then we extract the midpoint **x** and the density **y** of each histogram cell. Finally, we plot the result on a log-log scale (**log = “xy”**).
I prefer to label the axis using exponents rather than scientific notation so I use the function **eaxis** (library **sfsmisc**).
Here is the code:

And the result:

The result is quite nice but the tail of the distribution is very noisy, there are probably few values in the last bins.
To avoid that we need to bin the data into exponentially wider bins so that they appear equally spaced on the logarithmic scale.
The idea is to compute the breakpoints between histogram cells with equally spaced values in log scale and then apply the **exp**
function to obtain the breakpoints in the original scale. The updated code:

And the new plot:

Now the tail of the distribution is less messy and we can clearly identify a truncated power law distribution with probably an exponential cutoff. To check that we need to fit the truncated power law’s PDF to the data. The PDF of a truncated power law distribution with an exponential cutoff is:

So now we need to adjust the three parameters, \( C \) the normalizing constant, \( \mathbf{\beta} \) the power law exponent
and \( x_o \) the exponential cutoff to the data. To do so we define a function **TPL** which computes the residual sum of square
between the data **y** and **f(x)**.

Then, we use the function **optim** which returns the parameter values **par** minimizing **TPL**.

Finally, we can plot the data and the fitted curve. As you can see the fit looks reasonable. Of course to do things right it would
be necessary to have a more rigorous process for estimating the parameters with the maximum likelihood method for example. It
would also be important to test different distributions and compare them with a test and/or a goodness-of-fit. For those who
are interested in fitting power laws to data I recommend the paper of Clauset *et al.*