{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Calculate Prime Numbers and Plot Them\n",
    "\n",
    "### Authors: \n",
    "- Christian Michelsen (Niels Bohr Institute)\n",
    "- Troels C. Petersen (Niels Bohr Institute)\n",
    "\n",
    "### Date:    \n",
    "- 24-09-2018 (latest update)\n",
    "- 06-10-2015 (originally)\n",
    "\n",
    "***\n",
    "Script calculating all prime numbers up to 10000 and plots them in a histogram, effectively counting the number of primes in intervals of 100.\n",
    "***\n",
    "\n",
    "First, import the relevant packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np               \n",
    "import matplotlib.pyplot as plt\n",
    "from iminuit import Minuit\n",
    "from probfit import Chi2Regression #, BinnedChi2, UnbinnedLH, BinnedLH\n",
    "import sys                                         # Modules to see files and folders in directories\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "sys.path.append('../../External_Functions')\n",
    "from ExternalFunctions import nice_string_output, add_text_to_ax # useful functions to print fit results on figure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Decide whether or not the program should save the plot along with the maxmimum prime number and the starting point:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "save_plots = True\n",
    "Nmax = 10000              # Maximum prime number\n",
    "N    = 2                  # Starting value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main program:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculating prime numbers comes down to taking a number (here `N`), and testing if it is prime by dividing it by all numbers up the the square root of it. If no number divides `N` (tested using the `%` operator, which returns the remainder: `6%3 = 0`, `5%2 = 1`, etc.), then `N` is prime!\n",
    "***\n",
    "See if you can follow this in python program form below. There are only 9 lines in total! The central part consists of a While-loop, which loops as long as the statement is true. It is true to begin with (`N=2`, `Nmax = 10000`), but at some point `N` will have increased enough to equal `Nmax`, and the looping stops.\n",
    "***\n",
    "We start by creating empty list for primes. We dont create a numpy array since we don't know the size of it before creation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "primes = []\n",
    "\n",
    "while (N < Nmax):\n",
    "    # print(\"Potential Prime number I want to test: \", N)\n",
    "    hit = False                # Define a variable \"hit\", with starting value \"False\".\n",
    "\n",
    "    for i in range (2, int(np.sqrt(N)) + 1) :   # Loop over the possible divisors.\n",
    "        # print(\"  I'm testing the first number with the second and get the third: \", N, i, N%i)\n",
    "\n",
    "        if N%i == 0:           # If the remainder of the integer division is 0\n",
    "            hit = True         # hit = \"True\" (i.e. the number being tested had a divisor, and is not a prime)\n",
    "            break              # Skip the rest of the possible divisors.\n",
    "\n",
    "    if not hit:             # If no number gave a perfect integer division (i.e. with remainder 0)...\n",
    "        primes.append(N)\n",
    "        print(\"I got a prime\", N)   # ...the it is a prime number.\n",
    "\n",
    "    N += 1                     # Increase N by one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now convert the Python list into a Numpy array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "primes = np.array(primes)\n",
    "print(\"First 10 primes, \", primes[:10]) # prints the first 10 primes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can check that there are no even primes except 2:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"  The even prime numbers are listed here: \", primes[(primes % 2) == 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Draw output:\n",
    "\n",
    "Having computed the prime numbers, we now try to plot the distribution of them to see if there are any interesting patterns. \n",
    "\n",
    "We first define the number of bins and the minimum/maximum x value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Nbins = 100\n",
    "xmin = 0\n",
    "xmax = Nmax\n",
    "binwidth = int((xmax-xmin)/Nbins)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case we want the distribution of prime numbers as graph with error bars and not a regular histogram.  \n",
    "Therefore we don't do `ax.hist(primes, bins=Nbins, ...)`, but rather bin it using Numpy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y, bin_edges = np.histogram(primes, bins=Nbins, range=(xmin, xmax))\n",
    "x = bin_edges[:-1] + np.diff(bin_edges)       # Calculate the x-values as the center of the bins.\n",
    "sy = np.sqrt(y)                                  # Assume Poissonian errors (more on that later!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then plot it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(12, 6))\n",
    "ax.errorbar(x, y, sy, fmt='.', label='Prime number distribution')\n",
    "ax.set(xlim=(xmin, xmax), \n",
    "       xlabel=\"Prime Number\", \n",
    "       ylabel=f\"Frequency /{binwidth:4d}\",    # To put the bin width in the y-axis label \n",
    "       title=\"Prime number density distribution\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fitting the data:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The __[Prime Number Theorem](http://en.wikipedia.org/wiki/Prime_number_theorem)__ states, that the number of prime numbers follows roughly the function $f(x) = \\frac{1}{\\ln(x)}$. We have binned our prime numbers, so we must have a function with a few more degrees of freedom: $f(x) = c_0 + \\frac{c_1}{\\ln(x)}$, where $c_0$ and $c_1$ are constants that needs to be fitted."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the function to fit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "def fit_prime(x, c0, c1) :\n",
    "    return c0 + c1 / np.log(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make a chi2-regression object based on the fit function `fit_prime` and the `x`, `y` and `sy` values: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chi2_object = Chi2Regression(fit_prime, x, y, sy)\n",
    "minuit = Minuit(chi2_object, pedantic=False, c0=0, c1=10) # Initializes the fit with given initial values\n",
    "minuit.migrad(); # Fit the function to the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extract the fit parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c0, c1 = minuit.args                    # Get the output arguments\n",
    "for name in minuit.parameters:          # Print the output (parameters of the fit)\n",
    "    print(f\"Fit value: {name} = {minuit.values[name]:.5f} +/- {minuit.errors[name]:.5f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And plot the fit along with some extra text to the plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_fit = np.linspace(xmin, xmax, 1000)[1:]   # Create the x-axis for the plot of the fitted function - Not from x[0] since x[0] is 0, which is not defined for the log!\n",
    "y_fit = fit_prime(x_fit, c0, c1)    \n",
    "\n",
    "ax.plot(x_fit, y_fit, '-', label='Prime number fit result')\n",
    "\n",
    "ax.text(0.5, 0.6, r'Fit function: f(x) = $C_0 + \\frac{C_1}{\\log (x)}$', transform = ax.transAxes, fontdict={'size': 18}); # transform so relative coordinates\n",
    "ax.legend(loc='best')\n",
    "\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "We also want to extract the $\\chi^2$ value (see `IntroToPlottingAndFitting.ipynb` if the command below is unclear):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chi2_val = minuit.fval"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the number of degrees of freedom:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N_NotEmptyBin = len(y > 0)\n",
    "Ndof = N_NotEmptyBin - len(minuit.args)            # Number of degrees-of-freedom"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We calculate the $\\chi^2$ probability:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy import stats\n",
    "chi2_prob =  stats.chi2.sf(chi2_val, Ndof)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And add the fit information to the plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {'Entries':  N_NotEmptyBin,\n",
    "     'Chi2':     chi2_val,\n",
    "     'ndf':      Ndof,\n",
    "     'Prob':     chi2_prob,\n",
    "     'c0':       [minuit.values['c0'], minuit.errors['c0']],\n",
    "     'c1':       [minuit.values['c1'], minuit.errors['c1']]\n",
    "    }\n",
    "\n",
    "text = nice_string_output(d, extra_spacing=-2, decimals=2)\n",
    "add_text_to_ax(0.02, 0.95, text, ax, fontsize=12)\n",
    "fig.tight_layout()\n",
    "fig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we save the plots if save_plots is True:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if save_plots:\n",
    "    fig.savefig('PrimeNumberDistribution.pdf', dpi=600)"
   ]
  }
 ],
 "metadata": {
  "executable": "/usr/bin/env python",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  },
  "main_language": "python"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}