# Computing the PDF of the sum of N moves of an empirical PDF for USDJPY 1-minute moves

[Cross posted.]

Per-minute tick data for USDJPY is available here. Suppose we download this file to usdjpy.txt and then save it into a Numpy array in Python 3 as follows:

import numpy as np
data=[x.split(',') for x in data][1:]
jpy=np.array([float(close) for (ticker,yy,time,open,high,low,close,vol) in data])


The per-minute returns in USDJPY, expressed in basis points, will be:

rjpy=10000.0*np.diff(jpy)/jpy[0:-1]


Define a histogram function and empirical PDF function as follows:

def histc(X,bins):
map_to_bins = np.digitize(X,bins)
r = np.zeros(bins.shape)
for i in map_to_bins:
r[i-1] += 1
return [r,map_to_bins]

def epdf(S,numIntervals=100):
minS=np.min(S)
maxS=np.max(S)
intervalWidth=(maxS-minS)/numIntervals
x=np.arange(minS,maxS+intervalWidth/2.,intervalWidth)
[ncount,ii]=histc(S,x)
if ncount>len(S)/2:
medS=np.median(S)
minS=0.8*medS
maxS=1.2*medS
intervalWidth=(maxS-minS)/numIntervals
x=np.arange(minS,maxS+intervalWidth/2.,intervalWidth)
[ncount,ii]=histc(S,x)
relativefreq=ncount/sum(ncount)
return (x,relativefreq)


The empirical PDF of USDJPY 1-minute pip returns is then:

(x,rf)=epdf(djpy,numIntervals=1000)


which if we plot it

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()
simple=[(x[i],int(100*rf[i])) for i in range(rf.shape) if int(100*rf[i]) > 0]
X=np.array([x for x,y in simple])
Y=np.array([y for x,y in simple])
p=figure(plot_width=600,plot_height=200,tools="pan,wheel_zoom,box_zoom,reset,resize")
p.line(X,Y)
show(p)


Looks like this: Now suppose I want to know what could happen in an hour (60 one-minute moves). Following the answer to this question, I could convolve the EPDF above 60 times and I should get the right answer. I think this would look something like this:

def step_pdf(pdf1,pdf2):
pdf=np.convolve(pdf1,pdf2)
pdf=(pdf[0:-1:2]+pdf[1::2])/2
pdf=np.append(np.array(),pdf)
pdf=pdf/pdf.sum()
return pdf

from functools import reduce
pdf60=reduce(step_pdf,[rf for i in range(60)])


If I then plot the new pdf60 on top of the old pdf

p=figure(plot_width=600,plot_height=200,tools="pan,wheel_zoom,box_zoom,reset,resize")
p.line(x,rf,color='red')
p.line(x,pdf60,color='blue')
show(p)


I see (call this “Convolution PDF60”): The blue line is my 60-minute PDF from the above 60-fold convolution. It is smoother, which I expect, but it is still roughly in the same range as the original 1-minute PDF, which I do not expect. So now I will try a more constructive way of generating the 60-minute PDF: I will construct as many 60-minute samples randomly as I have 1-minute samples, by summing randomly selected vectors of size 60 from my original population of 1-minute moves. Then I will compute the empirical PDF of the result. I completely trust this construction, so I will use it as a benchmark against my original construction. So:

n=djpy.shape
draws=np.random.randint(0,n,size=(n,60))
djpy60=np.array([djpy[draws[i]].sum() for i in range(n)])
(x,pdf60)=epdf(djpy60,numIntervals=1000)


Now if I plot pdf60:

p=figure(plot_width=600,plot_height=200,tools="pan,wheel_zoom,box_zoom,reset,resize")
p.line(x,rf,color='red')
p.line(x,pdf60,color='blue')
show(p)


I see a much wider distribution of 60-minute moves, which corresponds much more strongly to my intuition (call this “Monte Carlo PDF60”): Question: Why aren’t my Convolution PDF60 and my Monte Carlo PDF60 in agreement?

# What proves that a random process with zero diffusion is not a martingale?

[Cross-posted.]

Consider the process $dX_t=W_t dt+0 dW_t$, alternatively $X_t=\int_0^t W_s ds$. $W_t$ is Brownian motion. I read a proof that $X_t$ is a martingale that simply states “Because the diffusion of $dX_t$ is 0, $X_t$ is not a martingale.”

By definition, a stochastic process $X_t$ adapted to a filtration $\{F_t\}$ is a martingale iff $E(|X_t|) <\infty, t \geq 0$ and $E(X_t|{\cal F}_s)=X_s, 0\leq s

Question: What exactly about either of these conditions establishes that if a random process has 0 diffusion, it is not a martingale?

I am asking because I see the 0-diffusion condition used often for this purpose, but in the above example, of a process which is still random even though it has a zero diffusion, I don’t get it.

# Write expectation of brownian motion conditional on filtration as an integral?

[Cross-post.]

Let $W_t$ be a Brownian motion, so $W_t=z_t \sqrt{t}$ where $z_t \in N(0,1)$ and the pdf of $z$ is $f(z)=\frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}}$. So $E(W_t)=\int_{-\infty}^{\infty} W_t f(z) dz =\int_{-\infty}^{\infty} z \sqrt{t} \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} dz =\int_{0}^{\infty} (z+(-z)) \sqrt{t} \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} dz=0$

Now suppose ${\cal F}_t$ is the natural filtration for $W_t$. By construction of Brownian motion, we are given that $E(W_t|{\cal F}_s)=W_s, 0\leq s\leq t$.

Question: How do I write $E(W_t|{\cal F}_s)$ as a Riemann integral expression similar to the Riemann integral expression of $E(W_t)$ given above?

Note: I have done extensive Google search on this, without finding any responsive exposition. If this question is beside the point, please explain why. If it’s on point, please answer with the Riemann integral expression.

# Empirical PDF from Empirical CDF

(cross post)

Suppose I do an experiment $N$ times and get a vector $X$ of results. Let $C_X(y)$ be the empirical cumulative distribution function of $X$. Suppose $X$ is sorted so that $x_1 \leq x_2 \cdots \leq x_N$. Approximately, $C_X(y)=0\textrm{ if }y \leq x \textrm{ for all }x \in X$ $C_X(y)=1\textrm{ if }y > x \textrm{ for all }x \in X$ $C_X(y)=\frac{i+\frac{y-x_i}{x_{i+1}-x_i}}{N} \textrm{ if } x_i \leq y \leq x_{i+1}$

Question: What is the most efficient way to compute the corresponding empirical PDF of $X$? Just interpolate through the histogram?

# Two-step empirical CDF from one-step empirical CDF

(cross-post)

Suppose I have a random variable $X_i$ which changes by $X_{i+1}=X_i+\delta_i$ from one timestep to the next. Suppose I do an experiment where I observe $N$ values $d_1,d_2,\ldots,d_N$ of $\delta_0$ and make an experimental CDF of $\delta_0$, by sorting $d$ so that $d_1 \leq d_2 \cdots \leq d_N$, and then approximating the CDF of $y$ as $\frac{i}{N}$ where $d_i \leq y \leq d_{i+1}$.

Question: What is the most efficient way to compute the empirical CDF of two steps of $X$, assuming that the process for going from $X_i$ to $X_{i+1}$ follows the same empirical distribution? The brute force way that occurs to me is to create the set $E={d_i+d_j: 1\leq i\leq N, 1\leq j\leq N}$, sorting $E=e_1,\ldots,e_{N^2}$ and then approximating the two-step CDF of $y$ as $\frac{i}{N^2}$ where $e_i \leq y \leq e_{i+1}$.