Computing the PDF of the sum of N moves of an empirical PDF for USDJPY 1-minute moves

[Cross posted.]

Per-minute tick data for USDJPY is available here. Suppose we download this file to usdjpy.txt and then save it into a Numpy array in Python 3 as follows:

import numpy as np
with open('USDJPY.txt','r') as f: data=f.readlines()
data=[x.split(',') for x in data][1:]
jpy=np.array([float(close) for (ticker,yy,time,open,high,low,close,vol) in data])

The per-minute returns in USDJPY, expressed in basis points, will be:

rjpy=10000.0*np.diff(jpy)/jpy[0:-1]

Define a histogram function and empirical PDF function as follows:

def histc(X,bins):
    map_to_bins = np.digitize(X,bins)
    r = np.zeros(bins.shape)
    for i in map_to_bins:
        r[i-1] += 1
    return [r,map_to_bins]

def epdf(S,numIntervals=100):
    minS=np.min(S)
    maxS=np.max(S)
    intervalWidth=(maxS-minS)/numIntervals
    x=np.arange(minS,maxS+intervalWidth/2.,intervalWidth)
    [ncount,ii]=histc(S,x)
    if ncount[1]>len(S)/2:
        medS=np.median(S)
        minS=0.8*medS
        maxS=1.2*medS
        intervalWidth=(maxS-minS)/numIntervals
        x=np.arange(minS,maxS+intervalWidth/2.,intervalWidth)
        [ncount,ii]=histc(S,x)
    relativefreq=ncount/sum(ncount)
    return (x,relativefreq)

The empirical PDF of USDJPY 1-minute pip returns is then:

(x,rf)=epdf(djpy,numIntervals=1000)

which if we plot it

from bokeh.plotting import figure, show
from bokeh.io import output_notebook
output_notebook()
simple=[(x[i],int(100*rf[i])) for i in range(rf.shape[0]) if int(100*rf[i]) > 0]
X=np.array([x for x,y in simple])
Y=np.array([y for x,y in simple])
p=figure(plot_width=600,plot_height=200,tools="pan,wheel_zoom,box_zoom,reset,resize")
p.line(X,Y)
show(p)

Looks like this:

PDF of 1 minute move of USDJPY

Now suppose I want to know what could happen in an hour (60 one-minute moves). Following the answer to this question, I could convolve the EPDF above 60 times and I should get the right answer. I think this would look something like this:

def step_pdf(pdf1,pdf2):
    pdf=np.convolve(pdf1,pdf2)
    pdf=(pdf[0:-1:2]+pdf[1::2])/2
    pdf=np.append(np.array([0]),pdf)
    pdf=pdf/pdf.sum()
    return pdf

from functools import reduce
pdf60=reduce(step_pdf,[rf for i in range(60)])

If I then plot the new pdf60 on top of the old pdf

p=figure(plot_width=600,plot_height=200,tools="pan,wheel_zoom,box_zoom,reset,resize")
p.line(x,rf,color='red')
p.line(x,pdf60,color='blue')
show(p)

I see (call this “Convolution PDF60”):

Convolution PDF60

The blue line is my 60-minute PDF from the above 60-fold convolution. It is smoother, which I expect, but it is still roughly in the same range as the original 1-minute PDF, which I do not expect. So now I will try a more constructive way of generating the 60-minute PDF: I will construct as many 60-minute samples randomly as I have 1-minute samples, by summing randomly selected vectors of size 60 from my original population of 1-minute moves. Then I will compute the empirical PDF of the result. I completely trust this construction, so I will use it as a benchmark against my original construction. So:

n=djpy.shape[0]
draws=np.random.randint(0,n,size=(n,60))
djpy60=np.array([djpy[draws[i]].sum() for i in range(n)])
(x,pdf60)=epdf(djpy60,numIntervals=1000)

Now if I plot pdf60:

p=figure(plot_width=600,plot_height=200,tools="pan,wheel_zoom,box_zoom,reset,resize")
p.line(x,rf,color='red')
p.line(x,pdf60,color='blue')
show(p)

I see a much wider distribution of 60-minute moves, which corresponds much more strongly to my intuition (call this “Monte Carlo PDF60”):

Monte Carlo PDF60

Question: Why aren’t my Convolution PDF60 and my Monte Carlo PDF60 in agreement?

What proves that a random process with zero diffusion is not a martingale?

[Cross-posted.]

Consider the process dX_t=W_t dt+0 dW_t, alternatively X_t=\int_0^t W_s ds. W_t is Brownian motion. I read a proof that X_t is a martingale that simply states “Because the diffusion of dX_t is 0, $X_t$ is not a martingale.”

By definition, a stochastic process X_t adapted to a filtration \{F_t\} is a martingale iff E(|X_t|) <\infty, t \geq 0 and E(X_t|{\cal F}_s)=X_s, 0\leq s<t

Question: What exactly about either of these conditions establishes that if a random process has 0 diffusion, it is not a martingale?

I am asking because I see the 0-diffusion condition used often for this purpose, but in the above example, of a process which is still random even though it has a zero diffusion, I don’t get it.

Write expectation of brownian motion conditional on filtration as an integral?

[Cross-post.]

Let W_t be a Brownian motion, so W_t=z_t \sqrt{t} where z_t \in N(0,1) and the pdf of z is
f(z)=\frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}}. So

E(W_t)=\int_{-\infty}^{\infty} W_t f(z) dz =\int_{-\infty}^{\infty} z \sqrt{t} \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} dz =\int_{0}^{\infty} (z+(-z)) \sqrt{t} \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} dz=0

Now suppose {\cal F}_t is the natural filtration for W_t. By construction of Brownian motion, we are given that E(W_t|{\cal F}_s)=W_s, 0\leq s\leq t.

Question: How do I write E(W_t|{\cal F}_s) as a Riemann integral expression similar to the Riemann integral expression of E(W_t) given above?

Note: I have done extensive Google search on this, without finding any responsive exposition. If this question is beside the point, please explain why. If it’s on point, please answer with the Riemann integral expression.

Two-step empirical CDF from one-step empirical CDF

(cross-post)

Suppose I have a random variable X_i which changes by X_{i+1}=X_i+\delta_i from one timestep to the next. Suppose I do an experiment where I observe N values d_1,d_2,\ldots,d_N of $\delta_0$ and make an experimental CDF of \delta_0, by sorting d so that d_1 \leq d_2 \cdots \leq d_N, and then approximating the CDF of y as \frac{i}{N} where $d_i \leq y \leq d_{i+1}$.

Question: What is the most efficient way to compute the empirical CDF of two steps of X, assuming that the process for going from X_i to X_{i+1} follows the same empirical distribution? The brute force way that occurs to me is to create the set E={d_i+d_j: 1\leq i\leq N, 1\leq j\leq N}, sorting E=e_1,\ldots,e_{N^2} and then approximating the two-step CDF of y as \frac{i}{N^2} where e_i \leq y \leq e_{i+1}.