--- title: "Comparaison of methods" output: pdf_document --- # Scan statistique - Méthode de Monte Carlo et calcul de p-value ## 1. Proposition for simulations under $\mathcal{H}_1$ In this part, we propose a method that simulates a Poisson process under the hypothesis $\mathcal{H}_1$. The idea is to simulate a sample under $\mathcal{H}_0$, and add randomly a subsequence under the alternative hypothesis in this sequence. ```{r} PoissonProcess <- function(lambda,T) { return(sort(runif(rpois(1,lambda*T),0,T))) } SimulationH1 <- function(lambda0, lambda1,T,tau){ ppH0=PoissonProcess(lambda0,T) ppH1.segt=PoissonProcess(lambda1,tau) dbt=runif(1,0,T-tau) ppH0bis=PoissonProcess(lambda0,T) ppH1.repo=dbt+ppH1.segt ppH0_avant=ppH0bis[which(ppH0bisppH1.repo[length(ppH1.repo)])] ppH1=c(ppH0_avant,ppH1.repo,ppH0_apres) return (ppH1) } TimeBetweenEvent <- function(pp){ n=length(pp) tbe=pp[2:n]-pp[1:n1-1] tbe=c(0,tbe) return (tbe) } DataFrame <- function(pp,tbe){ list=data.frame(ProcessusPoisson=pp, TimeBetweenEvent=tbe) } ``` ## 2. Simulation of the sequences under $\mathcal{H}_0$ via a Monte Carlo Method In this part, we will try to simulate, using a Monte Carlo method, a set of $10^5$ independant samples, under the assumption that $\lambda=\lambda_0$, hence, that we are under the null hypothesis $\mathcal{H}_0$. ```{r} ScanStat <- function(pp, T, tau){ n=length(pp) stop=n-length(which(pp>(T-tau))) ScanStat=0 for (i in (1:stop)) { x=which((pp>=pp[i])&(pp<=(pp[i]+tau))) scan=length(x) if (scan>ScanStat) {ScanStat=scan} } return (c(i,ScanStat)) } ``` We test the scan statistic method for different values of $\lambda_0$. The method of scan statistic we implemented will allow us to have access to the scan test statistic and where it happens in the sequence. ```{r} EmpDistrib <- function(lambda, n_sample,T,tau){ pp=PoissonProcess(lambda,T) scan=c(ScanStat(pp,T, tau)[2]) index=c(ScanStat(pp,T, tau)[1]) for (i in 2:(n_sample)){ pp=PoissonProcess(lambda,T) scan=rbind(scan,ScanStat(pp,T, tau)[2]) index=rbind(index,ScanStat(pp,T, tau)[1]) } min_scan=min(scan)-1 max_scan=max(scan) table1=table(factor(scan, levels = min_scan:max_scan)) EmpDis=data.frame(cdf=cumsum(table1)/sum(table1), proba=table1/sum(table1), index_scan=min_scan:max_scan) EmpDis<-EmpDis[,-2] return(EmpDis) } ``` ```{r} library("latex2exp") Plot_CDF <- function(lambda,n_sample,T,tau){ Emp=EmpDistrib(lambda,n_sample,T,tau) title=TeX(paste(r'(Cumulative distribution function for $\lambda=$)', lambda)) plot(Emp$index_scan, Emp$cdf,type="s",xlab="Number of occurrences",ylab="Probability", main=title, col="red") return(Emp) } ``` ### 2.1 Test of $\mathcal{H}_0: \lambda=\lambda_0$ against $\mathcal{H}_0: \lambda=\lambda_1$, where $\lambda_1 > \lambda_0$ In this part, we will test different values for $\lambda_0$ and $\lambda_1$, and compute the probability of occurrence of a certain scan statistic. ```{r} #Empiricial distribution under H0 n_sample=10**4 lambda0=3 T=10 tau=1 ppH0=PoissonProcess(lambda0,T) CDF=Plot_CDF(lambda0,n_sample,T,tau) ``` ```{r} PValue <- function(Emp,ppH1, T, tau){ scanH1=ScanStat(ppH1,T,tau)[2] index=Emp$index_scan n=length(index) if (scanH1< min(Emp$index_scan)){ return (c(scanH1,1)) } else{ if(min(Emp$index_scan) Nb seq. tbe -> dist. tbe (vérif) + Nb seq. Scores -> distr scores A = 1/(lambda0-lambda1) B = A*log(lambda1/lambda0) ppH1 = PoissonProcess(lambda1,T) n1 = length(ppH1) tbe1 = ppH1[2:n1]-ppH1[1:n1-1] print(tbe1) print(ks.test(tbe1,'exp')) x = log(lambda1/lambda0)+(lambda0-lambda1)*tbe1 # ne pas mettre le floor ni le E (certes égale à 1) hist(x) print(summary(x)) # Calcul du maximum des scores E = 1 # THEO à faire !!! max.s = log(lambda1/lambda0) maxXk = floor(E*(log(lambda1/lambda0))) maxXk while (maxXk < 3) { E = E+1 maxXk = floor(E*(log(lambda1/lambda0))) } print(E) x = floor(E*(log(lambda1/lambda0)+(lambda0-lambda1)*tbe1)) dist.emp.scores = table(x)/sum(table(x)) dist.emp.scores hist(x) print(range(x)) x.verif = seq(range(x)[1],range(x)[2],1) dist.theo.scores = lambda0*exp(-lambda0*(A*x.verif-B)) print(dist.theo.scores) print(dist.emp.scores) ```