Clean up unused files
This commit is contained in:
parent
68cbc5498d
commit
955b06bac6
Binary file not shown.
|
@ -1,649 +0,0 @@
|
||||||
---
|
|
||||||
title: "Evaluation of the performance of the new statistical tool"
|
|
||||||
output: pdf_document
|
|
||||||
---
|
|
||||||
\tableofcontents
|
|
||||||
|
|
||||||
In this part, we will try to generate a dataset and study the premises of the scan statistic methods.
|
|
||||||
To do so, we will use different types of data: either it will be a dataset we created ourselves, or a dataset that we studied last semester in Statistical Modeling. This dataset, called _SAheart_. This dataset is a compilation of 462 observations following 10 variables to explain the occurrence of a coronary heart disease.
|
|
||||||
|
|
||||||
\section{1. Study of a random dataset, with given law of probability}
|
|
||||||
In this section, we will try to randomly simulate a sample, given a known law of probability. To do so, we will use the R built-in function to simulate a variable of distribution $\mathcal{U}[0,1]$, and use the Monte Carlo's method to create these sample. The idea is to see if there are any irregularities to this dataset for random laws.
|
|
||||||
|
|
||||||
\subsection{1.1. Simulation of random samples, for a continiuous law of probability}
|
|
||||||
We will first try to create a sample of known-distribution, for instance, an exponential distribution of whom we know the value of the parameter $\lambda$. Thus, we will take $\displaystyle\lambda=\frac{1}{1500}$.
|
|
||||||
For instance, let's consider that the sample created $X$ represent the lifetime of $n=100$ bulbs. Hence, if we create a dataset of law $\mathcal{E}(\lambda)$, then the average lifetime of a bulb. is equal to $$\mathbb{E}[X]=\frac{1}{\lambda}=1500$$.
|
|
||||||
To create this sample, we will use a Monte Carlo method. Indeed, if $U\sim \mathcal{U}[0,1]$, $X\sim \mathcal{E}(\lambda)$, and if we denote $u$ a simulation of law $U$, then we have that $$x=\frac{\ln(1-u)}{\lambda}$$ is a simulation of the same distribution as $X$.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
n=100
|
|
||||||
u=runif(n,0,1)
|
|
||||||
x=c(1:n)*0
|
|
||||||
lamb=1/1500
|
|
||||||
x=-log(1-u)/lamb
|
|
||||||
```
|
|
||||||
This Monte Carlo method generates automatically a $n$-sample (where $n=100$). We can verify, using Kolmogorov-Smirnov adequacy test, that the created sample is following an exponential distribution, considering a certain error $\alpha$. The alternative `two.sided` will test:
|
|
||||||
$$\mathcal{H}_0: F=F_X \quad \text{vs.} \quad \mathcal{H}_1: F\ne F_X $$
|
|
||||||
where $F$ denotes the cumulative distribution function of the exponential law $\mathcal{E}(\lambda)$, and $F_X$ the cumulative distribution function of the sample of law $X$.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
ks.test(x,pexp,1/1500, alternative="two.sided")
|
|
||||||
|
|
||||||
```
|
|
||||||
Here, since the p-value is larger than $\alpha=5\%$, we can conclude that the dataset we generated is following an exponential distribution of parameter $\lambda$, given a level of confidence $1-\alpha$.
|
|
||||||
|
|
||||||
\subsection{1.2. Insertion of random irregularities of a given law of probability}
|
|
||||||
In order to test the method of scan statistic, we will randomly change $m=30$ variables in the sample. Logically, if the n-sample is distributed given an exponential law, then, since we use a Monte-Carlo method to create the sample, then all the variables are distributed according to this law of probability. Hence, any subset of the sample is distributed given this law of probability. \
|
|
||||||
Inserting randomized values of another law of probability will allow us to test if we have some significant changes of data, and will illustrate the scan statistic method.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
m=30 #number of changes
|
|
||||||
index=floor(runif(m,1,n))
|
|
||||||
lamb2=1/500
|
|
||||||
y=x
|
|
||||||
|
|
||||||
for (i in 1:m){
|
|
||||||
u=runif(1,0,1)
|
|
||||||
y[index[i]]=-log(1-u)/lamb2}
|
|
||||||
```
|
|
||||||
Here if we apply Kolmogorov-Smirnov adequacy test, we obtain the following result:
|
|
||||||
```{r}
|
|
||||||
ks.test(y,pexp,1/1500, alternative="two.sided")
|
|
||||||
result=ks.test(y,pexp,1/1500, alternative="two.sided")
|
|
||||||
print(result$statistic)
|
|
||||||
```
|
|
||||||
|
|
||||||
We can see that if we change randomly 40 out of 100 data, then the Kolmogorov-Smirnov adequacy test reject the null hypothesis. Hence, the fact that the sample is distributed given an exponential law of parameter $\displaystyle\lambda=\frac{1}{1500}$.
|
|
||||||
|
|
||||||
\subsection{1.3. First implementation of a scan statistic method}
|
|
||||||
|
|
||||||
Then, we can try to implement a method of scan statistic that will study the Kolmogorov-Smirnov statistic of test. The idea is to find the range of elements of the sample that has the largest statistic of test, that is to say to find the range where the distribution is less-likely to be compared to an exponential distribution of parameter $\displaystyle \lambda=\frac{1}{1500}$.
|
|
||||||
We will try to implement this method of statistic scan on a constant window of size $10$. Then, each window will go through all the data set, until we reach its end.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
n=100
|
|
||||||
size=10
|
|
||||||
max=0
|
|
||||||
rangemax=c(1:size)*0
|
|
||||||
start=1
|
|
||||||
end=size
|
|
||||||
|
|
||||||
for (i in 1:(n-size)){
|
|
||||||
result=ks.test(y[start:end],pexp,1/1500, alternative="two.sided")
|
|
||||||
compared=result$statistic
|
|
||||||
if (compared > max){max=compared
|
|
||||||
rangemax=c(start:end)}
|
|
||||||
start=start+1
|
|
||||||
end=end+1
|
|
||||||
}
|
|
||||||
print(max)
|
|
||||||
print(rangemax)
|
|
||||||
```
|
|
||||||
This code is a first implementation of a scan statistic on a given set of known probability, with some randomization. Here, we can see that, if we consider a constant-sized window, then the set with the largest Kolmogorov-Smirnov test statistic is given by `maxrange`. Hence, if we consider a serial number on the production of bulbs, then the set we will try to adjust will be the one given by `maxrange`.
|
|
||||||
|
|
||||||
|
|
||||||
We can try to change the size of the window using the exact same code, and ammending the value of `size`.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
size=5
|
|
||||||
max=0
|
|
||||||
rangemax=c(1:size)*0
|
|
||||||
start=1
|
|
||||||
end=size
|
|
||||||
|
|
||||||
for (i in 1:(n-size)){
|
|
||||||
result=ks.test(y[start:end],pexp,1/1500, alternative="two.sided")
|
|
||||||
compared=result$statistic
|
|
||||||
if (compared > max){max=compared
|
|
||||||
rangemax=c(start:end)}
|
|
||||||
start=start+1
|
|
||||||
end=end+1
|
|
||||||
}
|
|
||||||
print(max)
|
|
||||||
print(rangemax)
|
|
||||||
```
|
|
||||||
With `size=5`, we find that the subset with maximal test statistic is a subset of the one found for `size=10`. In terms of p-value:
|
|
||||||
```{r}
|
|
||||||
print(ks.test(y[rangemax],pexp,1/1500, alternative="two.sided")$p.value)
|
|
||||||
```
|
|
||||||
At level $\alpha=5\%$, we reject the null hypothesis.
|
|
||||||
|
|
||||||
If we study the original `x` sample, we observe:
|
|
||||||
```{r}
|
|
||||||
size=10
|
|
||||||
max=0
|
|
||||||
rangemax=c(1:size)*0
|
|
||||||
start=1
|
|
||||||
end=size
|
|
||||||
|
|
||||||
for (i in 1:(n-size)){
|
|
||||||
result=ks.test(x[start:end],pexp,1/1500, alternative="two.sided")
|
|
||||||
compared=result$statistic
|
|
||||||
if (compared > max){max=compared
|
|
||||||
rangemax=c(start:end)}
|
|
||||||
start=start+1
|
|
||||||
end=end+1
|
|
||||||
}
|
|
||||||
print(max)
|
|
||||||
print(rangemax)
|
|
||||||
print(ks.test(y[rangemax],pexp,1/1500, alternative="two.sided")$p.value)
|
|
||||||
```
|
|
||||||
Hence, we can see that, in the dataset `x`, wihtout any irregularities, there are some window where the scan statistic is already detecting some problems. The range `rangemax` does not have to be the same as the one with random data.
|
|
||||||
|
|
||||||
Now we will use another method using the uniform distribution in order to implement a Poisson Process.
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
PoissonProcess <- function(lambda,T) {
|
|
||||||
return(sort(runif(rpois(1,lambda*T),0,T)))
|
|
||||||
|
|
||||||
pp1=PoissonProcess(lambda0,Ti)
|
|
||||||
print(pp1)
|
|
||||||
plot(c(0,pp1),0:length(pp1),type="s",xlab="time t",ylab="number of events by time t")
|
|
||||||
|
|
||||||
pp2=PoissonProcess(lambda1,Ti)
|
|
||||||
print(pp2)
|
|
||||||
plot(c(0,pp2),0:length(pp2),type="s",xlab="time t",ylab="number of events by time t")
|
|
||||||
|
|
||||||
#time between events
|
|
||||||
n1=length(pp1)
|
|
||||||
tbe1=pp1[2:n1]-pp1[1:n1-1]
|
|
||||||
tbe1
|
|
||||||
|
|
||||||
n2=length(pp2)
|
|
||||||
tbe2=pp2[2:n2]-pp2[1:n2-1]
|
|
||||||
tbe2
|
|
||||||
|
|
||||||
ks.test(tbe1,pexp,lambda0, alternative="two.sided")
|
|
||||||
|
|
||||||
ks.test(tbe2,pexp,lambda1, alternative="two.sided")
|
|
||||||
|
|
||||||
```
|
|
||||||
The Kolmogorov-Smirnov test rejects the hypothesis that the time between events sequence is following an exponential distribution.
|
|
||||||
|
|
||||||
\section{Proposition de simulation sous H1}
|
|
||||||
|
|
||||||
Je reprends votre code pour faire un data set :
|
|
||||||
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
PoissonProcess <- function(lambda,T) {
|
|
||||||
return(sort(runif(rpois(1,lambda*T),0,T)))
|
|
||||||
}
|
|
||||||
|
|
||||||
# Etape 1 : simu Poisson process sous H0
|
|
||||||
ppH0=PoissonProcess(lambda0,Ti)
|
|
||||||
|
|
||||||
# Etape 2 : creation d'un segment sous H1
|
|
||||||
tau= 3 # longeur de l'intervalle modifie, a fortiori tau < Ti
|
|
||||||
ppH1.segt=PoissonProcess(lambda1,tau)
|
|
||||||
|
|
||||||
# Etape 3 : insertion du segment dans la sequence H0
|
|
||||||
dbt=runif(1,0,Ti-tau) # choix de l'indice de temps ou va commencer le segment modifié
|
|
||||||
ppH0bis=PoissonProcess(lambda0,Ti)
|
|
||||||
ppH1.repo=dbt+ppH1.segt # repositionnement des observations dans le temps
|
|
||||||
ppH0_avant=ppH0bis[which(ppH0bis<ppH1.repo[1])]
|
|
||||||
ppH0_apres=ppH0bis[which(ppH0bis>ppH1.repo[length(ppH1.repo)])]
|
|
||||||
ppH1=c(ppH0_avant,ppH1.repo,ppH0_apres)
|
|
||||||
|
|
||||||
|
|
||||||
#time between events
|
|
||||||
n1=length(ppH1)
|
|
||||||
tbe1=ppH1[2:n1]-ppH1[1:n1-1]
|
|
||||||
tbe1=c(0,tbe1)
|
|
||||||
|
|
||||||
n0=length(ppH0)
|
|
||||||
tbe0=ppH0[2:n0]-ppH0[1:n0-1]
|
|
||||||
|
|
||||||
list1=data.frame(ProcessusPoissonH1=ppH1,
|
|
||||||
TimeBetweenEventH1=tbe1)
|
|
||||||
|
|
||||||
tbe0=c(0,tbe0)
|
|
||||||
|
|
||||||
list0=data.frame(ProcessusPoissonH0=ppH0,
|
|
||||||
TimeBetweenEventH0=tbe0)
|
|
||||||
|
|
||||||
list0
|
|
||||||
list1
|
|
||||||
```
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
|
|
||||||
lambda=2
|
|
||||||
Tps=10
|
|
||||||
tau=1
|
|
||||||
pp=PoissonProcess(lambda,Tps)
|
|
||||||
pp
|
|
||||||
n=length(pp)
|
|
||||||
n
|
|
||||||
which(pp>(Tps-tau))
|
|
||||||
stop=n-length(which(pp>(Tps-tau)))
|
|
||||||
stop
|
|
||||||
ScanStat=0
|
|
||||||
for (i in (1:stop)) {
|
|
||||||
cat('\n i=',i)
|
|
||||||
cat('\t ppi=',pp[i])
|
|
||||||
x=which((pp>=pp[i])&(pp<=(pp[i]+tau)))
|
|
||||||
x
|
|
||||||
cat('\t which=',x)
|
|
||||||
scan=length(x)
|
|
||||||
cat(' scan=',scan)
|
|
||||||
if (scan>ScanStat) ScanStat=scan
|
|
||||||
}
|
|
||||||
ScanStat
|
|
||||||
```
|
|
||||||
```{r}
|
|
||||||
|
|
||||||
lambda0=list(1:10)
|
|
||||||
lambda1=list(1:10)
|
|
||||||
#mu0=2
|
|
||||||
|
|
||||||
E=10
|
|
||||||
|
|
||||||
mu1=1
|
|
||||||
|
|
||||||
for (mu0 in 1:10 ) {
|
|
||||||
#for (mu1 in 1:10) {
|
|
||||||
repe=10^3
|
|
||||||
# longueur de chaque séquence
|
|
||||||
n=200
|
|
||||||
SL_vect=vector(length=repe) # vecteur contenant le score local pour chaque séquence
|
|
||||||
for (j in 1:repe)
|
|
||||||
{
|
|
||||||
cat('\n repe=',j)
|
|
||||||
w.E=0 # initialisation de W (processus de Lindley) pour la séquence j
|
|
||||||
SL=0 # init du score local pour la séquence j
|
|
||||||
for (i in 1:n) {
|
|
||||||
a=rexp(1,mu0) # ici simulation d'une observation loi normale ; on peut aussi aller lire une observation dans un fichier de données
|
|
||||||
s.E=floor(E*log(dexp(a,mu0)/dexp(a,mean=mu0,sd=s0))) # calcul du score LLR associé à l'observation a
|
|
||||||
w.E=max(0,w.E+s.E) # calcul de la valeur W à l'indice j
|
|
||||||
if (w.E>SL) SL=w.E # actualisation du score local, cf. SL=max_j Wj
|
|
||||||
SL_vect[j]=SL # remplissage du vecteur des valeur de score local
|
|
||||||
}
|
|
||||||
}
|
|
||||||
p <- hist(SL_vect)
|
|
||||||
print(p)
|
|
||||||
#}
|
|
||||||
}
|
|
||||||
SL_vect
|
|
||||||
|
|
||||||
#p <- hist(SL_vect)
|
|
||||||
#print(p)
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
library(stringr)
|
|
||||||
|
|
||||||
E=10
|
|
||||||
lambda0=1
|
|
||||||
lambda1=2
|
|
||||||
max=0
|
|
||||||
|
|
||||||
while (exp(-lambda0*max) > 10^(-9)){
|
|
||||||
max = max + 1
|
|
||||||
}
|
|
||||||
max
|
|
||||||
|
|
||||||
# en fonction de lambda0 tq P(Expo(lambda
|
|
||||||
#0)>=max)<10^-9
|
|
||||||
t=seq(0,max,0.1)
|
|
||||||
x=floor(E*log(lambda1/lambda0)+(lambda0-lambda1)*t)
|
|
||||||
head(x)
|
|
||||||
range(x)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
s=seq(-15,6,0.1)
|
|
||||||
##tronquer la queue des x (x négatifs)
|
|
||||||
A=1/(lambda0-lambda1)
|
|
||||||
B=A*log(lambda1/lambda0)
|
|
||||||
proba.s=lambda0*exp(-lambda0*(A*s-B))
|
|
||||||
proba.s = proba.s/sum(proba.s)
|
|
||||||
barplot(proba.s)
|
|
||||||
|
|
||||||
##trouver 1/lambda en fonction intervalle seq
|
|
||||||
## ecart ou rapport entre les lambda qui est le + important ?
|
|
||||||
|
|
||||||
for (lambda0 in c(5,50,500,5000,50000) ) {
|
|
||||||
for (lambda1 in c(2,20,200,2000,20000) ) {
|
|
||||||
probaS_vect=vector(length=length(s))
|
|
||||||
|
|
||||||
for (j in 1:length(x))
|
|
||||||
{
|
|
||||||
|
|
||||||
|
|
||||||
A=1/(lambda0-lambda1)
|
|
||||||
B=A*log(lambda1/lambda0)
|
|
||||||
|
|
||||||
probaS=floor(E*lambda0*exp(-lambda0*(A*x[j]-B)))
|
|
||||||
# print(s)
|
|
||||||
probaS_vect[j]=probaS
|
|
||||||
}
|
|
||||||
|
|
||||||
probaS_vect = probaS_vect/sum(probaS_vect)
|
|
||||||
print(probaS_vect)
|
|
||||||
p <- barplot(probaS_vect, main=str_c("lambda0=", as.character(lambda0), "et", "lambda1=", as.character(lambda1), sep=" " ))
|
|
||||||
p
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
```{r}
|
|
||||||
# Vérification de la distribution theorique des scores
|
|
||||||
T=10
|
|
||||||
E=1
|
|
||||||
lambda0=2
|
|
||||||
lambda1=3
|
|
||||||
|
|
||||||
## Choix du E
|
|
||||||
maxXk = floor(E*(log(lambda1/lambda0)))
|
|
||||||
while (maxXk < 3) {
|
|
||||||
E = E+1
|
|
||||||
maxXk = floor(E*(log(lambda1/lambda0)))
|
|
||||||
}
|
|
||||||
E
|
|
||||||
|
|
||||||
## Calcul score max
|
|
||||||
score_max=floor(E*log(lambda1/lambda0))
|
|
||||||
score_max
|
|
||||||
|
|
||||||
## Calcul score_min
|
|
||||||
score_min_c=floor(E*log(lambda1/lambda0)+E*(lambda0-lambda1)*T)
|
|
||||||
|
|
||||||
## Calcul distrib score
|
|
||||||
l=seq(score_min_c,score_max,1)
|
|
||||||
borne_inf=(l-E*log(lambda1/lambda0))/(E*(lambda0-lambda1))
|
|
||||||
borne_sup=(l+1-E*log(lambda1/lambda0))/(E*(lambda0-lambda1))
|
|
||||||
proba.l=pexp(rate=lambda0,borne_inf)-pexp(rate=lambda0,borne_sup)
|
|
||||||
proba.l
|
|
||||||
S=sum(proba.l)
|
|
||||||
new.proba.s=proba.l/S
|
|
||||||
new.proba.s
|
|
||||||
barplot(new.proba.s)
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
|
|
||||||
library(stringr)
|
|
||||||
|
|
||||||
# Effacement des objets de l'environnement de travail
|
|
||||||
rm(list=ls())
|
|
||||||
|
|
||||||
# Chgt du nbr de chiffres après la virgule
|
|
||||||
options("digits" = 15)
|
|
||||||
|
|
||||||
# Chemin du dossier à adapter
|
|
||||||
workpath = "."
|
|
||||||
|
|
||||||
|
|
||||||
E=1
|
|
||||||
lambda0=2
|
|
||||||
lambda1=3
|
|
||||||
|
|
||||||
|
|
||||||
# en fonction de lambda0 tq P(Expo(lambda
|
|
||||||
#0)>=max)<10^-9
|
|
||||||
probaseuilmin = 10^-9
|
|
||||||
|
|
||||||
# Vérification de la distribution theorique des scores
|
|
||||||
E=1
|
|
||||||
lambda0
|
|
||||||
lambda1
|
|
||||||
maxXk = floor(E*(log(lambda1/lambda0)))
|
|
||||||
while (maxXk < 3) {
|
|
||||||
E = E+1
|
|
||||||
maxXk = floor(E*(log(lambda1/lambda0)))
|
|
||||||
}
|
|
||||||
E
|
|
||||||
score_max=floor(E*log(lambda1/lambda0))
|
|
||||||
score_max
|
|
||||||
l=-50
|
|
||||||
borne_inf=(l-E*log(lambda1/lambda0))/(E*(lambda0-lambda1))
|
|
||||||
borne_sup=(l+1-E*log(lambda1/lambda0))/(E*(lambda0-lambda1))
|
|
||||||
pexp(rate=lambda0,borne_inf)-pexp(rate=lambda0,borne_sup)
|
|
||||||
|
|
||||||
l=seq(-100,score_max,1)
|
|
||||||
x=floor(E*log(lambda1/lambda0)+E*(lambda0-lambda1)*l)
|
|
||||||
proba.s=(lambda1/lambda0)*(exp(lambda0/(lambda0-lambda1)*(1-l/E))*exp(-lambda0/(E*lambda0-lambda1))-1)
|
|
||||||
cbind(x,proba.s)
|
|
||||||
tail(proba.s)
|
|
||||||
boxplot(proba.s)
|
|
||||||
sum(proba.s)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
score_min_c=floor(E*log(lambda1/lambda0)+E*(lambda0-lambda1)*max)
|
|
||||||
|
|
||||||
somme=(lambda1/lambda0)*(exp(-lambda0/(E*(lambda0-lambda1)))-1)* exp(lambda0/(lambda0-lambda1))*exp(lambda0/(E*(lambda0-lambda1))*(1-score_min_c))/(1-exp(lambda0/(E*(lambda0-lambda1))))
|
|
||||||
|
|
||||||
#proba.s = proba.s/sum(proba.s)
|
|
||||||
proba.s= proba.s/(1-somme)
|
|
||||||
proba.s
|
|
||||||
barplot(proba.s)
|
|
||||||
|
|
||||||
|
|
||||||
if (lambda0 < lambda1) {
|
|
||||||
# Troncage à un score minimal
|
|
||||||
minXk = as.numeric(s[max(which(proba.s<(probaseuilmin)))])# On définit la classe pour proba < probaseuilmin
|
|
||||||
names(proba.s) = as.character(s)
|
|
||||||
proba.s[as.character(minXk)] = sum(proba.s[which(proba.s<probaseuilmin)]) # probabilité de la classe SCORE < minXk
|
|
||||||
score = s
|
|
||||||
score=score[which(score>=minXk)]
|
|
||||||
proba.s = proba.s[which(s>=minXk)] # On ne garde que les scores supérieurs à minXk
|
|
||||||
subtitle = paste("lambda0=",lambda0,";lambda1=",lambda1,";E=",E,sep="")
|
|
||||||
barplot(proba.s, col="steelblue",xlab="Score",ylab="Probabilité", main = paste("Probabilité d'apparition de chaque score\nLoi géométrique : ",subtitle,sep=""))
|
|
||||||
print(score) }
|
|
||||||
|
|
||||||
probaseuilmax= 10^-5
|
|
||||||
|
|
||||||
if (lambda1 < lambda0) {
|
|
||||||
# Troncage à un score minimal
|
|
||||||
maxXk = as.numeric(s[max(which(proba.s>(probaseuilmax)))])# On définit la classe pour proba < probaseuilmax
|
|
||||||
names(proba.s) = as.character(s)
|
|
||||||
proba.s[as.character(maxXk)] = sum(proba.s[which(proba.s>probaseuilmax)]) # probabilité de la classe SCORE < maxXk
|
|
||||||
score = s
|
|
||||||
score=score[which(score<=maxXk)]
|
|
||||||
proba.s = proba.s[which(s<=maxXk)] # On ne garde que les scores inférieurs à maxXk
|
|
||||||
subtitle = paste("lambda0=",lambda0,";lambda1=",lambda1,";E=",E,sep="")
|
|
||||||
barplot(proba.s, col="steelblue",xlab="Score",ylab="Probabilité", main = paste("Probabilité d'apparition de chaque score\nLoi géométrique : ",subtitle,sep=""))
|
|
||||||
print(score) }
|
|
||||||
|
|
||||||
|
|
||||||
min(score)
|
|
||||||
max(score)
|
|
||||||
mean(score)
|
|
||||||
|
|
||||||
# Enregistrement du fichier
|
|
||||||
fichier_proba = paste(workpath,"/LLR_Theorique_lambda0",gsub('\\.','',as.character(lambda0)),"_lambda1",gsub('\\.','',as.character(lambda1)),"_E",E,"_CL.txt",sep="")
|
|
||||||
write.table(proba.s, file = fichier_proba)
|
|
||||||
|
|
||||||
# Lecture fichier
|
|
||||||
#prob.X = read.table(fichier_proba)
|
|
||||||
|
|
||||||
```
|
|
||||||
```{r}
|
|
||||||
PoissonProcess <- function(lambda,T) {
|
|
||||||
return(sort(runif(rpois(1,lambda*T),0,T)))
|
|
||||||
}
|
|
||||||
lambda0=2
|
|
||||||
lambda1=3
|
|
||||||
Ti=100000
|
|
||||||
pp1=PoissonProcess(lambda1,Ti)
|
|
||||||
print(pp1)
|
|
||||||
n1=length(pp1)
|
|
||||||
tbe1=pp1[2:n1]-pp1[1:n1-1]
|
|
||||||
tbe1
|
|
||||||
ks.test(tbe1,'exp')
|
|
||||||
x=log(lambda1/lambda0)+(lambda0-lambda1)*tbe1 # ne pas mettre le floor ni le E (certes égale à 1)
|
|
||||||
hist(x)
|
|
||||||
summary(x)
|
|
||||||
|
|
||||||
# Calcul du maximum des scores
|
|
||||||
E=1
|
|
||||||
# THEO à faire !!! max.s=log(lambda1/lambda0)
|
|
||||||
maxXk = floor(E*(log(lambda1/lambda0)))
|
|
||||||
maxXk
|
|
||||||
while (maxXk < 3) {
|
|
||||||
E = E+1
|
|
||||||
maxXk = floor(E*(log(lambda1/lambda0)))
|
|
||||||
}
|
|
||||||
E
|
|
||||||
|
|
||||||
x=floor(E*(log(lambda1/lambda0)+(lambda0-lambda1)*tbe1))
|
|
||||||
dist.emp.scores=table(x)/sum(table(x))
|
|
||||||
dist.emp.scores
|
|
||||||
hist(x)
|
|
||||||
range(x)
|
|
||||||
x.verif=seq(range(x)[1],range(x)[2],1)
|
|
||||||
#dist.theo.scores=lambda0*exp(-lambda0*(A*x.verif-B))
|
|
||||||
#dist.theo.scores
|
|
||||||
dist.emp.scores
|
|
||||||
barplot(dist.emp.scores)
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
|
|
||||||
barplot(proba.s)
|
|
||||||
##trouver 1/lambda en fonction intervalle seq
|
|
||||||
## ecart ou rapport entre les lambda qui est le + important ?
|
|
||||||
|
|
||||||
for (lambda0 in c(5,50,500,5000,50000) ) {
|
|
||||||
for (lambda1 in c(2,20,200,2000,20000) ) {
|
|
||||||
probaS_vect=vector(length=length(s))
|
|
||||||
|
|
||||||
for (j in 1:length(x))
|
|
||||||
{
|
|
||||||
|
|
||||||
|
|
||||||
A=1/(lambda0-lambda1)
|
|
||||||
B=A*log(lambda1/lambda0)
|
|
||||||
|
|
||||||
probaS=floor(E*lambda0*exp(-lambda0*(A*x[j]-B)))
|
|
||||||
# print(s)
|
|
||||||
probaS_vect[j]=probaS
|
|
||||||
}
|
|
||||||
|
|
||||||
probaS_vect = probaS_vect/sum(probaS_vect)
|
|
||||||
print(probaS_vect)
|
|
||||||
p <- barplot(probaS_vect, main=str_c("lambda0=", as.character(lambda0), "et", "lambda1=", as.character(lambda1), sep=" " ))
|
|
||||||
p
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Import data of rainfall in France every 3 hours.
|
|
||||||
```{r}
|
|
||||||
Rain_Dataset = read.csv("data/synop.202202.csv", sep = ";")
|
|
||||||
print("Rain Dataset")
|
|
||||||
summary(Rain_Dataset)
|
|
||||||
|
|
||||||
|
|
||||||
Rain_Dataset_Red = Rain_Dataset[,c('date', 'rr3')]
|
|
||||||
Rain_Dataset_Red[,'rr3'] = as.numeric(Rain_Dataset_Red[,'rr3'])
|
|
||||||
|
|
||||||
summary(Rain_Dataset_Red)
|
|
||||||
head(Rain_Dataset_Red)
|
|
||||||
l=Rain_Dataset_Red[,1]
|
|
||||||
l
|
|
||||||
```
|
|
||||||
|
|
||||||
SST
|
|
||||||
```{r}
|
|
||||||
T = 1.
|
|
||||||
tau = .25
|
|
||||||
table_ppH1 = table(ppH1)
|
|
||||||
sst = -1
|
|
||||||
for (i in seq(0,(T-tau)/tau)){
|
|
||||||
t = i*tau
|
|
||||||
print(i)
|
|
||||||
partial_sst = length(table_ppH1[names(table_ppH1)>=t && names(table_ppH1)<=t+tau])
|
|
||||||
print(partial_sst)
|
|
||||||
if (sst<partial_sst) {
|
|
||||||
sst = partial_sst
|
|
||||||
}
|
|
||||||
}
|
|
||||||
sst
|
|
||||||
```
|
|
||||||
```{r}
|
|
||||||
lambda=2
|
|
||||||
Tps=10
|
|
||||||
tau=1
|
|
||||||
pp=PoissonProcess(lambda,Tps)
|
|
||||||
pp
|
|
||||||
n=length(pp)
|
|
||||||
n
|
|
||||||
which(pp>(Tps-tau))
|
|
||||||
stop=n-length(which(pp>(Tps-tau)))
|
|
||||||
ScanStat=0
|
|
||||||
for (i in (1:stop)) {
|
|
||||||
cat('\n i=',i)
|
|
||||||
cat('\t ppi=',pp[i])
|
|
||||||
x=which((pp>=pp[i])&(pp<=(pp[i]+tau)))
|
|
||||||
cat('\t which=',x)
|
|
||||||
scan=length(x)
|
|
||||||
cat(' scan=',scan)
|
|
||||||
if (scan>ScanStat) ScanStat=scan
|
|
||||||
}
|
|
||||||
ScanStat
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
lambda=2
|
|
||||||
Tps=10
|
|
||||||
tau=1
|
|
||||||
pp=Rain_Dataset_Red[,1]
|
|
||||||
pp
|
|
||||||
n=length(pp)
|
|
||||||
n
|
|
||||||
which(pp>(Tps-tau))
|
|
||||||
stop=n-length(which(pp>(Tps-tau)))
|
|
||||||
ScanStat=0
|
|
||||||
for (i in (1:stop)) {
|
|
||||||
cat('\n i=',i)
|
|
||||||
#cat('\t ppi=',pp[i])
|
|
||||||
x=which((pp>=pp[i])&(pp<=(pp[i]+tau)))
|
|
||||||
cat('\t which=',x)
|
|
||||||
scan=length(x)
|
|
||||||
cat(' scan=',scan)
|
|
||||||
if (scan>ScanStat) ScanStat=scan
|
|
||||||
}
|
|
||||||
ScanStat
|
|
||||||
|
|
||||||
```
|
|
||||||
```{r}
|
|
||||||
|
|
||||||
pp=Rain_Dataset_Red
|
|
||||||
print(Rain_Dataset)
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
Local score
|
|
||||||
```{r}
|
|
||||||
library("localScore")
|
|
||||||
E = 10
|
|
||||||
X = floor(E*log(dexp(tbe1, rate = lambda1)/dexp(tbe1, rate = lambda0)))
|
|
||||||
|
|
||||||
max_X = max(X)
|
|
||||||
min_X = min(X)
|
|
||||||
P_X = table(factor(X, levels = min_X:max_X))/length(X)
|
|
||||||
|
|
||||||
LS=localScoreC(X)$localScore[1]
|
|
||||||
|
|
||||||
result = daudin(localScore = LS, score_probabilities = P_X, sequence_length = length(x), sequence_min = min_X, sequence_max = max_X)
|
|
||||||
result
|
|
||||||
```
|
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue