Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have this code:

df1<-tibble(x = sort(rnorm(1e5)),
       cumulative = cumsum(abs(10-x)/sum(abs(10-x)))/2.5)
df2<-tibble(x1 = sort(rbinom(1e5,1e5, 0.001)/1e5))

which was posted in my previous question. After some research I still can't understand several things and I will be so pleased if someone will explain it for me:

  1. Which params we have at df1 distribution?
  2. Which params we have at df2 distribution? Why we have to divide on 1e5 and on which formula it is based?
  3. Why we have to use 10 at such scope - (10-x) and 2.5 also.

I will be happy if someone explain this questions for me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
153 views
Welcome To Ask or Share your Answers For Others

1 Answer

I think the answer to your previous question is wrong and misleading, although, to be fair, you did not ask the question very clearly.

I think what you are perhaps trying to do is compare the binomial distribution to the Normal approximation to it. The binomial is the number of successes you get if you do something N times, and the chance of each being a success is p. The mean of this is Np, and the standard deviation is sqrt(Np(1-p)), which can be used to approximate it with a Normal distribution.

One way to compare them using ggplot would be like this...

library(tidyverse)

trials <- 100   #i.e. N in the explanation above
prob <- 0.1     #i.e. p in the explanation above
sims <- 100000  #the number of simulations you want (1e5 in your previous question)

df <- tibble(n = 1:sims,
             normal = sort(rnorm(sims,                               #no of variates
                                 trials * prob,                      #mean
                                 sqrt(trials * prob * (1-prob)))),   #standard deviation
             binomial = sort(rbinom(sims, 
                                    trials, 
                                    prob))) 

Then, to compare the (discrete) histogram of the binomial distribution (in red) with the (continuous) density of the Normal approximation (in blue), you can do

df %>% ggplot() + 
  geom_density(aes(x = normal), 
               alpha = 0.5, 
               fill = "blue") +
  geom_histogram(aes(x = binomial, 
                     y = stat(density)), #normalises scale to sum to 1 
                 alpha = 0.5, 
                 fill = "red", 
                 binwidth = 1)

enter image description here

And to compare the cumulative distributions (taking advantage of the fact that we have sorted the variates in our dataframe)...

df %>% ggplot(aes(y = n/sims)) + 
  geom_line(aes(x = normal), 
            colour = "blue") +
  geom_line(aes(x = binomial), 
            colour = "red")

enter image description here

I hope this helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...