How I can understand code of two distributions R?

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

How I can understand code of two distributions R?

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I have this code:

df1<-tibble(x = sort(rnorm(1e5)),
       cumulative = cumsum(abs(10-x)/sum(abs(10-x)))/2.5)
df2<-tibble(x1 = sort(rbinom(1e5,1e5, 0.001)/1e5))

which was posted in my previous question. After some research I still can't understand several things and I will be so pleased if someone will explain it for me:

Which params we have at df1 distribution?
Which params we have at df2 distribution? Why we have to divide on 1e5 and on which formula it is based?
Why we have to use 10 at such scope - (10-x) and 2.5 also.

I will be happy if someone explain this questions for me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

153 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:21:30+0000

I think the answer to your previous question is wrong and misleading, although, to be fair, you did not ask the question very clearly.

I think what you are perhaps trying to do is compare the binomial distribution to the Normal approximation to it. The binomial is the number of successes you get if you do something N times, and the chance of each being a success is p. The mean of this is Np, and the standard deviation is sqrt(Np(1-p)), which can be used to approximate it with a Normal distribution.

One way to compare them using ggplot would be like this...

library(tidyverse)

trials <- 100   #i.e. N in the explanation above
prob <- 0.1     #i.e. p in the explanation above
sims <- 100000  #the number of simulations you want (1e5 in your previous question)

df <- tibble(n = 1:sims,
             normal = sort(rnorm(sims,                               #no of variates
                                 trials * prob,                      #mean
                                 sqrt(trials * prob * (1-prob)))),   #standard deviation
             binomial = sort(rbinom(sims, 
                                    trials, 
                                    prob)))

Then, to compare the (discrete) histogram of the binomial distribution (in red) with the (continuous) density of the Normal approximation (in blue), you can do

df %>% ggplot() + 
  geom_density(aes(x = normal), 
               alpha = 0.5, 
               fill = "blue") +
  geom_histogram(aes(x = binomial, 
                     y = stat(density)), #normalises scale to sum to 1 
                 alpha = 0.5, 
                 fill = "red", 
                 binwidth = 1)

And to compare the cumulative distributions (taking advantage of the fact that we have sorted the variates in our dataframe)...

df %>% ggplot(aes(y = n/sims)) + 
  geom_line(aes(x = normal), 
            colour = "blue") +
  geom_line(aes(x = binomial), 
            colour = "red")

I hope this helps!

Categories

How I can understand code of two distributions R?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags