欢迎加入 TidyFriday !
首年 345，续费享 8 折! 详情可添加我的微信咨询！
Today afternoon, a friend asked me a question, I can’t describe this question in English accurately, but I will try my best. Now, she has 30 sequences, each has 10 thousand numeric elements, they seemingly follow a normal distribution. For a normal distribution, the max density value correspond to mean. she want to know at which value, the density value equals half of the max density value.
The original data is: A.csv，each line is a sequence. First we need to make a tranposition:
She gives me 30 sequences, each have 10,000 observations. For example, for the fist sequence, the kenel density distribution is:
use A, clear
From above diagrams, we can see at about 0.1, the density get its biggest value, about 17. What we want to know is at which value, the density value is about 17/2 = 8.5. Obviously, there are two values meet this requirement.
My idea is to first get 10000 lattice points of the distribution by kernel density estimation, then sort the density value of each lattice point, find the largest divided by 2, and then find the two lattice points closest to the semi-max value. The code is:
use A, clear
These two points are exactly what we needed!
Next, program a loop to get all 30 sequences’ semi-max values:
A2.csv is like this:
To make sure the answers are correct, we can plot two comparasion charts:
gen obs = _n
tw line maxd obs, lp(solid) lw(*2) || ///
Now, I’m sure these answers are correct!
Update your browser to view this website correctly. Update my browser now