Find Semi-max Density Values in Normal Distribution

Find Semi-max Density Values in Normal Distribution

Today afternoon, a friend asked me a question, I can’t describe this question in English accurately, but I will try my best. Now, she has 30 sequences, each has 10 thousand numeric elements, they seemingly follow a normal distribution. For a normal distribution, the max density value correspond to mean. she want to know at which value, the density value equals half of the max density value.

The original data is: A.csv,each line is a sequence. First we need to make a tranposition:

Stata
1
2
3
4
5
6
clear all
set maxvar 10000
import delimited using A.csv, clear
xpose, clear
save A, replace
`

She gives me 30 sequences, each have 10,000 observations. For example, for the fist sequence, the kenel density distribution is:

Stata
1
2
3
use A, clear
* h kdensity
kdensity v1, generate(nv1 nd1) n(10000)

From above diagrams, we can see at about 0.1, the density get its biggest value, about 17. What we want to know is at which value, the density value is about 17/2 = 8.5. Obviously, there are two values meet this requirement.

My idea is to first get 10000 lattice points of the distribution by kernel density estimation, then sort the density value of each lattice point, find the largest divided by 2, and then find the two lattice points closest to the semi-max value. The code is:

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
use A, clear
* h kdensity
kdensity v1, generate(nv1 nd1) n(10000)
gr export semimax1.png, replace
line nd1 nv1
keep nd1 nv1 v1
preserve
gsort nd1
order nd1 nv1
/* max value */
local maxd = `=nd1[_N]'
local maxv = `=nv1[_N]'
/* semi-max value */
local smaxd = `maxd' / 2
di "`smaxd'"
gen diff = abs(nd1 - `smaxd')
gsort diff
gen group = (nv1 - `maxv') > 0
gsort group diff
bysort group: keep if _n == 1
local smax1 = `=nv1[1]'
local smax2 = `=nv1[2]'
local smaxd1 = `=nd1[1]'
local smaxd2 = `=nd1[2]'
restore
tw line nd1 nv1 || ///
scatteri `smaxd1' `smax1' "semi-max1", msize(*2) || ///
scatteri `smaxd2' `smax2' "semi-max2", msize(*2)

These two points are exactly what we needed!

Next, program a loop to get all 30 sequences’ semi-max values:

Stata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
clear all
use A, clear
mat semimax = J(30, 6, .)
mat list semimax
forval i = 1/30{
kdensity v`i', generate(nv`i' nd`i') n(10000) nograph
preserve
gsort nd`i'
local maxd = `=nd`i'[_N]'
local maxv = `=nv`i'[_N]'
local smaxd = `maxd' / 2
gen diff`i' = abs(nd`i' - `smaxd')
gsort diff`i'
gen group`i' = (nv`i' - `maxv') > 0
gsort group`i' diff`i'
bysort group`i': keep if _n == 1
local smax1`i' = `=nv`i'[1]'
local smax2`i' = `=nv`i'[2]'
local smaxd1`i' = `=nd`i'[1]'
local smaxd2`i' = `=nd`i'[2]'
restore
mat semimax[`i', 1] = `smax1`i''
mat semimax[`i', 2] = `smax2`i''
mat semimax[`i', 3] = `smaxd1`i''
mat semimax[`i', 4] = `smaxd2`i''
mat semimax[`i', 5] = `maxv'
mat semimax[`i', 6] = `maxd'

}
mat list semimax
* turn matrix semimax to dta dataset
clear
set obs 30
gen semimaxv1 = .
gen semimaxd1 = .
gen semimaxv2 = .
gen semimaxd2 = .
gen max = .
gen maxd = .
forval j = 1/30{
replace semimaxv1 = semimax[`j', 1] in `j'
replace semimaxv2 = semimax[`j', 2] in `j'
replace semimaxd1 = semimax[`j', 3] in `j'
replace semimaxd2 = semimax[`j', 4] in `j'
replace max = semimax[`j', 5] in `j'
replace maxd = semimax[`j', 6] in `j'
}
save results, replace
order semimaxv1 semimaxv2
export delimited using A2.csv, replace

A2.csv is like this:

semimaxv1 semimaxv2 semimaxd1 semimaxd2 max maxd
.078701444 .1300306 9.1357012 9.1339693 .10422645 18.27581
.06238056 .097713493 13.26233 13.267978 .081068322 26.527439
.027084824 .12270609 4.3110609 4.3171973 .060132872 8.6377687
.0097687775 .069909528 6.2074771 6.1685586 .02927785 12.341722
.0052797976 .074061677 5.0783057 5.0829434 .026011925 10.177743
.018993862 .12027971 4.0163946 4.0077281 .049942315 8.011528
.077044569 .1201292 10.900649 10.90671 .09677574 21.811472
.042942528 .087064542 10.498396 10.50073 .062700093 21.005436
.074877165 .11197978 12.780385 12.78516 .095224872 25.565735
.016825253 .12961259 3.506664 3.5183191 .048007637 7.0338864

To make sure the answers are correct, we can plot two comparasion charts:

1
2
3
4
gen obs = _n
tw line max obs, lp(solid) lw(*2) || ///
line semimaxv1 obs, lp(dash) || ///
line semimaxv2 obs, lp(dash)

Stata
1
2
3
tw line maxd obs, lp(solid) lw(*2) || ///
line semimaxd1 obs, lp(dash) || ///
line semimaxd2 obs, lp(dash)

Now, I’m sure these answers are correct!

# Stata

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×