Initial Exploration

RandomInteger [ { 0 , 1 } , 20 ]

{ 1 , 1 , 1 , 1 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 , 1 , 0 , 0 , 0 , 1 , 0 , 0 }

RandomVariate [ UniformDistribution [ { 0 , 1 } ] ]

0.6531459162175581

Count [ RandomInteger [ { 0 , 1 } , 20 ] , 0 ]

8

ListLinePlot [ Table [ Accumulate [ RandomReal [ { - 1 , 1 } , 250 ] ] , { 3 } ] , Filling Axis , PlotLegends { one , two , three } ]

ch_4_1.gif

Accumulate [ RandomInteger [ { 0 , 1 } , 10 ] ]

{ 1 , 1 , 2 , 2 , 3 , 4 , 5 , 5 , 6 , 7 }

ListLinePlot [ % ]

ch_4_2.gif

Table [ RandomInteger [ { 0 , 1 } , 1 ] , 10 ]

ch_4_3.gif

MovingAverage [ { { 0 } , { 1 } , { 1 } , { 1 } , { 0 } , { 1 } , { 0 } , { 1 } , { 0 } , { 1 } } ]

Accumulate [ RandomInteger [ BernoulliDistribution [ 0.5 ] , 1000 ] ] / Range [ 1000 ] // ListPlot

ch_4_4.gif

Accumulate [ RandomVariate [ NormalDistribution [ ] , 5000 ] ] / Range [ 5000 ] // ListPlot

ch_4_5.gif

ListPlot [ RandomVariate [ NormalDistribution [ ] , 1000 ] , Filling Axis ]

ch_4_6.gif

ListPlot [ RandomVariate [ GeometricDistribution [ 1 / 2 ] , 50 ] , Filling Axis ]

ch_4_7.gif

Accumulate [ RandomInteger [ { 0 , 1 } , 100 ] ] / Range [ 100 ] // ListPlot

ch_4_8.gif

Plot [ 6 x ( 1 - x ) , { x , 0 , 1 } , Filling -> Bottom ]

ch_4_9.gif

Integrate [ 6 x ( 1 - x ) x , { x , 0 , 1 } ]

1 2

0 1 6 x ( 1 - x ) x x

1 2

data = SemanticImport [ /Users/codingquark/Documents/Wolfram Mathematica/doing_bayesian_data_analysis/DBDA2Eprograms/HairEyeColor.csv ]

ch_4_10.gif

Mean [ data ]

Hair 1 16 ( 4 Black + 4 Blond + 4 Brown + 4 Red ) , Eye 1 16 ( 4 Blue + 4 Brown + 4 Green + 4 Hazel ) , Count 37

Length [ data ]

16

Normal [ data ]

{ Hair Black , Eye Blue , Count 20 , Hair Black , Eye Brown , Count 68 , Hair Black , Eye Green , Count 5 , Hair Black , Eye Hazel , Count 15 , Hair Blond , Eye Blue , Count 94 , Hair Blond , Eye Brown , Count 7 , Hair Blond , Eye Green , Count 16 , Hair Blond , Eye Hazel , Count 10 , Hair Brown , Eye Blue , Count 84 , Hair Brown , Eye Brown , Count 119 , Hair Brown , Eye Green , Count 29 , Hair Brown , Eye Hazel , Count 54 , Hair Red , Eye Blue , Count 17 , Hair Red , Eye Brown , Count 26 , Hair Red , Eye Green , Count 14 , Hair Red , Eye Hazel , Count 14 }

data [ Histogram , Count ]

ch_4_11.gif

data [ GroupBy [ Hair ] , Total , Count ]

ch_4_12.gif

data [ GroupBy [ Eye ] , Total , Count ]

ch_4_13.gif

data [ All , Count ] // Total

592

ch_4_14.gif

ch_4_15.gif

ch_4_16.gif

data [ GroupBy [ Hair ] , Mean , Count ]

ch_4_17.gif

data [ GroupBy [ Eye ] , Mean , Count ]

ch_4_18.gif

data

ch_4_19.gif

titanic = ExampleData [ { Dataset , Titanic } ]

ch_4_20.gif

titanic [ Histogram , age ]

ch_4_21.gif

titanic [ GroupBy [ Key [ sex ] ] , Counts , class ]

ch_4_22.gif

Watch https://www.wolfram.com/broadcast/video.php?c=377&p=2&v=1258

data [ Histogram , Count ]

ch_4_23.gif

data [ ListPlot , Count ]

ch_4_24.gif

data [ GroupBy [ Hair ] , GroupBy [ Eye ] ]

ch_4_25.gif

ch_4_26.gif

592

592

data [ Mean , Count ]

37

data [ Median , Count ]

37 2

data [ Counts , Hair ]

ch_4_27.gif

data [ Counts , Eye ]

ch_4_28.gif

data [ MinMax , Count ]

ch_4_29.gif

data [ StandardDeviation , Count ]

6314 5

ch_4_30.gif

ch_4_31.gif

ch_4_32.gif

PieChart [ data [ GroupBy [ Eye ] , Total , Count ] , ChartLegends -> Automatic ]

ch_4_33.gif

PieChart [ data [ GroupBy [ Hair ] , Total , Count ] , ChartLegends -> Automatic ]

ch_4_34.gif

data [ GroupBy [ Eye ] ] // Normal

Blue { Hair Black , Eye Blue , Count 20 , Hair Blond , Eye Blue , Count 94 , Hair Brown , Eye Blue , Count 84 , Hair Red , Eye Blue , Count 17 } , Brown { Hair Black , Eye Brown , Count 68 , Hair Blond , Eye Brown , Count 7 , Hair Brown , Eye Brown , Count 119 , Hair Red , Eye Brown , Count 26 } , Green { Hair Black , Eye Green , Count 5 , Hair Blond , Eye Green , Count 16 , Hair Brown , Eye Green , Count 29 , Hair Red , Eye Green , Count 14 } , Hazel { Hair Black , Eye Hazel , Count 15 , Hair Blond , Eye Hazel , Count 10 , Hair Brown , Eye Hazel , Count 54 , Hair Red , Eye Hazel , Count 14 }

Given the structure so far, we need to figure out a way to have named rows, nested with further named rows. Like so: {“Brown Eye”->{“Black Hair”->4, “Blond Hair”->10}, “Blue Eye”->...}
Not sure if this should be done at import time, or as the next step after import.

Importing Data

How should it look like?

We want our Hair + Eye colour data to be structured “nicely” before we start doing analysis. What is this “nice” structure?

If imported simply, this is what it looks like:

ch_4_35.gif

ch_4_36.gif

This structure is not bad, it correctly figures out that the first row of the CSV is column names. But is this how we want things to be? Could we have more structure to it?

One approach can be to add some named rows. For example, we can imagine that the actual data point in the entire dataset is the `Count` column, rest are just names! Which means, we can have rows of rows.

Let’s try. What does the structure look like?

planets = ExampleData [ { Dataset , Planets } ]

ch_4_37.gif

planets // Normal

Mercury Mass , Radius , Moons , Venus Mass , Radius , Moons , Earth Mass , Radius , Moons Moon Mass , Radius , Mars Mass , Radius , Moons Phobos Mass , Radius , Deimos Mass , Radius , Jupiter Mass , Radius , Moons Metis Mass , Radius , Adrastea Mass , Radius , Amalthea Mass , Radius , Thebe Mass , Radius , Io Mass , Radius , Europa Mass , Radius , Ganymede Mass , Radius , Callisto Mass , Radius , Themisto Mass , Radius , Leda Mass , Radius , Himalia Mass , Radius , Lysithea Mass , Radius , Elara Mass , Radius , S/2000 J11 Mass Missing [ NotAvailable ] , Radius , S/2003 J12 Mass Missing [ NotAvailable ] , Radius , Carpo Mass Missing [ NotAvailable ] , Radius , Euporie Mass , Radius , S/2003 J3 Mass Missing [ NotAvailable ] , Radius , S/2003 J18 Mass Missing [ NotAvailable ] , Radius , Orthosie Mass , Radius , Euanthe Mass , Radius , Harpalyke Mass , Radius , Praxidike Mass , Radius , Thyone Mass , Radius , S/2003 J16 Mass Missing [ NotAvailable ] , Radius , Iocaste Mass , Radius , Mneme Mass Missing [ NotAvailable ] , Radius , Hermippe Mass , Radius , Thelxinoe Mass Missing [ NotAvailable ] , Radius , Helike Mass Missing [ NotAvailable ] , Radius , Ananke Mass , Radius , S/2003 J15 Mass Missing [ NotAvailable ] , Radius , Eurydome Mass , Radius , Arche Mass Missing [ NotAvailable ] , Radius , Herse Mass Missing [ NotAvailable ] , Radius , Pasithee Mass , Radius , S/2003 J10 Mass Missing [ NotAvailable ] , Radius , Chaldene Mass , Radius , Isonoe Mass , Radius , Erinome Mass , Radius , Kale Mass , Radius , Aitne Mass , Radius , Taygete Mass , Radius , S/2003 J9 Mass Missing [ NotAvailable ] , Radius , Carme Mass , Radius , Sponde Mass , Radius , Megaclite Mass , Radius , S/2003 J5 Mass Missing [ NotAvailable ] , Radius , S/2003 J19 Mass Missing [ NotAvailable ] , Radius , S/2003 J23 Mass Missing [ NotAvailable ] , Radius , Kalyke Mass , Radius , Kore Mass Missing [ NotAvailable ] , Radius , Pasiphae Mass , Radius , Eukelade Mass Missing [ NotAvailable ] , Radius , S/2003 J4 Mass Missing [ NotAvailable ] , Radius , Sinope Mass , Radius , Hegemone Mass Missing [ NotAvailable ] , Radius , Aoede Mass Missing [ NotAvailable ] , Radius , Kallichore Mass Missing [ NotAvailable ] , Radius , Autonoe Mass , Radius , Callirrhoe Mass , Radius , Cyllene Mass Missing [ NotAvailable ] , Radius , S/2003 J2 Mass Missing [ NotAvailable ] , Radius , Saturn Mass , Radius , Moons Tarqeq Mass Missing [ NotAvailable ] , Radius , Pan Mass , Radius , Daphnis Mass Missing [ NotAvailable ] , Radius , Atlas Mass , Radius , Prometheus Mass , Radius , Pandora Mass , Radius , Epimetheus Mass , Radius , Janus Mass , Radius , Aegaeon Mass Missing [ NotAvailable ] , Radius , Mimas Mass , Radius , Methone Mass Missing [ NotAvailable ] , Radius , Anthe Mass , Radius , Pallene Mass Missing [ NotAvailable ] , Radius , Enceladus Mass , Radius , Tethys Mass , Radius , Calypso Mass , Radius , Telesto Mass , Radius , Polydeuces Mass Missing [ NotAvailable ] , Radius , Dione Mass , Radius , Helene Mass , Radius , Rhea Mass , Radius , Titan Mass , Radius , Hyperion Mass , Radius , Iapetus Mass , Radius , Kiviuq Mass , Radius , Ijiraq Mass , Radius , Phoebe Mass , Radius , Paaliaq Mass , Radius , Skathi Mass , Radius , Albiorix Mass , Radius , S/2007 S2 Mass , Radius , Bebhionn Mass Missing [ NotAvailable ] , Radius , Erriapo Mass , Radius , Siarnaq Mass , Radius , Skoll Mass Missing [ NotAvailable ] , Radius , Tarvos Mass , Radius , Greip Mass Missing [ NotAvailable ] , Radius , S/2004 S13 Mass Missing [ NotAvailable ] , Radius , Hyrrokkin Mass Missing [ NotAvailable ] , Radius , Mundilfari Mass , Radius , S/2006 S1 Mass Missing [ NotAvailable ] , Radius , Jarnsaxa Mass Missing [ NotAvailable ] , Radius , Narvi Mass , Radius , Bergelmir Mass Missing [ NotAvailable ] , Radius , S/2004 S17 Mass Missing [ NotAvailable ] , Radius , Suttungr Mass , Radius , Hati Mass Missing [ NotAvailable ] , Radius , S/2004 S12 Mass Missing [ NotAvailable ] , Radius , Bestla Mass Missing [ NotAvailable ] , Radius , Farbauti Mass Missing [ NotAvailable ] , Radius , Thrymr Mass , Radius , S/2007 S3 Mass Missing [ NotAvailable ] , Radius , Aegir Mass Missing [ NotAvailable ] , Radius , S/2004 S7 Mass Missing [ NotAvailable ] , Radius , S/2006 S3 Mass Missing [ NotAvailable ] , Radius , Kari Mass Missing [ NotAvailable ] , Radius , Fenrir Mass Missing [ NotAvailable ] , Radius , Surtur Mass Missing [ NotAvailable ] , Radius , Ymir Mass , Radius , Loge Mass Missing [ NotAvailable ] , Radius , Fornjot Mass Missing [ NotAvailable ] , Radius , Uranus Mass , Radius , Moons Cordelia Mass , Radius , Ophelia Mass , Radius , Bianca Mass , Radius , Cressida Mass , Radius , Desdemona Mass , Radius , Juliet Mass , Radius , Portia Mass , Radius , Rosalind Mass , Radius , Cupid Mass Missing [ NotAvailable ] , Radius , Belinda Mass , Radius , Perdita Mass Missing [ NotAvailable ] , Radius , Puck Mass , Radius , Mab Mass Missing [ NotAvailable ] , Radius , Miranda Mass , Radius , Ariel Mass , Radius , Umbriel Mass , Radius , Titania Mass , Radius , Oberon Mass , Radius , Francisco Mass Missing [ NotAvailable ] , Radius , Caliban Mass , Radius , Stephano Mass , Radius , Trinculo Mass , Radius , Sycorax Mass , Radius , Margaret Mass Missing [ NotAvailable ] , Radius , Prospero Mass , Radius , Setebos Mass , Radius , Ferdinand Mass Missing [ NotAvailable ] , Radius , Neptune Mass , Radius , Moons Naiad Mass , Radius , Thalassa Mass , Radius , Despina Mass , Radius , Galatea Mass , Radius , Larissa Mass , Radius , Proteus Mass , Radius , Triton Mass , Radius , Nereid Mass , Radius , Halimede Mass Missing [ NotAvailable ] , Radius , Sao Mass Missing [ NotAvailable ] , Radius , Laomedeia Mass Missing [ NotAvailable ] , Radius , Psamathe Mass Missing [ NotAvailable ] , Radius , Neso Mass Missing [ NotAvailable ] , Radius

Okay, now that we know how data is represented, we can try manually creating a row or two for our data.

dataset = Dataset [ <| Black -> <| Blue -> <| Count -> 20 |> , <| Brown -> <| Count -> 68 |> |> |> |> ]

ch_4_38.gif

Does this look good? We don’t have column names for Hair and Eye, but we do for Count. Is that normal? If you look at the Moons in the planet dataset, you can see that the names of of moons are not column-named, but their properties are column-named. Which is analogous to what we have for our dataset as well.

Let’s finalise on this structure for now, and find out how to convert the imported data.

Structuring Imported Dataset

So far, I have not found any simple way to morph the dataset. The important note found so far is that we cannot modify existing Dataset, but instead have to create new Dataset with the structure we want. What this means, is that we will have to define Column/Row structure and select proper Count columns for the new Dataset definition.

This manual approach is too much hassle. And figuring out “script-like” way to change the structure is not attractive right now.

Hence, we will try to roll with what we have and see what happens when we try doing operations on the data.

References

https://www.wolfram.com/broadcast/video.php?c=377&p=2&v=1258

https://www.youtube.com/watch?v=XN8Y177doEE

https://www.wolfram.com/wolfram-u/courses/data-science/becoming-a-data-curator-dat906/

Basic Analysis

Exercises from 4.6

Exercise 4.1

The eye-color hair-color data is frequencies of eye and hair color for males and females. Run the (some, we are going to ignore them) R code.

In your write-up, include each line above and its results. Explain what each line does (in a bit more detail than the inline comments). Extend above commands by also computing the probabilities of the hair colors given brown eyes, and the probabilities of the eye colors given Brown hair.

First, let’s import the data and show it.

ch_4_39.gif

ch_4_40.gif

Run equivalent of:
ch_4_41.gif

Funnily enough, the R code given by the book loads the CSV, generates Male/Female split, and then store in the variable “HairEyeColor”. The “sum” application then gives the data we have got already!

Next line is dividing each cell by the total.

ch_4_42.gif

ch_4_43.gif

To find total counts of each hair-color:

hairfrequencydataset = dataset [ GroupBy [ Hair ] , Total , Count ]

ch_4_44.gif

Like we added the Frequency column above for all rows, do the same but for the grouped data:

hairproportions = hairfrequencydataset [ All , <| Frequency -> #1 / counttotal |> & ]

ch_4_45.gif

Same operations now for eye colors:

ch_4_46.gif

ch_4_47.gif

ch_4_48.gif

Calculate conditional probability of Hair, given Blue eye color (table 4.2):

haireyeproportion [ Select [ #Eye == Blue & ] ] [ All , <| Hair -> #Hair , Frequency -> N [ #Frequency / eyeproportions [ Blue , Frequency ] ] |> & ]

ch_4_49.gif

Calculate conditional probability of hair colors given brown eyes:

haireyeproportion [ Select [ #Eye == Brown & ] ] [ All , <| Hair -> #Hair , Frequency -> N [ #Frequency / eyeproportions [ Brown , Frequency ] ] |> & ]

ch_4_50.gif

Calculate conditional probability of eye colors given brown hair:

haireyeproportion [ Select [ #Hair == Brown & ] ] [ All , <| Eye -> #Eye , Frequency -> N [ #Frequency / hairproportions [ Brown , Frequency ] ] |> & ]

ch_4_51.gif

Let’s now try to understand, verbally, what the exercise was about.

We were given raw counts for hair+eye colour combos. Just like one of the section in the chapter, we wanted to get to finding out conditional probabilities.

To get to conditional probabilities, we needed some basic information derived first:

Proportions of each hair+eye colour combo

Proportions of each hair colour

Proportions of each eye colour

When we calculate hair+eye proportions, we are calculating joint probabilities. For example, the probability of finding someone with both Brown eyes and Black hair is 0.11 i.e. 11% and so on.

When we calculate the proportions of only hair or only eye colours, we are measuring probability of finding, for example, a person with black hair; a person with blue eyes and so on. For example, the probability of finding a person with Black hair is 0.18 / 18%; a person with Brown eyes is 0.37 / 37%

Finally, we can use the derived information to calculate conditional probabilities. To calculate the probability of a person having Hazel eyes, given we know that the person has Blond hair we need to divide independent probability by marginal probability. So, for the specific case, the calculation is: 0.0168919/0.214527 = 0.078 (~7%)

Thus ends a successful Exercise 4.1! Notice how the entire exercise was about calculating proportions in various ways and it did not feel the way it did when reading the section of the book.

Exercise 4.2

Modify the coin flipping program in Section 4.5 to simulate a biased coin that has p(H)=0.8. Change the height of the reference line in the plot to match p(H). Comment your code.

Okay, this is more involved since it is mainly about R code. We will have to first write code to simulate coin flips, then make the coin flips biased.

Alright, let’s think about this. The R code uses `sample` function that takes probability values with with to take the samples. We need something similar in Mathematica. But what is our real goal? We want to flip a coin - [H,T]. And when it is flipped we want to specify how biased the coin is. Having flipped the coin N times, we want to plot the running proportion of H and know the end proportion.

First, we have RandomVariate function that can take some distribution. The one we might find useful is DiscreteUniformDistribution. Let’s try.  

RandomVariate [ DiscreteUniformDistribution [ { 0 , 1 } ] , 10 ] // Short

{ 1 , 0 , 0 , 0 , 1 , 1 , 0 , 0 , 1 , 1 }

Histogram [ RandomVariate [ DiscreteUniformDistribution [ { 0 , 1 } ] , 500 ] ]

ch_4_52.gif

tosses = RandomVariate [ DiscreteUniformDistribution [ { 0 , 1 } ] , 500 ] ;

N [ Count [ tosses , 1 ] 500 ]

0.456

N [ Count [ tosses , 0 ] 500 ]

0.544

There, we got the end proportions of the fair die after 500 tosses.

manytosses = RandomVariate [ DiscreteUniformDistribution [ { 0 , 1 } ] , 50000 ] ;

ListPlot [ Accumulate [ manytosses Length [ manytosses ] ] ]

ch_4_53.gif

ListLinePlot [ Accumulate [ Select [ tosses , # == 1 & ] ] ]

ch_4_54.gif

Still no idea how to get “running proportion”! FoldList works, but does not give access to current index of the list. MapIndex does not keep track of previous elements. Looks like we will have to write our own function. Not fun, I must say.

Stolen the following code from https://mathematica.stackexchange.com/questions/190503/how-to-make-foldlistindexed-ie-foldlist-with-an-index-or-iterator

ch_4_55.gif

{ x , f [ x , a , 1 ] , f [ f [ x , a , 1 ] , b , 2 ] , f [ f [ f [ x , a , 1 ] , b , 2 ] , c , 3 ] , f [ f [ f [ f [ x , a , 1 ] , b , 2 ] , c , 3 ] , d , 4 ] }

fewtosses = { 1 , 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 0 }

{ 1 , 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 0 }

This approach may not be the thing we want. What we want is a list: { 1 1 , 2 2 , 2 3 , 3 4 , 4 5 , 4 6 , 5 7 , 5 8 , 5 9 , 5 10 } and so on.

For [ i = 1 ; res = { } , i <= Length [ tosses ] , i ++ , AppendTo [ res , Last @ FoldList [ Plus , 0 , tosses [ [ ;; i ] ] ] i ] ]

ListLinePlot [ res ]

ch_4_56.gif

Well what do we have here! It worked! It is the ugliest way to do things because it is not functional, but it got the job done. We will refine this somehow.

MapIndexed [ FoldList [ Plus , 0 , fewtosses [ [ ;; First [ #2 ] ] ] ] First [ #2 ] & , fewtosses ]

{ { 0 , 1 } , { 0 , 1 2 , 1 } , { 0 , 1 3 , 2 3 , 2 3 } , { 0 , 1 4 , 1 2 , 1 2 , 3 4 } , { 0 , 1 5 , 2 5 , 2 5 , 3 5 , 4 5 } , { 0 , 1 6 , 1 3 , 1 3 , 1 2 , 2 3 , 2 3 } , { 0 , 1 7 , 2 7 , 2 7 , 3 7 , 4 7 , 4 7 , 5 7 } , { 0 , 1 8 , 1 4 , 1 4 , 3 8 , 1 2 , 1 2 , 5 8 , 5 8 } , { 0 , 1 9 , 2 9 , 2 9 , 1 3 , 4 9 , 4 9 , 5 9 , 5 9 , 5 9 } , { 0 , 1 10 , 1 5 , 1 5 , 3 10 , 2 5 , 2 5 , 1 2 , 1 2 , 1 2 , 1 2 } }

MapIndexed [ List [ #1 , First @ #2 ] & , fewtosses ]

{ { 1 , 1 } , { 1 , 2 } , { 0 , 3 } , { 1 , 4 } , { 1 , 5 } , { 0 , 6 } , { 1 , 7 } , { 0 , 8 } , { 0 , 9 } , { 0 , 10 } }

MapIndexed [ Total [ tosses [ [ ;; First @ #2 ] ] ] First @ #2 & , tosses ] ;

ListLinePlot [ % ]

ch_4_57.gif

There we go, it is now functional - nice and neat.

Now we need to bias the coin such that the probability of Heads is 0.8.

There’s BernoulliDistribution which “represents a Bernoulli distribution with probability parameter p. Also known as the coin toss distribution or Bernoulli trial”. Since it literally says “coin toss” in there, we should try it out.

RandomVariate [ BernoulliDistribution [ 0.8 ] , 10 ]

{ 1 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 }

ch_4_58.gif

0.746

0.254

Yes! Looks good! Let’s plot!

ListLinePlot [ MapIndexed [ Total [ biasedtosses [ [ ;; First @ #2 ] ] ] First @ #2 & , biasedtosses ] , PlotRange -> All ]

ch_4_59.gif

And thus concludes the exercise!

Exercise 4.4

Consider a spinner with a [0,1] scale on its circumference. Suppose that the spinner is slanted or magnetized or bent in some way such that it is biased, and its probability density function is p(x) = 6x(1-x) over the interval x ∈ [0,1].

Adapt the program IntegralOfDensity.R to plot this density function and approximate its integral. Comment your code. Be careful to consider values of x only in the interval [0,1]. Hint: You can omit the first couple of lines regarding meanval and sdval, because those parameter values pertain only to the normal distribution. Then set xlow=0 and xhigh=1, and set dx to some small value.

Derive the exact integral using calculus.

Does this function satisfy equation 4.3?

From inspecting the graph, what is the maximal values of p(x)?

Here is the R code from the file:

source("DBDA2E-utilities.R")
# Graph of normal probability density function, with comb of intervals.
meanval = 0.0               # Specify mean of distribution.
sdval = 0.2                 # Specify standard deviation of distribution.
xlow  = meanval - 3.5*sdval # Specify low end of x-axis.
xhigh = meanval + 3.5*sdval # Specify high end of x-axis.
dx = sdval/10               # Specify interval width on x-axis
# Specify comb of points along the x axis:
x = seq( from = xlow , to = xhigh , by = dx )
# Compute y values, i.e., probability density at each value of x:
y = ( 1/(sdval*sqrt(2*pi)) ) * exp( -.5 * ((x-meanval)/sdval)^2 )
# Plot the function. "plot" draws the intervals. "lines" draws the bell curve.
openGraph(width=7,height=5)
plot( x , y , type="h" , lwd=1 , cex.axis=1.5, xlab="x" , ylab="p(x)", cex.lab=1.5, main="Normal Probability Density" , cex.main=1.5 )
lines( x , y , lwd=3 ,  col="skyblue" )
# Approximate the integral as the sum of width * height for each interval.
area = sum( dx * y )
# Display info in the graph.
text( meanval-sdval , .9*max(y) , bquote( paste(mu ," = " ,.(meanval)) ), adj=c(1,.5) , cex=1.5 )
text( meanval-sdval , .75*max(y) , bquote( paste(sigma ," = " ,.(sdval)) ), adj=c(1,.5) , cex=1.5 )
text( meanval+sdval , .9*max(y) , bquote( paste(Delta , "x = " ,.(dx)) ), adj=c(0,.5) , cex=1.5 )
text( meanval+sdval , .75*max(y) ,bquote( sum({},x,{}) *" "* Delta *"x p(x) = "* .(signif(area,3)) ) ,adj=c(0,.5) , cex=1.5 )
# Save the plot to an EPS file.
saveGraph( file = "IntegralOfDensity" , type="eps" )

That’s a lot of code just to display graphs. Disregard those lines, and we are left with a few important lines:

# Specify comb of points along the x axis:
x = seq( from = xlow , to = xhigh , by = dx )

# Compute y values, i.e., probability density at each value of x:
y = ( 1/(sdval*sqrt(2*pi)) ) * exp( -.5 * ((x-meanval)/sdval)^2 )

# Approximate the integral as the sum of width * height for each interval.
area = sum( dx * y )

Fairly straight forward. Initially I thought we are going to use NormalDistribution function, but no. This is just calculating values of “probability density” in a range (0 to 1) with some interval and plots the values. Can we find some excuse to use NormalDistribution?

Anyway, here we go:

ch_4_60.gif

ch_4_61.gif

area = Total [ y * δ ]

0.9996

I am curious how the plot looked liked with the original equation.

ch_4_62.gif

ch_4_63.gif

0 1 6 x ( 1 - x ) x

1

In[296]:=

integrate 6x(1-x) from 0 to 1

Max [ y ]

1.5

Thus concludes the exercise.

However, there’s something more we can try. That is to see if we can find more Mathematica-like ways of doing this.
Thinking about it, we are given a probability distribution function 6x(1-x). And all we want is to see how it will behave for different values of the variable x. There is one potential function we can try.

ch_4_65.gif

ch_4_66.gif

Ah! Look what we got here! Perfect.

Max [ PDF [ D , x ] ]

6 (1-x) x 0<x<1
0 True

ch_4_67.gif

ch_4_68.gif

To get area under the curve we do CDF[a] - CDF[b]

ch_4_69.gif

1

ch_4_70.gif

0.999534741841929

As you can observe, this approach is much better. Though it may not be as easy to understand what’s really going on under the hood, it is cleaner once you already have understood what’s going on. For example, if we were to start directly with ProbabilityDistribution, we would not know how 6x(1-x) is executed and plotted.

Created with the Wolfram Language