sand15

1449 Reputation

15 Badges

11 years, 80 days

MaplePrimes Activity


These are replies submitted by sand15

@acer 

A way (the only one?) to place the text in the foreground and the hatching in the background.

Text_foreground_hatches_background.mw

@janhardo 

restart;

with(plots):
with(Statistics):

# Goed gescheiden componenten
G13 := RandomVariable(Normal(-1, 0.7));
G14 := RandomVariable(Normal(10, 0.9)):
G15 := RandomVariable(Normal(4, 0.6)):

G16 := RandomVariable(Normal(-5, 0.7)):
G17 := RandomVariable(Normal(1, 0.9)):
G18 := RandomVariable(Normal(15, 0.6)):

G19 := RandomVariable(Normal(-2, 0.7)):
G20 := RandomVariable(Normal(-1, 0.9)):
G21 := RandomVariable(Normal(13, 0.6)):

_R

(1)


The three lines which follow can be usefull to get the parameters of the nine random variables defined above.

Of course it would have been simpler to define mu_list and sigma_list before defigning these random variables.
 

RVs := sort( select(type, [anames(alluser)], suffixed(_ProbabilityDistribution)) );

mu_list    := Mean~(RVs);
sigma_list := StandardDeviation~(RVs)

[_ProbabilityDistribution, _ProbabilityDistribution0, _ProbabilityDistribution1, _ProbabilityDistribution2, _ProbabilityDistribution3, _ProbabilityDistribution4, _ProbabilityDistribution5, _ProbabilityDistribution6, _ProbabilityDistribution7]

 

[-1, 10, 4, -5, 1, 15, -2, -1, 13]

 

[.7, .9, .6, .7, .9, .6, .7, .9, .6]

(2)


Define proportions from weights
 

weights_sep := [0.33, 0.34, 0.33, 0.35, 0.35, 0.30,0.4, 0.4, 0.4]:
proportions := weights_sep /~ add(weights_sep);
 

[.1031250000, .1062500000, .1031250000, .1093750000, .1093750000, 0.9375000000e-1, .1250000000, .1250000000, .1250000000]

(3)


The simplest way, IMO, to generate a sample of the mixture is to randomly select each component accordingly to its proportion within
the mixture and sample the selected component.

Formally
  
   sample := Vector(N):
   for n from 1 to N do
      select component C accordingly oo its proportion within
      sample[n] := Sample(C, 1)[1]
   end do:

This algorithm is very inefficient and a far better way to proceed is to define a random variable 'selector' of ProbabilityTable distribution
where probabilities are equal to 'proportions'.
A sample of  'selector' of size N is a collection of numbers 1, 2, ...9 where the proportion of '1', for instance, is equal to propotions[1].
Let N1 the number of '1', N2 the number of '2', ....
For each component Ck draw a sample Sk of size Ck and assemble the nine samples.

Note that this method works whatever the distributions are.
Using the method=[envelope, range=...] would be a very bad idea if some distributions have a bounded support (generally an error is returned
indicating that the pdf cannot be evaluated a the boundaries of the supports).
 

selector := RandomVariable(ProbabilityTable(proportions)):

N := 5000:
selections := Sample(selector, N):

sub_sizes := rhs~(sort(Tally(selections), key=(x -> lhs(x))));

sample := convert(`<|>`( seq( Sample(RVs[i], sub_sizes[i]), i=1..numelems(RVs) ) ), Vector[column]):
 

[521, 547, 535, 536, 524, 479, 618, 590, 650]

(4)


The pdf of the mixture  can be easily obtained:
 

phi := unapply(PDF(Normal(a, b), x), (x, a, b)):

f := add( proportions[i] * phi(x, mu_list[i], sigma_list[i]), i=1..numelems(RVs) ):


Plotting domain
 

qq := (min..max)(sample)

-7.14040972574213040 .. 16.3984895268977589

(5)


Visualization

  # --------------------------------------------------
  # 5. VISUALISATIE
  # --------------------------------------------------
  
  hist_plot := Histogram(sample, minbins=round(sqrt(N)), color="LightGray", style=polygon):
  pdf_plot  := plot(f(x), x=qq, color=red, thickness=2):

  print(display(hist_plot, pdf_plot,
    title="Gaussian Mixture",
    labels=["x","density"],
    view=[qq, DEFAULT]
  ));

 
 

``

Download Improvements.mw

@Carl Love

I don't think I wrote something like that.

As a reminder I wrote

Then @Carl Love probably confused by your relation (4) told you you were computed a sum and Maple  equation (5) was right.

and

So I'm sorry to say that if you really want to compute mixtures properties, all the answers that have been given here contain, are wrong in a general sense, largely due to the confusion you initiated between the weighted sum and the mixture  of random  variables.

OP's question began with him writting
I would like to build a Gaussian Mixture Model in Maple using the most straightforward approach:
and providing what I took as an illustration.
The original error comes from the OP defining a mixture of two gaussian random variables as a sum of two gaussians random variables, which is generally false unless the two (more generally all) random variables have the sam expectation) which led you in a direction that does not correspond to the one the use of the word mixture should imply.

You write
I'd like to point out the I didn't say a single word about mixtures of random variables, nor did I claim to
indeed, for the reason I explained above you focused on OP's equation (4) not on the mixture word.

I think you misinterpreted my response and overreacted to my words, as it was not my intention to offend you.
I consider that the misunderstanding is over.

@janhardo 

The formula for the pdf you write in mixtureDist is correct, so is the one for the mean, but your variance formula is correct only if the two gaussian random variables share the same mean.

For decades, I have used C++ and FORTRAN debuggers.
They have continuously improved over the years, enabling debugging parallel codes on massively parallel architectures while becoming more and more friendly and easy to use.
Maple's debugger is really pitiful in comparison.

The only situation I keep using it is to access built-in procedures: when I have to debug my own code I prefer doing it the old-fashioned way, that is using print/printf commands instead.

@CarlosRodrigues

I thought that a week would have been enough time for you to analyze the answers you received and have the courtesy to respond to us.

@acer 

as indeed a good idea.
Even if this took me a lot of time to arrive where I wanted to, O finally discovered that Statistics:-LinearFit and Statistics:-Fit both use at the very end the NAG routine g02daf.
So debugging endeds here but I got nevertheless some interesting informations.

Thanks for your help

@acer 

I don't know what to say.

Indeed this proc calls LinearFitting:-LinearLS (which I can't find any trace of), likely some local names, when I expected a code detailing the way the fitted model is computed.

In a few words, I just noticed accidentally that Statistics:-LinearFit (and Statistics:-Fit) didn't give the expected result on a quite simple test case, was issuing a meaningless warning, and that the help pages contained incorrect information.

When I write "expected result" I mean the result one can obtain "by hand" using different solving strategies, results which are the same than the one LinearAlgebra:-LeastSquares  or (an hammer to kill a fly) Optimization:-LSSolve both provide.
After digging into the subject it appears that Statistics:-LinearFit probably uses its own code, that the solution method it truly uses is not the one the help page says it uses (which "Normally" [I quote the help page] is the QR decomposition), that it likely imposes its own Digits value (for the warning iti displayed can be avoided increasing Digits), ... and a lot of other things.

With you reply it looks like the snake which bites its own tail and I'm a bit in limbo

I was writting a post on the subject and expected a definitive answer to conclude this post and send it.
Maybe filling a RCS too where I would demand that  Statistics:-LinearFit uses LinearAlgebra:-LeastSquares (whose help page contain reliable informations).

If you want (and if you have time to spend on this subject) I can provide you with a worksheet which details all of this.

@nm 

The first thing is not to compute EllipticE(x, m) but EllipticE(sin(x), m)  (note this will work only where the application x → sin(x) is a one-to-one map given the way EllipticE is defined in help(EllipticE) ).
The reason why is given in the attached file, see equation (6).

Doing this goves MMA_EllipticE(x, m) = Maple_Elliptic(sin(x), sqrt(m) ) iif -Pi/2 <= x <= Pi/2.

To get the same results for any x value you must to "periodize" EllipticE this way:

  • For instance, to get the same result than MMA_EllipticE(3, -1), you must compute 
    Maple_Elliptic(sin(Pi/2), I ) + Maple_Elliptic(3-sin(Pi/2), I ).
    The procedure EllipticE_a_la_MMA shows you how to do this.
     
  • ... or you use function MA in equation (2) below, which seems too me far simpler.

restart

kernelopts(version)

`Maple 2015.2, APPLE UNIVERSAL OSX, Dec 20 2015, Build ID 1097895`

(1)

# AS stands for Abramowitz & Stegubn (ot MMA/SageMath defs).
# MA stands for Maple.
#
# Notations

AS := (x, m) -> evalf(Int(sqrt(1-m*sin(theta)^2), theta=0..x));
MA := (x, m) -> evalf(Int(sqrt(1-m^2*sin(theta)^2), theta=0..x));

proc (x, m) options operator, arrow; evalf(Int(sqrt(1-m*sin(theta)^2), theta = 0 .. x)) end proc

 

proc (x, m) options operator, arrow; evalf(Int(sqrt(1-m^2*sin(theta)^2), theta = 0 .. x)) end proc

(2)

# Evaluations

seqAS := [seq(AS(x, -1), x=-3..3, 0.5)]

[-3.678135309, -3.140058363, -2.507982114, -1.810019561, -1.123887723, -.5191743566, 0., .5191743566, 1.123887723, 1.810019561, 2.507982114, 3.140058363, 3.678135309]

(3)

seqMA := [seq(MA(x, I), x=-3..3, 0.5)]

[-3.678135309, -3.140058363, -2.507982114, -1.810019561, -1.123887723, -.5191743566, 0., .5191743566, 1.123887723, 1.810019561, 2.507982114, 3.140058363, 3.678135309]

(4)

seqAS -~ seqMA

[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]

(5)

# From MA above, after change of integration variable, one gets

with(IntegrationTools):

J := Int(sqrt(1-m^2*sin(theta)^2), theta=0..x);
K := Change(J, sin(theta)=t, t);

Int((1-m^2*sin(theta)^2)^(1/2), theta = 0 .. x)

 

Int((-m^2*t^2+1)^(1/2)/(-t^2+1)^(1/2), t = 0 .. sin(x))

(6)

# The expression of K differs from the one given in help(EllipticE)

EllipticE__help := Int(sqrt(-k^2*t^2+1)/sqrt(-t^2+1), t = 0 .. z)

Int((-k^2*t^2+1)^(1/2)/(-t^2+1)^(1/2), t = 0 .. z)

(7)

# So

isolate(op(2, GetRange(K)) = op(2, GetRange(EllipticE__help)), z)

z = sin(x)

(8)

MAK := (x, m) -> evalf(Int(sqrt(-m^2*t^2+1)/sqrt(-t^2+1), t=0..sin(x)));
EEH := (z, k) -> evalf(Int(sqrt(-k^2*t^2+1)/sqrt(-t^2+1), t = 0 .. z));

MAK(0.8, I);       # Maple gives the same result than MMA or SageMath
EEH(0.8, I);       # Maple EllipticE does not because of an incorrect upper integration range
EEH(sin(0.8), I);  # With corrected integration range

proc (x, m) options operator, arrow; evalf(Int(sqrt(-m^2*t^2+1)/sqrt(-t^2+1), t = 0 .. sin(x))) end proc

 

proc (z, k) options operator, arrow; evalf(Int(sqrt(-k^2*t^2+1)/sqrt(-t^2+1), t = 0 .. z)) end proc

 

.8699303662

 

1.029811941

 

.8699303662

(9)

# Note that AS and EEH give the same result only in the range the application f : x --> sin(x)
# is a one-to-one map

interface(rtablesize=14):

`<,>`(
  `<|>`("x", "MMA", "EllipticE"),
  Matrix(13, 3, (i, j) -> piecewise(j=1, (i-7)/2, j=2, AS((i-7)/2, -1), EEH(sin((i-7)/2), I)))
)

Matrix([["x", "MMA", "EllipticE"], [-3, -3.678135309, -.1420624804], [-5/2, -3.140058363, -.6801394257], [-2, -2.507982114, -1.312215675], [-3/2, -1.810019561, -1.810019561], [-1, -1.123887723, -1.123887723], [-1/2, -.5191743566, -.5191743566], [0, 0., 0.], [1/2, .5191743566, .5191743566], [1, 1.123887723, 1.123887723], [3/2, 1.810019561, 1.810019561], [2, 2.507982114, 1.312215675], [5/2, 3.140058363, .6801394257], [3, 3.678135309, .1420624804]])

(10)

plot(
  ['AS'(x, -1), 'EEH'(sin(x), I)]
  , x=-3..3
  , color=[blue, red]
  , style=[line, point]
  , symbol=[NULL, circle]
  , numpoints=30
  , legend=["MMA llipticE", "Maple EllipticE"]
)

 

local EllipticE:

EllipticE := proc(x, m)
 local EEH, f, n, s, i:


 EEH := (z, k) -> evalf(Int(sqrt(-k^2*t^2+1)/sqrt(-t^2+1), t = 0 .. z)):
 f := proc(x, m)
        piecewise(
          x < Pi/2,
          EEH(sin(x), m),
          EEH(1, m) - EEH(-1, m) + EEH(sin(x-Pi), m)
        ):
      end proc:

 n := floor(x/Pi):
 s := 2*n*EEH(1, I) + f(x-n*Pi, m);
end proc:

plot(
  ['AS'(x, -1), 'EllipticE'(x, I)]
  , x=-Pi..Pi
  , color=[blue, red]
  , style=[line, point]
  , symbol=[NULL, circle]
  , numpoints=30
  , legend=["MMA llipticE", "Maple EllipticE"]
)

 

plot(
  ['AS'(x, -1), 'EllipticE'(x, I)]
  , x=-10..10
  , color=[blue, red]
  , style=[line, point]
  , symbol=[NULL, circle]
  , numpoints=30
  , legend=["MMA llipticE", "Maple EllipticE"]
):

 

 

Download EllipticE__MMA_vs_Maple.mw

@nm  @dharr

Maple notation and MMA/SageMath notation differ.

help(EllipticK)

Elliptic integrals and the related functions are well described in the Table of Integrals Series and Products, Gradshteyn and Ryzhik (G&R) and in the popular Handbook of Mathematical Functions edited by Abramowitz and Stegun (A&S). In A&S, these functions are expressed in terms of a parameter m, representing the square of the modulus k entering the definition of the Elliptic, JacobiPQ and InverseJacobiPQ functions in Maple and G&R. For example, the  K(m) function shown in A&S is numerically equal to the Maple EllipticK(sqrt(m)) command.

Both  MMA and SageMath use the Abramowitz-Stegun notation.
From §17.3 p 590
So Maple returning EllipticE(sinx(x), I) is the same thing than MMA or SageMath retyrning EllipticE(sin(x), -1)

@janhardo 

Of course Maple 2015 should have ALWAYS return an error when computing R := A.B.A^+

The problem is it didn't  when A and B were floats.

@C_R 

I did a mistake... but Maple 2015 is gugged in a way which balances my mistake.
Look at this: A being a vector column Maple 2015 computes cvorrectly A.B.A+ if A and B hve floating point elements!

restart

kernelopts(version)

`Maple 2015.2, APPLE UNIVERSAL OSX, Dec 20 2015, Build ID 1097895`

(1)

A := Vector(2, symbol=a):
B := Matrix(2$2, symbol=b):
R := A.B.A^+

Error, (in LinearAlgebra:-Multiply) cannot multiply a column Vector and a Matrix

 

A := Vector(2, i -> i):
B := Matrix(2$2, (i, j) -> i+j):
R := A.B.A^+

Error, (in LinearAlgebra:-Multiply) cannot multiply a column Vector and a Matrix

 

A := evalf(Vector(2, i -> i)):
B := evalf(Matrix(2$2, (i, j) -> i+j)):
R := A.B.A^+

30.

(2)
 

 

Download Maple_2015_Bug.mw

In my procedure I mistakenly wrote Wres  := (res . Rinv . res^+); where res is a vector column.
With the Maple 2015's bug above I get the correct result.
In your more recent version (I guess) this bug has been fixed, hence the error you are receiving..

Here is my corrected worksheet Iterative_Gaussian_Process_Emulator.mw
Apologies

@C_R 

In your comments you use L as a synomym for lambda. Correct? : right, sorry for my not having been consistent in my notations.

About your error:

  1. In Maple 2015 X0, Y0, X and Y are column vectors: is it so with your Maple Version?
    1. If not convert all of them into Vector[column],
    2. if yes then 
      1. Download the attached file and duplicate it.
      2. Open them both in your Maple version.
      3. Keep the first one as the reference (do not execute it).
      4. Execute the second one.
      5. Tell me where the error occurs.


Compare_outputs_2015_your_version.mw

@C_R

In particular your informations about TuringBot are very interesting and I will havea look on it soon.

In case you would be interested in doing the same kind of thing in the future, here is a procedure which builds a GPE (Gaussian process Emulator) of an unknown scalar function f : X → Y (X being 1D) iteratively.
There is no stopping criterion: the decision to stop the process is yours as soon as you consider the GPE fits the data points well enough.

The process starts by selecting a small size subset of the data set (which may be interesting if this latter is large)  and by constructing a first GPE based on this subset.
Next the process enters an iterative step where a new point (belonging to the data set) is added to the subset at each iteration and the GPE is rebuilt.
The selection of the new point is based on the prediction error of the GPE built at the previous iteration.

The whole process converges quite rapidly.
For your data on can get a very good result by using less than 15% of the data set.

Iterative_Gaussian_Process_Emulator.mw

@dharr 

The attached file contains a variant named QR_RationalFractionFit of procedure RationalFractionFit .
Compared to this latter I replaced the call to LinearFit by a the QRDecomposition of the matrix named inputs plus a resolution of the system R . C = Q+ . output  using LinearSolve.

The attached file presents the [9, 9] case, which generates a Warning, model is not of full rank. You will see that the  QR decomposition stabilizes the system an avoid the occurrenceof this warning.

The new QR_RationalFractionFit procedure is incomplete because it does not return the many statistics that LinearfFit returns. So there is still a lot of work to be done butnothing complicated here, simply some time to spend on it.

QR_RationalFractionFit.mw




Using this  QR_RationalFractionFit procedure enables to compute the AIC criterion up to values greater than those at which you had to stop because ot the warnings.
The best model from the AIC criterion point of view has a numerator of degree dn = 11 and a denominator of degree dd =  13, corresponding to a 23-parameter model.
Comparing this number to the 101 points the data set contains means that this "best" model has a complexity of about 23% and so that it does not overfit the data.



Note that the fittings have been done up to the [20, 20] rational fraction without receiving any warning. 

1 2 3 4 5 6 7 Last Page 2 of 38