mmcdara

4316 Reputation

17 Badges

6 years, 250 days

MaplePrimes Activity


These are Posts that have been published by mmcdara

Hi, 

In a recent post  (Monte Carlo Integration) Radaar shared its work about the numerical integration, with the Monte Carlo method, of a function defined in polar coordinates.
Radaar used a raw strategy based on a sampling in cartesian coordinates plus an ad hoc transformation.
Radaar obtained reasonably good results, but I posted a comment to show how Monte Carlo summation in polar coordinates can be done in a much simpler way. Behind this is the choice of a "good" sampling distribution which makes the integration problem as simple as Monte Carlo integration over a 2D rectangle with sides parallel to the co-ordinate axis.

This comment I sent pushed me to share the present work on Monte Carlo integration over simple polygons ("simple" means that two sides do not intersect).
Here again one can use raw Monte Carlo integration on the rectangle this polygon is inscribed in. But as in Radaar's post, a specific sampling distribution can be used that makes the summation method more elegant.

This work relies on three main ingredients:

  1. The Dirichlet distribution, whose one form enables sampling the 2D simplex in a uniform way.
  2. The construction of a 1-to-1 mapping from this simplex into any non degenerated triangle (a mapping whose jacobian is a constant equal to the ratio of the areas of the two triangles).
  3. A tesselation into triangles of the polygon to integrate over.


This work has been carried out in Maple 2015, which required the development of a module to do the tesselation. Maybe more recent Maple's versions contain internal procedures to do that.
 

Monte_Carlo_Integration.mw

 

Hi, 

The present work is aimed to show how bayesian inference methods can be used to infer (= to assess) the probabilility that a person detected infected by the SARS-Cov2  has to die (remark I did not write "has to die if it" because one never be sure of the reason of the death).
A lot of details are avaliable in the attached pdf file (I tried to be pedagogic enough so that the people not familiar with bayesian inference can get a global understanding of the subject, many links are provided for quick access to the different notions).

In particular, I explain why simple mathematics cannot provide a reliable estimate of this probability of death (sometimes referred to as the "death rate") as long as the epidemic continues to spread.

Even if the approach presented here is rather original, this is not the purpose of this post. 
Since a long time I had in mind to post here an application concerning bayesian methods. The CoVid19 outbreak has only provided me with the most high-profile topic to do so.
I will say no more about the inference procedure itself (all the material is given in the attached pdf file) and I will only concentrate on the MAPLE implementation of the solution algorithm.

Bayesian Inference uses generally simple algorithms such as MCMC (Markov Chain Monte Carlo) or ABC (Approximate Bayesian Computation) to mention a few, and their corresponding pseudo code writes generally upon a few tens of lines.
This is something I already done with other languages but I found the task comparatively more difficult with Maple. Probably I was to obsess not to code in Maple as you code in Matlab or R for instance.
At the very end the code I wrote is rather slow, this because of the allocated memory size it uses.
In a question I posed weeks ago (How can I prevent the creation of random variables...) Preben gave a solution to limit the burst of the memory: the trick works well but I'm still stuked with memory size problems (Acer also poposed a solution but I wasn't capable to make it works... maybe I was too lazzy to modify deeply my code).

Anyway, the code is there, in case anyone would like to take up the challenge to make it more efficient (in which case I'll take it).

Note 1: this code contains a small "Maplet" to help you choose any country in the data file on which you would like to run the inference.
Note 2: Be careful: doing statistics, even bayesian statistics, needs enough data: some countries have history records ranging over a few days , or no recorded death at all; infering something from so loos date will probably be disappointing

The attached files:

  • The pdf file is the "companion document" where all or most of it is explained.It has been written a few days ago for another purpose and the results it presents were not ontained from the lattest data (march 21, 2020 coronavirus)
  • xls files are data files, they were loaded yesterday (march 28, 2020) from here coronavirus
  • the mw file... well, I guess you know what it is.
     

Bayesian_inference.pdf

total-cases-covid-19_NF.xls

total-deaths-covid-19_NF.xls

Bayesian_Inference_ABC+MCMC_NF_2.mw


 

Hi,

Two weeks ago, I started loading data on the CoVid19 outbreak in order to understand, out of any official communication from any country, what is really going on.

From february 29 to march 9 these data come from https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/ and from 10 march until now from https://www.worldometers.info/coronavirus/#repro.In all cases the loading is done manually (copy-paste onto a LibreOffice spreadsheet plus correction and save into a xls file) for I wasn't capable to find csv data (csv data do exist here https://github.com/CSSEGISandData/COVID-19, by they end febreuary 15th).
So I copied-pasted the results from the two sources above into a LibreOffice spreadsheet, adjusted the names of some countries for they appeared differently (for instance "United States" instead of "USA"), removed the unnessary commas and saved the result in a xls file.

I also used data from https://www.worldometers.info/world-population/population-by-country/ to get the populations of more than 260 countries around the world and, finally, csv data from https://ourworldindata.org/coronavirus#covid-19-tests to get synthetic histories of confirmed and death cases (I have discovered this site only yesterday evening and I think it could replace all the data I initially loaded).

The two worksheet here are aimed to exploratory and visualization only.
An other one is in progress whose goal is to infer the true death rate (also known as CFR, Case Fatality Rate).

No analysis is presented, if for no other reason than that the available data (except the numbers of deaths) are extremely dependent on the testing policies in place. But some features can be drawn from the data used here.
For instance, if you select country = "China" in file Covid19_Evolution_bis.mw, you will observe very well known behaviour which is that the "Apparent Death Rate", I defined as the ratio of the cumulated number of death at time t by the cumulatibe number of confirmed cases at the same time, is always an underestimation of the death rate one can only known once the outbreak has ended. With this in mind, changing the country in this worksheet from China to Italy seems to lead to frightening  scary interpolations... But here again, without knowing the test policy no solid conclusion can be drawn: maybe Italy tests mainly elder people with accute symptoms, thus the huge "Apparent Death Rate" Italy seems to have?


The work has been done with Maple 2015 and some graphics can be improved if a newer version is used (for instance, as Maple 2015 doesn't allow to change the direction of tickmarks, I overcome this limitation by assigning the date to the vertical axis on some plots).
The second Explore plot could probably be improved by using newer versions or Maplets or Embeded components.

Explore data from https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/ and https://www.worldometers.info/coronavirus/#repro
Files to use
Covid19_Evolution.mw
Covid19_Data.m.zip
Population.xls

Explore data from  https://ourworldindata.org/coronavirus#covid-19-tests
Files to use
Covid19_Evolution_bis.mw
daily-deaths-covid-19-who.xls
total-cases-covid-19-who.xls
Population.xls


I would be interested by any open collaboration with people interested by this post (it's not in my intention to write papers on the subject, my only motivation is scientific curiosity).

 

Here is a little animation to wish all of you a Merry Christmas

FireWorks.mw


Hi, 

This is more of an open discussion than a real question. Maybe it would gain to be displaced in the post section?

Working with discrete random variables I found several inconsistencies or errors.
In no particular order: 

  • The support of a discrete RV is not defined correctly (a real range instead of a countable set)
  • The plot of the probability function (which, in my opinion, would gain to be renamed "Probability Mass Function, see https://en.wikipedia.org/wiki/Probability_mass_function) is not correct.
  • The  ProbabiliytFunction of a discrte rv of EmpiricalDistribution can be computed at any point, but its formal expression doesn't exist (or at least is not accessible).
  • Defining the discrete rv "toss of a fair dice"  with EmpiricalDistribution and DiscreteUniform gives different results.


The details are given in the attached file and I do hope that the companion text is clear enough to point the issues.
I believe there is no major issues here, but that Maple suffers of some lack of consistencies in the treatment of discrete (at least some) rvs. Nothing that could easily be fixed.


As I said above, if some think this question has no place here and ought to me moved to the post section, please feel free to do it.

Thanks for your attention.


 

restart:

with(Statistics):


Two alternate ways to define a discrete random variable on a finite set
of equally likely outcomes.

Universe    := [$1..6]:
toss_1_dice := RandomVariable(EmpiricalDistribution(Universe));
TOSS_1_DICE := RandomVariable(DiscreteUniform(1, 6));

_R

 

_R0

(1)


Let's look to the ProbabilityFunction of each RV

ProbabilityFunction(toss_1_dice, x);
ProbabilityFunction(TOSS_1_DICE, x);

"_ProbabilityFunction[Typesetting:-mi("x",italic = "true",mathvariant = "italic")]"

 

piecewise(x < 1, 0, x <= 6, 1/6, 6 < x, 0)

(2)


It looks like the procedure ProbabilityFunction is not an attribute of RV with EmpiticalDistribution.
Let's verify

law := [attributes(toss_1_dice)][3]:
lprint(exports(law))

Conditions, ParentName, Parameters, CDF, DiscreteValueMap, Mean, Median, Mode, ProbabilityFunction, Quantile, Specialize, Support, RandomSample, RandomVariate

 


Clearly ProbabilityFunction is an attribute of toss_1_dice.

In fact it appears the explanation of the difference of behaviours relies upon different definitions
of the set of outcomes of toss_1_dice and TOSS_1_DICE

LAW := [attributes(TOSS_1_DICE)][3]:
exports(LAW):

law:-Conditions;
LAW:-Conditions;

[(Vector(6, {(1) = 1, (2) = 2, (3) = 3, (4) = 4, (5) = 5, (6) = 6}))::rtable]

 

[1 < 6]

(3)


From :-Conditions one can see that toss_1_dice is realy a discrete RV defined on a countable set of outcomes,
but that nothing is said about the set over which TOSS_1_DICE is defined.

The truly discrete definition of toss_1_dice is confirmed here :
(the second result is correct

ProbabilityFinction(toss_1_dice, x) = {0 if x < 1, 0 if x > 6, 1/6 if x::integer, 0 otherwise

ProbabilityFunction~(toss_1_dice, Universe);
ProbabilityFunction~(toss_1_dice, [seq(0..7, 1/2)]);

[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]

 

[0, 0, 1/6, 0, 1/6, 0, 1/6, 0, 1/6, 0, 1/6, 0, 1/6, 0, 0]

(4)


One can also see that the Support of both of these RVs are wrong

(see for instance https://en.wikipedia.org/wiki/Discrete_uniform_distribution)

There should be {1, 2, 3, 4, 5, 6}, not a RealRange.

Support(toss_1_dice);
Support(TOSS_1_DICE);

RealRange(1, 6)

 

RealRange(1, 6)

(5)

 

0

 

{1, 2, 3, 4, 5, 6}

 

 


Now this is the surprising ProbabilityFunction of TOSS_1_DICE.
This obviously wrong result probably linked to the weak definition of the conditions for this RB.

# plot(ProbabilityFunction(TOSS_1_DICE, x), x=0..7);
plot(ProbabilityFunction(TOSS_1_DICE, x), x=0..7, discont=true)

 


These differences of treatments raise a lot of questions :
    -  Why is a DiscreteUniform RV not defined on a countable set?
    -  Why does the ProbabilityFunction of an EmpiricalDistribution return no result
        if its second parameter is not set to one  its outcomes.

 All this without even mentioning the wrong plot shown above.
 

I believe something which would work like the module below would be much better than what is done

right now

 

EmpiricalRV := module()
export MassDensityFunction, PlotMassDensityFunction, Support:

MassDensityFunction := proc(rv, x)
  local u, v, N:
  u := [attributes(rv)][3]:
  if u:-ParentName = EmpiricalDistribution then
    v := op([1, 1], u:-Conditions);
    N := numelems(v):
    return piecewise(op(op~([seq([x=v[n], 1/N], n=1..N)])), 0)
  else
    error "The random variable does not have an EmpiricalDistribution"
  end if
end proc:

PlotMassDensityFunction := proc(rv, x1, x2)
  local u, v, a, b:
  u := [attributes(rv)][3]:
  if u:-ParentName = EmpiricalDistribution then
    v := op([1, 1], u:-Conditions);
    a := select[flatten](`>=`, v, x1);
    b := select[flatten](`<=`, a, x2);
    PLOT(seq(CURVES([[n, 0], [n, 1/numelems(v)]], COLOR(RGB, 0, 0, 1), THICKNESS(3)), n in b), VIEW(x1..x2, default))
  else
    error "The random variable does not have an EmpiricalDistribution"
  end if
end proc:

Support := proc(rv, x1, x2)
  local u, v, a, b:
  u := [attributes(rv)][3]:
  if u:-ParentName = EmpiricalDistribution then
    v := op([1, 1], u:-Conditions);
    return {entries(v, nolist)}
  else
    error "The random variable does not have an EmpiricalDistribution"
  end if
end proc:

end module:
 

EmpiricalRV:-MassDensityFunction(toss_1_dice, x);
 

piecewise(x = 1, 1/6, x = 2, 1/6, x = 3, 1/6, x = 4, 1/6, x = 5, 1/6, x = 6, 1/6, 0)

(6)

f := unapply(EmpiricalRV:-MassDensityFunction(toss_1_dice, x), x):
f(2);
f(5/2);
 

1/6

 

0

(7)

EmpiricalRV:-PlotMassDensityFunction(toss_1_dice, 0, 7);

 

 


 

Download Discrete_RV.mw

 

 

1 2 3 4 Page 3 of 4