I must thank @Scot Gould for having asked this question more than a year ago and thus, without meaning to, having been the driving force behind this post.


There is an enormous litterature about Monte-Carlo integration (MCI for short) and you might legitimately ask "Why another one?".



A personal experience.
Maybe if I tell you about my experience you will better understand why I believe that something is missing in the traditional courses and textbooks, even the most renowned ones.

For several years, I led training seminars in statistics for engineers working in the field of numerical simulation.
At some point I always came to speak about MCI and (as anyone does today) I introduced the subject by presenting the estimation of the area of a disk by randomly picking points in its circumsribed square and assessing its area from the proportion of points it contained.




Once done I switched (still as anybody does) to the Monte-Carlo summation formula (see Wikipedia for instance).

One day an attendee asked me this question "Why do you say that this [1D] summation formula is the same thing that the [2D] counting of points in the [circle within a box] example you have just presented?"
I have to say I was surprised by this question for it seemed to me quite evident that these two ways of assessing the area were nothing but two different points of view of, roughly, the same thing.
So I gave a quick, mostly informal, explanation (that I am not proud of) and, because the clock was running, I kept teaching the class.

But this question really puzzled me and I thought for a simple but rigourous way to prove these two approaches were (were they?) equivalent, at least in some reasonable sense.
The thing is that trying to derive simple explanations based on couting is not enough, and that you have to resort to certain probabilistic arguments to get out of it. Indeed, sticking to the counting approach leads to the more reasonable position that these two approaches are not equivalent.

The end of the story is that I spent more time on these two approaches of MCI during the trainings that followed.
Saying that, yes, the summation formula seems to be the reference today, but that the old counting strategy still has some advantages and can even gives access to informations the summation formula cannot.



About this post.
This post foccuses mainly on what I call the Historical viewpoint (counting points), and is aimed, in its first part, to answer the question "Is this point of view equivalent or not to the Modern (summation formula) one?" (And if it is, in what sense is it so?).

Let me illustrate this with the example
@Scot Gould  presented in its question. The brown bold curve on the left figure is the graph of the function  func(x) (whose expression has no interest here) and the brown area represents the area we want to assess using MCI.
In the Historical approach I picked unifomly at random N=100 points within the gray box (of area 2.42), found 26 of them were in the brown region and said the area of this latter is 2.42 x 26/100 = 0.6292. The Modern approach consists in picking uniformly N random points in the range x= [0.8, 3],  and using the blue formula to get an estimation of this same area ((
Lbox is the x-length of the gray box, here equal to 2.2).
The quesion is: Am I assessing the same thing when I apply either method? And, perhaps more importantly,  do my estimators have the same properties?




And here apppears a first problem:

  • Whatever the number of times you repeat the Historical sampling method, even with different points, you will always get a number of points in the brown region between 0 and N included, meaning that if S is the area of the gray box, the estimation of the brown area is always one of these numbers {0, S/N, 2.S/N, ..., S}.
  • At the opposite repetitions of the Modern approach will lead to a continuum of values for this brown area.
  • So, saying the two approaches might be equivalent simply means that a discrete set is equivalent to a non countable one.

If we remain at the elementary counting level, Historical and Modern viewpoints then are not equivalent.



Towards a probabilistic model of the Historical Process:
This goes against everything you may have heard or read: so, are the authors of these statements all wrong?
Yes from a strict Historical point of view, but happily not if we interpret the Historical approach in a more loose and probabilistic manner (although this still needs to be considered carefully as it is shown in the main worksheet).

This probabilistic manner relies upon a probabilistic model of the Historical process, where the event "K points out of N belong to the brown area" is to be interpreted as the realization of a very special random variable named Poisson-Binomial (do not worry if you never heard about it: a lot of statisticians did not neither).
In a few words, whereas a Binomial random variable is the sum of several independent and identically distributed Bernoulli random variables, a Poisson-Binomial random variable is the sum of several independent but not necessarily identically distributed Bernoulli random variables. Thus the Poisson-Binomial distribution generalizes the Binomial one.


Using the properties of Poisson-Binomial random variables we must prove in a rigourous way that the expectations of the area estimators for both the Historical and Modern approaches are identical.
So, given this "trick" the two methods are thus equivalent, are they not? And that settles it.
In fact no, the matetr of equivalence still remains.



When uncertainty enters the picture.
Generally one cannot satisfy ourselves with the sole estimation of the area and we would like to have informations about the reliability of this estimation. For instance if I find this value is 0.6292, am I ready to bet my salary that I am right? Of course not, unless I am insane, but the things would change if I am capable to say for instance that "I am 95% sure that the true value of the area is between 0.6 and 0.67".

For the Historical vewpoint the Poisson-Binomial model makes possible to asses an uncertainty (not the uncertainty!) of the area estimation. But things are subtle, because there are different ways to compute an uncertainty:

  • At the elementary level the height of the gray box is an essential parameter, but it does not necessarily gives a good estimation of this uncertainty (one can easily reduced this latter arbitrarily close to 0!).
  • To get reliable uncertainty estimation the call to a probability theory related to Extreme Value Theory (EVT for short) necessary (all of this is explained in the attached worksheet).


For the Modern point of view it is enough to observe that there is no concept of "box height" and that it is then   impossible to assess any uncertainty. Question: "If it is so, how can (all the) MCI procedures return an uncertainty value?"
The answer is simple: they consider a virtual encapsulating box whose eight is the maximum of the 
func(xi). This trick enables providing an uncertainty, but this is a non-conservative estimation (an over-optimistic one if you prefer, in other terms an estimation we mus regard very carefully).

So, at the end Historical and Modern approaches are equivalent only if we restrict to the estimation of the area, but no longer as soon as we are interested in the quality of this estimation.



What does the attached file contain?
The attached file speaks a lot of the estimation of the estimator uncertainty.
The core theory is named (Right) EndPoint Theory (I found nothing on Wikipedia nor any easy-to-read papers about this theory, so I more or less arbitrarilly decided to refer to this one). Basically it enables assessing the (usually right) end-point of a distribution known only through (right) censored data.
The simplest example is those of a New York pedestrian who looks to the taxi numbers and asks himself how to assess the highest number a taxi has. Here we know this number exists (meaning that some related distribution is bounded), but the situation can be more complex if one does not ever know if this distribution is bounded or not (in which cas one seeks for a right end-point whose probability to be overpassed is less than some small value).
A conservative, and thus reliable, uncertainty on the area estimator  can only be derived in the framework of the end-point theory.

Once the basis of this theory are understood it becomes relatively simple to enhance the Historical approach to get estimators with lessen uncertainties.
I present different ways to do this: one (even if derived otherwise) is named Importance Sampling, and the other leads in a straightforward way to algorithms which are quite close to some used in the CUBA library (partially accessible through evalf/Int).

The last important, if not fundamental, concept discussed in this article concerns the distinction between dispersion interval and confidence interval, concepts that are unfortunately not properly distinguished due to the imprecision of the English language (I apologize to native English speakers for these somewhat harsh words, but this is the reality here).
Some references are provided in attached (main) worksheet, but please, if you don't want to end up even more confused than you were before, avoid Wikipedia.



To sum up.
This note is a non-orthodox presentation of MCI centered arround the Historical viewpoint which, I am convinced of that, deserves a little more attention than the disk-in-the-square picture commonly displayed in MCI courses and textbooks.
An I am even more convinced of that then this old-fashion (antiquated?) approach is an open door to some high level probability theories such than the EndPoint and the EVT one.

Of course this post is not an advocacy agaist the Modern approach, and does not mean that you have to ignore classical texts or that the Law of Large Numbers (LLN) or the Central limit theorms are useless stuff in MCI.




Maple, but not just Maple.
A part of the attached worksheet is devoted base presents results I got with R  (a programming language for statistical computing and data visualization), simply because Maple 2015 (and it is still true for Maple 2025) did not contain the functions I needed.
For instance R implements the Cuba library in a far more complete way than Maple (I give a critical discussion about the way Maple does it), enabling for instance the change of the random seed
.

Main worksheet (I apologize in advance for typos that could remain in the texts)
A_note_on_Monte-Carlo_Integration.mw

The main worksheet refers to this one
How_does_the_variance_of_f_impact_the_estimator_dispersion.mw


Extra worksheet: An introduction to Importance Sampling
Importance_Sampling.mw


Please Wait...