acer

32405 Reputation

29 Badges

19 years, 345 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

Thanks for comments. But I should mention that I knew full well what caused it; and that cause is not so interesting as the fact that the slowdown is very severe. It's a huge amount of overhead to impose (unnecessarily) on the inexperienced user in the default environment. Most users will not realize that their code has this performance penalty.

Do not for a moment forget that what is happening is not a initial parsing slowdown, but an ongoing computational slowdown. And that is far more remarkable as an effect of typesetting merely of the code and not of the output. And it is non-negligible -- each application of that dot-lookalike takes as much time as a 2x2 Matrix multiplication!

Working around the problem by using prefix operators is not the answer for the majority of users. The experienced user will likely have no trouble finding that or other deft workarounds. But the nonexpert user, who is most susceptible to whether or not the default environment is effective and naturally efficient, will likely not even realize that there is any avoidable performance issue. For the majority of users the solution to this issue is to not use 2D Math input mode for writing procedures.

acer

It was really unclear (to me) whether the OP just wanted to easily create a 3x2 Matrix from a list and have the scanning be done by column, or whether the goal was to produce something specifically with C order.

There seemed to be the possibility that she only really wanted the scanning of the flat listdata by columns. (Doug seems to have inferred that, and ignored what would then be the C_order red-herring). In that case, the underlying "order" that Maple is using to store the data internally is irrelevant, and the OP should most certainly not be trying to get cheap transposition just by flipping the order. It seemed as if the OP might not realize that the orcer=C_order refers to the internal storage in memory and not how the data appears and gets printed.

The other possibility is that the OP wanted not only the data in A to be injected (scanned)  into a Matrix by columns, but that she also very specifically wanted C_order storage underneath. (Who knows, maybe the OP wanted to use ImageTools instead of LinearAlgebra and failed to clarify it).

It is confusing that the Matrix constructor reads a flat list as a row-scan, and never as a column-scan unless the list is split. I mean it seems almost a bug that this does not work,

A:=[$1..6]:
Matrix(3,2,A,scan=columns);

while this does work,

A:=[$1..6]:
Matrix(3,2,[ListTools:-LengthSplit(A,3)],scan=columns);

Fortunately, the answer that I gave (and I see since submission that Preben's is essentially the same) does not get garbled if one also slaps on an order=C_order option. So hopefully it works whatever the OP's actual goal was.

In case anyone cares, here's a tip. Do not try and transpose a Matrix by just flipping the order, even for square Matrices. It may print like what you want, but internally it is not, and it won't always get treated as you expect by LinearAlgebra.

M:=Matrix([[1.0,2.0],[3.0,4.0]],datatype=float[8]);
rtable_options(M,order=C_order); # acts inplace on M
M;

And don't get beguiled into expecting that inplace 'order' flipping will "transpose" as neatly for nonsquare Matrices, just because it seems to work for square Matrices..

BTW, ArrayTools has lots of nifty exports for manipulation. I was able to cook up another three ways to get the effect that Erik showed, using mixes of DataTranspose and Reshape. But I advise using some of those only if one understands the storage vs display distinction.

acer

It was really unclear (to me) whether the OP just wanted to easily create a 3x2 Matrix from a list and have the scanning be done by column, or whether the goal was to produce something specifically with C order.

There seemed to be the possibility that she only really wanted the scanning of the flat listdata by columns. (Doug seems to have inferred that, and ignored what would then be the C_order red-herring). In that case, the underlying "order" that Maple is using to store the data internally is irrelevant, and the OP should most certainly not be trying to get cheap transposition just by flipping the order. It seemed as if the OP might not realize that the orcer=C_order refers to the internal storage in memory and not how the data appears and gets printed.

The other possibility is that the OP wanted not only the data in A to be injected (scanned)  into a Matrix by columns, but that she also very specifically wanted C_order storage underneath. (Who knows, maybe the OP wanted to use ImageTools instead of LinearAlgebra and failed to clarify it).

It is confusing that the Matrix constructor reads a flat list as a row-scan, and never as a column-scan unless the list is split. I mean it seems almost a bug that this does not work,

A:=[$1..6]:
Matrix(3,2,A,scan=columns);

while this does work,

A:=[$1..6]:
Matrix(3,2,[ListTools:-LengthSplit(A,3)],scan=columns);

Fortunately, the answer that I gave (and I see since submission that Preben's is essentially the same) does not get garbled if one also slaps on an order=C_order option. So hopefully it works whatever the OP's actual goal was.

In case anyone cares, here's a tip. Do not try and transpose a Matrix by just flipping the order, even for square Matrices. It may print like what you want, but internally it is not, and it won't always get treated as you expect by LinearAlgebra.

M:=Matrix([[1.0,2.0],[3.0,4.0]],datatype=float[8]);
rtable_options(M,order=C_order); # acts inplace on M
M;

And don't get beguiled into expecting that inplace 'order' flipping will "transpose" as neatly for nonsquare Matrices, just because it seems to work for square Matrices..

BTW, ArrayTools has lots of nifty exports for manipulation. I was able to cook up another three ways to get the effect that Erik showed, using mixes of DataTranspose and Reshape. But I advise using some of those only if one understands the storage vs display distinction.

acer

If you look carefully enough at the results resturned by `solve` in both my and John's earlier answers you should be able to see that they do in fact contain such formulae. That is, they contain answers of the form a=..., b=... in terms of X.

But the right-hand-sides of those answers like {a=..,b=...} in terms of X are implicit RootOfs, and not explicit arithmetic formulae in terms of radicals.

You may not like that implicit RootOf form. But as I showed it is useful. Using evalf or fsolve following substitution of X by a float value (as I showed using evalf), all the roots for any given numeric X can be obtained. Hence, you can much more easily and reliably obtain all the roots of those RootOfs involving simple polynomials in X (for any given numeric X) than you can obtain roots of the system of equations using fsolve (for any given numeric X).

The extra RootFinding:-Parametric bit I added in my worksheet, before the `solve` stuff, was to show how many roots (ie. distinct sets of values for a,b,c,d) there are for separate ranges of real-valued parameter  X.

acer

Something has gone wrong with the last two paragraphs of the original post.

At some earlier date, those two paragraphs likely mentioned setting values for kernelopts(prettyprint). But the text's meaning is now slightly garbled, as if instances of that term had disappeared from the text.

For those wondering, labelling can be turned enabled in the Standard GUI by first setting the value of the `prettyprint` interface setting to either 1 or 2. For example, by issuing,

interface(prettyprint=1):

Now, enabled labeling doesn't usually produce as nice results in Standard. Oh, sure, it allows results to be neatly displayed, whereas without it those results would be practically unviewable. But enabling it also seems to wreck proper breaking of other, non-labeled terms in long lines. And so huge, long lines can appear, sometimes in displayed nonscalar objects' output. And, as John mentioned, the mechanism tends to choose common subexpressions for labeling which are awkwardly too short. Presumably that is why labeling is not enabled by default in Standard: it works too poorly.

But labeling should be an important aspect of functionality in any good CAS's user interface, and it really ought to be properly ported from commandline/Classic to Standard. After all, the Standard GUI was introduced seven years and six major releases ago.

Unfortunately, most Standard users don't realize what great stuff they are missing out on... because they've never seen the alternative. There are lots of simple symbolic solving examples where the unlabeled output either exceeds the permitted display length at the default settings or is too long to be sensibly legible.

acer

Thanks, John. I had also considered using `eliminate`.

But I figured that the OP might be interested in knowing how many solutions lay in the various regions for X along the real line.

As a wise man once told me: people asking (vague) questions about parametric systems are almost always going to be disappointed by answers based upon guesses as to what they really want.

Hey! You and your fancy commandline/Classic subexpression labeling. You made my Windows Standard GUI "go away" with your semi-colon. ;)

> sol:=eliminate({a*b^2-b+c-d-3, a^3*b^2-b+c-d^3+4, 
> a^4+b^3-b^2+c-d+2, 2*a*b^2-b+c-d-1-X}, {a,b,c,d}):

> sol2:= SolveTools:-PolynomialSystem( {a*b^2-b+c-d-3, a^3*b^2-b+c-d^3+4,
> a^4+b^3-b^2+c-d+2, 2*a*b^2-b+c-d-1-X}, {a,b,c,d}, domain=parametric):

> length(sol), length(sol2);
5183500, 1398546

I suppose that I'd have to read the SolveTools help-page to figure out how to pick apart the essentially unviewable answer in the Standard GUI, and substitute in values for X, etc. It's ridiculous that the result sol2 can be viewed ok -- without the Standard GUI turning to sludge and finally printing something near unreadable -- but only after issuing something like,

> interface(prettyprint=1);

acer

Thanks, John. I had also considered using `eliminate`.

But I figured that the OP might be interested in knowing how many solutions lay in the various regions for X along the real line.

As a wise man once told me: people asking (vague) questions about parametric systems are almost always going to be disappointed by answers based upon guesses as to what they really want.

Hey! You and your fancy commandline/Classic subexpression labeling. You made my Windows Standard GUI "go away" with your semi-colon. ;)

> sol:=eliminate({a*b^2-b+c-d-3, a^3*b^2-b+c-d^3+4, 
> a^4+b^3-b^2+c-d+2, 2*a*b^2-b+c-d-1-X}, {a,b,c,d}):

> sol2:= SolveTools:-PolynomialSystem( {a*b^2-b+c-d-3, a^3*b^2-b+c-d^3+4,
> a^4+b^3-b^2+c-d+2, 2*a*b^2-b+c-d-1-X}, {a,b,c,d}, domain=parametric):

> length(sol), length(sol2);
5183500, 1398546

I suppose that I'd have to read the SolveTools help-page to figure out how to pick apart the essentially unviewable answer in the Standard GUI, and substitute in values for X, etc. It's ridiculous that the result sol2 can be viewed ok -- without the Standard GUI turning to sludge and finally printing something near unreadable -- but only after issuing something like,

> interface(prettyprint=1);

acer

@Christopher2222 It's really not clear to what you are referring, when you write that Maple scores last.

I'm pretty sure that you are referring to the review Duncan mentioned, and not to the review mentioned in the parent post. But you've failed to mention which it was that you were discussing, and with the lack of comment-threading-indentation it's ambiguous.

@DuncanA It's true that the ncrunch review is more in depth. It has a slant toward computational statistics (which is fine, as that is pretty much declared explicitly).  But it is also quite wrong in places. And the benchmark code is not all implemented in a fair or near optimal way.

See here for a Mapleprimes post about it, from a few years ago.

The functionality charts are also wrong in places, and misleading in some others. For example, some claims about Maple's debugger are just plain wrong. It says that Maple doesn't support FFT in more than 1D, while of course one can just use the 1D version to get a 2D calculation. And so on. It takes off points for not having dedicated commands for some things which are easily done as 1-liners (eg. I can think of at least 3 simple ways to make a "Pascal Matrix" generator in 1 line of code, so why would anyone want a routine for that!?). That's just a sample; there are more errors both large and small.

And the weighting scheme for the various sections is quite arbitrary and subjective.

Please don't misunderstand. It's obviously the result of a lot of hard work and effort. And the author has tried to keep it regularly updated. But it needs better review and correction by experts before publication, or at least some sort of working feedback and correction mechanism.

There are more advisers/reviewers listed on the ncrunch site for Matlab and Mathematica than for Maple. (No offense is intended to anyome; more heads always do better.) Even still, I suspect that there might be errors related to each of the programs in the Review, and significant improvements possible perhaps for the posted code for others and not just for Maple.

Maybe comprehensive comparisons are just too much work to do really well. An interesting alternative might be tight, narrowly focused product comparisons of individual aspects of functionality, a bit like this earlier suggestion.

acer

Could you post the code here, by uploading it in a file?

If you aren't at liberty to show the code in public then maybe you could send it to Maplesoft's technical support.

acer

@marvin Presumably you are talking about hardware precision float Matrices and Vectors (float[8] or complex[8] datatypes) since you mention alternatively doing it on CUDA.

Are you sure that the external MKL (Intel Math Kernel Library) is not already making use of all or most of the multiple cores when doing each of the individual Matrix-Vector multiplications? If that were the case, then not much more speedup might even be possible. (You might check it most simply by using the Task Manager, to see the total load.)

Investigation with compiler utilities reveals that dgemm and dgemv are not always called directly from Maple. Instead, an external library linalg.dll is called from Maple. (Just how, seems to depend on whether one uses `.` or MatrixMatrixMultiply.) And linalg.dll is linked against nag.dll. And the Library-side define_external may not have the option to enable multiple Threads to concurrently access that external library. There may be thread-safety issues here, related to MKL.

But if the MKL is already using multiple cores, then perhaps thing are already near optimal for your example. As a rough general rule, the lower-level the threadedness the better. Especially for array calculations where cache might be shared. Accelerated BLAS functions gain as much by virtue of being highly cache-tuned as they do by being threaded for 4-8 cores. (This touches a bit on another excellent post by Darin.)

I realize that you mention that example because it seems like a good test: easily parallelized across the list of Vectors. But in principal it would be suboptimal to storing all the Vectors in a single Matrix, so as to perform Matrix-Matrix multiplication. If nothing else this would avoid the overhead cost of `n` Maple function calls, which are relatively expensive compared to C function calls. But it also allows for a single call  to dgemm to be more cleverly cache-tuned than the overall task of `n` dgemv calls. Apologies if you considerd that -- as mentioned, yours is likely intended as a test example of the Threading functionality. But it may not be a clean example, on practical considerations.

In summary, I see at these considerations:

  • MKL may already be using multiple cores for each Matrix-Vector multiplication
  • multiple calls to linalg.dll may not be allowed concurrently (MKL higher-access thread-safety?)
  • `n` Matrix-Vector calls adds Maple overhead
  • `n` dgemv calls subverts cache-tuning specific to dgemm
  • `n` dgemv calls may subvert any Strassen or other superior dgemm algorithm

Changing topic slightly, things are really going to get interesting (I think) when machines commonly get hundreds of cores. Consider the case of solving a (non-modular) integer linear system using modular techniques. Sure, for each prime modulus a particular external call might use lots of cores, but perhaps not with linear speedup. Or maybe there are a huge number of cores available. At some point, it could be desirable to perform the modular LA operations in parallel, even though each one uses many cores. There would be enough cores that one would actually want to apportion them out to many concurrent modular subtasks to be run concurrently. In such a state of affairs it would be desirable to "fix" or work around the second bullet point above.

Let's remain calm, as things are as they were yesterday and the day before that...

There are two situations to be careful about. The first is computing with Digits<5. The solution is simple: don't do it. Don't assign Digits<5. Similarly, don't do evalf[d](...) or evalf(...,d) for d<5. This is for all sorts of expressions, both compound and simple.

Now for the second situation. For compound expressions, yes, one must be careful of roundoff error. That is true for any system with a working precision that is fixed (constant, or at best fixed per computation), and number of guard digits that is limited (constant, or at best limited according to the working precision). It is as true for a great deal of Maple as it is for, oh say, most of Matlab and compiled C/Fortran, etc.

Here is an example.

> restart:

> S:=eval(exp(abs(tan(x)-x)),x=1/10^4)-1;
                     /   /  1  \     1  \    
                  exp|tan|-----| - -----| - 1
                     \   \10000/   10000/    

> seq(evalf[d](S),d=10..15); # mostly all wrong, and only partly slightly correct
                                   -13        -13
               0., 0., 0., 0., 3 10   , 3.3 10   

> Chris2222evalf:=proc(a,b)
>   evalf(evalf(a,b+2),b)
> end proc:

> seq(Chris2222evalf(S,i),i=10..15); # just as bad, really (seriously)
                  -13        -13         -13          -13
      0., 0., 3 10   , 3.3 10   , 3.33 10   , 3.333 10   

> evalf[23](S): evalf[10](%); # thank you very much...
                                     -13
                       3.333333347 10   

The extra tough question is: how high do you have to raise the working precision X (including allowing a set number of guard digits Y) in order to certifiably attain a requested number Z of correct digits? For arbitrary expressions involving transcendental functions this is a very hard question. How could one ascertain in advance and without doing as much work as the actual computation that at least Digits=23 is required in order to produce 10 correct digits in the result? What about for arbitrary compound expressions, with arbitrarily large exact coefficients thrown in (for fun)?

A very small part of Maple will try to raise Digits arbitrarily high in order to compute a floating-point solution to a problem. `signum` and `is` will generally not do that, and they often rely on `evalf` to gauge relative magnitudes, sort, or ascertain sign. The routine `fsolve/polyill` will do it, by Digits-doubling until it gets as many roots as it expects, but there are examples known to make it run until hitting the Digits upper bound. Even `evalr` and `shake` do not really do it.

Mathematica's behaviour is different. It appears to strive to achieve a supplied accuracy, even by raising the internal working precision as high as it needs(?). But it is a closed system, and the internals and the workings of its numeric scheme are not public. UC Berkeley Professor Richard Fateman has several times taken Mathematica to task for their numeric model. A lot of it is on sci.math.symbolic for public consumption. (1 and 2 are in two such threads)

Another link, for your amusement/edification, by Dave Rusin: calc_errors

With no intended offense, I would suggest that even in High School the basics of roundoff error could be introduced in any course which did floating-point computations. It needn't be advanced, as a simple example can at least demonstrate that the sand underfoot is shifting.

restart:
Digits:=15: # something similar for the students calculators should work
a:=1.0: b:=3.0*10^(-21): (a + b) - a; 0. (a - a) + b; -21 3.00000000000000 10

acer

@hirnyk I'm not sure that I understand. My construction is very similar to your own with procedure f. The only significant difference is that I inlined the proc right into the Matrix() call, while you assigned it first outside that call. And I used an operator (which is just a simple form of a procedure) while you used a general procedure.

The ?Matrix help-page describes that as `init` in its Calling Sequences, mentioning that it may be a procedure (as opposed to list of lists or other valid objects). The ?Matrix help-page has this one in its Examples section (again, differeing only by pre-assignment of operator f),

f:= (i,j) -> x^(i+j-1):
Matrix(2,f);

@hirnyk I'm not sure that I understand. My construction is very similar to your own with procedure f. The only significant difference is that I inlined the proc right into the Matrix() call, while you assigned it first outside that call. And I used an operator (which is just a simple form of a procedure) while you used a general procedure.

The ?Matrix help-page describes that as `init` in its Calling Sequences, mentioning that it may be a procedure (as opposed to list of lists or other valid objects). The ?Matrix help-page has this one in its Examples section (again, differeing only by pre-assignment of operator f),

f:= (i,j) -> x^(i+j-1):
Matrix(2,f);

@Christopher2222 Isn't the point made above that if this is 2D Math input then only the restart is happening for the single-line case. If nothing else on that single line is executed (assignment to a, or the gc call) then that could explain why it doesn't affect the memory use at that time -- things which don't happen have little effect.

First 452 453 454 455 456 457 458 Last Page 454 of 593