acer

32385 Reputation

29 Badges

19 years, 343 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

@belief111 

I'm sorry to disappoint you, but there really isn't much information to be had on this.

This topic is a bit advanced. It hinges on the fact that Maple uses a special form of structure, as intermediary object, for the purpose of 2-dimensional mathematical typesetting (a.k.a. 2D Math). There aren't a full set of commands to construct such objects in a program. I accomplished it above by programmatically (crudely) concatenating together the various portions of such objects.

There are a few other ways to construct such objects (which get interpreted by Maple's GUI as things to pretty-print as 2D Math). In a 2009 workshop, Paulina Chin described some of the basics, labelling the internal form of typeset objects as TypeMK. That is the form that has been used in this post. So I could also do the following:

You might get some insight by issuing these commands as 1D Maple Notation input in a Worksheet.

expr:=f[x]; # a table reference, not an atomic identifier
lprint(expr); # demonstrating that it is a table reference
Typesetting:-Typeset(f[x]); # now as function calls to unassign Typesetting exports
lprint(%); # demonstrating the last claim
convert(%,`global`); # now as function calls to global function names
cat(`#`,convert(%,name)); # now a so-called atomic identifier
lprint(%);
`#msub(mi("f"),mi("x"))`;

Apart from such translation of an expression as a whole to TypeMK (as done above) there are few if any other commands for stitching together TypeMK structures. Hence the few people who accomplish that in programs usually do so by brute-force concatenation.

acer

@epostma sort[inplace] is still nifty, and a nice suggestion, even if it only sorts in a few possible ways for float[8].

If I recall, the BLAS function daxpy might not accept negative strides on all platforms' implementations. But an `adder` might still be compiled, and made to act on the various negative and/or positive parts of the Aliased Array. It could be made to perform specialized adding, striding backward or forward from the neg/pos boundary as you suggest.

@epostma sort[inplace] is still nifty, and a nice suggestion, even if it only sorts in a few possible ways for float[8].

If I recall, the BLAS function daxpy might not accept negative strides on all platforms' implementations. But an `adder` might still be compiled, and made to act on the various negative and/or positive parts of the Aliased Array. It could be made to perform specialized adding, striding backward or forward from the neg/pos boundary as you suggest.

I believe that it can all be done easily, both for using the 'method' option of evalf/Int as well as avoiding cut and paste (perhaps via a handy use of `subs`, even if not more easily). If you upload the worksheet, we could more easily demonstrate it to you. That's less work for us than having to cook up an example, and besides which your sheet may have other aspects that pertain. The upload avoids guesswork.

acer

sort[inplace] isn't documented, and it is buggy for float[8] rtables alongside custom sorting procs (except maybe for `>` and friends).

v := Vector[row]([-3,-7,1,9],datatype=float[8]);
                           v := [-3., -7., 1., 9.]

cv1:=copy(v):
cv2:=copy(v):

p:=proc(a,b)
   if signum(a)<>signum(b) then
      a<b;
   else
      abs(a)<abs(b);
   end if;
end proc:

sort(v); # ok
                             [-7., -3., 1., 9.]

sort(cv1,p); # ok
                             [-3., -7., 1., 9.]

sort[inplace](cv1,p); # oops, should be same as last
                             [-7., -3., 1., 9.]

cv1;
                             [-7., -3., 1., 9.]

sort[inplace](cv2,`>`); # ok
                             [9., 1., -3., -7.]

cv2;
                             [9., 1., -3., -7.]

sort(v,`>`); # ok
                             [9., 1., -3., -7.]

sort[inplace](v,(a,b)->a>b); # oops, should be same as last
                             [-7., -3., 1., 9.]

v;
                             [-7., -3., 1., 9.]


I'll submit some SCRs.

acer

sort[inplace] isn't documented, and it is buggy for float[8] rtables alongside custom sorting procs (except maybe for `>` and friends).

v := Vector[row]([-3,-7,1,9],datatype=float[8]);
                           v := [-3., -7., 1., 9.]

cv1:=copy(v):
cv2:=copy(v):

p:=proc(a,b)
   if signum(a)<>signum(b) then
      a<b;
   else
      abs(a)<abs(b);
   end if;
end proc:

sort(v); # ok
                             [-7., -3., 1., 9.]

sort(cv1,p); # ok
                             [-3., -7., 1., 9.]

sort[inplace](cv1,p); # oops, should be same as last
                             [-7., -3., 1., 9.]

cv1;
                             [-7., -3., 1., 9.]

sort[inplace](cv2,`>`); # ok
                             [9., 1., -3., -7.]

cv2;
                             [9., 1., -3., -7.]

sort(v,`>`); # ok
                             [9., 1., -3., -7.]

sort[inplace](v,(a,b)->a>b); # oops, should be same as last
                             [-7., -3., 1., 9.]

v;
                             [-7., -3., 1., 9.]


I'll submit some SCRs.

acer

@Axel Vogt 

The following call to display (internal Maple Library) routine source shows me that Arraytools:-AddAlongDimension is using Nag's f06ecf for the datatype=float[8] case.

showstat(ArrayTools::AddAlongDimension2D);

And google shows me that Nag's f06ecf is the the same as the standard BLAS function daxpy.

But there are two extra arguments in that external call from the `AddAlongdimension2D` routine. Ie.,

    ExtCall(ncols,alpha,y,i-1,incy,x,0,incx)

The extra two arguments are what comes right after arguments y and x. The `i-1` and the `0`. These are offsets into the contiguous memory used for the double-precision data of those two Maple float[8] rtables.

Such offsets allow the daxpy routines (which is "technically" for 1-d arrays of data) to operate along any row or column of a 2-d array of data. Simply offset to the entry that starts the row of column in question. And then stride (incx, or incy) the data by m, n, or 1, according to whether it is stored in Fortran-order or C-order (column-major or row-major). For example, when passing fortran-order array `x` in C one could (carefully) pass it as daxpy(..., x+3,...) and so offset by so many double widths the initial address that daxpy sees.

@Axel Vogt 

The following call to display (internal Maple Library) routine source shows me that Arraytools:-AddAlongDimension is using Nag's f06ecf for the datatype=float[8] case.

showstat(ArrayTools::AddAlongDimension2D);

And google shows me that Nag's f06ecf is the the same as the standard BLAS function daxpy.

But there are two extra arguments in that external call from the `AddAlongdimension2D` routine. Ie.,

    ExtCall(ncols,alpha,y,i-1,incy,x,0,incx)

The extra two arguments are what comes right after arguments y and x. The `i-1` and the `0`. These are offsets into the contiguous memory used for the double-precision data of those two Maple float[8] rtables.

Such offsets allow the daxpy routines (which is "technically" for 1-d arrays of data) to operate along any row or column of a 2-d array of data. Simply offset to the entry that starts the row of column in question. And then stride (incx, or incy) the data by m, n, or 1, according to whether it is stored in Fortran-order or C-order (column-major or row-major). For example, when passing fortran-order array `x` in C one could (carefully) pass it as daxpy(..., x+3,...) and so offset by so many double widths the initial address that daxpy sees.

My procedure `P` calls the Nag name of the BLAS function daxpy, I believe. And that does this (in pseudo-syntax):

  x -> alpha*y + x

which is an inplace operation on `x`. So `P` uses a 1x1 float[8] Vector for x. And it walks y with a stride of 1, and x with a stride of 0 (!). So x ends up accumulating the sum of all that is in the n-Vector `y`. Something like that.

And `y` is just an Alias of the original 2D Array as a 1D Vector. So the end result is that all the entries in the Array are summed, in one big "add-along" shot.

You could try writing/finding an inplace sorter, and compiling it. (...and maybe give its define_external call the THREAD_SAFE options, down the road.) It needn't sort inplace on the data rtable (the Aliased Vector, say). It might be easier to find one which merely wrote out the sorted result to another temp Vector. And that Vector might be re-usable. (That way is far easier for handling, and is good for non-threaded parent code. But when parallelizing it becomes tricky again as each thread needs its own re-uable temp Vector to hold sorted results.) This is the interesting bit. I see where you are at now. You'd like to test a fast version of this, sorting to get it ready for Kahan or similar carefull summation. Fun.

The BLAS on Windows will be MKL. I'm not sure whether their daxpy is threaded, or just cache-tuned. As you can see, 5000 summations of 110x110 float[8] Arrays in undr 1 second is not bad. Unfortunately, the Intel people say that one is not advised to call MKL from within a thread of another threaded app. They don't come right out and say that MKL is wholly, or partly, thread-unsafe. I've never tried to force it by placing the option on that daxpy's define_external call from within Maple. On Linux the BLAS is ATLAS, and I've never see a described limitation on whether it can be called from within a higher level thread.

You may note that `P` operated on full rectangular storage 2D Arrays (albeit, Aliased). Having my two procs be clever about handling triangular data could be a lot of work... for only a single power of 2 benefit. The daxpy is so quick, and copying out subsections from below the main diagonal (or making repeated calls with tricky offsets) might well blow away such a small speedup. I'd be tempted to stick with full rectangular storage, thus permitting Maple to do the Alias and a single external call daxpy. I'd likely only suggest divvying up the data if it were very sparse. (And in which case we could have even more fun discussing the sparse float[8] structure.)

It'd be interesting to hear how you get along with this task.

nb. I'm not saying that your code is affected by the following. But you might care to see this post on 2D Math runtime efficiency. I saw a wickly brutal double-whammy of this last month or so. Someone had gotten bitten not only by the delayed multiplication parsing within a proc defn, but had alongside that used syntax A[i][j] instead of A[i,j] for Matrix entry access. But the former creates A[i] a whole new temp row Vector, which is then accessed at its jth entry and then disposed of as garbage. The overall penalty was a slowdown of a few hundred times...

acer

My procedure `P` calls the Nag name of the BLAS function daxpy, I believe. And that does this (in pseudo-syntax):

  x -> alpha*y + x

which is an inplace operation on `x`. So `P` uses a 1x1 float[8] Vector for x. And it walks y with a stride of 1, and x with a stride of 0 (!). So x ends up accumulating the sum of all that is in the n-Vector `y`. Something like that.

And `y` is just an Alias of the original 2D Array as a 1D Vector. So the end result is that all the entries in the Array are summed, in one big "add-along" shot.

You could try writing/finding an inplace sorter, and compiling it. (...and maybe give its define_external call the THREAD_SAFE options, down the road.) It needn't sort inplace on the data rtable (the Aliased Vector, say). It might be easier to find one which merely wrote out the sorted result to another temp Vector. And that Vector might be re-usable. (That way is far easier for handling, and is good for non-threaded parent code. But when parallelizing it becomes tricky again as each thread needs its own re-uable temp Vector to hold sorted results.) This is the interesting bit. I see where you are at now. You'd like to test a fast version of this, sorting to get it ready for Kahan or similar carefull summation. Fun.

The BLAS on Windows will be MKL. I'm not sure whether their daxpy is threaded, or just cache-tuned. As you can see, 5000 summations of 110x110 float[8] Arrays in undr 1 second is not bad. Unfortunately, the Intel people say that one is not advised to call MKL from within a thread of another threaded app. They don't come right out and say that MKL is wholly, or partly, thread-unsafe. I've never tried to force it by placing the option on that daxpy's define_external call from within Maple. On Linux the BLAS is ATLAS, and I've never see a described limitation on whether it can be called from within a higher level thread.

You may note that `P` operated on full rectangular storage 2D Arrays (albeit, Aliased). Having my two procs be clever about handling triangular data could be a lot of work... for only a single power of 2 benefit. The daxpy is so quick, and copying out subsections from below the main diagonal (or making repeated calls with tricky offsets) might well blow away such a small speedup. I'd be tempted to stick with full rectangular storage, thus permitting Maple to do the Alias and a single external call daxpy. I'd likely only suggest divvying up the data if it were very sparse. (And in which case we could have even more fun discussing the sparse float[8] structure.)

It'd be interesting to hear how you get along with this task.

nb. I'm not saying that your code is affected by the following. But you might care to see this post on 2D Math runtime efficiency. I saw a wickly brutal double-whammy of this last month or so. Someone had gotten bitten not only by the delayed multiplication parsing within a proc defn, but had alongside that used syntax A[i][j] instead of A[i,j] for Matrix entry access. But the former creates A[i] a whole new temp row Vector, which is then accessed at its jth entry and then disposed of as garbage. The overall penalty was a slowdown of a few hundred times...

acer

This is interesting. I'd like to look at it, but likely cannot work on it for a few days...

acer

@Alejandro Jakubi I hadn't noticed that this was branched from another post. Thanks.

@Alejandro Jakubi I hadn't noticed that this was branched from another post. Thanks.

@Alejandro Jakubi That may work for this example. But it also might be entirely beside the point. (Ok, so we don't know for sure what the OP's actual criteria are. But we may guess intelligently. I suggested that it was for u,v,w to appear less often in the result. Nobody's suggested otherwise, or another guess. Who knows, maybe the only criterion is expression length or lack of RootOfs. I was discussing the situation under my own guess.)

Your picking V to exclude, merely happens to produce some result in which u,v,w appear less often as a particularity of the example. But that's just a particular of the example. I see no method at all in your selecting V for exclusion from the trig terms as a `solve` variable aside from visual inspection or noticing the nature of eqn2, etc.

So your suggestion is much like my own first `solve` suggestion above, rather than what I was attempting with `G`.

Of course, I expect that you, Alejandro, are quite aware of these distinctions. So I'm commenting more for the benefit of the OP.

One could also simply invoke `isolate` on eqn2, in order to get the supposedly "better" formula for sin(b). That's even easier. But I didn't suggest it because it shares the aspect of arbitrariness.

As mentioned, we don't know what the OP really wants. Consider the sin(a) solution from A.J.'s `solve` call. Is it more, less, or equally desirable as this below?

> solve({eqn1, eqn2, eqn3}, {sin(a),cos(a),sin(b)});


       /            u                  w               v\ 
      { cos(a) = --------, sin(a) = --------, sin(b) = - }
       \         V cos(b)           V cos(b)           V/ 

For some other, different example, including V rather than excluding it might be key. It's problem specific. By trying a Groebner basis methodology for handling it as a polynomial system, I was trying to enforce priority amongst the variables (where I made a supposition that u,v,w were undesirable). Maybe it could be done better. Or maybe the OP just wants short solutions. Or the desire might be as simple as just wanting any solutions without RootOfs.

@Alejandro Jakubi That may work for this example. But it also might be entirely beside the point. (Ok, so we don't know for sure what the OP's actual criteria are. But we may guess intelligently. I suggested that it was for u,v,w to appear less often in the result. Nobody's suggested otherwise, or another guess. Who knows, maybe the only criterion is expression length or lack of RootOfs. I was discussing the situation under my own guess.)

Your picking V to exclude, merely happens to produce some result in which u,v,w appear less often as a particularity of the example. But that's just a particular of the example. I see no method at all in your selecting V for exclusion from the trig terms as a `solve` variable aside from visual inspection or noticing the nature of eqn2, etc.

So your suggestion is much like my own first `solve` suggestion above, rather than what I was attempting with `G`.

Of course, I expect that you, Alejandro, are quite aware of these distinctions. So I'm commenting more for the benefit of the OP.

One could also simply invoke `isolate` on eqn2, in order to get the supposedly "better" formula for sin(b). That's even easier. But I didn't suggest it because it shares the aspect of arbitrariness.

As mentioned, we don't know what the OP really wants. Consider the sin(a) solution from A.J.'s `solve` call. Is it more, less, or equally desirable as this below?

> solve({eqn1, eqn2, eqn3}, {sin(a),cos(a),sin(b)});


       /            u                  w               v\ 
      { cos(a) = --------, sin(a) = --------, sin(b) = - }
       \         V cos(b)           V cos(b)           V/ 

For some other, different example, including V rather than excluding it might be key. It's problem specific. By trying a Groebner basis methodology for handling it as a polynomial system, I was trying to enforce priority amongst the variables (where I made a supposition that u,v,w were undesirable). Maybe it could be done better. Or maybe the OP just wants short solutions. Or the desire might be as simple as just wanting any solutions without RootOfs.

First 441 442 443 444 445 446 447 Last Page 443 of 592