Carl Love

Carl Love

28035 Reputation

25 Badges

12 years, 317 days
Himself
Wayland, Massachusetts, United States
My name was formerly Carl Devore.

MaplePrimes Activity


These are replies submitted by Carl Love

There are no files attached to your Question. Please try attaching them again.

@Joe Riel It's frustrating that the expository section of help page ?object,create has 8 paragraphs devoted to "Inheritance", but the "Examples" section doesn't have a single example of it.

@sursumCorda The specific example that you show can be done by

M:= <4, 1, 1 | 0, 5, 2 | 7, 8, 9>:
<sort(convert(M, list, dimension= 1), key= (R-> [seq](R[[1,3,2]])))[]>;

Higher dimensions can also be accomodated.

@Mike Mc Dermott Okay, here's an example verifying real-time enhancement on both integer and hardware-float data:

nV:= 6:  V:= [x||(1..nV)]:
P0:= codegen:-makeproc(randpoly(V, dense, degree= nV), V):
P1:= codegen:-optimize(P0):
P2:= codegen:-optimize(P0, tryhard):
(lprint@codegen:-cost)~([P0, P1, P2]):
918*additions+4727*multiplications
221*storage+221*assignments+1827*multiplications+918*additions
108*storage+108*assignments+855*multiplications+917*additions

DataInt:= 
    'rtable(1..2^13, random(-9..9), datatype= integer[4], subtype= Vector[row])' 
    $ nV
:
for k from 0 to 2 do gc(); R[k]:= CodeTools:-Usage((P||k)~(DataInt)) od:
memory used=375.70MiB, alloc change=0 bytes, 
cpu time=1.95s, real time=1.96s, gc time=0ns

memory used=72.91MiB, alloc change=0 bytes, 
cpu time=844.00ms, real time=847.00ms, gc time=0ns

memory used=39.18MiB, alloc change=0 bytes, 
cpu time=469.00ms, real time=473.00ms, gc time=0ns

#Verify accuracy:
LinearAlgebra:-Norm~([R[0]-R[1], R[1]-R[2], R[0]-R[2]], infinity);
                           [0, 0, 0]
DataHF:=
    'rtable(1..2^13, frandom(-2..2), datatype= hfloat, subtype= Vector[row])' 
    $ nV
:
for k from 0 to 2 do gc(); R[k]:= CodeTools:-Usage((P||k)~(DataHF)) od:
memory used=404.77MiB, alloc change=0 bytes, 
cpu time=7.02s, real time=4.07s, gc time=3.53s

memory used=195.47MiB, alloc change=0 bytes, 
cpu time=1.38s, real time=1.38s, gc time=0ns

memory used=92.04MiB, alloc change=0 bytes,
cpu time=563.00ms, real time=558.00ms, gc time=0ns

#Verify accuracy:
(lprint@LinearAlgebra:-Norm)~([R[0]-R[1], R[1]-R[2], R[0]-R[2]], infinity):
.123691279441118240e-9
.436557456851005554e-10
.130967237055301666e-9

 

I was rushed when I wrote the Answer above. Here's a more-complete example:

Statistics:-ColumnGraph(
    [1,2,3,4], #column heights
    width= 2, distance= 1,
    axis[1]= [
        tickmarks= [
            [seq](k+1 = sprintf("%a - %a", k, k+2), k= 0..11, 3), 
            rotation= Pi/4
        ]
    ],
    axesfont= ["Arial", 18]
);

 

@2cUniverse You can never make a valid comparison between two measurements both of which have been rounded down to 0. You must either use a finer measuring instrument, or increase the scale of the problem until the instrument returns positive values. The latter is usually easier in Maple.

iremFrac2a:=proc(n,d, b)
   local r1,r,i,li:
   li:=NULL:
   r:= irem(n*b,d):
   if NumberTheory:-AreCoprime(d,b) then
     while true do
        r1:= irem(b*r,d):
        li:=li,[[r,r],[r,r1]],[[r,r1],[r1,r1]]:
        if r1=n then break fi:
        r:=r1:
     end do:
   else print("Denom-Base not coprime "): return [] fi:
   [li]
end proc
:   
iremFrac2b:= proc(n, d, b) local N:= irem(n,d), r:= irem(N*b, d), s; 
    [if igcd(b,d) = 1 then 
        (do [[r,r], [r, (s:= irem(b*r, d))]], [[r,s], [s,s]] until (r:= s)=N)
    else
        print("Denom-Base not coprime")
    fi]
end proc
:
d:= nextprime(2^17):  b:= irem(rand(), d);
                          b := 116956

(L1, CT1, RT1):= CodeTools:-Usage(
    iremFrac2a(1, d, b), output= [output, cputime, realtime]
)[]:
memory used=14.24GiB, alloc change=-32.00MiB, 
cpu time=5.09m, real time=79.49s, gc time=4.93m

(L2, CT2, RT2):= CodeTools:-Usage(
    iremFrac2b(1, d, b), output= [output, cputime, realtime]
)[]:
memory used=14.89MiB, alloc change=0 bytes, 
cpu time=172.00ms, real time=165.00ms, gc time=0ns

#Verify that computed results are equal:
evalb(L1 = L2);
                              true

#time ratios (real and CPU):
RT1/RT2, CT1/CT2;
                    481.7333333, 1776.529070


To be fair, those ratios are not as extreme if I increase the scale of the problem "horizontally" (i.e., smaller test cases but more of them) rather than "vertically" (i.e., a single large test case), as above. I'll explain why in a moment.

d:= nextprime(2^10):  b:= irem~(['rand()'$64], d);
b := [173, 917, 905, 706, 476, 906, 759, 1001, 827, 720, 297, 35, 
  933, 582, 61, 289, 139, 245, 817, 496, 11, 477, 1001, 541, 824, 
  911, 787, 901, 990, 853, 562, 160, 248, 313, 631, 873, 630, 
  281, 565, 282, 85, 761, 311, 1021, 475, 123, 593, 632, 209, 
  683, 931, 296, 348, 381, 890, 772, 720, 565, 529, 277, 861, 
  286, 147, 912]

(L1, CT1, RT1):= CodeTools:-Usage(
    iremFrac2a~(1, d, b), output= [output, cputime, realtime]
)[]:
memory used=283.35MiB, alloc change=8.00KiB,
cpu time=6.34s, real time=1.70s, gc time=6.02s

(L2, CT2, RT2):= CodeTools:-Usage(
    iremFrac2b~(1, d, b), output= [output, cputime, realtime]
)[]:
memory used=13.95MiB, alloc change=0 bytes,
cpu time=172.00ms, real time=172.00ms, gc time=0ns

#Verify that computed results are equal:
evalb(L1 = L2);
                              true

#time ratios (real and CPU):
RT1/RT2, CT1/CT2;
                    9.877906977, 36.88372093

The reason is that when you create a sequence (or list or set) by appending to an existing sequence, each assignment reconstructs the entire sequence. Of course, the time for each such reconstruction is proportional to the length of the sequence. Thus, the time for the whole loop (just for this reconstruction minutiae, not for the actual desired computation) is proportional to the square of the number of loop iterations. Using the standard asymtotic-order ("big O") notation, we say that the time for this process is O(n^2) (where n is the number of iterations).  Assuming that the actual computation time for each sequence element is independent of its position in the sequence (which is very often the case, and it's certainly true in this case), that time is O(n) in total. It's mathematically certain that there's a value of n, say n_c, such that the O(n^2) process will dominate the O(n) process for all n > n_c. In more-technical mathematical language, for any A > 0, B > 0,

limit(A*n^2 / (B*n) , n= infinity) = infinity.

@2cUniverse Your new version can be coded like this:

iremFrac2:= proc(n, d, b) local N:= irem(n,d), r:= irem(N*b, d), s; 
    [if igcd(b,d) = 1 then 
        (do [[r,r], [r, (s:= irem(b*r, d))]], [[r,s], [s,s]] until (r:= s)=N)
    else
        print("Denom-Base not coprime")
    fi]
end proc
:

The N:= irem(n,d) is needed because if n >= d then the loop will never reach n.

Oh, I just realized that this may have not been obvious before: The number of loop iterations is always MutiplicativeOrder(b,d), but dividing that number by two isn't always relevant.

@sursumCorda GMP is open source and publically licensed. You can find numerous websites with documentation of the whole package or of invidual components such as "GMP isprime". The Maple help page ?GMP contains a link to the source code as it's been modified for use in Maple.

@Mike Mc Dermott Your example is too small for a difference to be shown. For any expression of practical size for which I've used optimize(..., tryhard) over the past 20 years, it has made a great improvement in the optimization. The extra time taken by the optimizer has always been trivial compared to the improvement in execution time. Here's an example:
 

P0:= codegen:-makeproc(randpoly([x,y,z], dense), [x,y,z]);

proc (x, y, z) -27-16*x+18*y+51*z+33*x^5+59*y^5+52*z^5-89*x^4-46*y^4+36*z^4-60*x^3-34*y^3+91*z^3-20*x^2-10*y^2-22*z^2+12*x^3*y*z+7*x^2*y^2*z-70*x^2*y*z^2-89*x*y^3*z+69*x*y^2*z^2-42*x*y*z^3+34*x^2*y*z+80*x*y^2*z-33*x*y*z^2+21*x*y*z-25*x^3*y+10*x^4*y+7*x^4*z+65*x^3*y^2-96*x^3*z^2-42*x^2*y^3-60*x^2*z^3-4*x*y^4+97*x*z^4-69*y^4*z-33*y^3*z^2+40*y^2*z^3-65*y*z^4+50*x^3*z-89*x^2*y^2+16*x^2*z^2-77*x*y^3+30*x*z^3+87*y^3*z+77*y^2*z^2-85*y*z^3-68*x^2*y+52*x^2*z+28*x*y^2-64*x*z^2+y^2*z+54*y*z^2-35*x*y+89*x*z end proc

P1:= codegen:-optimize(P0);

proc (x, y, z) local t1, t10, t109, t13, t16, t19, t2, t22, t25, t28, t31, t34, t37, t40, t41, t44, t68, t9, t92; t1 := x^2; t2 := t1^2; t9 := t1*x; t10 := y^2; t13 := y*t9; t16 := z^2; t19 := t10*y; t22 := t10*t1; t25 := y*t1; t28 := t16*z; t31 := t10^2; t34 := t19*x; t37 := t10*x; t40 := -42*t19*t1-60*t1*t28+65*t10*t9+12*t13*z-70*t16*t25+69*t16*t37-96*t16*t9+33*t2*x+10*t2*y+7*t2*z+7*t22*z-4*t31*x-89*t34*z; t41 := x*y; t44 := t16^2; t68 := 16*t1*t16+40*t10*t28-33*t16*t19+34*t25*z-42*t28*t41+59*t31*y-69*t31*z+97*t44*x-65*t44*y+52*t44*z+50*t9*z-25*t13-89*t2-89*t22; t92 := 52*t1*z+77*t16*t10-33*t16*t41+87*t19*z+30*t28*x-85*t28*y+80*t37*z+21*t41*z-68*t25-46*t31-77*t34+28*t37+36*t44-60*t9; t109 := t10*z-64*t16*x+54*t16*y+89*x*z-20*t1-10*t10-22*t16-34*t19+91*t28-35*t41-16*x+18*y+51*z-27; t40+t68+t92+t109 end proc

P2:= codegen:-optimize(P0, tryhard);

proc (x, y, z) local result, t19, t2, t20, t3, t5, t6, t8, t9; t20 := 7*z-89; t19 := 52*z; t9 := x^2; t8 := x*t9; t6 := y^2; t5 := t6*y; t3 := z^2; t2 := t3*z; result := 91*t2-34*t5-60*t8-27+(87*t5+50*t8+51)*z+(-85*t2+18+(12*z-25)*t8)*y+(30*t2+89*z-16+(-89*z-77)*t5+(-42*t2+21*z-35)*y)*x+(-60*t2-42*t5+t19-20+(34*z-68)*y+(33*x+10*y+t20)*t9)*t9+(40*t2+65*t8-10+z+t20*t9+(80*z+28)*x+(59*y-46-69*z-4*x)*t6)*t6+(-33*t5+77*t6-96*t8-22+54*y+(-70*y+16)*t9+(69*t6-33*y-64)*x+(36+t19-65*y+97*x)*t3)*t3 end proc

P3:= codegen:-optimize(P1, tryhard);

proc (x, y, z) local t1, t10, t13, t16, t19, t2, t22, t25, t28, t31, t34, t37, t41, t44, t9, result; t1 := x^2; t2 := t1^2; t9 := t1*x; t10 := y^2; t13 := y*t9; t16 := z^2; t19 := t10*y; t22 := t10*t1; t25 := y*t1; t28 := t16*z; t31 := t10^2; t34 := t19*x; t37 := t10*x; t41 := x*y; t44 := t16^2; result := -27-20*t1-25*t13-34*t19-68*t25+91*t28-89*t22-46*t31-89*t2-16*x+18*y-4*t31*x-77*t34+36*t44-60*t9-35*t41+65*t9*t10+(-42*t19-60*t28)*t1+(-96*t9-70*t25+69*t37)*t16+(12*t13+7*t22-89*t34)*z+(33*x+10*y+7*z)*t2+(40*t10-42*t41)*t28+(16*t1-33*t19)*t16+(50*t9+34*t25)*z+(97*x-65*y+52*z)*t44+(59*y-69*z)*t31+(30*x-85*y)*t28+(77*t10-33*t41)*t16+(52*t1+87*t19+21*t41)*z+(80*z+28)*t37+(-64*x+54*y-22)*t16+(z-10)*t10+(89*x+51)*z end proc

P4:= codegen:-optimize(P2);

proc (x, y, z) local t19, t2, t20, t24, t3, t5, t6, t8, t9; t20 := 7*z-89; t19 := 52*z; t9 := x^2; t8 := x*t9; t6 := y^2; t5 := t6*y; t3 := z^2; t2 := t3*z; t24 := 89*z; 91*t2-34*t5-60*t8-27+(87*t5+50*t8+51)*z+(-85*t2+18+(12*z-25)*t8)*y+x*(30*t2+t24-16+t5*(-t24-77)+(-42*t2+21*z-35)*y)+(-60*t2-42*t5+t19-20+(34*z-68)*y+(33*x+10*y+t20)*t9)*t9+(40*t2+65*t8-10+z+t20*t9+(80*z+28)*x+(59*y-46-69*z-4*x)*t6)*t6+(-33*t5+77*t6-96*t8-22+54*y+(-70*y+16)*t9+(69*t6-33*y-64)*x+(36+t19-65*y+97*x)*t3)*t3 end proc

<codegen:-cost~([P||(0..4)])>;

Vector(5, {(1) = 54*additions+207*multiplications, (2) = 19*storage+19*assignments+104*multiplications+54*additions, (3) = 9*storage+9*assignments+53*additions+57*multiplications, (4) = 16*storage+16*assignments+82*multiplications+54*additions, (5) = 9*storage+9*assignments+53*additions+56*multiplications})

 

Download OptimizeTryhard.mw

@sursumCorda That's a good find on your part. Here are the new timings using option compile on all Iterators, and running twice so that the compilation time isn't counted.

LL:= [seq]([$0..k], k= 1..8): #So, product has 9! 8-tuples.
for k to 6 do cp[k]:= CodeTools:-Usage(CP[k](LL)) od:
combinat:-cartprod:
memory used=189.79MiB, alloc change=0 bytes,
cpu time=2.66s, real time=2.15s, gc time=703.12ms

Iterator:-CartesianProduct:
memory used=173.28MiB, alloc change=5.64MiB, 
cpu time=2.66s, real time=2.09s, gc time=796.88ms

Iterator:-MixedRadixTuples:
memory used=106.80MiB, alloc change=2.87MiB, 
cpu time=2.56s, real time=2.29s, gc time=390.62ms

Iterator:-MixedRadixGrayCode:
memory used=106.80MiB, alloc change=0 bytes, 
cpu time=2.69s, real time=2.36s, gc time=421.88ms

Iterator:-MultiSeq:
memory used=386.46MiB, alloc change=0 bytes,
cpu time=4.70s, real time=3.46s, gc time=1.72s

Carl's own cartesian product iterator:
memory used=138.49MiB, alloc change=-8.50MiB,
cpu time=1.89s, real time=1.62s, gc time=359.38ms

 

@2cUniverse My ideas regarding remainder -1 weren't intended for all moduli. 97 is prime; 49136 is not. The idea will work for moduli of the forms p^k and 2*p^k for odd primes p. If the remainder -1 occurs, then it corresponds to d-1 and is obviously the maximum of the list. But for a highly composite modulus such as 49136, it's unlikely that -1 will occur as a remainder for some arbitrary base b.

Your procedure remainders is very inefficient because it builds a sequence by appending, and it also has a few other much-more-minor inefficiencies. Here's an improvement:

remainders:= proc(n, d, b) local n1:= irem(n,d), r:= n1; 
    `if`(igcd(b,d) = 1, [r, (do r:= irem(b*r, d) until r=n1)][..-2], FAIL)   
end proc:

The remainder 1 is usuallly put at the end of the list and not the beginning, so that R[k] = b^k mod d. If you do it that way, the procedure can be simplified to

remainders:= proc(n, d, b) local n1:= irem(n,d), r:= n1; 
    `if`(igcd(b,d) = 1, [do r:= irem(b*r, d) until r=n1], FAIL)   
end proc:

@sursumCorda To my great surprise, combinat:-cartprod is the best of those 5, and Iterator:-CartesianProduct is by far the worst. (My timings below do not include the once-per-session compilation times for the Iterators.) To celebrate, I rewrote combinat:-cartprod and made a significant improvement in its time. Another difference among the methods is that they give the results in different orders.

Using my new CartProd iterator, I reduced the time for A161786__2(10^6, 10^6) to under 6 seconds and the time for A161786__2(1, 2*10^6) to under 7 seconds!

restart
:
CP[1]:= LL-> local nx:= combinat:-cartprod(LL)['nextvalue'];
    {to mul(nops~(LL)) do nx() od}
:
CP[2]:= LL-> local c;
    {for c in Iterator:-CartesianProduct(LL[]) do [seq](c) od}
:
CP[3]:= LL-> local c, k, j;
    {for c in Iterator:-MixedRadixTuples(nops~(LL)) do 
        [for k,j in c do LL[k][j+1] od]
    od}
:
CP[4]:= LL-> local c, k, j;
    {for c in Iterator:-MixedRadixGrayCode(nops~(LL)) do 
        [for k,j in c do LL[k][j+1] od]
    od}
:
CP[5]:= LL-> local c, k, j;
    {for c in Iterator:-MultiSeq(<[1$nops(LL)]>..<nops~(LL)>) do
        `?[]`~(LL, [seq](`[]`~(c)))
    od}
: 
#My improvement of combinat:-cartprod:
CartProd:= proc(LL::list({list, set})) 
option `Author: Carl Love <carl.j.love@gmail.com> 2023-Oct-24`;
local 
    n:= nops(LL), N:= nops~(LL), J:= rtable([1$n-1, 0], datatype= integer[4]),
    M:= rtable((1..n, 1..max(N)), [op]~(LL))
;
    proc() local i:= n, j;
        while J[i] = N[i] do J[i--]:= 1 od;
        J[i]++;
        [for i,j in J do M[i,j] od]
    end proc,
    mul(N)
end proc
:
CP[6]:= proc(LL) local (nx, N):= CartProd(LL);
    {to N do nx() od}
end proc
:
LL:= [seq]([$0..k], k= 1..8):

#I actually ran these tests twice after the restart 
#so that the Iterators would be compiled:
for k to 6 do cp[k]:= CodeTools:-Usage(CP[k](LL)) od:
memory used=189.79MiB, alloc change=2.77MiB,
cpu time=1.89s, real time=1.89s, gc time=0ns

memory used=407.77MiB, alloc change=5.64MiB,
cpu time=3.53s, real time=3.54s, gc time=0ns

memory used=106.80MiB, alloc change=2.87MiB, 
cpu time=3.25s, real time=2.37s, gc time=1.03s

memory used=106.80MiB, alloc change=5.64MiB,
cpu time=2.16s, real time=2.16s, gc time=0ns

memory used=386.45MiB, alloc change=5.64MiB, 
cpu time=2.92s, real time=2.93s, gc time=0ns

memory used=140.25MiB, alloc change=-19.78MiB,
cpu time=2.67s, real time=1.68s, gc time=1.17s

nops({entries}(cp, nolist));  #Show that all results are the same
                               1

 

Your worksheet shows two procedures for this---one named rho and one named InversePtMobius. Since those two procedures obviously give identical results, I don't understand what exactly you're asking.

@2cUniverse Since it's obvious that your list is the powers of 2 modulo 97, here are two more methods based on that:

NumberTheory:-ModularLog(-1, 2, 97);

NumberTheory:-MultiplicativeOrder(2, 97)/2;

Both of these return 24. You were getting 25 because you started your list at 2^0 rather than 2^1. The MultiplicativeOrder is likely more efficient because it makes use of the special status of -1 as the primitive square root of 1. The 2 in the denominator is to get the logarithm of -1, and it's independent of the base, 2, or the modulus, 97.  

I think that in your first line, you tried to enter a formula for h, but it got replaced by characters that look like rectangles with Xs in them.

First 34 35 36 37 38 39 40 Last Page 36 of 708