GNU Free Documentation License, Version 1.2

- User documentation for SmallFpLogImpl
- Maintainer documentation for SmallFpLogImpl
- Bugs, Shortcomings and other ideas

The class `SmallFpLogImpl`

is a very low level implementation class for fast
arithmetic in a small, prime finite field. It is **not intended** for use
by casual CoCoALib users, who should instead see the documentation in
`QuotientRing`

(in particular the function `NewZZmod`

), or possibly the
documentation in `RingFp`

, `RingFpLog`

, and `RingFpDouble`

.

Compared to `SmallFpImpl`

the only difference is an implementation
detail: multiplication and division are achieved using discrete log
tables -- this may be fractionally faster on some processors.

Note that the cost of construction of a `SmallFpLogImpl(p)`

object for
larger primes may be quite considerable (linear in `p`

), and the resulting
object may occupy quite a lot of space (*e.g.* probably about 6*p bytes).

**All operations** on values must be effected by calling member functions
of the `SmallFpLogImpl`

class. Here is a brief summary.

SmallFpLogImpl::IsGoodCtorArg(p); // true iff ctor SmallFpLogImpl(p) will succeed SmallFpLogImpl::ourMaxModulus(); // largest permitted modulus SmallFpLogImpl ModP(p, convention); // create SmallFpLogImpl object long n; BigInt N; BigRat q; SmallFpImpl::value_t a, b, c; ModP.myModulus(); // value of p (as a long) ModP.myReduce(n); // reduce mod p ModP.myReduce(N); // reduce mod p ModP.myReduce(q); // reduce mod p ModP.myExport(a); // returns a preimage (of type long) according to symm/non-neg convention. ModP.myNegate(a); // -a mod p ModP.myAdd(a, b); // (a+b)%p; ModP.mySub(a, b); // (a-b)%p; ModP.myMul(a, b); // (a*b)%p; ModP.myDiv(a, b); // (a*inv(b))%p; where inv(b) is inverse of b ModP.myPower(a, n); // (a^n)%p; where ^ means "to the power of" ModP.myIsZeroAddMul(a,b,c) // a = (a+b*c)%p; result is (a==0)

For `myExport`

the choice between least non-negative and symmetric
residues is determined by the convention specified when constructing
the `SmallFpLogImpl`

object. This convention may be either
`GlobalSettings::SymmResidues`

or
`GlobalSettings::NonNegResidues`

.

The only clever bit is the *economical* construction of the log/exp
tables in the constructor where we exploit the fact that `myRoot`

to
the power (p-1)/2 must be equal to -1.

This implementation uses discrete log/exp tables to effect multiplication
and division quickly. Note that the residues themselves (*i.e.* the values
of the ring elements) are held as machine integers whose value is the
least non-negative representative of the residue class (*i.e.* in the range
0 to p-1). In particular, although log tables are used, we do NOT use a
*logarithmic representation* for the field elements.

The log/exp tables are stored in C++ vectors: aside from their
construction during the `RingFpLogImpl`

constructor, these vectors are
never modified, and are used only for table look-up. The C++ vectors
are resized in the body of the constructor to avoid large memory
requests when overly large characteristics are supplied as argument.

Besides these tables `SmallFpLogImpl`

also remembers the characteristic in
`myModulus`

; `myRoot`

is the primitive root used to generate the log/exp
tables.

The members `myResidueUPBValue`

and `myIterLimit`

and `myHalfNormalize`

may be used for delayed normalization in loops: see the inner product example
in `SmallFpImpl`

.

As the code currently stands, the modulus must also be small enough that it
can fit into an `FpTableElem`

(an `unsigned short`

), and that its
square can fit into a `value_t`

. Using `short`

s in the tables gave
slightly better run-time performance in our tests. Furthermore, to permit
use of unnormalized products in some algorithms, twice the square of the
characteristic must fit into a `value_t`

(*i.e.* `myIterLimit`

must
be greater than zero). The constructor for a `RingFpLogImpl`

checks the
size restrictions on the characteristic.

Note that the log table has a slot with index 0 which is never written to nor read from. The exp table is double size so that multiplication can be achieved more easily: the highest slot which could ever be used is that with index 2p-3 (in division), but the constructor fills two extra slots (as this makes the code simpler/neater).

The only slick part of the implementation is the filling of the tables in the constructor, where some effort is made to avoid doing more reductions modulo p than necessary. Note that the primitive root is always calculated (potentially costly!); there is no memorized global table of primitive roots anywhere.

It is not as fast as I hoped -- perhaps cache effects?