bigIntegerAlgos

Overview

bigIntegerAlgos uses the C library GMP (GNU Multiple Precision Arithmetic) for efficiently factoring big integers. For very large integers, prime factorization is carried out by a variant of the quadratic sieve algorithm that implements multiple polynomials. For smaller integers, a constrained version of the Pollard’s rho algorithm is used (original code from https://gmplib.org/… this is the same algorithm found in the R gmp package (https://cran.r-project.org/web/packages/gmp/gmp.pdf) called by the function factorize). Finally, one can quickly obtain a complete factorization of given number n via divisorsBig.

Installation

install.packages("bigIntegerAlgos")

## Or install the development version
devtools::install_github("jwood000/bigIntegerAlgos")

Usage

First, we take a look at divisorsBig. It is vectorized and can also return a named list.

## Get all divisors of a given number:
divisorsBig(1000)
Big Integer ('bigz') object of length 16:
 1    2    4    5    8    10   20   25   40   50   100  125  200  250  500  1000

## Or, get all divisors of a vector:
divisorsBig(urand.bigz(nb = 2, size = 100, seed = 42), namedList = TRUE)
Seed initialisation
\$`153675943236425922379228498617`
Big Integer ('bigz') object of length 16:
 1                              3
 7                              9
 21                             27
 63                             189
 813100228764158319466817453    2439300686292474958400452359
 5691701601349108236267722171   7317902058877424875201357077
 17075104804047324708803166513  21953706176632274625604071231
 51225314412141974126409499539  153675943236425922379228498617

\$`261352009818227569107309994396`
Big Integer ('bigz') object of length 12:
 1                              2
 4                              155861
 311722                         623444
 419206873140534785974859       838413746281069571949718
 1676827492562139143899436      65338002454556892276827498599
 130676004909113784553654997198 261352009818227569107309994396

It is very efficient as well. It is equipped with a modified merge sort algorithm that significantly outperforms the std::sort/bigvec (the class utilized in the R gmp package) combination.

hugeNumber <- pow.bigz(2, 100) * pow.bigz(3, 100) * pow.bigz(5, 100)
system.time(overOneMillion <- divisorsBig(hugeNumber))
user  system elapsed
0.637   0.043   0.682
length(overOneMillion)
 1030301

## Output is in ascending order
tail(overOneMillion)
Big Integer ('bigz') object of length 6:
 858962534553352218394101882942702121170179203335000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1030755041464022662072922259531242545404215044002000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1288443801830028327591152824414053181755268805002500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1717925069106704436788203765885404242340358406670000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 2576887603660056655182305648828106363510537610005000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 5153775207320113310364611297656212727021075220010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Another benefit is that it will return correct orderings on extremely large numbers when compared to sorting large vectors in base R. Typically in base R you must execute the following: order(asNumeric(myVectorHere)). When the numbers get large enough, precision is lost which leads to incorrect orderings. Observe:

set.seed(101)
testBaseSort <- do.call(c, lapply(sample(100), function(x) add.bigz(pow.bigz(10,80), x)))
testBaseSort <- testBaseSort[order(asNumeric(testBaseSort))]
myDiff <- do.call(c, lapply(1:99, function(x) sub.bigz(testBaseSort[x+1], testBaseSort[x])))

## Should return integer(0) as the difference should always be positive
which(myDiff < 0)
  1  3  4  7  9 11 14 17 19 22 24 25 26 28 31 32 33 36 37 38 40 42 45 47 48
 50 51 54 57 58 59 63 64 65 66 69 70 72 75 78 81 82 85 87 89 91 93 94 97 98

## N.B. The first and second elements are incorrect order (among others)
Big Integer ('bigz') object of length 6:
 100000000000000000000000000000000000000000000000000000000000000000000000000000038
 100000000000000000000000000000000000000000000000000000000000000000000000000000005
 100000000000000000000000000000000000000000000000000000000000000000000000000000070
 100000000000000000000000000000000000000000000000000000000000000000000000000000064
 100000000000000000000000000000000000000000000000000000000000000000000000000000024
 100000000000000000000000000000000000000000000000000000000000000000000000000000029

The function quadraticSieve implements the multiple polynomial quadratic sieve algorithm. Currently, quadraticSieve can comfortably factor numbers with less than 50 digits (~160 bits).

## Generate large semi-primes
semiPrime120bits <- prod(nextprime(urand.bigz(2,60,42)))
semiPrime130bits <- prod(nextprime(urand.bigz(2,65,1)))
semiPrime140bits <- prod(nextprime(urand.bigz(2,70,42)))

## Using factorize from gmp package which implements pollard's rho algorithm
## We did not test the 140 bit semi-prime as the 130 bit took a very long time

##**************gmp::factorize*********************
system.time(print(factorize(semiPrime120bits)))
Big Integer ('bigz') object of length 2:
 638300143449131711  1021796573707617139
user  system elapsed
126.603   0.052 126.694

system.time(print(factorize(semiPrime130bits)))
Big Integer ('bigz') object of length 2:
 14334377958732970351 29368224335577838231
user   system  elapsed
1513.055    1.455 1517.524

## quadraticSieve is much faster and scales better
Big Integer ('bigz') object of length 2:
 638300143449131711  1021796573707617139
user  system elapsed
4.028   0.008   4.036

Big Integer ('bigz') object of length 2:
 14334377958732970351 29368224335577838231
user  system elapsed
6.310   0.013   6.324

Big Integer ('bigz') object of length 2:
 143600566714698156857  1131320166687668315849
user  system elapsed
12.990   0.260  13.249

It can also be used as a general prime factoring function:

Seed initialisation
Big Integer ('bigz') object of length 5:
 5       31      307     2441    4702723

However gmp::factorize is more suitable for numbers smaller than 60 bits and should be used in such cases.

Current Research:

Improvements are being made to the section of the quadratic sieve algorithm that selects smooth numbers. Right now, we are only selecting M-smooth numbers where M is the largest prime in our sieving base. A potential efficiency gain is to also keep track of mostly M-smooth numbers (i.e. numbers that are almost completely factored by our sieving base but have one factor that contains a prime number greater than M). This way, we can obtain more smooth numbers by essentially eliminating these larger factors when we find a pair of them (N.B. square terms come out in the wash, so the large factors won’t influence the outcome).

We are also working on efficiently integrating divisorsBig with quadraticSieve as currently, divisorsBig utilizes gmp::factorize.

Contact

I welcome any and all feedback. If you would like to report a bug, have a question, or have suggestions for possible improvements, please contact me here: jwood000@gmail.com