
Libmemoize

Libmemoize provides a wrapper around C functions to memoize them,
i.e. only call them once for a given set of parameters.

If you are running a function which is expensive, gives a unique result
result for a given set of input parameters (i.e. is "pure"), then  
libmemoize may speed up your program. 



Copyright 2018/2019 Robert Izzard.

Please see the file LICENCE for licensing conditions, particularly
note the attribution clause (2a), requirements for use in commercial
projects and note that no warranty is provided for any purposes.

Repository at time of writing:

https://gitlab.eps.surrey.ac.uk/ri0005/libmemoize

------------------------------------------------------------

Installation

See INSTALL file.

------------------------------------------------------------

Usage

You should use Memoize() whenever:

a) Your function takes a long time to run and is likely to be called
   with the same parameters, repeatedly. You should make the hash as large
   as the number of parameter combinations.

   Note that if the input is a floating point number which is (effectively)
   random, you should not use Memoize. The number of possible parameter combinations
   is so large that you will never have enough memory to store them all.
   
   If your function is *very* fast, the overheads associated with Memoize will always
   make using it pointless. (Although see (b) below.)

   If your function depends on any other variables than the input parameters,
   e.g. external or global variables, then you should either not use Memoize or you
   should call memoize_clear_hash_contents() when the external variable changes.

b) Your function is likely to be called repeatedly with the same input
   parameter(s). In this case, set the hash size to 1, so it only checks
   the previous call, so the hash is fast to scan. In this case, it's possible
   that even system-library calls can be speeded up by calls to Memoize instead.
   

Memoize(
     struct memoize_hash_tRESTRICT memo,
     const charRESTRICT funcname,
     const size_t nmax, 
     (scalar|array),
     <parameter variable type>,
     const size_t n_parameters,
     const voidparameter,
     <parameter result type>,
     const size_t n_results,
     result
       );
    
Where 

memo is a memoize_hash_t pointer that can be allocated.
     Note: to allocate memory manually, call memoize_initialize(memo);
     Note: to free memory call memoize_free(&memo);

funcname is a string to identify your function. Use only one
     string per function. 

nmax is the maximum number of results that are stored for each function.
     Typically you want this number to be small, e.g. if you expect repeated
     calls with the same result, set it to 1.

(scalar|array) : choose one of scalar or array. If you choose scalar,
     you can send the (single) parameter directly into parameter,
     if you choose array you have to wrap the parameter data in 
     ((parameter_type[]){...}) (see example 2 below).

<parameter variable type> is the C variable type of the parameters, 
     e.g. int, char, double. We assume all parameters are of the same 
     type, sorry (if you require different types, consider wrapping them
     in a struct: this will then be compared byte-by-byte to the memoize hash).

n_parameters is the number of parameters pointed to byparameter (below),
     e.g. for a scalar this is 1.

parameter is the parameter data.

<result variable type> is an C variable type of the results,
     e.g. int, char, double.
 
n_results is the number of results provided by the function, . 
     e.g. for a scalar this is 1.

result is the result data. This can be an expression, a call to a 
     function, etc. and can be a pointer to memory (e.g. to 
     return an array of data or a struct). 


Generally, Memoize is designed for scalar input (n_parameters == n_results == 1)
but by setting larger numbers there's no reason why arrays of data won't work.

Similarly, Memoize has been tested on intrinsic C data types (e.g. double, int)
but should work on any complex data type.


Note: because we use a macro, RESULT is not calcualted until
      it is required, hence we gain some speed. The down side
      to this is that search_result and store_result must both
      check the hash for an existing key. This is (presumably) faster
      than the function to be memoized. You can only know by trying! 

------------------------------------------------------------

Example 1: scalar in, scalar out

double x = Memoize(
     disc->memo,
     "discT",
     1,
     scalar,
     double,
     1,
     radius,
     double,
     1,
     MAX(0.0,generic_power_law(radius,disc,POWER_LAW_TEMPERATURE))
      );

This example sets the result in the MAX(...) call, where the result 
is a single double and the parameter is also a single double ("radius").
The hash table at disc->memo is stored elsewhere, as is required.

------------------------------------------------------------

Example 2: array in, scalar out, to replace the pow function.

#define POW(x,y) Memoize( \
      binary_c_memo,\
      "pow",\
      10, \
      array,\
      double,\
      2, \
      ((double[]){x,y}), \
      double,
      1,\
      pow(x,y))

In this case, we replace calls to pow(x,y), a standard C-library function,
with a macro POW(x,y) which instead calls the memoized version of pow.
You can see that two double-precision parameters are passed in, x and y. 
These must be encapsulated in an anonymous array, which is the 
((double)[]{x,y}). The outer brackets are required to avoid this being
treated as part of the POW macro. 10 results are stored in the memo
hash.
Now, instead of calling pow(x,y) you should call POW(x,y) to use the 
memoized version.

NB: you can override the pow library call with something like the following,
and remember to link with "-ldl":

#include <dlfcn.h>
double (*orig_pow)(double,double) orig_pow = dlsym(RTLD_NEXT, "pow");
#define pow(x,y) Memoize(
      binary_c_memo,
      "pow",
      10, 
      array,
      double,
      2, 
      ((double[]){x,y}), 
      double,
      1,
      orig_pow(x,y));

------------------------------------------------------------

Example 3: Fibnacci numbers

This example is compiled into test_memoize (built in the src directory).

The following code computes the Fibonacci numbers using libmemoize.
When computing the numbers up to the 42nd in the series, as in the loop
below, on an Intel i7-5960X, the memoized version is 6.5 million times
faster than a non-cached version, with an associated memory cost of
0.69KBytes.

---

long int __attribute__((const)) fibonacci_memoized(struct memoize_hash_t * memo,
                                                   const long int nfib,
                                                   const long int n);

#define Fibbo_memo(x) Memoize(                                          \
        memo,                                                           \
        "fib",                                                          \
        (const size_t)nfib,                                            \
        scalar,                                                         \
        long int,                                                       \
        1,                                                              \
        x,                                                              \
        long int,                                                       \
        1,                                                              \
        fibonacci_memoized(memo,nfib,x))

    long int i,nfb=42;
    for(i=0;i<=nfib;i++)
    {
        printf("Fibonacci (with memoize   ) %ld : %ld   \x0d",
               i,
               Fibbo_memo(i));
        fflush(stdout);
    }

long int __attribute__((const)) fibonacci_memoized(struct memoize_hash_t * memo,
                                                   const long int nfib,
                                                   const long int n)
{
    return
        n<2 ?
        n :
        (Fibbo_memo(n-1)+Fibbo_memo(n-2));
}

---


------------------------------------------------------------

Three versions of the memoize library are built, libmemoize, libmemoize-debug 
and libmemoize-stats.
libmemoize is the version to use in practice for maximum speed.
libmemoize-debug is the debugging version, which includes extra output.
libmemoize-stats includes statistics output and should be used for testing


The memoize library is based on the general idea, best expounded by
   https://perldoc.perl.org/Memoize.html
many thanks to the author of Memoize-perl.

Note: 
   Memory allocation is done with MALLOC and checking for success is 
   thus only done if ALLOC_CHECKS is defined. There is no checking here. 

------------------------------------------------------------

API

You should use the Memoize macro to access the library, but you can
also call the functions directly if you wish.

---

void memoize_initialize(struct memoize_hash_t * RESTRICT * RESTRICT m)

Initialize the memoize hash at *m : you should pass in a pointer to
the pointer, e.g.

struct memoize_hash_t memo;
memoize_initalize(&memo);

---

struct memoize_hash_item_t * memoize_search_hash(const struct memoize_hash_t * RESTRICT memo,
                                                 const char * RESTRICT funcname)

An internal function to search the hash at memo for an entry
corresponding to the function label, funcname. Returns a pointer
to the entry, or NULL if there is no match.

---

void * memoize_search_result(struct memoize_hash_t * RESTRICT memo,
                             const char * RESTRICT funcname,
                             const void * parameter)

Searches the hash memo for a match to both hthe function label
funcname and the parameter set at *parmeter.

Returns the matching hash result, or NULL if there is no match.
    
---

void * memoize_store_result(struct memoize_hash_t * RESTRICT memo,
                            const char * RESTRICT funcname,
                            const size_t nmax,
                            const size_t parameter_memsize,
                            const void * parameter,
                            const size_t result_memsize,
                            const void * result)

memoize_store_result() stores a given result in the memoize hash at memo.
The function name (which is the hash key) is funcname, the size
of the hash is nmax.
The parameters are in *parmaeter, which if there is only one parameter
is a pointer to the parameter, in the case of an array is a pointer to the
array.
The result is set in result.

parameter_memsize and parameter_memsize are the memory sizes (in bytes)
that are allocated in the memo hash. These are equal to sizeof(type)*n where
n is the number of parameters/results.

---

void memoize_clear_hash_contents(struct memoize_hash_t * RESTRICT memo)

Clears the contents of the hash stored at memo.

---

void memoize_free(struct memoize_hash_t ** RESTRICT memo)

Frees the contents of the hash stored at &memo.

---

void memoize_status(const struct memoize_hash_t * memo)

Shows, on stdout, the status of the hash at memo.

---

size_t memoize_sizeof(struct memoize_hash_t * memo)

Returns the size of memory allocated in memo.

---



Author: Robert Izzard
You have no guarantee that this works, and the author accepts no liability. 
Originally part of the binary_c project https://gitlab.eps.surrey.ac.uk/ri0005/binary_c