5. API Reference

synkhronos.fork(n_parallel=None, use_gpu=True, master_rank=0, profile_workers=False, max_n_var=1000, max_dim=16)

Forks a Python process for each additional GPU, initializes GPUs. Call before building any Theano GPU-variables or Synkhronos functions.

Parameters:
  • n_parallel (None, optional) – Number of GPUs to use (default uses all)
  • use_gpu (bool, optional) – Inactive (possibly future CPU-only mode)
  • master_rank (int, optional) – GPU to use in master process
  • profile_workers (bool, optional) – If True, records cProfiles of workers (see synkhronos/worker.py for details)
  • max_n_var (int, optional) – Max number of variables in a function call
  • max_dim (int, optional) – Max number of dimensions of any variable
Returns:

Number of GPUs using.

Return type:

int

Raises:
  • NotImplementedError – If use_gpu==False
  • RuntimeError – If already forked.
synkhronos.close()

Close workers and join their processes. Called automatically on exit.

synkhronos.function(inputs, outputs=None, bcast_inputs=None, updates=None, givens=None, sliceable_shareds=None, **kwargs)

Replacement for theano.function(), with a similar interface. Builds underlying Theano functions, including support for function slicing.

Parameters:
  • inputs – as in Theano, to be scattered among workers
  • outputs – as in Theano, with option to specify reduce operation (see notes below)
  • bcast_inputs – as inputs in Theano, to be broadcast to all workers
  • updates – as in Theano, with option to specify reduct operation (see notes below)
  • givens – as in Theano
  • sliceable_shareds – any implicit inputs (Theano shared variables) acting as data-parallel data (i.e. to be subjected to the kwarg batch_s and /or to function slicing) must be listed here
  • **kwargs – passed on to all internal calls to theano.function()
Reduce Operations:

Outputs: May be specified simply as Theano tensor variables, as in normal Theano, or as two-tuples, as in (var, reduce-op), where reduce-op can be: “avg”, “sum”, “max”, “min”, “prod”, or None. Default is “avg”.

Updates: May be specified as a list of two-tuples, as in normal Theano, or may include triples, as in (var, update, reduce-op). Unlike for outputs, the reduce-op here applies only when using function slicing. Every slice is computed using the original values, and the update is accumulated over the slices. (This may impose some limits on the form of the update expression.) At the end of the function call, all updates are applied only locally, within each worker. This provides clear control to user over when to communicate.

Returns:

callable object, replacing a theano.Function

Return type:

sykhronos.function_module.Function

Raises:
  • RuntimeError – If Sykhronos not yet forked, or if already distributed
  • TypeError – If incorrect format for arguments.
  • ValueError – If entry in sliceable_shareds is not used in function, or for invalid reduce operation requested.
synkhronos.distribute()

Replicates all Synkhronos functions and their Theano shared variables in worker processes / GPUs. It must be called after building the last Synkhronos function and before calling any Synkhronos function.

It pickles all underlying Theano functions into one file, which workers unpickle. All Theano shared variable data is included, and correspondences between variables across functions is preserved. The pickle file is automatically deleted by a worker. The default file location is in the directory synkhronos/pkl/, but this can be changed by modifying PKL_PATH in synkhronos/util.py.

Raises:RuntimeError – If not yet forked or if already distributed.
class synkhronos.function_module.Function(inputs, bcast_inputs, to_cpu, return_list=True, **kwargs)

Class of instances returned by synkhronos.function().

__call__(*args, output_subset=None, batch=None, batch_s=None, num_slices=1, **kwargs)

Callable as in Theano function.

When called, a Synkhronos function:

  1. Assigns input data evenly across all GPUs,
  2. Signals to workers to start and which function to call,
  3. Calls the underlying Theano function on assigned data subset,
  4. Collect results from workers and returns them.
Parameters:
  • *args (Data) – Normal data inputs to Theano function
  • output_subset – as in Theano
  • batch – indexes to select from scattering input data (see notes)
  • batch_s – indexes to select from scattered implicit inputs (see notes)
  • num_slices (int) – call the function over this many slices of the selected, scattered data and accumulate results (avoid out-of-memory)
  • **kwargs (Data) – Normal data inputs to Theano function
Batching:

The kwarg batch can be of types: (int, slice, list (of ints), numpy array (1-d, int)). It applies before scattering, to the whole input data set. If type int, this acts as data[:int].

The kwarg batch_s can be of type (slice, list (of ints), numpy array (1-d, int)) or a list of all the same type (one of those three), with one entry for each GPU. It applies after scattering, to data already residing on the GPU. If only one of the above types is provided, rather than a list of them, it is used in all GPUs.

In both batch and batch_s, full slice types are not supported; start and stop fields must be ints, step None.

Slicing:
Function slicing by the num_slices kwarg applies within each worker, after individual worker data assignment. Results are accumulated within each worker and reduced only once at the end. Likewise, any updates are computed and accumulated using the original variable values, and the updates are applied only once at the end.
Raises:RuntimeError – If not distributed or if synkhronos closed.
as_theano(*args, **kwargs)

Call the function in the master process only, as normal Theano.

Parameters:
  • *args (data) – Normal inputs to the Theano function
  • **kwargs (data) – Normal inputs to the Theano function
build_inputs(*args, force_cast=False, oversize=1.0, minibatch=False, **kwargs)

Convenience method which internally calls synkhronos.data() for each input variable associated with this function. Provide data inputs as if calling the Theano function.

Parameters:
  • *args – data inputs
  • force_cast (bool, optional) – see synkhronos.data()
  • oversize (float [1,2], optional) – see synkhronos.data()
  • minibatch (bool, optional) – see synkhronos.data()
  • **kwargs – data inputs

The kwargs force_cast, oversize, and minibatch are passed to all calls to synkhronos.data()

Returns:data object for function input.
Return type:synkhronos.data_module.Data
name

As in Theano functions.

output_modes

Returns the reduce operations used to collect function outputs.

update_modes

Returns the reduce operations used to accumulate updates (only when slicing)

synkhronos.data(value=None, var=None, dtype=None, ndim=None, shape=None, minibatch=False, force_cast=False, oversize=1, name=None)

Returns a synkhronos.Data object, for data input to functions. Similar to a Theano variable, Data objects have fixed ndim and dtype. It is optional to populate this object with actual data or assign a shape (induces memory allocation) at instantiation.

Parameters:
  • value – Data values to be stored (e.g. numpy array)
  • var (Theano variable) – To infer dtype and ndim
  • dtype – Can specify dtype (if not implied by var)
  • ndim – Can specify ndim (if not implied if not implied)
  • shape – Can specify shape (if not implied by value)
  • minibatch (bool, optional) – Use for minibatch data inputs (compare to full dataset inputs)
  • force_cast (bool, optional) – If True, force value to specified dtype
  • oversize (int, [1,2], optional) – Factor for OS shared memory allocation in excess of given value or shape
  • name – As in Theano variables
Returns:

used for data input to functions

Return type:

synkhronos.Data

Raises:

TypeError – If incomplete specification of dtype and ndim.

class synkhronos.data_module.Data(ID, dtype, ndim, minibatch=False, name=None)

Type of object required as inputs to functions (instead of numpy arrays). May also be used as inputs to some collectives. Underlying memory is OS shared memory, for multi-process access. Presents a similar interface as a numpy array, with some additions and restrictions. The terms “numpy wrapper” and “underlying numpy array” refer to the same object, which is a view to the underlying memory allocation.

__getitem__(k)

Provided to read from underlying numpy data array, as in array[k]. Full numpy indexing is supported.

__len__()

Returns len() of underlying numpy data array.

__setitem__(k, v)

Provided to write to underlying numpy data array, as in array[k] = v. Full numpy indexing is supported.

alloc_size

Returns the size of the underlying memory allocation (units – number of items). This may be larger than the size of the underlying numpy array, which may occupy only a portion of the allocation (always starting at the same memory address as the allocation).

condition_data(input_data, force_cast=False)

Test the conditioning done to input data when calling set_value or synkhronos.data()

Parameters:
  • input_data – e.g., numpy array
  • force_cast (bool, optional) – force data type
Returns:

numpy array of shape and dtype that would be used

Return type:

TYPE

data

Returns underlying numpy data array. In general, it is not recommended to manipulate this object directly, aside from reading from or writing to it (without changing shape). It may be passed to other python processes and used as shared memory.

dtype

Returns data type.

free_memory()

Removes all references to underlying memory (and numpy wrapper) in master and workers; the only way to shrink the allocation size.

minibatch

Returns whether this data is treated as a minibatch. When using the batch kwarg in a function call, all minibatch data inputs will have their selection indexes shifted so that the lowest overall index present in batch corresponds to index 0. (It is enforced that all data is long enough to meet all requested indexes.)

name

Returns the name (may be None)

ndim

Returns number of dimensions.

set_length(length, oversize=1)

Change length of underlying numpy array. Will induce memory reallocation if necessary (for length larger than current memory).

Parameters:
  • length (int) – New length of underlying numpy array
  • oversize (int, [1,2]) – Used only if reallocating memory

Warning

Currently, memory reallocation loses old data.

set_shape(shape, oversize=1)

Change shape of underlying numpy array. Will induce memory reallocation if necessary (for shape larger than current memory).

Parameters:
  • shape (list, tuple) – New shape of underlying numpy array
  • oversize (int, [1,2]) – Used only if reallocating memory

Warning

Currently, memory reallocation loses old data.

set_value(input_data, force_cast=False, oversize=1)

(Over)Write data values. Change length, reshape, and/or reallocate shared memory if necessary (applies eagerly in workers).

Parameters:
  • input_data – e.g. numpy array, fed into numpy.asarray()
  • force_cast (bool, optional) – force input data to existing dtype of data object, without error or warning
  • oversize (int, [1-2]) – Factor for oversizing memory allocation relative to input data size

Oversize applies only to underlying shared memory. The numpy array wrapper will have the exact shape of input_data.

shape

Returns shape of underlying numpy data array.

size

Returns size of underlying numpy data array.

synkhronos.broadcast(shared_vars, values=None, nccl=True)

Broadcast master’s values (or optionally input values) to all GPUs, resulting in same values for Theano shared variables in all GPUs.

Parameters:
  • shared_vars (Theano shared variables) – one or list/tuple
  • values (None, optional) – if included, must provide one for each shared variable input; otherwise existing value in master is used
  • nccl (bool, optional) – If True, use NCCL if available
synkhronos.scatter(shared_vars, values, batch=None)

Scatter values across all workers. Values are scattered as evenly as possible, by 0-th index. Optional param batch can specify a subset of the input data to be scattered.

Parameters:
  • shared_vars (Theano shared variables) – one or list/tuple
  • values (synk.Data or numpy array) – one value for each variable
  • batch (None, optional) – Slice or list of indexes, selects a subset of the input data (by 0-th index) to scatter
synkhronos.gather(shared_vars, nd_up=1, nccl=True)

Gather and return values in Theano shared variables from all GPUs. Does not affect the values present in the variables.

Parameters:
  • shared_vars (Theano shared variable) – one, or list/tuple
  • nd_up (int, [0,1] optional) – number of dimensions to add during concatenation of results
  • nccl (bool, optional) – If True, use NCCL if available

Warning

If a Theano shared variable has different shapes on different GPUs (e.g. some have length one longer than others), then using NCCL yields slightly corrupted results. (Extra rows will be added or left off so the same shape is collected from each GPU.) CPU-based gather will always show accurately the actual values on each GPU, regardless of shape mismatch.

Returns:gathered values
Return type:array (or tuple)
synkhronos.all_gather(shared_vars, nccl=True)

All GPUs gather the values in each variable, and overwrites the variable’s data with the gathered data (cannot change ndims).

Parameters:
  • shared_vars (Theano shared variable) – one, or list/tuple
  • nccl (bool, optional) – If True, use NCCL if available.
synkhronos.reduce(shared_vars, op='avg', in_place=True, nccl=True)

Reduce the values of the shared variables from all GPUs into the master. Worker’s values are not affected; by default the master’s value is overwritten.

Parameters:
  • shared_vars (Theano shared variable) – one, or list/tuple
  • op (str, optional) – reduce operation (avg, sum, min, max, prod)
  • in_place (bool, optional) – If True, overwrite master variable data, otherwise return new array(s)
  • nccl (bool, optional) – If True, use NCCL if available
Returns:

if in_place==False, results of reduction(s)

Return type:

array (or tuple)

synkhronos.all_reduce(shared_vars, op='avg', nccl=True)

Reduce the values of the shared variables across all GPUs, overwriting the data stored in each GPU with the result.

Parameters:
  • shared_vars (Theano shared variable) – one, or list/tuple
  • op (str, optional) – reduce operation (avg, sum, min, max, prod)
  • nccl (bool, optional) – If True, use NCCL if available
synkhronos.set_value(rank, shared_vars, values, batch=None)

Set the value of Theano shared variable(s) in one GPU.

Parameters:
  • rank (int) – Which GPU to write to
  • shared_vars (Theano shared variable) – one, or list/tuple
  • values (synk.Data, numpy array) – one value for each variable
  • batch (None, optional) – slice or list of indexes, selects subset of input values to use
synkhronos.get_value(rank, shared_vars)

Get the value of Theano shared variable(s) from one GPU.

Parameters:
  • rank (int) – Which GPU to read from
  • shared_vars (Theano shared variable) – one, or list/tuple
synkhronos.get_lengths(shared_vars)

Get lengths of Theano shared variable(s) from all GPUs.

Parameters:shared_vars (Theano shared variable) – one, or list/tuple
synkhronos.get_shapes(shared_vars)

Get shapes of Theano shared variable(s) from all GPUs.

Parameters:shared_vars (Theano shared variable) – one, or list/tuple