5. API Reference¶

synkhronos.fork(n_parallel=None, use_gpu=True, master_rank=0, profile_workers=False, max_n_var=1000, max_dim=16)¶

Forks a Python process for each additional GPU, initializes GPUs. Call before building any Theano GPU-variables or Synkhronos functions.

Parameters:	n_parallel (None, optional) – Number of GPUs to use (default uses all) use_gpu (bool, optional) – Inactive (possibly future CPU-only mode) master_rank (int, optional) – GPU to use in master process profile_workers (bool, optional) – If True, records cProfiles of workers (see `synkhronos/worker.py` for details) max_n_var (int, optional) – Max number of variables in a function call max_dim (int, optional) – Max number of dimensions of any variable
Returns:	Number of GPUs using.
Return type:	int
Raises:	`NotImplementedError` – If `use_gpu==False` `RuntimeError` – If already forked.

synkhronos.close()¶: Close workers and join their processes. Called automatically on exit.

synkhronos.function(inputs, outputs=None, bcast_inputs=None, updates=None, givens=None, sliceable_shareds=None, **kwargs)¶

Replacement for theano.function(), with a similar interface. Builds underlying Theano functions, including support for function slicing.

Parameters:

inputs – as in Theano, to be scattered among workers
outputs – as in Theano, with option to specify reduce operation (see notes below)
bcast_inputs – as inputs in Theano, to be broadcast to all workers
updates – as in Theano, with option to specify reduct operation (see notes below)
givens – as in Theano
sliceable_shareds – any implicit inputs (Theano shared variables) acting as data-parallel data (i.e. to be subjected to the kwarg batch_s and /or to function slicing) must be listed here
**kwargs – passed on to all internal calls to theano.function()

Reduce Operations:

Outputs: May be specified simply as Theano tensor variables, as in normal Theano, or as two-tuples, as in (var, reduce-op), where reduce-op can be: “avg”, “sum”, “max”, “min”, “prod”, or None. Default is “avg”.

Updates: May be specified as a list of two-tuples, as in normal Theano, or may include triples, as in (var, update, reduce-op). Unlike for outputs, the reduce-op here applies only when using function slicing. Every slice is computed using the original values, and the update is accumulated over the slices. (This may impose some limits on the form of the update expression.) At the end of the function call, all updates are applied only locally, within each worker. This provides clear control to user over when to communicate.

Returns:	callable object, replacing a theano.Function
Return type:	sykhronos.function_module.Function
Raises:	`RuntimeError` – If Sykhronos not yet forked, or if already distributed `TypeError` – If incorrect format for arguments. `ValueError` – If entry in `sliceable_shareds` is not used in function, or for invalid reduce operation requested.

synkhronos.distribute()¶

Replicates all Synkhronos functions and their Theano shared variables in worker processes / GPUs. It must be called after building the last Synkhronos function and before calling any Synkhronos function.

It pickles all underlying Theano functions into one file, which workers unpickle. All Theano shared variable data is included, and correspondences between variables across functions is preserved. The pickle file is automatically deleted by a worker. The default file location is in the directory synkhronos/pkl/, but this can be changed by modifying PKL_PATH in synkhronos/util.py.

Raises:	`RuntimeError` – If not yet forked or if already distributed.

class synkhronos.function_module.Function(inputs, bcast_inputs, to_cpu, return_list=True, **kwargs)¶

Class of instances returned by synkhronos.function().

__call__(*args, output_subset=None, batch=None, batch_s=None, num_slices=1, **kwargs)¶

Callable as in Theano function.

When called, a Synkhronos function:

Assigns input data evenly across all GPUs,

Signals to workers to start and which function to call,

Calls the underlying Theano function on assigned data subset,

Collect results from workers and returns them.

Parameters:

*args (Data) – Normal data inputs to Theano function
output_subset – as in Theano
batch – indexes to select from scattering input data (see notes)
batch_s – indexes to select from scattered implicit inputs (see notes)
num_slices (int) – call the function over this many slices of the selected, scattered data and accumulate results (avoid out-of-memory)
**kwargs (Data) – Normal data inputs to Theano function

Batching:

The kwarg batch can be of types: (int, slice, list (of ints), numpy array (1-d, int)). It applies before scattering, to the whole input data set. If type int, this acts as data[:int].

The kwarg batch_s can be of type (slice, list (of ints), numpy array (1-d, int)) or a list of all the same type (one of those three), with one entry for each GPU. It applies after scattering, to data already residing on the GPU. If only one of the above types is provided, rather than a list of them, it is used in all GPUs.

In both batch and batch_s, full slice types are not supported; start and stop fields must be ints, step None.

Slicing:

Function slicing by the num_slices kwarg applies within each worker, after individual worker data assignment. Results are accumulated within each worker and reduced only once at the end. Likewise, any updates are computed and accumulated using the original variable values, and the updates are applied only once at the end.

Raises:	`RuntimeError` – If not distributed or if synkhronos closed.

as_theano(*args, **kwargs)¶

Call the function in the master process only, as normal Theano.

Parameters:	args (data) – Normal inputs to the Theano function kwargs (data*) – Normal inputs to the Theano function

build_inputs(*args, force_cast=False, oversize=1.0, minibatch=False, **kwargs)¶

Convenience method which internally calls synkhronos.data() for each input variable associated with this function. Provide data inputs as if calling the Theano function.

Parameters:	args – data inputs force_cast* (bool, optional) – see `synkhronos.data()` oversize (float [1,2], optional) – see `synkhronos.data()` minibatch (bool, optional) – see `synkhronos.data()` **kwargs – data inputs

The kwargs force_cast, oversize, and minibatch are passed to all calls to synkhronos.data()

Returns:	data object for function input.
Return type:	synkhronos.data_module.Data

name¶: As in Theano functions.

output_modes¶: Returns the reduce operations used to collect function outputs.

update_modes¶: Returns the reduce operations used to accumulate updates (only when slicing)

synkhronos.data(value=None, var=None, dtype=None, ndim=None, shape=None, minibatch=False, force_cast=False, oversize=1, name=None)¶

Returns a synkhronos.Data object, for data input to functions. Similar to a Theano variable, Data objects have fixed ndim and dtype. It is optional to populate this object with actual data or assign a shape (induces memory allocation) at instantiation.

Parameters:	value – Data values to be stored (e.g. numpy array) var (Theano variable) – To infer dtype and ndim dtype – Can specify dtype (if not implied by var) ndim – Can specify ndim (if not implied if not implied) shape – Can specify shape (if not implied by value) minibatch (bool, optional) – Use for minibatch data inputs (compare to full dataset inputs) force_cast (bool, optional) – If True, force value to specified dtype oversize (int, [1,2], optional) – Factor for OS shared memory allocation in excess of given value or shape name – As in Theano variables
Returns:	used for data input to functions
Return type:	synkhronos.Data
Raises:	`TypeError` – If incomplete specification of dtype and ndim.

class synkhronos.data_module.Data(ID, dtype, ndim, minibatch=False, name=None)¶

Type of object required as inputs to functions (instead of numpy arrays). May also be used as inputs to some collectives. Underlying memory is OS shared memory, for multi-process access. Presents a similar interface as a numpy array, with some additions and restrictions. The terms “numpy wrapper” and “underlying numpy array” refer to the same object, which is a view to the underlying memory allocation.

__getitem__(k)¶: Provided to read from underlying numpy data array, as in array[k]. Full numpy indexing is supported.

__len__()¶: Returns len() of underlying numpy data array.

__setitem__(k, v)¶: Provided to write to underlying numpy data array, as in array[k] = v. Full numpy indexing is supported.

alloc_size¶: Returns the size of the underlying memory allocation (units – number of items). This may be larger than the size of the underlying numpy array, which may occupy only a portion of the allocation (always starting at the same memory address as the allocation).

condition_data(input_data, force_cast=False)¶

Test the conditioning done to input data when calling set_value or synkhronos.data()

Parameters:	input_data – e.g., numpy array force_cast (bool, optional) – force data type
Returns:	numpy array of shape and dtype that would be used
Return type:	TYPE

data¶: Returns underlying numpy data array. In general, it is not recommended to manipulate this object directly, aside from reading from or writing to it (without changing shape). It may be passed to other python processes and used as shared memory.

dtype¶: Returns data type.

free_memory()¶: Removes all references to underlying memory (and numpy wrapper) in master and workers; the only way to shrink the allocation size.

minibatch¶: Returns whether this data is treated as a minibatch. When using the batch kwarg in a function call, all minibatch data inputs will have their selection indexes shifted so that the lowest overall index present in batch corresponds to index 0. (It is enforced that all data is long enough to meet all requested indexes.)

name¶: Returns the name (may be None)

ndim¶: Returns number of dimensions.

set_length(length, oversize=1)¶

Change length of underlying numpy array. Will induce memory reallocation if necessary (for length larger than current memory).

Parameters:	length (int) – New length of underlying numpy array oversize (int, [1,2]) – Used only if reallocating memory

Warning

Currently, memory reallocation loses old data.

set_shape(shape, oversize=1)¶

Change shape of underlying numpy array. Will induce memory reallocation if necessary (for shape larger than current memory).

Parameters:	shape (list, tuple) – New shape of underlying numpy array oversize (int, [1,2]) – Used only if reallocating memory

Warning

Currently, memory reallocation loses old data.

set_value(input_data, force_cast=False, oversize=1)¶

(Over)Write data values. Change length, reshape, and/or reallocate shared memory if necessary (applies eagerly in workers).

Parameters:	input_data – e.g. numpy array, fed into numpy.asarray() force_cast (bool, optional) – force input data to existing dtype of data object, without error or warning oversize (int, [1-2]) – Factor for oversizing memory allocation relative to input data size

Oversize applies only to underlying shared memory. The numpy array wrapper will have the exact shape of input_data.

shape¶: Returns shape of underlying numpy data array.

size¶: Returns size of underlying numpy data array.

synkhronos.broadcast(shared_vars, values=None, nccl=True)¶

Broadcast master’s values (or optionally input values) to all GPUs, resulting in same values for Theano shared variables in all GPUs.

Parameters:	shared_vars (Theano shared variables) – one or list/tuple values (None, optional) – if included, must provide one for each shared variable input; otherwise existing value in master is used nccl (bool, optional) – If True, use NCCL if available

synkhronos.scatter(shared_vars, values, batch=None)¶

Scatter values across all workers. Values are scattered as evenly as possible, by 0-th index. Optional param batch can specify a subset of the input data to be scattered.

Parameters:	shared_vars (Theano shared variables) – one or list/tuple values (synk.Data or numpy array) – one value for each variable batch (None, optional) – Slice or list of indexes, selects a subset of the input data (by 0-th index) to scatter

synkhronos.gather(shared_vars, nd_up=1, nccl=True)¶

Gather and return values in Theano shared variables from all GPUs. Does not affect the values present in the variables.

Parameters:	shared_vars (Theano shared variable) – one, or list/tuple nd_up (int, [0,1] optional) – number of dimensions to add during concatenation of results nccl (bool, optional) – If True, use NCCL if available

Warning

If a Theano shared variable has different shapes on different GPUs (e.g. some have length one longer than others), then using NCCL yields slightly corrupted results. (Extra rows will be added or left off so the same shape is collected from each GPU.) CPU-based gather will always show accurately the actual values on each GPU, regardless of shape mismatch.

Returns:	gathered values
Return type:	array (or tuple)

synkhronos.all_gather(shared_vars, nccl=True)¶

All GPUs gather the values in each variable, and overwrites the variable’s data with the gathered data (cannot change ndims).

Parameters:	shared_vars (Theano shared variable) – one, or list/tuple nccl (bool, optional) – If True, use NCCL if available.

synkhronos.reduce(shared_vars, op='avg', in_place=True, nccl=True)¶

Reduce the values of the shared variables from all GPUs into the master. Worker’s values are not affected; by default the master’s value is overwritten.

Parameters:	shared_vars (Theano shared variable) – one, or list/tuple op (str, optional) – reduce operation (avg, sum, min, max, prod) in_place (bool, optional) – If True, overwrite master variable data, otherwise return new array(s) nccl (bool, optional) – If True, use NCCL if available
Returns:	if `in_place==False`, results of reduction(s)
Return type:	array (or tuple)

synkhronos.all_reduce(shared_vars, op='avg', nccl=True)¶

Reduce the values of the shared variables across all GPUs, overwriting the data stored in each GPU with the result.

Parameters:	shared_vars (Theano shared variable) – one, or list/tuple op (str, optional) – reduce operation (avg, sum, min, max, prod) nccl (bool, optional) – If True, use NCCL if available

synkhronos.set_value(rank, shared_vars, values, batch=None)¶

Set the value of Theano shared variable(s) in one GPU.

Parameters:	rank (int) – Which GPU to write to shared_vars (Theano shared variable) – one, or list/tuple values (synk.Data, numpy array) – one value for each variable batch (None, optional) – slice or list of indexes, selects subset of input values to use

synkhronos.get_value(rank, shared_vars)¶

Get the value of Theano shared variable(s) from one GPU.

Parameters:	rank (int) – Which GPU to read from shared_vars (Theano shared variable) – one, or list/tuple

synkhronos.get_lengths(shared_vars)¶

Get lengths of Theano shared variable(s) from all GPUs.

Parameters:	shared_vars (Theano shared variable) – one, or list/tuple

synkhronos.get_shapes(shared_vars)¶

Get shapes of Theano shared variable(s) from all GPUs.

Parameters:	shared_vars (Theano shared variable) – one, or list/tuple