5. API Reference¶
-
synkhronos.
fork
(n_parallel=None, use_gpu=True, master_rank=0, profile_workers=False, max_n_var=1000, max_dim=16)¶ Forks a Python process for each additional GPU, initializes GPUs. Call before building any Theano GPU-variables or Synkhronos functions.
Parameters: - n_parallel (None, optional) – Number of GPUs to use (default uses all)
- use_gpu (bool, optional) – Inactive (possibly future CPU-only mode)
- master_rank (int, optional) – GPU to use in master process
- profile_workers (bool, optional) – If True, records cProfiles of workers
(see
synkhronos/worker.py
for details) - max_n_var (int, optional) – Max number of variables in a function call
- max_dim (int, optional) – Max number of dimensions of any variable
Returns: Number of GPUs using.
Return type: int
Raises: NotImplementedError
– Ifuse_gpu==False
RuntimeError
– If already forked.
-
synkhronos.
close
()¶ Close workers and join their processes. Called automatically on exit.
-
synkhronos.
function
(inputs, outputs=None, bcast_inputs=None, updates=None, givens=None, sliceable_shareds=None, **kwargs)¶ Replacement for
theano.function()
, with a similar interface. Builds underlying Theano functions, including support for function slicing.Parameters: - inputs – as in Theano, to be scattered among workers
- outputs – as in Theano, with option to specify reduce operation (see notes below)
- bcast_inputs – as inputs in Theano, to be broadcast to all workers
- updates – as in Theano, with option to specify reduct operation (see notes below)
- givens – as in Theano
- sliceable_shareds – any implicit inputs (Theano shared variables) acting
as data-parallel data (i.e. to be subjected to the kwarg
batch_s
and /or to function slicing) must be listed here - **kwargs – passed on to all internal calls to
theano.function()
- Reduce Operations:
Outputs: May be specified simply as Theano tensor variables, as in normal Theano, or as two-tuples, as in (var, reduce-op), where reduce-op can be: “avg”, “sum”, “max”, “min”, “prod”, or None. Default is “avg”.
Updates: May be specified as a list of two-tuples, as in normal Theano, or may include triples, as in (var, update, reduce-op). Unlike for outputs, the reduce-op here applies only when using function slicing. Every slice is computed using the original values, and the update is accumulated over the slices. (This may impose some limits on the form of the update expression.) At the end of the function call, all updates are applied only locally, within each worker. This provides clear control to user over when to communicate.
Returns: callable object, replacing a theano.Function
Return type: sykhronos.function_module.Function
Raises: RuntimeError
– If Sykhronos not yet forked, or if already distributedTypeError
– If incorrect format for arguments.ValueError
– If entry insliceable_shareds
is not used in function, or for invalid reduce operation requested.
-
synkhronos.
distribute
()¶ Replicates all Synkhronos functions and their Theano shared variables in worker processes / GPUs. It must be called after building the last Synkhronos function and before calling any Synkhronos function.
It pickles all underlying Theano functions into one file, which workers unpickle. All Theano shared variable data is included, and correspondences between variables across functions is preserved. The pickle file is automatically deleted by a worker. The default file location is in the directory synkhronos/pkl/, but this can be changed by modifying
PKL_PATH
in synkhronos/util.py.Raises: RuntimeError
– If not yet forked or if already distributed.
-
class
synkhronos.function_module.
Function
(inputs, bcast_inputs, to_cpu, return_list=True, **kwargs)¶ Class of instances returned by
synkhronos.function()
.-
__call__
(*args, output_subset=None, batch=None, batch_s=None, num_slices=1, **kwargs)¶ Callable as in Theano function.
When called, a Synkhronos function:
- Assigns input data evenly across all GPUs,
- Signals to workers to start and which function to call,
- Calls the underlying Theano function on assigned data subset,
- Collect results from workers and returns them.
Parameters: - *args (Data) – Normal data inputs to Theano function
- output_subset – as in Theano
- batch – indexes to select from scattering input data (see notes)
- batch_s – indexes to select from scattered implicit inputs (see notes)
- num_slices (int) – call the function over this many slices of the selected, scattered data and accumulate results (avoid out-of-memory)
- **kwargs (Data) – Normal data inputs to Theano function
- Batching:
The kwarg
batch
can be of types: (int, slice, list (of ints), numpy array (1-d, int)). It applies before scattering, to the whole input data set. If type int, this acts as data[:int].The kwarg
batch_s
can be of type (slice, list (of ints), numpy array (1-d, int)) or a list of all the same type (one of those three), with one entry for each GPU. It applies after scattering, to data already residing on the GPU. If only one of the above types is provided, rather than a list of them, it is used in all GPUs.In both
batch
andbatch_s
, full slice types are not supported; start and stop fields must be ints, step None.- Slicing:
- Function slicing by the
num_slices
kwarg applies within each worker, after individual worker data assignment. Results are accumulated within each worker and reduced only once at the end. Likewise, any updates are computed and accumulated using the original variable values, and the updates are applied only once at the end.
Raises: RuntimeError
– If not distributed or if synkhronos closed.
-
as_theano
(*args, **kwargs)¶ Call the function in the master process only, as normal Theano.
Parameters:
-
build_inputs
(*args, force_cast=False, oversize=1.0, minibatch=False, **kwargs)¶ Convenience method which internally calls
synkhronos.data()
for each input variable associated with this function. Provide data inputs as if calling the Theano function.Parameters: - *args – data inputs
- force_cast (bool, optional) – see
synkhronos.data()
- oversize (float [1,2], optional) – see
synkhronos.data()
- minibatch (bool, optional) – see
synkhronos.data()
- **kwargs – data inputs
The kwargs
force_cast
,oversize
, andminibatch
are passed to all calls tosynkhronos.data()
Returns: data object for function input. Return type: synkhronos.data_module.Data
-
name
¶ As in Theano functions.
-
output_modes
¶ Returns the reduce operations used to collect function outputs.
-
update_modes
¶ Returns the reduce operations used to accumulate updates (only when slicing)
-
-
synkhronos.
data
(value=None, var=None, dtype=None, ndim=None, shape=None, minibatch=False, force_cast=False, oversize=1, name=None)¶ Returns a
synkhronos.Data
object, for data input to functions. Similar to a Theano variable, Data objects have fixed ndim and dtype. It is optional to populate this object with actual data or assign a shape (induces memory allocation) at instantiation.Parameters: - value – Data values to be stored (e.g. numpy array)
- var (Theano variable) – To infer dtype and ndim
- dtype – Can specify dtype (if not implied by var)
- ndim – Can specify ndim (if not implied if not implied)
- shape – Can specify shape (if not implied by value)
- minibatch (bool, optional) – Use for minibatch data inputs (compare to full dataset inputs)
- force_cast (bool, optional) – If True, force value to specified dtype
- oversize (int, [1,2], optional) – Factor for OS shared memory allocation in excess of given value or shape
- name – As in Theano variables
Returns: used for data input to functions
Return type: synkhronos.Data
Raises: TypeError
– If incomplete specification of dtype and ndim.
-
class
synkhronos.data_module.
Data
(ID, dtype, ndim, minibatch=False, name=None)¶ Type of object required as inputs to functions (instead of numpy arrays). May also be used as inputs to some collectives. Underlying memory is OS shared memory, for multi-process access. Presents a similar interface as a numpy array, with some additions and restrictions. The terms “numpy wrapper” and “underlying numpy array” refer to the same object, which is a view to the underlying memory allocation.
-
__getitem__
(k)¶ Provided to read from underlying numpy data array, as in array[k]. Full numpy indexing is supported.
-
__len__
()¶ Returns len() of underlying numpy data array.
-
__setitem__
(k, v)¶ Provided to write to underlying numpy data array, as in array[k] = v. Full numpy indexing is supported.
-
alloc_size
¶ Returns the size of the underlying memory allocation (units – number of items). This may be larger than the size of the underlying numpy array, which may occupy only a portion of the allocation (always starting at the same memory address as the allocation).
-
condition_data
(input_data, force_cast=False)¶ Test the conditioning done to input data when calling
set_value
orsynkhronos.data()
Parameters: - input_data – e.g., numpy array
- force_cast (bool, optional) – force data type
Returns: numpy array of shape and dtype that would be used
Return type: TYPE
-
data
¶ Returns underlying numpy data array. In general, it is not recommended to manipulate this object directly, aside from reading from or writing to it (without changing shape). It may be passed to other python processes and used as shared memory.
-
dtype
¶ Returns data type.
-
free_memory
()¶ Removes all references to underlying memory (and numpy wrapper) in master and workers; the only way to shrink the allocation size.
-
minibatch
¶ Returns whether this data is treated as a minibatch. When using the
batch
kwarg in a function call, all minibatch data inputs will have their selection indexes shifted so that the lowest overall index present inbatch
corresponds to index 0. (It is enforced that all data is long enough to meet all requested indexes.)
-
name
¶ Returns the name (may be None)
-
ndim
¶ Returns number of dimensions.
-
set_length
(length, oversize=1)¶ Change length of underlying numpy array. Will induce memory reallocation if necessary (for length larger than current memory).
Parameters: - length (int) – New length of underlying numpy array
- oversize (int, [1,2]) – Used only if reallocating memory
Warning
Currently, memory reallocation loses old data.
-
set_shape
(shape, oversize=1)¶ Change shape of underlying numpy array. Will induce memory reallocation if necessary (for shape larger than current memory).
Parameters: - shape (list, tuple) – New shape of underlying numpy array
- oversize (int, [1,2]) – Used only if reallocating memory
Warning
Currently, memory reallocation loses old data.
-
set_value
(input_data, force_cast=False, oversize=1)¶ (Over)Write data values. Change length, reshape, and/or reallocate shared memory if necessary (applies eagerly in workers).
Parameters: - input_data – e.g. numpy array, fed into numpy.asarray()
- force_cast (bool, optional) – force input data to existing dtype of data object, without error or warning
- oversize (int, [1-2]) – Factor for oversizing memory allocation relative to input data size
Oversize applies only to underlying shared memory. The numpy array wrapper will have the exact shape of
input_data
.
-
shape
¶ Returns shape of underlying numpy data array.
-
size
¶ Returns size of underlying numpy data array.
-
-
synkhronos.
broadcast
(shared_vars, values=None, nccl=True)¶ Broadcast master’s values (or optionally input values) to all GPUs, resulting in same values for Theano shared variables in all GPUs.
Parameters: - shared_vars (Theano shared variables) – one or list/tuple
- values (None, optional) – if included, must provide one for each shared variable input; otherwise existing value in master is used
- nccl (bool, optional) – If True, use NCCL if available
-
synkhronos.
scatter
(shared_vars, values, batch=None)¶ Scatter values across all workers. Values are scattered as evenly as possible, by 0-th index. Optional param
batch
can specify a subset of the input data to be scattered.Parameters: - shared_vars (Theano shared variables) – one or list/tuple
- values (synk.Data or numpy array) – one value for each variable
- batch (None, optional) – Slice or list of indexes, selects a subset of the input data (by 0-th index) to scatter
-
synkhronos.
gather
(shared_vars, nd_up=1, nccl=True)¶ Gather and return values in Theano shared variables from all GPUs. Does not affect the values present in the variables.
Parameters: - shared_vars (Theano shared variable) – one, or list/tuple
- nd_up (int, [0,1] optional) – number of dimensions to add during concatenation of results
- nccl (bool, optional) – If True, use NCCL if available
Warning
If a Theano shared variable has different shapes on different GPUs (e.g. some have length one longer than others), then using NCCL yields slightly corrupted results. (Extra rows will be added or left off so the same shape is collected from each GPU.) CPU-based gather will always show accurately the actual values on each GPU, regardless of shape mismatch.
Returns: gathered values Return type: array (or tuple)
-
synkhronos.
all_gather
(shared_vars, nccl=True)¶ All GPUs gather the values in each variable, and overwrites the variable’s data with the gathered data (cannot change ndims).
Parameters: - shared_vars (Theano shared variable) – one, or list/tuple
- nccl (bool, optional) – If True, use NCCL if available.
-
synkhronos.
reduce
(shared_vars, op='avg', in_place=True, nccl=True)¶ Reduce the values of the shared variables from all GPUs into the master. Worker’s values are not affected; by default the master’s value is overwritten.
Parameters: - shared_vars (Theano shared variable) – one, or list/tuple
- op (str, optional) – reduce operation (avg, sum, min, max, prod)
- in_place (bool, optional) – If True, overwrite master variable data, otherwise return new array(s)
- nccl (bool, optional) – If True, use NCCL if available
Returns: if
in_place==False
, results of reduction(s)Return type: array (or tuple)
-
synkhronos.
all_reduce
(shared_vars, op='avg', nccl=True)¶ Reduce the values of the shared variables across all GPUs, overwriting the data stored in each GPU with the result.
Parameters: - shared_vars (Theano shared variable) – one, or list/tuple
- op (str, optional) – reduce operation (avg, sum, min, max, prod)
- nccl (bool, optional) – If True, use NCCL if available
-
synkhronos.
set_value
(rank, shared_vars, values, batch=None)¶ Set the value of Theano shared variable(s) in one GPU.
Parameters: - rank (int) – Which GPU to write to
- shared_vars (Theano shared variable) – one, or list/tuple
- values (synk.Data, numpy array) – one value for each variable
- batch (None, optional) – slice or list of indexes, selects subset of input values to use
-
synkhronos.
get_value
(rank, shared_vars)¶ Get the value of Theano shared variable(s) from one GPU.
Parameters: - rank (int) – Which GPU to read from
- shared_vars (Theano shared variable) – one, or list/tuple
-
synkhronos.
get_lengths
(shared_vars)¶ Get lengths of Theano shared variable(s) from all GPUs.
Parameters: shared_vars (Theano shared variable) – one, or list/tuple
-
synkhronos.
get_shapes
(shared_vars)¶ Get shapes of Theano shared variable(s) from all GPUs.
Parameters: shared_vars (Theano shared variable) – one, or list/tuple