datamatrix.functional
A set of functions and decorators for functional programming. This module is typically imported as fnc
for brevity:
from datamatrix import functional as fnc
What is functional programming?
Functional programming is a style of programming that is characterized by the following:
- Lack of statements—In its purest form, functional programming does not use any statements. Statements are things like assignments (e.g.
x = 1
),for
loops,if
statements, etc. Instead of statements, functional programs are chains of function calls. - Short functions—In the purest form of functional programming, each function is a single expression. In Python, this can be implemented through
lambda
expressions. - Referential transparency—Functions are referentially transparent when they always return the same result given the same set of arguments (i.e. they are stateless), and when they do not alter the state of the program (i.e. they have no side effects).
function curry(fnc)
A currying decorator that turns a function with multiple arguments into a chain of partial functions, each of which takes at least a single argument. The input function may accept keywords, but the output function no longer does (i.e. currying turns all keywords into positional arguments).
Example:
from datamatrix import functional as fnc
@fnc.curry
def add(a, b, c):
return a + b + c
print(add(1)(2)(3)) # Curried approach with single arguments
print(add(1, 2)(3)) # Partly curried approach
print(add(1)(2, 3)) # Partly curried approach
print(add(1, 2, 3)) # Original approach multiple arguments
Output:
6
6
6
6
Arguments:
fnc
-- A function to curry.- Type: callable
Returns:
A curried function that accepts at least the first argument, and returns a function that accepts the second argument, etc.
- Type: callable
function filter_(fnc, obj)
Filters rows from a datamatrix or column based on filter function
(fnc
).
If obj
is a column, fnc
should be a function that accepts a single
value. If obj
is a datamatrix, fnc
should be a function that
accepts a keyword dict
, where column names are keys and cells are
values. In both cases, fnc
should return a bool
indicating whether
the row or value should be included.
New in v0.8.0: You can also directly compare a column with a function
or lambda
expression. However, this is different from filter_()
in
that it returns a datamatrix object and not a column.
Example:
from datamatrix import DataMatrix, functional as fnc
dm = DataMatrix(length=5)
dm.col = range(5)
# Create a column with only odd values
col_new = fnc.filter_(lambda x: x % 2, dm.col)
print(col_new)
# Create a new datamatrix with only odd values in col
dm_new = fnc.filter_(lambda **d: d['col'] % 2, dm)
print(dm_new)
Output:
col[1, 3]
+---+-----+
| # | col |
+---+-----+
| 1 | 1 |
| 3 | 3 |
+---+-----+
Arguments:
fnc
-- A filter function.- Type: callable
obj
-- A datamatrix or column to filter.- Type: BaseColumn, DataMatrix
Returns:
A new column or datamatrix.
- Type: BaseColumn, DataMatrix
function map_(fnc, obj)
Maps a function (fnc
) onto rows of datamatrix or cells of a column.
If obj
is a column, the function fnc
is mapped is mapped onto each
cell of the column, and a new column is returned. In this case,
fnc
should be a function that accepts and returns a single value.
If obj
is a datamatrix, the function fnc
is mapped onto each row,
and a new datamatrix is returned. In this case, fnc
should be a
function that accepts a keyword dict
, where column names are keys and
cells are values. The return value should be another dict
, again with
column names as keys, and cells as values. Columns that are not part of
the returned dict
are left unchanged.
New in v0.8.0: In Python 3.5 and later, you can also map a function
onto a column using the @
operator:
dm.new = dm.old @ (lambda i: i*2)
Example:
from datamatrix import DataMatrix, functional as fnc
dm = DataMatrix(length=3)
dm.old = 0, 1, 2
# Map a 2x function onto dm.old to create dm.new
dm.new = fnc.map_(lambda i: i*2, dm.old)
print(dm)
# Map a 2x function onto the entire dm to create dm_new, using a fancy
# dict comprehension wrapped inside a lambda function.
dm_new = fnc.map_(
lambda **d: {col : 2*val for col, val in d.items()},
dm
)
print(dm_new)
Output:
+---+-----+-----+
| # | new | old |
+---+-----+-----+
| 0 | 0 | 0 |
| 1 | 2 | 1 |
| 2 | 4 | 2 |
+---+-----+-----+
+---+-----+-----+
| # | new | old |
+---+-----+-----+
| 0 | 0 | 0 |
| 1 | 4 | 2 |
| 2 | 8 | 4 |
+---+-----+-----+
Arguments:
fnc
-- A function to map onto each row or each cell.- Type: callable
obj
-- A datamatrix or column to mapfnc
onto.- Type: BaseColumn, DataMatrix
Returns:
A new column or datamatrix.
- Type: BaseColumn, DataMatrix
class memoize
Requires json_tricks
A memoization decorator that stores the result of a function call, and returns the stored value when the function is called again with the same arguments. That is, memoization is a specific kind of caching that improves performance for expensive function calls.
This decorator only works for return values that can be pickled, and
arguments that can be serialized to json
.
The memoized function becomes a callable object. To clear the
memoization cache, call the .clear()
function on the memoized
function. The total size of all cached return values is available as
the .cache_size
property.
For a more detailed description, see:
Changed in v0.8.0: You can no longer pass the memoclear
keyword to
the memoized function. Use the .clear()
function instead.
Example:
from datamatrix import functional as fnc
@fnc.memoize
def add(a, b):
print('add(%d, %d)' % (a, b))
return a + b
three = add(1, 2) # Storing result in memory
three = add(1, 2) # Re-using previous result
add.clear() # Clear cache, but only for the next call
three = add(1, 2) # Calculate again
@fnc.memoize(persistent=True, key='persistent-add')
def persistent_add(a, b):
print('persistent_add(%d, %d)' % (a, b))
return a + b
three = persistent_add(1, 2) # Writing result to disk
three = persistent_add(1, 2) # Re-using previous result
Output:
add(1, 2)
add(1, 2)
persistent_add(1, 2)
function profile(*args, **kwds)
A context manager (with
) for easy profiling, using cProfile. The
results of the profile are written to the file specified in the path
keyword (default=u'profile'
), and the sorting order, as accepted by
pstats.Stats.sort_stats()
, is specified in the the sortby
keyword
(default=u'cumulative'
).
Example:
from datamatrix import functional as fnc
with fnc.profile(path=u'profile.txt', sortby=u'cumulative'):
dm = DataMatrix(length=1000)
dm.col = range(1000)
dm.is_even = dm.col @ (lambda x: not x % 2)
Argument list:
*args
: No description.
Keyword dict:
**kwds
: No description.
function setcol(dm, name, value)
Returns a new DataMatrix to which a column has been added or in which a column has been modified.
The main difference with regular assignment (dm.col = 'x'
) is that
setcol()
does not modify the original DataMatrix, and can be used in
lambda
expressions.
Example:
from datamatrix import DataMatrix, functional as fnc
dm1 = DataMatrix(length=5)
dm2 = fnc.setcol(dm1, 'y', range(5))
print(dm2)
Output:
+---+---+
| # | y |
+---+---+
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+---+---+
Arguments:
dm
-- A DataMatrix.- Type: DataMatrix
name
-- A column name.- Type: str
value
-- The value to be assigned to the column. This can be any value this is valid for a regular column assignment.
Returns:
A new DataMatrix.
- Type: DataMatrix
function stack_multiprocess(fnc, args, processes=None)
Facilitates multiprocessing for functions that return DataMatrix
objects.
Specifically, stack_multiprocess()
, calls fnc()
in separate
processes, each time passing a different argument. Arguments are
specified in args
, which should be a list (or other iterable) of
arguments that are passed to fnc()
for each call separately. In other
words, as many processes are launched as there are elements in args
.
fnc()
should be a function that accepts a single argument and returns
a DataMatrix
object. The resulting DataMatrix
objects are stacked
together (similar to ops.stack()
) and returned as a single
DataMatrix
.
See also:
- https://docs.python.org/3/library/multiprocessing.html
- https://pydatamatrix.eu/1.0/operations#function-stack
Version note: New in 1.0.0.
Version note: As of 1.0.4, if one of the processes crashes, and error is shown with the Exception, but the main process doesn't crash.
Example:
from datamatrix import DataMatrix, functional as fnc
def get_dm(i):
dm = DataMatrix(length=1)
dm.s = i
return dm
# This will launch five separate processes and return a single dm
dm = fnc.stack_multiprocess(get_dm, [1, 2, 3, 4, 5])
arguments:
fnc:
desc: A function to call. This function should accept a single
argument and return a single DataMatrix
.
type: callable
args:
desc: A list
of arguments that are passes separately to
fnc()
.
keywords:
processes:
desc: The number of processes that are launched simultaneously
or None
to launch one process for each core on the
system.
type: [None, int]
returns:
type: DataMatrix
Arguments:
fnc
-- No descriptionargs
-- No description
Keywords:
processes
-- No description- Default: None