I really like the corny API
Added thanks to @tcbegley: __getitem__
and __setitem__
brain = Brain()
brain['text'] = 'asdf' # calls brain.learn()
brain['text'] # calls brain.recall()
# >>> 'asdf'
Updates:
Ability to start the underlying plasma_state
process when you instantiate Brain
brain = Brain(start_process=True, size=100000000)
Also, if you have used brain.dead(i_am_sure=True)
to kill the plasma_state
process, you can restart it with the new method brain.start(path='this/path',size=numberofbytes)
(parameters are optional - default is to use the previous size and path)
Fixed bug that sometimes doesn’t let you assign a new value to a given name:
# old error example
brain['a'] = 'asdf'
brain['a']
# >>> 'asdf'
brain['a'] = 5
# >>> Plasma Error - ObjectID already exists
New attributes: brain.size
& brain.mb
number of bytes (integer) and megabytes (e.g. '50 MB'
), respectively, available in the plasma_store
Updates:
Ability to resize the memory available in the underlying plasma_store
process without losing any variables.
brain['a'] = [1,2,3,4]
brain.size
# >>> 50000000
brain.resize(100000000)
# size changes
brain.size
# >>> 100000000
# all the values remain
brain['a']
# >>> [1,2,3,4]
Now you have to specify to NOT start the process rather than assuming that the plasma_state
process is already there.
Plus general bugfixes, stabilizing the API, and performance.
Updates:
new functions
# how much space is used
`brain.used()`
# how much space is free
`brain.free()`
# dynamically find size of plasma_state
`brain.size()`
# see dictionary of names:ObjectID()s
`brain.object_map()`
Bugfix: brain.start()
and brain.resize()
started a new plasma_store instance, now they don’t. Problem was in brain.dead()
Update with release v0.2:
Big things! The brain-plasma
is stable again (with breaking changes around starting and killing Plasma instances), documentation is better, and there is a new killer feature: namespaces!
RELEASE WITH BREAKING CHANGES
- changed parameter order of
learn()
to('name',thing)
which is more intuitive (but you should always use bracket notation) - removed ability to start, kill, or resize the underlying brain instance (stability)
- added ability to use unique namespaces to hold same-name values.
- newly available:
-
len(brain)
--># 5
-
del brain['this']
--># brain.forget('this')
-
'this' in brain
--># True
- (implemented
__len__
,__delitem__
, and__contains__
)
-
Using namespaces:
brain.namespace
>>> 'default'
brain['this'] = 'default text object'
# change namespace
brain.set_namespace('newname')
brain['this'] = 'newname text object'
brain.set_namespace('default')
brain['this']
>>> 'default text object'
brain.names(namespaces='all')
>>>['this','this']
brain.show_namespaces()
>>> {'default','newname'}
brain.remove_namespace('newname')
brain.namespace
>>> 'default'
I’m currently using the namespaces feature to back up persistent user state data that can’t be held client-side, and continuing to use the main storage features for quick big-object access.
I’d love some help on emulating dictionary indexed assignment behavior like in this helpful issue: https://github.com/russellromney/brain-plasma/issues/18
Hope this is useful for some folks!
Hi Russell,
Thanks for this tool but could you please tell me (and maybe others) how to use it with Dash?
I tried it in my app and i get this:
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1002 12:08:20.027307 2871042944 io.cc:168] Connection to IPC socket failed for pathname /tmp/plasma, retrying 5 more times
E1002 12:08:20.430019 2871042944 io.cc:168] Connection to IPC socket failed for pathname /tmp/plasma, retrying 4 more times
E1002 12:08:20.834305 2871042944 io.cc:168] Connection to IPC socket failed for pathname /tmp/plasma, retrying 3 more times
E1002 12:08:21.237042 2871042944 io.cc:168] Connection to IPC socket failed for pathname /tmp/plasma, retrying 2 more times
E1002 12:08:21.638428 2871042944 io.cc:168] Connection to IPC socket failed for pathname /tmp/plasma, retrying 1 more times
Traceback (most recent call last):
File "index.py", line 8, in <module>
from apps import triangles_app, ts_analysis, graphing, gtaa, play
File "/Users/Desktop/webapp/apps/graphing.py", line 34, in <module>
brain = Brain()
File "/Users/Desktop/webapp/env/lib/python3.7/site-packages/brain_plasma/brain_plasma.py", line 20, in __init__
self.client = plasma.connect(self.path,num_retries=5)
File "pyarrow/_plasma.pyx", line 805, in pyarrow._plasma.connect
File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Could not connect to socket /tmp/plasma
I am running the dash app in a virtualenv and plasma_store -m 50000000 -s /tmp/plasma
in a separate terminal window.
Thanks!
I solved my problem by launching the plasma_store process with a path inside the dash webapp. Is this the best way to do it?
You should not need to do that. Afaik, /tmp
is used to create temporary socket files which are used to communicate with the Plasma instance. Changing the path should not how brain-plasma works though. Which system are you on?
I’m on MacOS. Also, I couldn’t use start_process=True
as argument (i have brain-plasma 0.2)
That’s odd that it doesn’t work on Mac. Will you open an issue on Github with an example of the non-working code?
In v0.2 I removed that ability as it made the tool unstable. I have updated the reference in the README on Github.
This is really cool! Thanks for making it @russellthehippo. I’m using it for a server and was wondering what the best way to start a gunicorn/flask process is with brain/plasma. I can’t start all workers at the same time because they will all be trying do the initial write of the dataframe to the same object. Right now, I’m running an initial script to read my data into the brain, and then start the gunicorn workers.
It sounds like you need to have each worker check if the object exists already before it tries to load it. Maybe I misunderstand the question.
Well, I do, but these objects have to be loaded as soon as the worker starts and all the workers try to load the same object at the same time. I guess there’s no obvious solution besides a two stage initialisation.
Update with Release v0.3:
Summary
This release is the biggest release yet in the path to production usefulness outside of a few large objects. I mostly rewrote brain_plasma.Brain
and entirely refactored: it now hashes names for direct access to speed up read and write operations by several orders of magnitude due to fewer and more lightweight calls. The API is mostly the same. Custom exceptions are added to help users catch and understand errors better. Most functions are unit tested and can be checked with pytest
.
The sum of these changes means brain-plasma
can be used as a fast production backend similar to Redis, but with fast support for very large values as well as for very small values (and for pure Python objects rather than transformed values a la JSON) and for few as well as many values. I’m pretty excited about it.
Hashing speedup
Speedup results are drastic, especially when there are more than a dozen or so names in the store. This is because the old Brain
called client.list()
multiple times for a most Brain
interactions. This was admittedly a horrible design. The new Brain
doesn’t call client.list()
at all for most operations including all reads and writes. The script many_vals.py
compares the old with the new Brain
s (all values in seconds):
plasma_store -m 10000000 -s /tmp/plasma
# new terminal
python many_vals.py
>>>
100 items:
learn:
old: 3.6606647968292236
hash: 0.030955076217651367
recall:
old: 4.092543840408325
hash: 0.017110824584960938
10 items:
learn:
old: 0.32016992568969727
hash: 0.005012035369873047
recall:
old: 0.31406521797180176
hash: 0.002324819564819336
Unit tests
Most functions are tested in tests/
. Check yourself or test your changes with:
pip install pytest
pytest
Exceptions
Custom exceptions are added to help users catch and understand errors better. Most types of errors that are unique to the functions rather than to Python errors are defined as custom exceptions. Function docstrings mention which exceptions which may be caught. New exceptions are imported en masse like:
from brain_plasma.exceptions import (
BrainNameNotExistError,
BrainNamespaceNameError,
BrainNamespaceNotExistError,
BrainNamespaceRemoveDefaultError,
BrainNameLengthError,
BrainNameTypeError,
BrainClientDisconnectedError,
BrainRemoveOldNameValueError,
BrainLearnNameError,
BrainUpdateNameError,
)
Other
Code is formatted with the excellent black
. Markdown is formatted with Prettier.
Hello I am using brain-plasma in production, I update with kombu my dataframes so I don’t have to query database anymore, only at start, in my Dockerfile I call entrypoint.sh and use this line for starting alongside guinicorn
plasma_store -m 50000000 -s /tmp/plasma &
then I do
exec gunicorn src.app:server --bind 0.0.0.0:8000 --log-level=info --timeout=90
I hope this can be helpful for you
Could Brain Plasma be used as an alternative backend for https://pythonhosted.org/Flask-Caching/#custom-cache-backends
That would be great I guess.
Hi @mwveliz, thanks for the reply! Could you please explain in a bit more detail how you deal with the issue of multiple workers trying to write to plasma the initial dataframe? Thanks
Hi @dldx I am not sure because I only use one worker, but maybe this way (using --preload) you could do it:
gunicorn --preload src.app:server --bind 0.0.0.0:8000 --log-level=info --timeout=90
as descirbed on
and
Hope it can help you
Perfect! I had no idea about that preload option! Thank you.