wangshuide2020
7e29177656
change response of names interface from list to tree.
5 years ago
wangshuide2020
dd7e72e811
add notes for init files.
5 years ago
luopengting
b9c978830a
clean redundant code after removing lineage public APIs:
1. clean redundant code in lineage parsing and querier
2. delete get_summary_lineage()
3. modify related st and ut to use data_manager
5 years ago
ougongchang
956535eed2
will ignore the graph if there are some invalid characters
5 years ago
mindspore-ci-bot
f79010d210
!699 do not raise exception to avoid printing error in the stdout.
Merge pull request !699 from wangshuide/wsd_merged_debug
5 years ago
wangshuide2020
84ff069109
do not raise exception to avoid printing error in the stdout.
5 years ago
ougongchang
d6ee87fdbe
set status to done when there occus some exception on read file
5 years ago
ougongchang
fe6f548dd2
escape some special string to avoid the frontend crash and change the default http code to 400 in MindInsightException
5 years ago
Li Hongzhang
8fdce0280e
change names of input and output
5 years ago
Li Hongzhang
e7bb056aa2
move the try except inside
5 years ago
zhangyunshu
4a5f737899
debugger: bugfix, hide the large const tensor value in GraphProto
5 years ago
Li Hongzhang
da797e3e2f
parse pb files in executor
5 years ago
Li Hongzhang
b1c8d4d758
detect having summary or not
5 years ago
Li Hongzhang
bb5fb9b1e1
avoid brief cache too often
5 years ago
Li Hongzhang
c2210200fc
redefine reload interval
5 years ago
mindspore-ci-bot
5e6932f0f6
!601 Add the summary loading switch mechanism
Merge pull request !601 from LiHongzhang/fix_caching
5 years ago
yelihua
50e1400505
add debugger module
5 years ago
Li Hongzhang
990800239b
add summary loading switch mechanism
5 years ago
zhangyunshu
39c11a2ad3
datavisual: reconstruct debugger graph
5 years ago
wangshuide2020
b0fa50e2e5
use default np.float64 to create ndarray to reduce error of calculation precision
5 years ago
wangshuide2020
005eac7582
Store data with default datatype in TensorContainer and remove limitation of datatype.
5 years ago
wangshuide2020
9daf2ae128
kill children processes of worker before worker has been killed by gunicorn master.
5 years ago
wangshuide2020
d3b65356df
remove redundant data to save memory and simplify the tensorcontainer.
5 years ago
ougongchang
ad779f4e57
set the parameter output data type
5 years ago
yuximiao
1460ab4ab1
gpu profiler
5 years ago
Li Hongzhang
c4325055aa
use the full_name as node_name
5 years ago
wangshuide2020
e66e41006c
1. add the limitation of the number of tag in tensor visualization; 2. update the max step per tensor tag to 20; 3. support query one train_job in the interface of train_jobs.
5 years ago
mindspore-ci-bot
eaf9edbf5f
!458 extract the function of _event_parse so that its code line is reasonable.
Merge pull request !458 from wangshuide/fix_ci
5 years ago
luopengting
1c73d20cc7
mainly to new a thread to load detail info
1. New a thread to load detail info. Loading detail info takes too much time,
so the summary list and lineage can not be loaded timely.
2. Add a status for DetailCacheManager to indicate it is INIT, LOADING or DONE.
3. Update UT/ST.
5 years ago
wangshuide2020
6630ca0bf2
extract the function of _event_parse so that its code line is reasonable.
5 years ago
wangshuide2020
0d9af857fa
limit the tensor count in one step and one tag and the length of event string.
5 years ago
mindspore-ci-bot
9c6ae7c026
!445 fix sometimes deserialized protobuf data cannot be pickled to be sent to another process
Merge pull request !445 from wenkai/wk4_0721_ci_pickle_error_fix
5 years ago
mindspore-ci-bot
08dc87def0
!443 Optimize number of max processes used for computing and limit number of thread running load_data_in_thread()
Merge pull request !443 from wenkai/perf_opt2_0721
5 years ago
wenkai
0877c6f7b1
Optimize number of max processes used for computing and limit number of thread running load_data_in_thread()
For max processes used for computing, we need to make sure every summary directory has a process to load data. We also should not use too many processes to avoid system problems (eg. out of memory). So we calc the max processes cnt in _calc_default_max_processes_cnt
For _load_data_in_thread_wrapper, because self._load_data_in_thread() will create process pool when loading files, we can not afford to run multiple self._load_data_in_thread() simultaneously. So we use a lock to make sure that only one self._load_data_in_thread() is running.
5 years ago
wenkai
902972e3eb
fix sometimes deserialized protobuf data cannot be pickled to be sent to another process.
1. delete original message in HistogramContainer
2. Wrap image content in ImageContainer
5 years ago
wenkai
26fabf4770
Refactor the mindinsight multiprocessing computing code to use a unified manager.
Main features:
1. Use the ComputingResourceManager to manage all computing workers.
2. Ensure fair worker number between summary directories at first. So every summary directories in cache will be loaded simultaneously.
3. When a summary directory is loaded, it's worker will be released, and other unfinished summary directories can use the released workers to speed up. This way we solved the slow worker problem.
Code changes:
1. Added computing_resource_mgr.py
2. Passed ComputingResourceManager instances instead of workers_count
3. Simplified the _load_single_fine() function a bit.
5 years ago
wangshuide2020
7877f33b70
Use multiple processes to calc events.
1. To accelerate summary file parsing, multiple processes are used. As the first step to mindinsight parsing performance optimization, we only made changes to _load_single_file function.
2. This PR will imporve summary parsing throughput dramatically (about cpu_count times)
3. Changes are mainly about _load_single_file function
In the future, a more global concurrent computing framework is needed for mindinsight. See the gitee wiki doc for details.
5 years ago
wangshuide2020
5d9473de6d
supplement the docstring of Histogram class.
5 years ago
wangshuide2020
e8ffeb70ef
Support tensor visualization. 1.Tensor display in a table, it can support no more than two dimensions tensor visualization; 2.Tensor histogram visualization for all step in cache.
5 years ago
mindspore-ci-bot
c363a86606
!404 Change the summary watcher for not calling analyse in user scripts.
Merge pull request !404 from yuximiao/master
5 years ago
yuximiao
c66be92ef9
change profiling watcher
5 years ago
mindspore-ci-bot
d17a6621ab
!399 delete lineage cache when parse failed
Merge pull request !399 from luopengting/fix_lineage_update
5 years ago
wenkai
c610544905
fix ZeroDivisionError when original bucket width is 0 by checking the width.
5 years ago
luopengting
9e6f852c5d
delete lineage cache when parse failed
5 years ago
wangshuide2020
0fad2218fd
add null byte check in the api of get_plugins
5 years ago
liangyongxiong
4913543f31
add robust check for summary watcher in case of FileNotFound exception
5 years ago
wenkai
1895275edf
fix log levels too high
5 years ago
mindspore-ci-bot
4f3d07937d
!167 optimize redundant funtion codes
Merge pull request !167 from liangyongxiong/redundant-codes
5 years ago
liangyongxiong
b7b07bdc12
fix bugs of summary watcher when discovering profiler directory
5 years ago
liangyongxiong
b3a5f2b6cc
optimize redundant funtion codes
5 years ago