mindspore-mindinsight

Commit Graph

Author	SHA1	Message	Date
Li Hongzhang	990800239b	add summary loading switch mechanism	5 years ago
yuximiao	1460ab4ab1	gpu profiler	5 years ago
luopengting	1c73d20cc7	mainly to new a thread to load detail info 1. New a thread to load detail info. Loading detail info takes too much time, so the summary list and lineage can not be loaded timely. 2. Add a status for DetailCacheManager to indicate it is INIT, LOADING or DONE. 3. Update UT/ST.	5 years ago
mindspore-ci-bot	08dc87def0	!443 Optimize number of max processes used for computing and limit number of thread running load_data_in_thread() Merge pull request !443 from wenkai/perf_opt2_0721	5 years ago
wenkai	0877c6f7b1	Optimize number of max processes used for computing and limit number of thread running load_data_in_thread() For max processes used for computing, we need to make sure every summary directory has a process to load data. We also should not use too many processes to avoid system problems (eg. out of memory). So we calc the max processes cnt in _calc_default_max_processes_cnt For _load_data_in_thread_wrapper, because self._load_data_in_thread() will create process pool when loading files, we can not afford to run multiple self._load_data_in_thread() simultaneously. So we use a lock to make sure that only one self._load_data_in_thread() is running.	5 years ago
wenkai	26fabf4770	Refactor the mindinsight multiprocessing computing code to use a unified manager. Main features: 1. Use the ComputingResourceManager to manage all computing workers. 2. Ensure fair worker number between summary directories at first. So every summary directories in cache will be loaded simultaneously. 3. When a summary directory is loaded, it's worker will be released, and other unfinished summary directories can use the released workers to speed up. This way we solved the slow worker problem. Code changes: 1. Added computing_resource_mgr.py 2. Passed ComputingResourceManager instances instead of workers_count 3. Simplified the _load_single_fine() function a bit.	5 years ago
wangshuide2020	7877f33b70	Use multiple processes to calc events. 1. To accelerate summary file parsing, multiple processes are used. As the first step to mindinsight parsing performance optimization, we only made changes to _load_single_file function. 2. This PR will imporve summary parsing throughput dramatically (about cpu_count times) 3. Changes are mainly about _load_single_file function In the future, a more global concurrent computing framework is needed for mindinsight. See the gitee wiki doc for details.	5 years ago
luopengting	9e6f852c5d	delete lineage cache when parse failed	5 years ago
liangyongxiong	4913543f31	add robust check for summary watcher in case of FileNotFound exception	5 years ago
liangyongxiong	0eae537da0	fix bugs of cache mechanism	5 years ago
liangyongxiong	34648a48d2	update cache status on loading	5 years ago
liangyongxiong	8893236417	compare scalars within multiple train jobs	5 years ago
mindspore-ci-bot	6f599aa191	!134 modify lineage parsing for multi lineage files Merge pull request !134 from luopengting/lineage_parsing	5 years ago
luopengting	665f2d680a	modify lineage parsing for multi lineage files, modify ut/st	5 years ago
wenkai	3cb0fbbbef	fix detail_cache.has_content works not properly (always return the wrong value false)	5 years ago
mindspore-ci-bot	2f28388be4	!120 establish a caching mechanism for lineage Merge pull request !120 from luopengting/lineage_cache_datamgr	5 years ago
luopengting	6963af9374	add lineage cache, add update method and put api, modify ut/st	5 years ago
wenkai	c63a32b0bb	log exception raised by new thread to help debugging	5 years ago
wenkai	7e17d6ffae	refactor data manager and unify cache and data access/reload	5 years ago
ougongchang	a807c45a4a	add some exceptions, such as TrainJobNotExistError, GraphNotExistError and so on	5 years ago
gaocongli	e7a0496e87	initial version	5 years ago

21 Commits (990800239bd3cc03539f0901997eb39468038829)