You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README_en.md 22 kB

4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
4 months ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411
  1. **Read in: [English](README_en.md) | [中文](README.md)**
  2. # JCS-pub (JointCloud Storage Public Infrastructure) + JCS-client (Ready-to-Use JointCloud Storage Client)
  3. ## Project Overview
  4. JointCloud Storage is a storage service model built on a peer-to-peer JointCloud collaboration mechanism. It manages storage resources across multiple clouds and provides users with a unified data storage service. The core idea emphasizes the independent and equal status of each cloud, connecting them in a non-intrusive manner. It also promotes intercloud collaboration by integrating computing, storage, and networking resources across clouds to deliver high-quality storage services.
  5. This project aims to turn JointCloud Storage into a public infrastructure that is easy to use for individuals and enterprises alike. By simply using a JCS-client, users can quickly access efficient jointcloud storage without the need to deploy additional components. At the same time, the system allows flexible customization of its features to meet diverse needs.
  6. ## Evolution Roadmap
  7. <center>
  8. <img src="docs/figs/roadmap_en.png" width=100% /></center>
  9. ## Features
  10. ### 1. Data Migration
  11. - **Cross-Cloud Migration Support**: Allows users to migrate data across multiple cloud storage providers.
  12. - **Policy-Driven Migration Engine**
  13. - Filtering Rules: Migrate files based on size, file extension, or directory path.
  14. - Scheduling Control: Define migration time windows.
  15. - Post-Migration Actions: Allow users to choose whether to retain or delete the original data after migration.
  16. - **Migration Efficiency Optimization**: Improve migration performance and reduce bandwidth costs by leveraging the JCS-pub.
  17. ### 2. Cross-Cloud Data Storage
  18. - **Unified Multi-Cloud View**: Store data across multiple cloud platforms while presenting a unified view to users, abstracting away multi-cloud complexity.
  19. - **Multi-Level Disaster Recovery**: Provides cloud-level resilience through various redundancy strategies, including erasure coding, replication, and hybrid redundancy (erasure coding + replication).
  20. - **Adaptive Redundancy Strategy**
  21. - Redundancy Scheme: Customize the number of replicas, EC data blocks, and coding algorithms.
  22. - Data Placement: Configure where each replica or EC block is stored.
  23. - **Cross-Cloud Command Set**: Supports custom commands such as upload, download, random read, cross-cloud scheduling, and cross-cloud repair.
  24. - **Multiple Data Access Methods**
  25. - REST API
  26. - Command-line interface
  27. - FUSE file system
  28. - **Access Efficiency Optimization**: Boost cross-cloud data access speed and reduce traffic costs via the JCS-pub.
  29. ### 3. Hybrid Local + Multi-Cloud Storage
  30. - **Unified Hybrid Storage View**: Store data across both **local file systems** and **multiple remote cloud platforms**, while presenting a unified view and hiding underlying complexities.
  31. - **Intelligent Data Collaboration Strategy**: Offers flexible policies for syncing data between local and remote storage.
  32. - Data Filtering: Dynamically select files for remote sync based on size, path, or extension.
  33. - Local Retention: Configure whether to keep local copies of remote data.
  34. - Bidirectional Sync: Independently configure local-to-cloud and cloud-to-local synchronization.
  35. - **Collaboration Efficiency Optimization**: Enhance sync performance and reduce traffic costs using the JCS-pub.
  36. - **Multiple Data Access Methods**
  37. - REST API
  38. - Command-line interface
  39. - FUSE file system
  40. ### 4. Ready-to-Use Deployment
  41. - Simply download and install the client — no additional components or configuration needed.
  42. ### 5. JCS-pub
  43. - **Unified Hybrid Storage View**: Supports unified access across **local file systems** and **multiple remote cloud platforms**, abstracting the complexity behind a consistent interface.
  44. - **Dual Operation Modes**:
  45. - Standalone Mode: Fully offline usage without any external service dependencies.
  46. - Infrastructure-Connected Mode: Operates with public infrastructure support(**enhances performance and reduces bandwidth cost**).
  47. - **Open Source Public Infrastructure**:
  48. - Users can self-deploy or connect to existing public infrastructure.
  49. - Free Public Infrastructure Access:
  50. - To obtain an account, password, and certificate, please send an Email to `song-jc@foxmail.com`. The application process is illustrated below.
  51. <center>
  52. <img src="docs/figs/application_en.png" width=70% /></center>
  53. ## Architecture Diagram
  54. <center>
  55. <img src="docs/figs/architecture_en.png" width=45% /></center>
  56. ### 1. Cloud Storage Space
  57. - Before using the JointCloud Storage client, users must prepare their own cloud storage space. This typically refers to object storage buckets or directories in file storage services. Both public cloud storage services and private cloud deployments are supported.
  58. - For tutorials on registering mainstream public cloud storage services, see: [Guide](docs/公有云注册及使用教程.md)
  59. ### 2. JCS-pub
  60. - The public infrastructure consists of multiple proxy nodes. These nodes collaborate with clients to perform cross-cloud operations such as data migration, upload, and download. They also optimize data transfer routes and support concurrent access, improving performance and reducing bandwidth costs. This design prevents the client from becoming a performance bottleneck.
  61. ### 3. JCS-client
  62. - The JointCloud Storage Client is deployed on the user's server and serves as both a data service gateway and metadata management node.
  63. - Users manage their cloud storage data through various interfaces provided by the client.
  64. - All metadata and cloud storage credentials remain on the client managed by the user. When a proxy node needs to access a user’s cloud storage, the client temporarily grants access permissions as needed.
  65. ## Installation
  66. ### 1. Third-Party Dependencies
  67. The following components must be installed manually:
  68. - `MySQL`: Version 8.0 or above. Please create a database and user account for the JCS client to use.
  69. ### 2. Docker Installation (Recommended)
  70. Currently, only the JCS client is available as a Docker image. The jcsctl tool must be used as a precompiled executable — [Download here]().
  71. Pull the Docker image
  72. ```bash
  73. docker pull jcs:latest
  74. ```
  75. If you already have your configuration and certificate files prepared, simply mount the directory containing these files into the container using the `-v` flag, and use the `-c` flag to specify the path to the configuration file (inside the container).
  76. If you haven’t prepared the configuration yet and want to generate it using the client’s `init` command, you should still mount a local directory using `-v` so that the generated configuration and certificate files can be saved to it.
  77. Here is an example command:
  78. ```bash
  79. # Assuming the config files are on the host at /etc/jcsconfs
  80. docker run -v /etc/jcsconfs:/opt/confs \
  81. jcs serve -c /opt/confs/config.json # Note: the config path is inside the container
  82. ```
  83. ### 3. Build from Source
  84. Before compiling, make sure you have the following installed:
  85. - `Go`: Version 1.23 or above
  86. - `mage`: A Makefile-like build tool. GitHub repository: [Mage](https://github.com/magefile/mage)
  87. After installing dependencies, clone this repository and run the following command in the project root:
  88. ```powershell
  89. mage bin
  90. ```
  91. After successful execution, a build directory will be created in the project root containing the compiled executables:
  92. - `jcs`: The main JCS client
  93. - `jcsctl`: Command-line tool for the client (add it to your PATH for convenience)
  94. - `coordinator`: Coordination node for the JCS system (can be ignored if using the public JCS infrastructure)
  95. - `hub`: Central node of the JCS system (can also be ignored if using the public infrastructure)
  96. ### 4. Install Precompiled Executables
  97. Download and extract the binaries that match your system environment — [Download here]().
  98. ## Usage Guide
  99. ### 1. Generate Configuration File
  100. Use the `init` command of the JCS client to start the configuration process and follow the prompts to fill in the required information.
  101. **Note**: If you plan to run the JCS client using the Docker image, all paths in the configuration should refer to locations inside the container.
  102. After completing the command, the following files will be generated:
  103. - `config.json`: The full configuration file for the client.
  104. - `ca_cert.pem`: Root certificate for the HTTP service.
  105. - `ca_key.pem`: Root private key for the HTTP service(**must be securely stored by the user**).
  106. - `server_cert.pem`, `server_key.pem`: Server certificate and key signed by the root key.
  107. - `client_cert.pem`, `client_key.pem`: Client certificate and key signed by the root key, used by jcsctl or third-party programs.
  108. All files except `ca_key.pem` will be used during client operation.
  109. The configuration file fields are explained as follows:
  110. ```json
  111. {
  112. "hubRPC": {
  113. "rootCA": "" // Path to root certificate for communication with the Hub service
  114. },
  115. "coordinatorRPC": {
  116. "address": "127.0.0.1:5009", // Address of the Coordinator service
  117. "rootCA": "" // Path to root certificate for communication with the Coordinator service, usually same as Hub's rootCA
  118. },
  119. "logger": {
  120. "output": "stdout", // Log output mode: stdout or file
  121. "outputFileName": "client", // Log file name (effective if output is file)
  122. "outputDirectory": "log", // Directory for log files (effective if output is file)
  123. "level": "debug" // Log level: debug, info, warn, error
  124. },
  125. "db": {
  126. "address": "127.0.0.1:3306", // MySQL database address
  127. "account": "root", // Database username
  128. "password": "123456", // Database password
  129. "databaseName": "cloudream" // Database name
  130. },
  131. "connectivity": {
  132. "testInterval": 300 // Interval in seconds to test connectivity with the Hub
  133. },
  134. "downloader": {
  135. "maxStripCacheCount": 100, // Maximum number of EC stripes cached during file reads
  136. "ecStripPrefetchCount": 1 // Number of EC stripes to prefetch during read operations
  137. },
  138. "downloadStrategy": {
  139. "highLatencyHub": 35 // Latency threshold in ms; Hubs above this latency are considered high latency
  140. },
  141. "tickTock": {
  142. "ecFileSizeThreshold": 5242880, // Minimum file size in bytes to apply EC encoding
  143. "accessStatHistoryWeight": 0.8 // Weight of previous day's access count when updating today's access statistics (range 0~1)
  144. },
  145. "http": {
  146. "enabled": true, // Enable HTTP service
  147. "listen": "127.0.0.1:7890", // HTTP service listen address
  148. "rootCA": "", // Path to root certificate
  149. "serverCert": "", // Path to server certificate
  150. "serverKey": "", // Path to server private key
  151. "clientCerts": [], // List of client certificates signed by the root CA
  152. "maxBodySize": 5242880 // Maximum HTTP request body size in bytes
  153. },
  154. "mount": {
  155. "enabled": false, // Enable FUSE mount
  156. "mountPoint": "", // Mount directory path
  157. "gid": 0, // GID for files and folders in the mount directory
  158. "uid": 0, // UID for files and folders in the mount directory
  159. "dataDir": "", // Cache data directory
  160. "metaDir": "", // Metadata directory
  161. "maxCacheSize": 0, // Max cache size in bytes (0 means unlimited; not enforced in real-time)
  162. "attrTimeout": "10s", // OS cache timeout for file attributes
  163. "uploadPendingTime": "30s", // Delay after file modification before upload starts
  164. "cacheActiveTime": "1m", // Time cached data remains active in memory before cleanup
  165. "cacheExpireTime": "1m", // Time cached files remain on disk after being removed from memory (effective only for fully uploaded files)
  166. "scanDataDirInterval": "10m" // Interval to scan the cache directory
  167. },
  168. "accessToken": {
  169. "account": "", // Account for logging into the public JCS system
  170. "password": "" // Password for the public JCS system
  171. }
  172. }
  173. ```
  174. **Note**: If manually editing this JSON file, remove all comments as JSON does not support comments.
  175. ### 2. Command Line
  176. `jcsctl` is the command-line tool for managing the JCS client. It requires certificates for authentication, generated during client initialization. The required files are `ca_cert.pem`, `client_cert.pem`, and `client_key.pem`.
  177. When starting, `jcsctl` searches for these files in the following order:
  178. - Paths specified via command-line options `--ca`, `--cert`, and `--key`.
  179. - The directory where the `jcsctl` executable resides.
  180. By default, `jcsctl` attempts to connect to the client at `https://127.0.0.1:7890`. Use the `--endpoint` option to specify a different client address.
  181. ### 3. API
  182. See the API documentation: [Access here](docs/JCS_pub_API_en.md)
  183. ## Testing & Evaluation
  184. Test framework under development. Stay tuned for updates or join us in contributing!
  185. ## Custom Redundancy Strategy
  186. Redundancy strategies significantly impact the read/write process and are closely integrated with various parts of the codebase. When implementing your own strategy, it's highly recommended to review and reference existing implementations to avoid omissions.
  187. The following explanation is organized to prioritizes clarity and understanding — it may not reflect the most optimal implementation flow. Please read through the entire section before deciding where to begin.
  188. ### 1. Redundancy Transformation
  189. When an object is first uploaded, it exists as a complete file in a single storage space, with a redundancy strategy of `None`. The transformation of its redundancy strategy is handled by the `ChangeRedundancy` task in the JCS client, which is triggered daily at midnight.
  190. The transformation generally involves two steps:
  191. - **Strategy Selection**: The system selects a redundancy strategy for the object based on predefined rules (e.g., object size). Note that transformation is not limited to non-redundant objects — an object can switch between different redundancy modes if needed.
  192. - **Execution**: Based on the chosen strategy and the current one, a transformation plan is generated and executed.If you need to customize the execution plan or instructions, refer to the later section on Custom Instructions.
  193. See the implementation in the `change_redundancy.go` and `redundancy_recover.go` files under `gitlink.org.cn/cloudream/jcs-pub/client/internal/ticktock`.
  194. ### 2. Redundancy Shrinking
  195. Redundancy shrinking uses a simulated annealing algorithm to optimize redundancy from three dimensions: **resilience**, **redundancy level**, and **access efficiency**.
  196. Currently, it supports redundancy shrinking for **replication** and **erasure coding** strategies.
  197. This feature is optional. If needed, refer to the `redundancy_shrink.go` file in `gitlink.org.cn/cloudream/jcs-pub/client/internal/ticktock`.
  198. ### 3. Object Download
  199. Any functionality that involves downloading objects (not just HTTP-based downloads) must be adapted to support the new redundancy strategies. The download process includes:
  200. - **Strategy Selection**
  201. - **Strategy Execution**
  202. The code for **strategy selection** is mainly located in `gitlink.org.cn/cloudream/jcs-pub/client/internal/downloader/strategy` package.
  203. The code for **strategy execution** is distributed across multiple parts of the project, including but not limited to `gitlink.org.cn/cloudream/jcs-pub/client/internal/downloader` package.
  204. We recommend using your IDE's search functionality to ensure full coverage when modifying related logic.
  205. ## Custom Instructions
  206. Most of the currently implemented instructions can be found in the `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch2` package and serve as good references during custom instruction development.
  207. ### 1. Writing Instruction Logic
  208. Each instruction must implement the following interface:
  209. ```go
  210. type Op interface {
  211. Execute(ctx *ExecContext, e *Executor) error
  212. String() string
  213. }
  214. ```
  215. - `String()`: Used for debugging, returns a description of the instruction.
  216. - `Execute(ctx *ExecContext, e *Executor)error`: The core execution logic.
  217. - `ctx`: Execution context, which includes a cancelable context(`Context`) and custom values(`Values`) that can be accessed using functions like `GetValueByType` and `SetValueByType` from the `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch/exec` package.
  218. - `e`: The executor, which provides access to a variable table through functions like `BindVar`, `PutVar`, or generic variants such as `BindVar` and `BindArray` from the `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch/exec` package.
  219. If `Execute` returns a non-nil error, the entire plan is considered failed and all ongoing instructions will be aborted.
  220. Typical implementation steps:
  221. - **Read Parameters**: Use `BindVar` or related functions to read required inputs. The `VarIDs` should be defined when the instruction is created.
  222. - **Process**: Implement your desired logic.
  223. - **Output Results**: Use `PutVar` or similar to write results back to the variable table for use by subsequent instructions.
  224. If you understand how the module works internally, you are free to implement any custom logic as needed.
  225. The data structure used with `PutVar`/`BindVar` must implement:
  226. ```go
  227. type VarValue interface {
  228. Clone() VarValue
  229. }
  230. ```
  231. **Note**: All data transferred between instructions is serialized/deserialized using JSON. Avoid using non-serializable fields in custom structs.
  232. If you want to pass a stream (`io.ReadCloser`) between instructions, use the built-in `StreamValue` type instead of defining your own, as streams require special handling.
  233. If an instruction produces a stream, consider using `WaitGroup` or similar mechanisms to ensure the stream has been fully consumed before `Execute` returns.
  234. Finally, register your new instruction and data types using `UseOp` and `UseVarValue` from the `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch/exec` package.
  235. ### 2. Writing DAG Nodes for Instructions
  236. The `ioswitch` module expresses execution plans as DAGs (Directed Acyclic Graphs), where each node corresponds to an instruction and edges represent data dependencies.
  237. To define a custom node, implement the following interface:
  238. ```go
  239. type Node interface {
  240. Graph() *Graph
  241. SetGraph(graph *Graph)
  242. Env() *NodeEnv
  243. InputStreams() *StreamInputSlots
  244. OutputStreams() *StreamOutputSlots
  245. InputValues() *ValueInputSlots
  246. OutputValues() *ValueOutputSlots
  247. GenerateOp() (exec.Op, error)
  248. }
  249. ```
  250. Key concepts:
  251. - Each node has four types of slots: input/output for streams and input/output for values.
  252. - Each input slot accepts data from one node; output slots can fan out to multiple nodes.
  253. - `Env()` specifies the environment where the instruction should run (e.g., Driver, Hub, Any). Most instructions can use Any.
  254. You may embed the `NodeBase` struct from the `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch/dag` package, which implements all functions except `GenerateOp`.
  255. ### 3. Using Custom Instructions
  256. The lifecycle of an execution plan includes the following stages: writing `FromTo` definitions, parsing `FromTo` into a DAG, optimizing the DAG, generating instructions from the DAG. You can extend any of the first three stages to integrate your custom instructions.
  257. The `FromTo` model is used to describe the structure of an execution plan:
  258. - The `From` component defines where the data originates
  259. - The `To` component defines where the data is intended to go
  260. The plan parser uses predefined rules to convert the FromTo description into a DAG (Directed Acyclic Graph), which represents data dependencies and logical flow between operations. This DAG will eventually be translated into a sequence of executable instructions.
  261. If necessary, you can define your own custom `From` and `To` types, as long as they implement the following interfaces:
  262. ```go
  263. type From interface {
  264. GetStreamIndex() StreamIndex
  265. }
  266. type To interface {
  267. // The range of the file stream required by this To node.
  268. // The specific meaning of this value depends on DataIndex:
  269. // If DataIndex == -1, it refers to the entire file range.
  270. // If DataIndex >= 0, it refers to a specific shard (chunk) of the file.
  271. GetRange() math2.Range
  272. GetStreamIndex() StreamIndex
  273. }
  274. ```
  275. You’ll also need to modify the DAG parsing logic in `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch2/parser/gen` package.
  276. **Note**: Ensure that your custom DAG nodes implement:
  277. ```go
  278. type FromNode interface {
  279. dag.Node
  280. GetFrom() ioswitch2.From
  281. Output() dag.StreamOutputSlot
  282. }
  283. type ToNode interface {
  284. dag.Node
  285. GetTo() ioswitch2.To
  286. Input() dag.StreamInputSlot
  287. }
  288. ```
  289. This ensures compatibility with DAG optimization algorithms that rely on these interfaces.
  290. In the optimization phase, you can add your own steps to match certain patterns and replace or merge nodes to improve efficiency or reduce instruction count.
  291. After implementing your optimizer, integrate it into the `Parse` function in `gitlink.org.cn/cloudream/jcs-pub/common/pkgs/ioswitch2/parser` package. Carefully consider the interaction between your logic and other optimization steps.
  292. ## License
  293. MulanPSL v2
  294. ## Development Team
  295. - **Planning & Architecture**: Han Bao
  296. - **Design & Implementation**: Xihua Gong, Jiancong Song, Zhishi Ren, Kaixin Zhang, Junhui Kan
  297. - **Advisors**: Yijie Wang
  298. - **Technical Contact**: Jiancong Song (Email: song-jc@foxmail.com)

本项目旨在将云际存储公共基础设施化,使个人及企业可低门槛使用高效的云际存储服务(安装开箱即用云际存储客户端即可,无需关注其他组件的部署),同时支持用户灵活便捷定制云际存储的功能细节。