398 Star 2.3K Fork 849

GVPWeBank/DataSphereStudio

Create your Gitee Account
Explore and code with more than 12 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README
Apache-2.0

DSS

License

English | 中文

Introduction

       DataSphere Studio (DSS for short) is WeDataSphere, a one-stop data application development management portal developed by WeBank.

       With the pluggable integrated framework design and the Linkis, a computing middleware, DSS can easily integrate various upper-layer data application systems, making data development simple and easy to use.

       DataSphere Studio is positioned as a data application development portal, and the closed loop covers the entire process of data application development. With a unified UI, the workflow-like graphical drag-and-drop development experience meets the entire lifecycle of data application development from data import, desensitization cleaning, data analysis, data mining, quality inspection, visualization, scheduling to data output applications, etc.

       With the connection, reusability, and simplification capabilities of Linkis, DSS is born with financial-grade capabilities of high concurrency, high availability, multi-tenant isolation, and resource management.

UI preview

       Please be patient, it will take some time to load gif.

DSS-V1.0 GIF

Core features

1. One-stop, full-process application development management UI

       DSS is highly integrated. Currently integrated components include(DSS version compatibility for the above components, please visit: Compatibility list of integrated components):

       1. Data Development IDE Tool - Scriptis

       2. Data Visualization Tool - Visualis (Based on the open source project Davinci contributed by CreditEase)

       3. Data Quality Management Tool - Qualitis

       4. Workflow scheduling tool - Schedulis

       5. Data Exchange Tool - Exchangis

       6. Data Api Service - DataApiService

       7. Streaming Application Development Management Tool - Streamis

       8. One-stop machine Learning Platform - Prophecis

       9. Workflow Task Scheduling Tool - DolphinScheduler (In Code Merging)

       10. Help documentation and beginner's guide - UserGuide (In Code Merging)

       11. Data Model Center - DataModelCenter (In development)

       DSS version compatibility for the above components, please visit: Compatibility list of integrated components.

       With a pluggable framework architecture, DSS is designed to allow users to quickly integrate new data application tools, or replace various tools that DSS has integrated. For example, replace Scriptis with Zeppelin, and replace Schedulis with DolphinScheduler...

DSS one-stop video

2. AppConn, based on Linkis,defines a unique design concept

       AppConn is the core concept that enables DSS to easily and quickly integrate various upper-layer web systems.

       AppConn, an application connector, defines a set of unified front-end and back-end three-level integration protocols, allowing external data application systems to easily and quickly becoming a part of DSS data application development.

       The three-level specifications of AppConn are: the first-level SSO specification, the second-level organizational structure specification, and the third-level development process specification.

       DSS arranges multiple AppConns in series to form a workflow that supports real-time execution and scheduled execution. Users can complete the entire process development of data applications with simple drag and drop operations.

       Since AppConn is integrated with Linkis, the external data application system shares the capabilities of resource management, concurrent limiting, and high performance. AppConn also allows sharable context across system level and thus makes external data application completely gets away from application silos.

3. Workspace, as the management unit

       With Workspace as the management unit, it organizes and manages business applications of various data application systems, defines a set of common standards for collaborative development of workspaces across data application systems, and provides user role management capabilities.

4. Integrated data application components

       DSS has integrated a variety of upper-layer data application systems by implementing multiple AppConns, which can basically meet the data development needs of users.

       If desired, new data application systems can also be easily integrated to replace or enrich DSS's data application development process. Click me to learn how to quickly integrate new application systems

Component Description DSS0.X compatible version (DSS0.9.1 recommended) DSS1.0 compatible version (DSS1.1.0 recommended)
Linkis Computing middleware Apache Linkis, by providing standard interfaces such as REST/WebSocket/JDBC/SDK, upper-layer applications can easily connect and access underlying engines such as MySQL/Spark/Hive/Presto/Flink. Linkis0.11.0 is recommended (*Released *) >= Linkis1.1.1 (released)
DataApiService (DSS has built-in third-party application tools) data API service. The SQL script can be quickly published as a Restful interface, providing Rest access capability to the outside world. Not supported DSS1.1.0 recommended (released)
Scriptis (DSS has built-in third-party application tools) support online writing of SQL, Pyspark, HiveQL and other scripts, and submit to [Linkis](https ://github.com/WeBankFinTech/Linkis) data analysis web tool. Recommended DSS0.9.1 (Released) Recommended DSS1.1.0 (Released)
Schedulis Workflow task scheduling system based on Azkaban secondary development, with financial-grade features such as high performance, high availability and multi-tenant resource isolation. Recommended Schedulis0.6.1 (released) >= Schedulis0.7.0 (Released)
EventCheck (a third-party application tool built into DSS) provides signal communication capabilities across business, engineering, and workflow. Recommended DSS0.9.1 (Released) Recommended DSS1.1.0 (Released)
SendEmail (DSS has built-in third-party application tools) provides the ability to send data, all the result sets of other workflow nodes can be sent by email DSS0.9.1 is recommended (released) Recommended DSS1.1.0 (Released)
Qualitis Data quality verification tool, providing data verification capabilities such as data integrity and correctness Qualitis0.8.0 is recommended (**Released **) >= Qualitis0.9.2 (Released)
Streamis Streaming application development management tool. It supports the release of Flink Jar and Flink SQL, and provides the development, debugging and production management capabilities of streaming applications, such as: start-stop, status monitoring, checkpoint, etc. Not supported >= Streamis0.2.0 (Released)
Prophecis A one-stop machine learning platform that integrates multiple open source machine learning frameworks. Prophecis' MLFlow can be connected to DSS workflow through AppConn. Not supported >= Prophecis 0.3.2 (Released)
Exchangis A data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources, the upcoming Exchangis1. 0, will work with DSS workflow not supported = Exchangis1.0.0 (Released)
Visualis A data visualization BI tool based on the secondary development of Davinci, an open source project of CreditEase, provides users with financial-level data visualization capabilities in terms of data security. Recommended Visualis0.5.0 = Visualis1.0.0 (Released)
DolphinScheduler Apache DolphinScheduler, a distributed and easily scalable visual workflow task scheduling platform, supports one-click publishing of DSS workflows to DolphinScheduler. Not supported DolphinScheduler1.3.X (Released)
UserGuide (DSS will be built-in third-party application tools) contains help documents, beginner's guide, Dark mode skinning, etc. Not supported >= DSS1.1.0 (Released)
DataModelCenter (the third-party application tool that DSS will build) mainly provides data warehouse planning, data model development and data asset management capabilities. Data warehouse planning includes subject domains, data warehouse hierarchies, modifiers, etc.; data model development includes indicators, dimensions, metrics, wizard-based table building, etc.; data assets are connected to Apache Atlas to provide data lineage capabilities . Not supported Planned in DSS1.2.0 (under development)
UserManager (DSS has built-in third-party application tools) automatically initialize all user environments necessary for a new DSS user, including: creating Linux users, various user paths, directory authorization, etc. Recommended DSS0.9.1 (Released) Planning
Airflow Supports publishing DSS workflows to Apache Airflow for scheduled scheduling. PR not yet merged Not supported

Demo Trial environment

       The function of DataSphere Studio supporting script execution has high security risks, and the isolation of the WeDataSphere Demo environment has not been completed. Considering that many users are inquiring about the Demo environment, we decided to first issue invitation codes to the community and accept trial applications from enterprises and organizations.

       If you want to try out the Demo environment, please join the DataSphere Studio community user group (Please refer to the end of the document), and contact WeDataSphere Group Robot to get an invitation code.

       DataSphereStudio Demo environment login page: click me to enter

Download

       Please go to the DSS Releases Page to download a compiled version or a source code package of DSS.

Compile and deploy

       Please follow Compile Guide to compile DSS from source code.

       Please refer to Deployment Documents to do the deployment.

Examples and Guidance

       You can find examples and guidance for how to use DSS in User Manual.

Documents

       For a complete list of documents for DSS1.0, see DSS-Doc

       The following is the installation guide for DSS-related AppConn plugins:

Architecture

DSS Architecture

Usage Scenarios

      DataSphere Studio is suitable for the following scenarios:

      1. Scenarios in which big data platform capability is being prepared or initialized but no data application tools are available.

      2. Scenarios in which users already have big data foundation platform capabilities but with only a few data application tools.

      3. Scenarios in which users have the ability of big data foundation platform and comprehensive data application tools, but suffers strong isolation and and high learning costs because those tools have not been integrated together.

      4. Scenarios in which users have the capabilities of big data foundation platform and comprehensive data application tools. but lacks unified and standardized specifications, while a part of these tools have been integrated.

Contributing

       Contributions are always welcomed, we need more contributors to build DSS together. either code, or doc, or other supports that could help the community.

       For code and documentation contributions, please follow the contribution guide.

Communication

       For any questions or suggestions, please kindly submit an issue.

       You can scan the QR code below to join our WeChat to get more immediate response.

communication

Who is using DSS

       We opened an issue for users to feedback and record who is using DSS.

       Since the first release of DSS in 2019, it has accumulated more than 700 trial companies and 1000+ sandbox trial users, which involving diverse industries, from finance, banking, tele-communication, to manufactory, internet companies and so on.

License

       DSS is under the Apache 2.0 license. See the License file for details.

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

将满足从数据交换、脱敏清洗、分析挖掘、质量检测、可视化展现、定时调度到数据输出等数据应用开发全流程场景需求。欢迎申请体验demo环境:https://sandbox.webank.com/wds/dss/#/register expand collapse
Java and 6 more languages
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Java
1
https://gitee.com/WeBank/DataSphereStudio.git
git@gitee.com:WeBank/DataSphereStudio.git
WeBank
DataSphereStudio
DataSphereStudio
master

Search