2 Star 6 Fork 0

科大讯飞/Chinese-BERT-wwm

Create your Gitee Account
Explore and code with more than 12 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README
Apache-2.0

中文说明 | English

Chinese BERT with Whole Word Masking

For further accelerating Chinese natural language processing, we provide Chinese pre-trained BERT with Whole Word Masking. Meanwhile, we also compare the state-of-the-art Chinese pre-trained models in depth, including BERTERNIEBERT-wwm.

Pre-Training with Whole Word Masking for Chinese BERT
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu

This repository is developed based on:https://github.com/google-research/bert

You may also interested in,

More resources by HFL: https://github.com/ymcui/HFL-Anthology

News

2021/1/27 All models support TensorFlow 2 now. Please use transformers library to access them or download from https://huggingface.co/hfl

2020/9/15 Our paper "Revisiting Pre-Trained Models for Chinese Natural Language Processing" is accepted to Findings of EMNLP as a long paper.

2020/8/27 We are happy to announce that our model is on top of GLUE benchmark, check leaderboard.

2020/3/23 The models in this repository now can be easily accessed through PaddleHub, check Quick Load

2020/2/26 We release a knowledge distillation toolkit TextBrewer

2020/1/20 Happy Chinese New Year! We've released RBT3 and RBTL3 (3-layer RoBERTa-wwm-ext-base/large), check Small Models

Past News 2019/12/19 The models in this repository now can be easily accessed through [Huggingface-Transformers](https://github.com/huggingface/transformers), check [Quick Load](#Quick-Load)

2019/10/14 We release RoBERTa-wwm-ext-large, check Download

2019/9/10 We release RoBERTa-wwm-ext, check Download

2019/7/30 We release BERT-wwm-ext, which was trained on larger data, check Download

2019/6/20 Initial version, pre-trained models could be downloaded through Google Drive, check Download

Guide

Section Description
Introduction Introduction to BERT with Whole Word Masking (WWM)
Download Download links for Chinese BERT-wwm
Quick Load Learn how to quickly load our models through 🤗Transformers or PaddleHub
Model Comparison Compare the models published in this repository
Baselines Baseline results for several Chinese NLP datasets (partial)
Small Models 3-layer Transformer models
Useful Tips Provide several useful tips for using Chinese pre-trained models
English BERT-wwm Download English BERT-wwm (by Google)
FAQ Frequently Asked Questions
Citation Citation

Introduction

Whole Word Masking (wwm) is an upgraded version by BERT released on late May 2019.

The following introductions are copied from BERT repository.

In the original pre-processing code, we randomly select WordPiece tokens to mask. For example:

Input Text: the man jumped up , put his basket on phil ##am ##mon ' s head 

Original Masked Input: [MASK] man [MASK] up , put his [MASK] on phil [MASK] ##mon ' s head

The new technique is called Whole Word Masking. In this case, we always mask all of the the tokens corresponding to a word at once. The overall masking rate remains the same.

Whole Word Masked Input: the man [MASK] up , put his basket on [MASK] [MASK] [MASK] ' s head

The training is identical -- we still predict each masked WordPiece token independently. The improvement comes from the fact that the original prediction task was too 'easy' for words that had been split into multiple WordPieces.

Important Note: Terminology Masking does not ONLY represent replace a word into [MASK] token. It could also be in another form, such as keep original word or randomly replaced by another word.

In the Chinese language, it is straightforward to utilize whole word masking, as traditional text processing in Chinese should include Chinese Word Segmentation (CWS). In the original BERT-base, Chinese by Google, the segmentation is done by splitting the Chinese characters while neglecting the importance of CWS. In this repository, we utilize Language Technology Platform (LTP) by Harbin Institute of Technology for CWS, and adapt whole word masking in Chinese text.

Download

As all models are 'BERT-base' variants, we do not incidate 'base' in the following model names.

  • BERT-base:12-layer, 768-hidden, 12-heads, 110M parameters
Model Data Google Drive iFLYTEK Cloud
RBT6, Chinese Wikipedia+Extended data[1] - TensorFlow(pw:XNMA)
RBT4, Chinese Wikipedia+Extended data[1] - TensorFlow(pw:e8dN)
RBTL3, Chinese Wikipedia+Extended data[1] TensorFlow
PyTorch
TensorFlow(pw:vySW)
RBT3, Chinese Wikipedia+Extended data[1] TensorFlow
PyTorch
TensorFlow(pw:b9nx)
RoBERTa-wwm-ext-large, Chinese Wikipedia+Extended data[1] TensorFlow
PyTorch
TensorFlow(pw:u6gC)
RoBERTa-wwm-ext, Chinese Wikipedia+Extended data[1] TensorFlow
PyTorch
TensorFlow(pw:Xe1p)
BERT-wwm-ext, Chinese Wikipedia+Extended data[1] TensorFlow
PyTorch
TensorFlow(pw:4cMG)
BERT-wwm, Chinese Wikipedia TensorFlow
PyTorch
TensorFlow(pw:07Xj)
BERT-base, ChineseGoogle Wikipedia Google Cloud -
BERT-base, Multilingual CasedGoogle Wikipedia Google Cloud -
BERT-base, Multilingual UncasedGoogle Wikipedia Google Cloud -

PyTorch Version

If you need these models in PyTorch,

  1. Convert TensorFlow checkpoint into PyTorch, using 🤗Transformers

  2. Download from https://huggingface.co/hfl

Steps: select one of the model in the page above → click "list all files in model" at the end of the model page → download bin/json files from the pop-up window

Note

The whole zip package roughly takes ~400M. ZIP package includes the following files:

chinese_wwm_L-12_H-768_A-12.zip
    |- bert_model.ckpt      # Model Weights
    |- bert_model.meta      # Meta info
    |- bert_model.index     # Index info
    |- bert_config.json     # Config file
    |- vocab.txt            # Vocabulary

bert_config.json and vocab.txt are identical to the original BERT-base, Chinese by Google。

Quick Load

Huggingface-Transformers

With Huggingface-Transformers, the models above could be easily accessed and loaded through the following codes.

tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
model = BertModel.from_pretrained("MODEL_NAME")

Notice: Please use BertTokenizer and BertModel for loading these model. DO NOT use RobertaTokenizer/RobertaModel!

The actual model and its MODEL_NAME are listed below.

Original Model MODEL_NAME
RoBERTa-wwm-ext-large hfl/chinese-roberta-wwm-ext-large
RoBERTa-wwm-ext hfl/chinese-roberta-wwm-ext
BERT-wwm-ext hfl/chinese-bert-wwm-ext
BERT-wwm hfl/chinese-bert-wwm
RBT3 hfl/rbt3
RBTL3 hfl/rbtl3

PaddleHub

With PaddleHub, we can download and install the model with one line of code.

import paddlehub as hub
module = hub.Module(name=MODULE_NAME)

The actual model and its MODULE_NAME are listed below.

Original Model MODULE_NAME
RoBERTa-wwm-ext-large chinese-roberta-wwm-ext-large
RoBERTa-wwm-ext chinese-roberta-wwm-ext
BERT-wwm-ext chinese-bert-wwm-ext
BERT-wwm chinese-bert-wwm
RBT3 rbt3
RBTL3 rbtl3

Model Comparison

We list comparisons on the models that were released in this project. ~BERT means to inherit the attributes from original Google's BERT.

- BERTGoogle BERT-wwm BERT-wwm-ext RoBERTa-wwm-ext RoBERTa-wwm-ext-large
Masking WordPiece WWM[1] WWM WWM WWM
Type BERT-base BERT-base BERT-base BERT-base BERT-large
Data Source wiki wiki wiki+ext[2] wiki+ext wiki+ext
Training Tokens # 0.4B 0.4B 5.4B 5.4B 5.4B
Device TPU Pod v2 TPU v3 TPU v3 TPU v3 TPU Pod v3-32[3]
Training Steps ? 100KMAX128
+100KMAX512
1MMAX128
+400KMAX512
1MMAX512 2MMAX512
Batch Size ? 2,560 / 384 2,560 / 384 384 512
Optimizer AdamW LAMB LAMB AdamW AdamW
Vocabulary 21,128 ~BERT[4] vocab ~BERT vocab ~BERT vocab ~BERT vocab
Init Checkpoint Random Init ~BERT weight ~BERT weight ~BERT weight Random Init

Baselines

We experiment on several Chinese datasets, including sentence-level to document-level tasks.

We only list partial results here and kindly advise the readers to read our technical report.

Best Learning Rate:

Model BERT ERNIE BERT-wwm*
CMRC 2018 3e-5 8e-5 3e-5
DRCD 3e-5 8e-5 3e-5
CJRC 4e-5 8e-5 4e-5
XNLI 3e-5 5e-5 3e-5
ChnSentiCorp 2e-5 5e-5 2e-5
LCQMC 2e-5 3e-5 2e-5
BQ Corpus 3e-5 5e-5 3e-5
THUCNews 2e-5 5e-5 2e-5
  • represents all related models (BERT-wwm, BERT-wwm-ext, RoBERTa-wwm-ext, RoBERTa-wwm-ext-large)

Note: To ensure the stability of the results, we run 10 times for each experiment and report maximum and average scores.

Average scores are in brackets, and max performances are the numbers that out of brackets.

CMRC 2018

CMRC 2018 dataset is released by Joint Laboratory of HIT and iFLYTEK Research. The model should answer the questions based on the given passage, which is identical to SQuAD. Evaluation Metrics: EM / F1

Model Development Test Challenge
BERT 65.5 (64.4) / 84.5 (84.0) 70.0 (68.7) / 87.0 (86.3) 18.6 (17.0) / 43.3 (41.3)
ERNIE 65.4 (64.3) / 84.7 (84.2) 69.4 (68.2) / 86.6 (86.1) 19.6 (17.0) / 44.3 (42.8)
BERT-wwm 66.3 (65.0) / 85.6 (84.7) 70.5 (69.1) / 87.4 (86.7) 21.0 (19.3) / 47.0 (43.9)
BERT-wwm-ext 67.1 (65.6) / 85.7 (85.0) 71.4 (70.0) / 87.7 (87.0) 24.0 (20.0) / 47.3 (44.6)
RoBERTa-wwm-ext 67.4 (66.5) / 87.2 (86.5) 72.6 (71.4) / 89.4 (88.8) 26.2 (24.6) / 51.0 (49.1)
RoBERTa-wwm-ext-large 68.5 (67.6) / 88.4 (87.9) 74.2 (72.4) / 90.6 (90.0) 31.5 (30.1) / 60.1 (57.5)

DRCD

DRCD is also a span-extraction machine reading comprehension dataset, released by Delta Research Center. The text is written in Traditional Chinese. Evaluation Metrics: EM / F1

Model Development Test
BERT 83.1 (82.7) / 89.9 (89.6) 82.2 (81.6) / 89.2 (88.8)
ERNIE 73.2 (73.0) / 83.9 (83.8) 71.9 (71.4) / 82.5 (82.3)
BERT-wwm 84.3 (83.4) / 90.5 (90.2) 82.8 (81.8) / 89.7 (89.0)
BERT-wwm-ext 85.0 (84.5) / 91.2 (90.9) 83.6 (83.0) / 90.4 (89.9)
RoBERTa-wwm-ext 86.6 (85.9) / 92.5 (92.2) 85.6 (85.2) / 92.0 (91.7)
RoBERTa-wwm-ext-large 89.6 (89.1) / 94.8 (94.4) 89.6 (88.9) / 94.5 (94.1)

CJRC

CJRC is a Chinese judiciary reading comprehension dataset, released by Joint Laboratory of HIT and iFLYTEK Research. Note that, the data used in these experiments are NOT identical to the official one. Evaluation Metrics: EM / F1

Model Development Test
BERT 54.6 (54.0) / 75.4 (74.5) 55.1 (54.1) / 75.2 (74.3)
ERNIE 54.3 (53.9) / 75.3 (74.6) 55.0 (53.9) / 75.0 (73.9)
BERT-wwm 54.7 (54.0) / 75.2 (74.8) 55.1 (54.1) / 75.4 (74.4)
BERT-wwm-ext 55.6 (54.8) / 76.0 (75.3) 55.6 (54.9) / 75.8 (75.0)
RoBERTa-wwm-ext 58.7 (57.6) / 79.1 (78.3) 59.0 (57.8) / 79.0 (78.0)
RoBERTa-wwm-ext-large 62.1 (61.1) / 82.4 (81.6) 62.4 (61.4) / 82.2 (81.0)

XNLI

We use XNLI data for testing NLI task. Evaluation Metrics: Accuracy

Model Development Test
BERT 77.8 (77.4) 77.8 (77.5)
ERNIE 79.7 (79.4) 78.6 (78.2)
BERT-wwm 79.0 (78.4) 78.2 (78.0)
BERT-wwm-ext 79.4 (78.6) 78.7 (78.3)
RoBERTa-wwm-ext 80.0 (79.2) 78.8 (78.3)
RoBERTa-wwm-ext-large 82.1 (81.3) 81.2 (80.6)

ChnSentiCorp

We use ChnSentiCorp data for testing sentiment analysis. Evaluation Metrics: Accuracy

Model Development Test
BERT 94.7 (94.3) 95.0 (94.7)
ERNIE 95.4 (94.8) 95.4 (95.3)
BERT-wwm 95.1 (94.5) 95.4 (95.0)
BERT-wwm-ext 95.4 (94.6) 95.3 (94.7)
RoBERTa-wwm-ext 95.0 (94.6) 95.6 (94.8)
RoBERTa-wwm-ext-large 95.8 (94.9) 95.8 (94.9)

Sentence Pair Matching:LCQMC, BQ Corpus

LCQMC

Evaluation Metrics: Accuracy

Model Development Test
BERT 89.4 (88.4) 86.9 (86.4)
ERNIE 89.8 (89.6) 87.2 (87.0)
BERT-wwm 89.4 (89.2) 87.0 (86.8)
BERT-wwm-ext 89.6 (89.2) 87.1 (86.6)
RoBERTa-wwm-ext 89.0 (88.7) 86.4 (86.1)
RoBERTa-wwm-ext-large 90.4 (90.0) 87.0 (86.8)

BQ Corpus

Evaluation Metrics: Accuracy

Model Development Test
BERT 86.0 (85.5) 84.8 (84.6)
ERNIE 86.3 (85.5) 85.0 (84.6)
BERT-wwm 86.1 (85.6) 85.2 (84.9)
BERT-wwm-ext 86.4 (85.5) 85.3 (84.8)
RoBERTa-wwm-ext 86.0 (85.4) 85.0 (84.6)
RoBERTa-wwm-ext-large 86.3 (85.7) 85.8 (84.9)

THUCNews

Released by Tsinghua University, which contains news in 10 categories. Evaluation Metrics: Accuracy

Model Development Test
BERT 97.7 (97.4) 97.8 (97.6)
ERNIE 97.6 (97.3) 97.5 (97.3)
BERT-wwm 98.0 (97.6) 97.8 (97.6)
BERT-wwm-ext 97.7 (97.5) 97.7 (97.5)
RoBERTa-wwm-ext 98.3 (97.9) 97.7 (97.5)
RoBERTa-wwm-ext-large 98.3 (97.7) 97.8 (97.6)

Small Models

We list RBT3 and RBTL3 results on several NLP tasks. Note that, we only list test set results.

Model CMRC 2018 DRCD XNLI CSC LCQMC BQ Average Params
RoBERTa-wwm-ext-large 74.2 / 90.6 89.6 / 94.5 81.2 95.8 87.0 85.8 87.335 325M
RoBERTa-wwm-ext 72.6 / 89.4 85.6 / 92.0 78.8 95.6 86.4 85.0 85.675 102M
RBTL3 63.3 / 83.4 77.2 / 85.6 74.0 94.2 85.1 83.6 80.800 61M (59.8%)
RBT3 62.2 / 81.8 75.0 / 83.9 72.3 92.8 85.1 83.3 79.550 38M (37.3%)

Relative performance:

Model CMRC 2018 DRCD XNLI CSC LCQMC BQ Average AVG-C
RoBERTa-wwm-ext-large 102.2% / 101.3% 104.7% / 102.7% 103.0% 100.2% 100.7% 100.9% 101.9% 101.2%
RoBERTa-wwm-ext 100% / 100% 100% / 100% 100% 100% 100% 100% 100% 100%
RBTL3 87.2% / 93.3% 90.2% / 93.0% 93.9% 98.5% 98.5% 98.4% 94.3% 97.35%
RBT3 85.7% / 91.5% 87.6% / 91.2% 91.8% 97.1% 98.5% 98.0% 92.9% 96.35%
  • AVG-C: average score of classification tasks: XNLI, CSC, LCQMC, BQ
  • The numbers of parameter are calculated based on XNLI classification task.
  • Relative parameter percentage is calculated based on RoBERTa-wwm-ext model.
  • RBT3: We use RoBERTa-wwm-ext for initializing the first three layers, and continue to train 1M steps.
  • RBTL3: We use RoBERTa-wwm-ext-large for initializing the first three layers, and continue to train 1M steps.
  • The name of RBT is the syllables of 'RoBERTa', and 'L' stands for large model.
  • Directly using the first three layers of RoBERTa-wwm-ext-large to fine-tune the downstream task will result in a bad performance. For example, in CMRC 2018 task we could only achieve 42.9/65.3, while RBTL3 could reach 63.3/83.4.

Useful Tips

  • Initial learning rate is the most important hyper-parameters (regardless of BERT or other neural networks), and should ALWAYS be tuned for better performance.
  • As shown in the experimental results, BERT and BERT-wwm share almost the same best initial learning rate, so it is straightforward to apply your initial learning rate in BERT to BERT-wwm. However, we find that ERNIE does not share the same characteristics, so it is STRONGLY recommended to tune the learning rate.
  • As BERT and BERT-wwm were trained on Wikipedia data, they show relatively better performance on the formal text. While, ERNIE was trained on larger data, including web text, which will be useful on casual text, such as Weibo (microblogs).
  • In long-sequence tasks, such as machine reading comprehension and document classification, we suggest using BERT or BERT-wwm.
  • As these pre-trained models are trained in general domains, if the task data is extremely different from the pre-training data (Wikipedia for BERT/BERT-wwm), we suggest taking another pre-training steps on the task data, which was also suggested by Devlin et al. (2019).
  • As there are so many possibilities in pre-training stage (such as initial learning rate, global training steps, warm-up steps, etc.), our implementation may not be optimal using the same pre-training data. Readers are advised to train their own model if seeking for another boost in performance. However, if it is unable to do pre-training, choose one of these pre-trained models which was trained on a similar domain to the downstream task.
  • When dealing with Traditional Chinese text, use BERT or BERT-wwm.

English BERT-wwm

We also repost English BERT-wwm (by Google official) here for your perusal.

FAQ

Q: How to use this model?
A: Use it as if you are using original BERT. Note that, you don't need to do CWS for your text, as wwm only change the pre-training input but not the input for down-stream tasks.

Q: Do you have any plans to release the code?
A: Unfortunately, I am not be able to release the code at the moment. As implementation is quite easy, I would suggest you to read #10 and #13.

Q: How can I download XXXXX dataset?
A: We only provide the data that is publically available, check data directory. For copyright reasons, some of the datasets are not publically available. In that case, please search on GitHub or consult original authors for accessing.

Q: How to use this model?
A: Use it as if you are using original BERT. Note that, you don't need to do CWS for your text, as wwm only change the pre-training input but not the input for down-stream tasks.

Q: Do you have any plans on releasing the larger model? Say BERT-large-wwm?
A: If we could get significant gains from BERT-large, we will release a larger version in the future.

Q: You lier! I can not reproduce the result! 😂
A: We use the simplist models in the downstream tasks. For example, in the classification task, we directly use run_classifier.py by Google. If you are not able to reach the average score that we reported, then there should be some bugs in your code. As there is randomness in reaching maximum scores, there is no guarantee that you will reproduce them.

Q: I could get better performance than you!
A: Congratulations!

Q: How long did it take to train such a model?
A: The training was done on Google Cloud TPU v3 with 128HBM, and it roughly takes 1.5 days. Note that, in the pre-training stage, we use LAMB Optimizer which is optimized for the larger batch. In fine-tuning downstream task, we use normal AdamWeightDecayOptimizer as default.

Q: Who is ERNIE?
A: The ERNIE in this repository refer to the model released by Baidu, but not the one that published by Tsinghua University which was also called ERNIE.

Q: BERT-wwm does not perform well on some tasks.
A: The aim of this project is to provide researchers with a variety of pre-training models. You are free to choose one of these models. We only provide experimental results, and we strongly suggest trying these models in your own task. One more model, one more choice.

Q: Why not trying on more dataset?
A: To be honest: 1) no time to find more data; 2) no need; 3) no money;

Q: Say something about these models
A: Each has its own emphasis and merits. Development of Chinese NLP needs joint efforts.

Q: Any comments on the name of next generation of the pre-trained model?
A: Maybe ZOE: Zero-shOt Embeddings from language model

Q: Tell me a little bit more about RoBERTa-wwm-ext
A: integrate whole word masking (wwm) into RoBERTa model, specifically:

  1. use whole word masking (but we did not use dynamic masking)
  2. remove Next Sentence Prediction (NSP)
  3. directly use the data generated by max_len=512 (but not from max_len=128 for several steps then max_len=512)
  4. extended training steps (1M steps)

Citation

If you find the technical report or resource is useful, please cite the following technical report in your paper.

@inproceedings{cui-etal-2020-revisiting,
    title = "Revisiting Pre-Trained Models for {C}hinese Natural Language Processing",
    author = "Cui, Yiming  and
      Che, Wanxiang  and
      Liu, Ting  and
      Qin, Bing  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.58",
    pages = "657--668",
}
@article{chinese-bert-wwm,
  title={Pre-Training with Whole Word Masking for Chinese BERT},
  author={Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Yang, Ziqing and Wang, Shijin and Hu, Guoping},
  journal={arXiv preprint arXiv:1906.08101},
  year={2019}
 }

Disclaimer

This is NOT a project by Google official. Also, this is NOT an official product by HIT and iFLYTEK. The experiments only represent the empirical results in certain conditions and should not be regarded as the nature of the respective models. The results may vary using different random seeds, computing devices, etc. The contents in this repository are for academic research purpose, and we do not provide any conclusive remarks. Users are free to use anythings in this repository within the scope of Apache-2.0 licence. However, we are not responsible for direct or indirect losses that was caused by using the content in this project.

Acknowledgement

The first author of this project is partially supported by Google TensorFlow Research Cloud (TFRC) Program.

Issues

If there is any problem, please submit a GitHub Issue.

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型) expand collapse
Python
Apache-2.0
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/iflytek/Chinese-BERT-wwm.git
git@gitee.com:iflytek/Chinese-BERT-wwm.git
iflytek
Chinese-BERT-wwm
Chinese-BERT-wwm
master

Search