如何成為杰出開(kāi)源項(xiàng)目的貢獻(xiàn)者（源碼篇）

更新時(shí)間:2020年09月07日14時(shí)18分來(lái)源:傳智播客瀏覽次數(shù):

概述

對(duì)于程序員來(lái)講，成為杰出開(kāi)源項(xiàng)目的貢獻(xiàn)者是一件有意義的事，當(dāng)然，這也絕非易事。如果你正從事人工智能有關(guān)的工作，那么你一定了解諸如Google Tensorflow，F(xiàn)acebook Pytorch這樣的開(kāi)源項(xiàng)目。下面我們就說(shuō)一說(shuō)如何成為這些杰出的開(kāi)源項(xiàng)目的Contributor。

準(zhǔn)備

1.首先你必須成為github的使用者，并已經(jīng)熟悉了github上托管代碼的基本邏輯。

2.對(duì)于杰出的開(kāi)源項(xiàng)目，一般需要你去簽署一份Contributor License Agreement(簡(jiǎn)稱CLA)，例如Tensorflow項(xiàng)目，個(gè)人簽署TF individual CLA，公司簽署TF corporate CLA，Pytorch中的部分項(xiàng)目則需要簽署Facebook CLA，這樣你的代碼才允許被接收。

3.讓你編寫的代碼風(fēng)格更規(guī)范，一般的開(kāi)源項(xiàng)目都要求為Google Python Style，即使是Pytorch都是遵循該規(guī)范，更不要說(shuō)Google自家的Tensorflow了。

4.你貢獻(xiàn)的代碼往往由類或者函數(shù)構(gòu)成(文檔貢獻(xiàn)除外)，因此你需要單元測(cè)試程序，它和代碼注釋一樣，是代碼共享過(guò)程中必不可少的一部分。沒(méi)有它，即使你的代碼正確無(wú)誤也不會(huì)被merge，最終還是會(huì)被要求提供單元測(cè)試腳本。

5.很多開(kāi)源項(xiàng)目要求你的每個(gè)py腳本都要以許可證書開(kāi)頭，比如Tensorflow，這是它的python許可證書示例: Python license example，當(dāng)然，這很簡(jiǎn)單。

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================

工具

接下來(lái)我們將介紹相關(guān)工具的使用，它能夠有效的幫助我們來(lái)完成貢獻(xiàn)前的準(zhǔn)備工作，比如：代碼規(guī)范和單元測(cè)試等。

代碼規(guī)范工具

為了滿足代碼滿足Google Style的要求，我們首先需要一個(gè)代碼規(guī)范檢測(cè)工具，這里我們使用官方推薦的pylint。

安裝:

pip install pylint

使用:

# 使用pylint檢測(cè)腳本代碼，默認(rèn)將按照PEP8標(biāo)準(zhǔn)
# 這里我們需要指定配置文件，即按照Google Style標(biāo)準(zhǔn)
# myfile.py代表你寫好的python腳本文件
pylint --rcfile=pylintrc myfile.py

pylintrc內(nèi)容請(qǐng)參照: pylintrc

又因?yàn)槲覀兂跏紝懙拇a往往隨意性過(guò)強(qiáng)，可能直接用pylint需要修改的地方太多，可能對(duì)你幼小的心靈造成重創(chuàng)，因此，這里也帶來(lái)很多開(kāi)源項(xiàng)目推薦的另外一款工具：black，它能夠直接幫你修改代碼中出現(xiàn)的基本問(wèn)題(仍然存在很多問(wèn)題無(wú)法被判定，需要使用pylint檢測(cè))。

安裝:

pip install black

使用:

# 這里的-l代表代碼的每行最大長(zhǎng)度
# 默認(rèn)是88，但是Google Style要求為80
# 因此這里指定為80
black myfile.py -l 80

代碼樣式示例:

def my_op(tensor_in, other_tensor_in, my_param, other_param=0.5,
          output_collections=(), name=None):
  """My operation that adds two tensors with given coefficients.

  Args:
    tensor_in: `Tensor`, input tensor.
    other_tensor_in: `Tensor`, same shape as `tensor_in`, other input tensor.
    my_param: `float`, coefficient for `tensor_in`.
    other_param: `float`, coefficient for `other_tensor_in`.
    output_collections: `tuple` of `string`s, name of the collection to
                        collect result of this op.
    name: `string`, name of the operation.

  Returns:
    `Tensor` of same shape as `tensor_in`, sum of input values with coefficients.

  Example:
    >>> my_op([1., 2.], [3., 4.], my_param=0.5, other_param=0.6,
              output_collections=['MY_OPS'], name='add_t1t2')
    [2.3, 3.4]
  """
  with tf.name_scope(name or "my_op"):
    tensor_in = tf.convert_to_tensor(tensor_in)
    other_tensor_in = tf.convert_to_tensor(other_tensor_in)
    result = my_param * tensor_in + other_param * other_tensor_in
    tf.add_to_collection(output_collections, result)
    return result
output = my_op(t1, t2, my_param=0.5, other_param=0.6,
               output_collections=['MY_OPS'], name='add_t1t2')
               

單元測(cè)試工具

·單元測(cè)試對(duì)于團(tuán)隊(duì)開(kāi)發(fā)十分重要，是檢驗(yàn)代碼質(zhì)量的重要依據(jù)，因此你的每一份完整的代碼都要配備單元測(cè)試腳本。這里我們使用python主流的單元測(cè)試工具unittest。

· 安裝:

pip install unittest

使用: 這里只去演示核心的使用方法，更具體的內(nèi)容請(qǐng)參照unittest文檔

# 導(dǎo)入unittest工具包
import unittest

# 我們首先要建立一個(gè)測(cè)試類，它將包含你所有需要進(jìn)行測(cè)試的函數(shù)
# 這個(gè)類不使用__init__(self)，但可以使用setUp(self)來(lái)定義公有部分
# 它需要繼承unittest.TestCase, 類名往往也建議以Test開(kāi)頭
class TestStringMethods(unittest.TestCase):
    # 類的里面依次是你需要進(jìn)行測(cè)試的函數(shù)
    # 這些函數(shù)建議以test_開(kāi)頭
    # 這些函數(shù)一般情況不設(shè)置參數(shù)，而是直接在函數(shù)中具體化需要的參數(shù)
    # 當(dāng)然你也可以設(shè)置原始的參數(shù)，然后在外部具體化參數(shù)并調(diào)用該函數(shù)
    # 在測(cè)試函數(shù)中必須存在assert...來(lái)斷定測(cè)試結(jié)果
    # 常用的assert...包括: assertEqual, assertTrue, assertFalse,
    # assertRaises, assertIn, assertNotIn, assertIs, assertIsNot...
    def test_upper(self,):
        # 使用assertEqual判斷兩個(gè)字符串是否相等
        self.assertEqual(
            "foo".upper(), "FOO",
        )

    def test_isupper(self,):
        # 使用assertTrue/False斷定條件為真/假
        self.assertTrue("FOO".isupper())
        self.assertFalse("Foo".isupper())

    def test_split(self,):
        # 設(shè)定任意輸入
        s = "hello world"
        # 使用assertIn斷定列表包含關(guān)系
        self.assertIn(
            s.split(), [["hello", "world"]],
        )
        # 注意：這里with self.assertRaises來(lái)斷定異常
        with self.assertRaises(TypeError):
            s.split("asd")


# 這里是主函數(shù)，如果使用python運(yùn)行該腳本測(cè)試，則必須存在
# 如果使用pytest(后面會(huì)介紹)，則可以省略
if __name__ == "__main__":
    # 使用unittest.main運(yùn)行所有繼承unittest.TestCase的類
    unittest.main()

裝飾器的使用: unittest最常使用方法之一就是類/函數(shù)的裝飾器。

# 對(duì)于一些特殊需要強(qiáng)制跳過(guò)的測(cè)試的類/函數(shù)使用下方裝飾器，但你必須說(shuō)明原因
# @unittest.skip("長(zhǎng)得太帥，不需要測(cè)試，給我跳過(guò)！")

# 如果條件為真，則該測(cè)試被強(qiáng)制跳過(guò)。比如：檢測(cè)GPU是否可用
# @unittest.skipIf(TEST_CUDA, "CUDA available")

# 除非條件為真，否則該測(cè)試被強(qiáng)制跳過(guò)。比如: 檢測(cè)某些依賴包是否安裝
# @unittest.skipUnless(has_unittest, "unittest dependencies are not installed")

# 函數(shù)異常測(cè)試的表達(dá)方式，函數(shù)出現(xiàn)異常則測(cè)試通過(guò)，比之前說(shuō)的內(nèi)部異常粒度更大
# @unittest.expectedFailure

import torch
try:
    import unittest
except ImportError:
    has_unittest = False
else:
    has_unittest = True

if torch.cuda.is_available():
    TEST_CUDA = True
else:
    TEST_CUDA = False

# 條件為真，不跳過(guò)
@unittest.skipUnless(has_unittest, "unittest dependencies are not installed")
# 條件為真，跳過(guò)；條件為假，不跳過(guò)
@unittest.skipIf(TEST_CUDA, "CUDA available")
class TestStringMethods(unittest.TestCase):
    def test_upper(self,):
        self.assertEqual(
            "foo".upper(), "FOO",
        )
    @unittest.skip("長(zhǎng)得太帥，不需要測(cè)試，給我跳過(guò)！")
    def test_isupper(self,):
        self.assertTrue("FOO".isupper())
        self.assertFalse("Foo".isupper())

    @unittest.expectedFailure
    def test_split(self,):
        s = "hello world"
        self.assertIn(
            s.split(), [["hello", "world"]],
        )
        # 這里預(yù)計(jì)拋出異常，但實(shí)際沒(méi)有異常，本質(zhì)上這也算一種異常
        # 可以使用@unittest.expectedFailure
        with self.assertRaises(TypeError):
            s.split("ZMZ")


if __name__ == "__main__":
    unittest.main()

運(yùn)行你的測(cè)試腳本:

# 建議使用pytest執(zhí)行測(cè)試腳本，你的python中往往自帶這個(gè)工具包
# 這時(shí)你不必寫下主函數(shù)，并且他的輸出形式更美觀
pytest test_myfile.py

輸出效果:

======================== test session starts =========================
platform linux -- Python 3.7.3, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
rootdir: /root
plugins: remotedata-0.3.1, celery-4.3.0, doctestplus-0.3.0, arraydiff-0.3, openfiles-0.3.2
collected 3 items

test_myfile.py sx.                                             [100%]

=========== 1 passed, 1 skipped, 1 xfailed in 0.34 seconds ===========

真實(shí)單元測(cè)試腳本請(qǐng)參考Pytorch Tests和Tensorflow Tests

過(guò)程

在準(zhǔn)備成為貢獻(xiàn)者之前，要確保你已經(jīng)能夠熟練使用該項(xiàng)目。進(jìn)而明確你要貢獻(xiàn)源碼的類型，是Fix Bug還是Implement New Feature(實(shí)現(xiàn)新特性)。當(dāng)然，對(duì)一個(gè)新手貢獻(xiàn)者來(lái)講，F(xiàn)ix Bug是你的不二選擇。除非你已經(jīng)通過(guò)自己的實(shí)踐，明確了要做貢獻(xiàn)的具體內(nèi)容，否則，建議你需要遵循以下步驟:

第一步：

從開(kāi)源項(xiàng)目的Github Issues中尋找open的問(wèn)題，這里是Tensorflow Issues, Pytorch Issues，仔細(xì)閱讀大家提出的問(wèn)題，這將幫你在尋找問(wèn)題上節(jié)約大量時(shí)間，同時(shí)你可以在討論區(qū)看到有關(guān)技術(shù)的討論或已經(jīng)提交的PR，進(jìn)一步明確自己是否應(yīng)該參與該問(wèn)題的解決。(有很多開(kāi)源項(xiàng)目的issue會(huì)帶有"contributions welcome"的標(biāo)簽，可以優(yōu)先看一看。)

第二步：

當(dāng)你明確了自己要解決的問(wèn)題，在正式寫代碼之前，你需要fork這個(gè)開(kāi)源項(xiàng)目到你自己的Github倉(cāng)庫(kù)，然后再將該倉(cāng)庫(kù)clone到自己指定的服務(wù)器上，這樣最后你才可以提交PR。

# 例如:
git clone https://github.com/AITutorials/tensorflow.git

到這里你可以通過(guò)git remote -v發(fā)現(xiàn)我們只與自己遠(yuǎn)程倉(cāng)庫(kù)進(jìn)行了連接(origin/master)。

此時(shí)我們還需要與開(kāi)源項(xiàng)目的遠(yuǎn)程倉(cāng)庫(kù)建立連接(upstream/master)

# 以tensorflow為例建立連接
git remote add upstream https://github.com/tensorflow/tensorflow.git

# 查看到upstream
git remote -v

然后你就需要建立一個(gè)自己的分支，當(dāng)然，你可以先查看一下遠(yuǎn)程的分支情況

# 查看遠(yuǎn)程分支
git branch -a

# 創(chuàng)建自己的遠(yuǎn)程分支cnsync
git checkout -b cnsync

第三步：

通過(guò)第二步你已經(jīng)拿到了項(xiàng)目的源碼并創(chuàng)建了自己分支，這時(shí)就要開(kāi)始你的表演，coding + review，你之前準(zhǔn)備的代碼規(guī)范工具和單元測(cè)試工具將派上用場(chǎng)。

第四步：

提交代碼你的代碼并在github中創(chuàng)建一個(gè)PR。

# 把內(nèi)容添加到暫存區(qū)
git add .

# 提交更改的內(nèi)容
git commit -m "添加你的改變說(shuō)明"

# push到自己的遠(yuǎn)程倉(cāng)庫(kù)
git push origin cnsync

注意：這里雖然你只push到了自己的遠(yuǎn)程倉(cāng)庫(kù)，但其實(shí)你的遠(yuǎn)程倉(cāng)庫(kù)和源項(xiàng)目的倉(cāng)庫(kù)是連接的。也就是說(shuō)，此時(shí)你可以通過(guò)操作自己的遠(yuǎn)程倉(cāng)庫(kù)決定是否將創(chuàng)建一個(gè)源項(xiàng)目的PR(這些過(guò)程可以在你剛剛fork的項(xiàng)目頁(yè)面中實(shí)現(xiàn)，包括填寫PR的title和comment，有時(shí)你也需要在title中添加一個(gè)標(biāo)記，如[Draft]/[WIP]/[RFR]等等)。

第五步:

耐心的等待，如果你是PR是一個(gè)Ready For Review的狀態(tài)，它將很快進(jìn)入自動(dòng)化測(cè)試的流程以及評(píng)委會(huì)的介入，不久后你將收到一些反饋，你的代碼方案可能被采納，可能需要更多的修改或測(cè)試。

結(jié)語(yǔ)

最終，經(jīng)過(guò)不斷地磨練，你將成為一名杰出開(kāi)源項(xiàng)目的貢獻(xiàn)者，所以，加油吧少年!

猜你喜歡：

語(yǔ)言模型-BERT：bert算法介紹

人工智能算法如何學(xué)習(xí)數(shù)據(jù)中的規(guī)律?

求TopN熱搜關(guān)鍵詞

上一篇：語(yǔ)言模型-BERT：bert算法介紹 下一篇：如何解決分類中解決類別不平衡問(wèn)題？