Project 3 时间序列预测 答疑区



  • project 3,有问题在这里扔出来哈



  • @haibo 第5节课预习安装 xgboost,步骤2不是很懂
    1 我是按照官网操作的,mac 下安装xgboost 安装gcc-6
    brew install gcc --without-multilib 报错,然后我直接安装brew install gcc成功了,这两个gcc不同有关系吗
    2 在python 环境下面,python->import xgboost是可以的,但是使用conda list却找不到我安装的xgboost,这是什么原因
    我的python如下
    Python 2.7.13 |Anaconda 4.3.0 (x86_64)| (default, Dec 20 2016, 23:05:08)
    [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
    Type “help”, “copyright”, “credits” or “license” for more information.
    Anaconda is brought to you by Continuum Analytics.



  • 安装xgboost(mac)出现如下错误:
    Command “python setup.py egg_info” failed with error code 1 in /private/var/folders/_r/shl728yj7bv5f0bnqsz4bkzr0000gn/T/pip-build-s_nglqhz/xgboost/



  • 0_1487987147367_upload-c191bc6e-3da7-4afc-a605-f773bb534b83



  • pip install bayesian-optimization报错:
    pip install bayesian-optimization
    Collecting bayesian-optimization
    Using cached bayesian-optimization-0.4.0.tar.gz
    Requirement already satisfied: numpy>=1.9.0 in d:\program files\python35\lib\site-packages (from bayesian-optimization)
    Collecting scipy>=0.14.0 (from bayesian-optimization)
    Using cached scipy-0.18.1.tar.gz
    Collecting scikit-learn>=0.18.0 (from bayesian-optimization)
    Using cached scikit_learn-0.18.1-cp35-cp35m-win32.whl
    Installing collected packages: scipy, scikit-learn, bayesian-optimization
    Running setup.py install for scipy … error
    Complete output from command “d:\program files\python35\python.exe” -u -c “import setuptools, tokenize;file=‘C:\Users\INTERN~1\AppData\Local\Temp\pip-build-ibldjt4q\scipy\setup.py’;f
    =getattr(tokenize, ‘open’, open)(file);code=f.read().replace(’\r\n’, ‘\n’);f.close();exec(compile(code, file, ‘exec’))” install --record C:\Users\INTERN~1\AppData\Local\Temp\pip-siaet0cm-rec
    ord\install-record.txt --single-version-externally-managed --compile:

    Note: if you need reliable uninstall behavior, then install
    with pip instead of using `setup.py install`:
    
      - `pip install .`       (from a git repo or downloaded source
                               release)
      - `pip install scipy`   (last SciPy release on PyPI)
    
    
    lapack_opt_info:
    lapack_mkl_info:
      libraries mkl_rt not found in ['d:\\program files\\python35\\lib', 'C:\\', 'd:\\program files\\python35\\libs']
      NOT AVAILABLE
    
    openblas_lapack_info:
      libraries openblas not found in ['d:\\program files\\python35\\lib', 'C:\\', 'd:\\program files\\python35\\libs']
      NOT AVAILABLE
    
    atlas_3_10_threads_info:
    Setting PTATLAS=ATLAS
    d:\program files\python35\lib\site-packages\numpy\distutils\system_info.py:1051: UserWarning: Specified path C:\projects\numpy-wheels\windows-wheel-builder\atlas-builds\atlas-3.10.1-sse2-32\lib
    

    is invalid.
    pre_dirs = system_info.get_paths(self, section, key)
    <class ‘numpy.distutils.system_info.atlas_3_10_threads_info’>
    NOT AVAILABLE

    atlas_3_10_info:
    <class 'numpy.distutils.system_info.atlas_3_10_info'>
      NOT AVAILABLE
    
    atlas_threads_info:
    Setting PTATLAS=ATLAS
    <class 'numpy.distutils.system_info.atlas_threads_info'>
      NOT AVAILABLE
    
    atlas_info:
    <class 'numpy.distutils.system_info.atlas_info'>
      NOT AVAILABLE
    
    d:\program files\python35\lib\site-packages\numpy\distutils\system_info.py:572: UserWarning:
        Atlas (http://math-atlas.sourceforge.net/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [atlas]) or by setting
        the ATLAS environment variable.
      self.calc_info()
    lapack_info:
      libraries lapack not found in ['d:\\program files\\python35\\lib', 'C:\\', 'd:\\program files\\python35\\libs']
      NOT AVAILABLE
    
    d:\program files\python35\lib\site-packages\numpy\distutils\system_info.py:572: UserWarning:
        Lapack (http://www.netlib.org/lapack/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [lapack]) or by setting
        the LAPACK environment variable.
      self.calc_info()
    lapack_src_info:
      NOT AVAILABLE
    
    d:\program files\python35\lib\site-packages\numpy\distutils\system_info.py:572: UserWarning:
        Lapack (http://www.netlib.org/lapack/) sources not found.
        Directories to search for the sources can be specified in the
        numpy/distutils/site.cfg file (section [lapack_src]) or by setting
        the LAPACK_SRC environment variable.
      self.calc_info()
      NOT AVAILABLE
    
    Running from scipy source directory.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\INTERN~1\AppData\Local\Temp\pip-build-ibldjt4q\scipy\setup.py", line 415, in <module>
        setup_package()
      File "C:\Users\INTERN~1\AppData\Local\Temp\pip-build-ibldjt4q\scipy\setup.py", line 411, in setup_package
        setup(**metadata)
      File "d:\program files\python35\lib\site-packages\numpy\distutils\core.py", line 135, in setup
        config = configuration()
      File "C:\Users\INTERN~1\AppData\Local\Temp\pip-build-ibldjt4q\scipy\setup.py", line 335, in configuration
        config.add_subpackage('scipy')
      File "d:\program files\python35\lib\site-packages\numpy\distutils\misc_util.py", line 1001, in add_subpackage
        caller_level = 2)
      File "d:\program files\python35\lib\site-packages\numpy\distutils\misc_util.py", line 970, in get_subpackage
        caller_level = caller_level + 1)
      File "d:\program files\python35\lib\site-packages\numpy\distutils\misc_util.py", line 907, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "scipy\setup.py", line 15, in configuration
        config.add_subpackage('linalg')
      File "d:\program files\python35\lib\site-packages\numpy\distutils\misc_util.py", line 1001, in add_subpackage
        caller_level = 2)
      File "d:\program files\python35\lib\site-packages\numpy\distutils\misc_util.py", line 970, in get_subpackage
        caller_level = caller_level + 1)
      File "d:\program files\python35\lib\site-packages\numpy\distutils\misc_util.py", line 907, in _get_configuration_from_setup_py
        config = setup_module.configuration(*args)
      File "scipy\linalg\setup.py", line 20, in configuration
        raise NotFoundError('no lapack/blas resources found')
    numpy.distutils.system_info.NotFoundError: no lapack/blas resources found
    
    ----------------------------------------
    

    Command ““d:\program files\python35\python.exe” -u -c “import setuptools, tokenize;file=‘C:\Users\INTERN~1\AppData\Local\Temp\pip-build-ibldjt4q\scipy\setup.py’;f=getattr(tokenize, ‘open
    ’, open)(file);code=f.read().replace(’\r\n’, ‘\n’);f.close();exec(compile(code, file, ‘exec’))” install --record C:\Users\INTERN~1\AppData\Local\Temp\pip-siaet0cm-record\install-record.txt -
    -single-version-externally-managed --compile” failed with error code 1 in C:\Users\INTERN~1\AppData\Local\Temp\pip-build-ibldjt4q\scipy\



  • Mac的解决办法:

    第一步:终端输入
    git clone --recursive https://github.com/dmlc/xgboost
    cd xgboost; cp make/minimum.mk ./config.mk; make -j4

    第二步:终端:安装XGBoost Python package

    cd python-package; sudo python setup.py install

    会显示:
    Finished processing dependencies for xgboost==0.6 就成功啦



  • 此回复已被删除!


  • @Dennis_Wang 谢谢Dennis的信息, 我刚刚借到了一个Mac电脑做测试. 这是我的步骤

    (1) in terminal, type “python --version” to verify you are using anaconda python. The result should be “Python 3.6.0 :: Anaconda 4.3.0 (x86_64)”
    (2) go to anaconda/bin by typing “cd ~/anaconda/bin/”
    (3) type “git clone --recursive https://github.com/dmlc/xgboost”
    (4) “cd xgboost”
    (5) “./build.sh”
    (6) “cd python-package”
    (7) “python setup.py install --user”

    你可能会看到如下信息"openMP not exists in the system, will build use single-thread". 今晚的lecture用single thread是没问题的. 但是从performance考虑,可以考虑安装openMP (在terminal输入"brew install clang-omp")



  • @DONGXIANG_MAO 从error的信息来看, 你似乎并没有使用Anaconda里面的python. 我的建议会是使用anaconda的环境作为默认的开发环境. 有个比较hack的办法就是你用anaconda打开一个notebook, 运行如下代码

    import pip
    pip.main([‘install’,‘bayesian-optimization’])



  • 老师, 如何做Gridsearch + group CV来tune parameters呀





  • @jone4291 你好, 这是一个非常好的问题. 我刚刚看了一下sklearn的函数. 如果是sklearn本身有的classifier, 可以采用GridSearchCV + GroupKFold来进行parameter tuning. 可以参考下面代码, 这是一个Logistic regression + GroupKFold的例子.

    MyData = Datawarehouse()
    MyData.read_data()
    parameters = {
    ‘C’: (0.3, 0.5, 0.7, 0.9),
    ‘max_iter’: (1, 2, 3)
    }
    MyLR = linear_model.LogisticRegression(n_jobs=8, fit_intercept=True)
    gkf = GroupKFold(n_splits=2)
    grid_search = GridSearchCV(MyLR, parameters, n_jobs=8, cv=gkf, verbose=1)
    grid_search.fit(MyData.train_in, MyData.train_out, MyData.group)

    print(“Performing grid search…”)
    print(“Best score: %0.9f” % grid_search.best_score_)
    print(“Best parameters set:”)
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(parameters.keys()):
    print("\t%s: %r" % (param_name, best_parameters[param_name]))



  • @jone4291 但如果你有自己写的model, 那需要先将你的model包一下, 才能放进去GridSearchCV里面. 你需要提供一个class, 里面有fit, score, get_params这些函数供GridSearchCV使用.

    最后, 如果你有一个自己写的cross validation, 我目前没看到太好的办法. 你可能需要自己定义parameter grid(可以使用sklearn的ParameterGrid), 然后对里面每组参数进行function evaluation.



  • 运行xgboost的时候,出现error:AttributeError: module ‘xgboost’ has no attribute ‘DMatrix’, 环境是mac python3 jupyter notebook, 谢谢。



  • @randxie 所以我们上课写的group kfold跟这个 GroupKFold的意义是一模一样的嘛?
    把model包一下是什么意思呢?用上课写的model做例子的话,是要再写fit, score, get_params这些函数?



  • @jone4291 你好, 我给的例子里面和上课用的有些细微区别. 上课给的是group random split (GroupShuffleSplit). 所以你只需要让"gkf = GroupShuffleSplit(n_splits=2, random_state=0)"即可得到相应的cross validation. 但是, 如果你需要做一些自己的定义的cross validation然后sklearn又没有提供的话, 就需要自己写一个类似的函数了.

    把模型包一下的话, 可以参考"http://stackoverflow.com/questions/20330445/how-to-write-a-custom-estimator-in-sklearn-and-use-cross-validation-on-it". 这时候你需要自己写一个能在GridSearchCV里面使用的estimator



  • @Echo_Xiao 请这里面信息有点少, 无法判断具体原因. 不过你可以先测试一下能否在ipython里面import xgboost. 以及检查是否有和xgboost重名的文件或者folder.



  • @randxie 老师, 当我们用 logistic regression with L1 regularization的importance select features的时候, 我们怎么选择regularization 的值呢
    可以用logisticregressionCV吗?



  • @jone4291 你好, 用logisticregressionCV没有问题的. 但要注意的一点是, 正如课上讲的, 这些方法成立的前提是有准确的cross validation. 但这次的EEG数据要建立准确的cross validation比较困难. 在这种情况下, 我会建议用稍微大一点的regularization.



  • @randxie 老师 我用你之前说的group CV来做这个logisticregressionCV, 最后出错说在fit里面没有group的值
    ValueError: The groups parameter should not be None
    能帮我debug下吧

    from sklearn.linear_model import LogisticRegressionCV
    MyData = Datawarehouse()
    MyData.read_data()

    gkf = GroupShuffleSplit(n_splits=6, random_state=0)

    lr_cv = LogisticRegressionCV(penalty=‘l1’, Cs=(0.1,1),solver=‘liblinear’, cv=gkf)
    lr_cv = lr_cv.fit(MyData.train_in, MyData.train_out, MyData.group)
    print(np.count_nonzero(lr_cv.coef_))



  • @jone4291 hi, 这里你可能需要用之前和你提到的GridSearchCV来调参. 我看了一下logisticregressionCV的代码, 似乎还不支持group split这种方法.


登录后回复
 

与 BitTiger Community 的连接断开,我们正在尝试重连,请耐心等待