crackcell's dustbin home projects
首页 > 单机版机器学习开发环境搭建:基于Jupyter > 正文

单机版机器学习开发环境搭建:基于Jupyter

1 总览

  • Spark:数据处理基于Spark来搞,数据量大了可以无缝迁移到集群环境。
  • Jupyter Notebook:提供GUI环境
  • Tensorflow:提供DL的训练引擎

2 Spark

  1. 下载最新版spark:http://spark.apache.org/downloads.html
  2. 解压到本地,例如:/opt/

3 Jupyter Notebook

3.1 安装jupyter

sudo pip install jupyter

3.2 安装scala kernel

  1. 下载jupyter-scala:https://github.com/alexarchambault/jupyter-scala
  2. 执行安装:
    $ ./jupyter-scala
    
  3. 查看是否安转成功:
    $ jupyter kernelspec list
    Available kernels:
      scala      /home/crackcell/.local/share/jupyter/kernels/scala
      python2    /usr/local/share/jupyter/kernels/python2
    

3.3 安装spark kernel

  1. 下载Apache Toree:
    $ sudo pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
    
  2. 设置spark home:
    $ sudo jupyter toree install --spark_home=Spark安装位置 --interpreters=Scala,PySpark,SQL
    
  3. 启动jupyter notebook,新建一个"Apache Toree - PySpark"的notebook,试试。如果终端报scala运行时错误,可能安装了老版的toree用了旧版的scala,删除之后重新安装最新版就好。写这篇文章的时候,用的是scala 2.11。

4 TODO TensorFlow

TODO

Date: Sat May 27 15:54:07 2017

Author: Menglong TAN

Created: 2017-05-31 Wed 14:57

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Modified theme and code from Tom Preston-Werner.