Jupyter notebooks to Org source + Tower of Babel

This post provides a simple example demonstrating how a shell script can be called with appropriate variables from any Org file in Emacs. The script essentially converts a Jupyter notebook to Org source, and Babel is leveraged to call the script with appropriate variables from any Org file. This reddit thread and blog post elucidate the advantages of using Babel and Org mode over Jupyter notebooks.

Directly editing code in a Jupyter notebook in a browser is not an attractive long term option and is inconvenient even in the short term. My preference is to have it all in Emacs, leveraging a versatile Org file where it is easy to encapsulate code in notebooks or projects within Org-headings. Thus, projects are integrated with the in-built task management and calendar of Org mode.

However, it may be a frequent necessity to access an external Jupyter notebook for which there is no Org source.

One solution is to start up a Jupyter server locally, open the file and then File >> save as a markdown file, which can be converted to an Org file using pandoc. Remarkably, the output code seems similar to the code blocks used in the R-markdown notebooks, rather than pure markdown markup. Therefore this markdown export should work fine in RStudio as well. However, unless the Jupyter server is always running on your machine, this is a relatively slow, multi-step process.

This SO discussion provided my answer, which is a 2 step script via the versatile pandoc. A workable solution, as a test conversion revealed. The headings and subheadings and code are converted into Org markup along with Org source blocks.

jupyter nbconvert notebook.ipynb --to markdown
pandoc notebook.md -o notebook.org

The next consideration was to have the above script or recipe handy for converting any Jupyter notebook to an Org file quickly.1 For the script to be referenced and called from any other location, the source block needs to be defined with a name and the necessary arguments, and also added into the org-babel library.

In this example the path to the Jupyter notebook, markdown file and resulting org file are specified as variables or arguments. Note that the absolute path to any file is required. Save the following in an Org file, named appropriately, like my-recipes.org

#+NAME: jupyter-to-org-current
#+HEADER:  :var path_ipynb="/Users/xxx/Jupyter_notebook"
#+HEADER: :var path_md = "Jupyter_notebook-markdown"
#+HEADER: :var path_org = "Jupyter-notebook-org"
#+BEGIN_SRC sh :results verbatim
jupyter nbconvert --to markdown $path_ipynb.ipynb --output $cwd/$path_md.md
pandoc $cwd/$path_md.md -o $cwd/$path_org.org
cp $path_ipynb.ipynb $cwd

The path_ipynb variable can be changed as required to point to the Jupyter notebook.2

All such blocks above can be stored in Org files and added to the Library of Babel (LOB) by including the following in the Emacs init configuration.

(org-babel-lob-ingest "/Users/shreyas/my_projects/my-recipes.org")

The named shell script source block can now be called from any Org file, with specified arguments and have the notebook. The script is called using the #+CALL function and using the name and arguments of the source block above.

#+CALL: jupyter-to-org-current(path_md="Jup-to-markdown", path_org="Markdown-to-org")

Therefore, the snippet above will convert a Jupyter notebook to a markdown file named Jup-to-markdown and then an Org file called Markdown-to-org. If an argument is not specified, the default value of the paths specified in the original source block will be used.

Of course, the #+CALL function used above is also too lengthy to remember and reproduce without headaches. This is also bound to happen as the number of such named code snippets increase. One solution (though not ideal) is to store the #+CALL as a snippet using M-x yas-new-snippet, and load it when needed using the excellent ivy-yasnippet package (see MELPA), with minimal exertions.

Further possibilities

It would be nice to improve the options available for modifications on the fly. Python may be an ‘easier’ option to write up for such activities rather than a shell script. For example, a script with the working directory being an additional /optional argument could be considered.

Another desirable factor in the resulting Org file would be iPython blocks in place of python. As a temporary solution, the python blocks could be converted to ipython blocks via a search and replace throughout the document. A lisp macro / source block could run after the above source block to facilitate the search and replace. 3

  1. In Scimax - it is possible to quickly start a new project using M-x nb-new, which creates a sub-folder in the specified projects folder and creates and opens a readme.org file for the project. ^
  2. The option C-u-cl is a messy way to quickly get the full file name path, the resulting path will need to be modified slightly. ^
  3. It is worth noting that a bunch of additional HTML blocks and hyperlinks are inserted via the above export procedure. It should be possible to add some hooks to clean up the org file after the export from pandoc. ^


comments powered by Disqus