BEL Nanopub vs BELScripts - future directions

Charlie Hoyt and I discussed future directions of BELScripts vs BEL Nanopubs. Here are my current thoughts and ideas on how to move forward. I’m looking for feedback.

Overview

Issues with BELScripts:

  • they are not easy to parse correctly - they require custom parsers
  • there are no tools that support writing these manually with BEL/Annotation completions
  • they do not allow adding additional metadata or citation data like we have in BEL Nanopubs
  • fall through annotation can wreak havok on your document - it’s very easy for a curator working with these manually to forget an UNSET statement and have annotations apply to many additional BEL Assertions unexpectedly (and hard to figure out afterwards).

BEL Nanopubs were designed to capture the citation/assertions/annotations and metadata as a transportable/shareable unit and be trivial to parse (any JSON/YAML parser will work). We still do not fully support a way to write large amounts of BEL nanopubs manually as BEL curators currently do using the BEL Nanopub Visual Studio Code extension or BioDati Studio - but these are things we are working on.

Further, BEL Nanopubs don’t manage the header information: Namespace/Annotations definitions, as well as BELScripts, but it’s straightforward to add this - just wasn’t a priority in the first iteration.

I’m personally pushing to deprecate BELScripts. I’m quite open to a BELScript-type approach that is easier to build parsers for and less likely to cause user errors and allows for non-lossy roundtrip conversions to the BEL Nanopub format.

Large-scale Curation Support

Two things will improve this for BEL Nanopubs - we are currently re-writing the Nanopub editor in BEL Nanopub to make this a more streamlined form with keyboard shortcuts. Plus, we are planning on adding completion support to the BEL Nanopub Visual Studio Code extension for BEL Assertions and Annotations.

BEL Nanopub Visual Studio Code extension specifically makes it easy to convert between JSON and YAML with the YAML format being easier to work with manually.

BEL Nanopub format enhancements

I’m currently working on the following and planning on:

  1. Better namespace/annotation definitions (e.g. BELScript header content)
  2. Simplify the citation database record
  3. Annotation specification that supports namespace annotations (e.g. type Disease, DOID:9351!diabetes mellitus) as well as numerical/time annotation (see BEP7)

Number one is straightforward. I’m thinking that header will be represented in each nanopub as a URI and replace the schema_uri in the Nanopub JSON Schema. The schema_uri makes the BEL Nanopubs self-documenting, but we can add a specification_uri to the Nanopubs (as an optional element) which will contain the namespace, annotation and JSONSchema definitions and any additional overall metadata desired. We also want to make Namespace template URLs available in this specification so any namespace:id can be used to link to the namespace:id source record.

Number two is merging the citation database name and id together as a typical namespace:id pair like the regular BEL Namespaces. This is to simplify and standardize on the namespace approach for citation database entries. We can then just have one citation.id that would match either a namespace:id, a uri - e.g. https?:// or a simple reference string.

Number three is challenging. The goal is to have documented annotation types so that the annotations are validatable and parseable. Please look at BEP7 for more details, but some examples of what needs to be handled as annotations:

  • FoldChange = 2.5
  • Time = “24 h”
  • pvalue = 0.3
  • pvalue = 1E-10
  • Disease=DOID:9351!diabetes mellitus

If anyone would like to work on the specification for number three, I’d be very interested in the help.