Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots. Halyard implementation is based on Eclipse RDF4J framework and Apache HBase database, and it is completely written in Java.
Build environment prerequisites are:
In the Halyard project root directory execute command:
Optionally you can build Halyard from NetBeans or other Java Development IDE.
Halyard is expected to run on an Apache Hadoop cluster node with configured Apache HBase client. Apache Hadoop and Apache HBase components are not bundled with Halyard. The runtime requirements are:
Note: Recommended Apache Hadoop distribution is Hortonworks Data Platform (HDP) version 2.4.2 or Amazon Elastic Map Reduce (EMR).
Hortonworks Data Platform is a Hadoop Distribution with all important parts of Hadoop included, however it does not directly provide hardware and core OS.
The whole HDP stack installation through Amabari is very well described at Hortonworks Data Platform - Apache Ambari Installation page.
It is possible to strip down the set of Hadoop components to
ZooKeeper, and optionally
Ambari Metrics for cluster monitoring.
Detailed documentation about Hortonworks Data Platform is accessible from http://docs.hortonworks.com
Amazon Elastic MapReduce is a service providing both - hardware and software stack to run Hadoop and Halyard on top of it.
Sample Amazon EMR setup is very well described in Amazon EMR Management Guide - Getting Started.
For Halyard purpose it is important to perform first two steps of the guide:
It is possible to strip down the set of provided components during
Create Cluster by clicking on
Go to advanced options and selecting just
HBase and optionally
Ganglia for cluster monitoring.
HBase for Halyard is possible to run in both Storage Modes -
Instance types with redundant storage space (like for example
d2.xlarge) are highly recommended when you plan to Bulk Load large datasets using Halyard.
Instance types with enough memory and fast disks for local caching (for example
i2.xlarge) are recommended when the cluster would mainly serve data through Halyard.
Additional EMR Task Nodes can be used to host additional Halyard SPARQL Endpoints.
Detailed documentation about Amazon EMR is available at https://aws.amazon.com/documentation/emr/