Vagrantfile
in your folder. Once this is in place, you can run
vagrant up
which will create and start a virtual machine based on the
ubuntu/trusty64
image. Finally, once the VM is up, you can run
vagrant ssh
to ssh into it and run Linux commands. Other useful commands are
vagrant halt
which shuts down the running machine that Vagrant is managing and
vagrant destroy
which will remove all traces of the guest machine from your system.
vagrant.yml
. First, we create the file
vagrant.yml
in the same folder as the Vagrantfile and insert into it the following contents: [code] --- vm_box: ubuntu/trusty64 vm_name: neo4j-node vm_memory: 4096 vm_gui: false [/code] Bear in mind that for this file (being YAML) indentation is important. Next, we modify our
Vagrantfile
. We add the following lines at the top: [code language="ruby"] require 'yaml' settings = YAML.load_file 'vagrant.yml' [/code] These lines import the required Ruby module and load our
vagrant.yml
file into a variable called settings. Now we can reference the configuration variables. For example, we can change the line: [code language="ruby"] config.vm.box = "ubuntu/trusty64" [/code] to [code language="ruby"] config.vm.box = settings['vm_box'] [/code] This will allow us to be able to change the box later should we want to test our setup on a different operating system or version. At this point, we can also control other settings. If we are using Virtualbox, we can find the
config.vm.provider "virtualbox"
section of our Vagrantfile, uncomment it and modify it as follows: [code language="ruby"] config.vm.provider "virtualbox" do |vb| # Display the VirtualBox GUI when booting the machine vb.gui = settings['vm_gui'] # Customize the amount of memory on the VM: vb.memory = settings['vm_memory'] # Customize the name of the VM: vb.name = settings['vm_name'] end [/code] This allows us to control several settings of our Virtualbox VM (such as the name, the memory etc) based on the values we define in our
vagrant.yml
file. Now that we have a configuration file, we are ready to proceed to the next step.
config.vm.define
method call. We want the number of nodes to be configurable, so we first add the following line to our
vagrant.yml
configuration file: [code language="ruby"] cluster_size: 3 [/code] Now we have to edit our Vagrantfile in order to introduce a loop. The way to do this is to find the line containing: [code language="ruby"] Vagrant.configure("2") do |config| [/code] and insert the following lines after it: [code language="ruby"] # Loop with node_number taking values from 1 to the configured cluster size (1..settings['cluster_size']).each do |node_number| # Define node_name by appending the node number to the configured vm_name node_name = settings['vm_name'] + "#{node_number}" # Define settings for each node config.vm.define node_name do |node| [/code] For the rest of the Vagrantfile we have to:
config
with node
settings['vm_name']
with node_name
vagrant up
three machines will be created in Virtualbox:
neo4j-node1
,
neo4j-node2
and
neo4j-node3
.
vagrant.yml
configuration file: [code] vm_ip_prefix: 192.168.3 [/code] The last part of the IP is going to be determined for each node by simply adding 10 to the node number. So we add the following lines to our Vagrantfile script: [code language="ruby"] # Determine node_ip based on the configured vm_ip_prefix node_ip = settings['vm_ip_prefix']+"."+"#{node_number+10}" # Create a private network, which allows access to the machine using node_ip node.vm.network "private_network", ip: node_ip [/code] Now after we run
vagrant up
, we can test network connectivity. We should be able to run
ping 192.168.3.11
,
ping 192.168.3.12
,
ping 192.168.3.13
from our host and get a response. We should also be able to run the same commands from within each node and get a response.
neo4j_initial_hosts
)playbook.yml
) and any variables that are known at this time and will be needed later by Ansible. For example, the Vagrant variable node_ip
is passed to the Ansible variable node_ip_address
so whenever we use the expression
within an Ansible template it will be substituted with the actual ip address that was assigned to the particular node when it was created by Vagrant.ansible-java8-oracle
role folderansible-neo4j
role folderansible-neo4j/defaults/main.yml
This file contains variables used by all other files.
ansible-neo4j/tasks/install_neo4j.yml
This file contains the tasks to be performed to install Neo4j
ansible-neo4j/tasks/main.yml
This file contains all the tasks to be performed within Neo4j role
ansible-neo4j/tasks/install_neo4j_spatial.yml
ansible-neo4j/handlers/main.yml
This file contains tasks that are triggered in response to ‘notify’ actions called by other tasks. They will only be triggered once at the end of a ‘play’ even if notified by multiple different tasks.
ansible-neo4j/templates/neo4j.conf
This file contains all the configuration for Neo4j. The most interesting changes are:
1. Changes to paths of directories, security and upgrade settings:
[code language="ruby"] # Paths of directories in the installation. dbms.directories.data=/var/lib/neo4j/data dbms.directories.plugins=/var/lib/neo4j/plugins dbms.directories.certificates=/var/lib/neo4j/certificates dbms.directories.logs=/var/log/neo4j dbms.directories.lib=/usr/share/neo4j/lib dbms.directories.run=/var/run/neo4j dbms.directories.metrics=/var/lib/neo4j/metrics # This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to # allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the # `LOAD CSV` section of the manual for details. # dbms.directories.import=/var/lib/neo4j/import dbms.directories.import=/vagrant/csv # Whether requests to Neo4j are authenticated. # To disable authentication, uncomment this line dbms.security.auth_enabled=false # Enable this to be able to upgrade a store from an older version. dbms.allow_upgrade=true [/code]2. Use of node_ip_address variable for network configuration
[code language="ruby"] #***************************************************************** # Network connector configuration #***************************************************************** # With default configuration Neo4j only accepts local connections. # To accept non-local connections, uncomment this line: dbms.connectors.default_listen_address= # You can also choose a specific network interface, and configure a non-default # port for each connector, by setting their individual listen_address. # The address at which this server can be reached by its clients. This may be the server's IP address or DNS name, or # it may be the address of a reverse proxy which sits in front of the server. This setting may be overridden for # individual connectors below. dbms.connectors.default_advertised_address= # You can also choose a specific advertised hostname or IP address, and # configure an advertised port for each connector, by setting their # individual advertised_address. # Bolt connector dbms.connector.bolt.enabled=true #dbms.connector.bolt.tls_level=OPTIONAL dbms.connector.bolt.listen_address=:7687 # HTTP Connector. There must be exactly one HTTP connector. dbms.connector.http.enabled=true dbms.connector.http.listen_address=:7474 # HTTPS Connector. There can be zero or one HTTPS connectors. dbms.connector.https.enabled=true dbms.connector.https.listen_address=:7473 [/code]3. High Availability Cluster configuration (using variables)
[code language="ruby"] #***************************************************************** # HA configuration #***************************************************************** # Uncomment and specify these lines for running Neo4j in High Availability mode. # See the High Availability documentation at https://neo4j.com/docs/ for details. # Database mode # Allowed values: # HA - High Availability # SINGLE - Single mode, default. # To run in High Availability mode uncomment this line: dbms.mode=HA # ha.server_id is the number of each instance in the HA cluster. It should be # an integer (e.g. 1), and should be unique for each cluster instance. ha.server_id= # ha.initial_hosts is a comma-separated list (without spaces) of the host:port # where the ha.host.coordination of all instances will be listening. Typically # this will be the same for all cluster instances. ha.initial_hosts= # IP and port for this instance to listen on, for communicating cluster status # information with other instances (also see ha.initial_hosts). The IP # must be the configured IP address for one of the local interfaces. ha.host.coordination=:5001 # IP and port for this instance to listen on, for communicating transaction # data with other instances (also see ha.initial_hosts). The IP # must be the configured IP address for one of the local interfaces. ha.host.data=:6001 # The interval, in seconds, at which slaves will pull updates from the master. You must comment out # the option to disable periodic pulling of updates. ha.pull_interval=10 [/code]vagrant up
which will create and configure three Virtualbox machines which will start and join each other to form a Neo4j HA Cluster. (Note: if you have been following along and experimenting, you might want to run
vagrant destroy
first in order to start clean). The cluster will be available as soon as the first machine is up and running. Every time another machine comes up, it will join the cluster and replicate the database. You can access the Neo4j GUI of each machine from your browser at:
https://192.168.3.11:7474 (neo4j-node1) https://192.168.3.12:7474 (neo4j-node2) https://192.168.3.13:7474 (neo4j-node3)
A sample csv file has also been included in the
GitHub repo of this post. You can load this file using the following command in Neo4j browser: [code] LOAD CSV FROM 'file:///genres.csv' AS line CREATE (:Genre { GenreId: line[0], Name: line[1]}) [/code] You can then check the results using: [code]MATCH (n) RETURN (n)[/code] This should return 115 Genre nodes and 0 relationships. To prove to yourself that this is actually a cluster and that the data is replicated automatically, you can run the above match command on any node and you should get the same result. You can also bring down a node (eg.
vagrant halt neo4j-node2
). As long as the first node is running, all the other live nodes should be responsive and in-sync. You can test each node using the match command above. If you bring a node back up (eg.
vagrant up neo4j-node2
) then the node will rejoin the cluster and the latest state of the database will be replicated to this node also. Finally, you can run
:sysinfo
on any node to see more information on the state of the cluster.