Fountain Node Monitor

Introduction

Fountain implements the node monitor specification of the SciDAC Scalable Systems Software project. It is capable of aggregating the status information of every node in a cluster in a scalable, reliable, and efficient manner while using a neglibible amount of CPU activity on each node. It can recover from individual and multiple node failures in the event a node unexpectedly goes down or is taken offline for administrative purposes.

Why Fountain?

The name Fountain was chosen because it's a type of bamboo. Bamboo was the first SSS project developed in the SCL, and Fountain was the second so we wanted a similar name.

Installation

Fountain installation contains instructions and notes for installing Fountain on different platforms.

Development

Development Notes contains various development notes and ideas learned over the course of developing Fountain.

Network Monitoring contains thoughts and ideas for things to look in to with Fountain's network support.

Documentation

Consult the Fountain/Documentation wiki page for detailed documentation and usage information.

Active Tickets

View in Ticket Browser

#5
Add support to detect extended length Infiniband port counters
#50
Add Load Average to the reports from Fountain
#64
Add a LinkStatus element
#80
Add a --with-moso argument to the configure script
#132
Think about retooling Fountain to use Goanna::Model
#143
Further Networking XML clarifications
#174
identify subnet manager
#176
asynchronous network discovery
#228
Report alternative response code when not all required attributes are met
#302
Incorrect symbol error values
#311
Need to support re-scan of IB network
#321
A failed forced network rediscovery should return a more verbose error message
#324
some fountain queries show partial data
#344
Periodically check 'Down' IB ports during each query
#368
fountainibPoll does not correctly report packets/sec values
#483
Host connecting to fountain with no reverse DNS get disconnected
#507
Incorrect data rate sanity checking
#519
Have Fountain node monitor report disk throughput somehow

Authors

Sam Miller - samm@scl.ameslab.gov

Brett Bode - brett@scl.ameslab.gov