Fountain Node Monitor
Introduction
Fountain implements the node monitor specification of the SciDAC Scalable Systems Software project. It is capable of aggregating the status information of every node in a cluster in a scalable, reliable, and efficient manner while using a neglibible amount of CPU activity on each node. It can recover from individual and multiple node failures in the event a node unexpectedly goes down or is taken offline for administrative purposes.
Why Fountain?
The name Fountain was chosen because it's a type of bamboo. Bamboo was the first SSS project developed in the SCL, and Fountain was the second so we wanted a similar name.
Installation
Fountain installation contains instructions and notes for installing Fountain on different platforms.
Development
Development Notes contains various development notes and ideas learned over the course of developing Fountain.
Network Monitoring contains thoughts and ideas for things to look in to with Fountain's network support.
Documentation
Consult the Fountain/Documentation wiki page for detailed documentation and usage information.
Active Tickets
- #5
- Add support to detect extended length Infiniband port counters
- #50
- Add Load Average to the reports from Fountain
- #64
- Add a LinkStatus element
- #80
- Add a --with-moso argument to the configure script
- #132
- Think about retooling Fountain to use Goanna::Model
- #143
- Further Networking XML clarifications
- #174
- identify subnet manager
- #176
- asynchronous network discovery
- #228
- Report alternative response code when not all required attributes are met
- #302
- Incorrect symbol error values
- #311
- Need to support re-scan of IB network
- #321
- A failed forced network rediscovery should return a more verbose error message
- #324
- some fountain queries show partial data
- #344
- Periodically check 'Down' IB ports during each query
- #368
- fountainibPoll does not correctly report packets/sec values
- #483
- Host connecting to fountain with no reverse DNS get disconnected
- #507
- Incorrect data rate sanity checking
- #519
- Have Fountain node monitor report disk throughput somehow
Authors
Sam Miller - samm@scl.ameslab.gov
Brett Bode - brett@scl.ameslab.gov
