Copyright 2000, 2003 Hewlett-Packard Company. Basic Design Outline Netperf Version 4 ftp://ftp.cup.hp.com/dist/networking/benchmarks/netperf/netperf4_design.txt Rick Jones Hewlett-Packard Company Cupertino, CA Revision 0.6; March 19, 2003 feedback and embellishments, strawman message formats Revision 0.5; March 17, 2003 further embellishments of commands and messages. Revision 0.4; February 6, 2003 Nuke the netsink Revision 0.2; February 8, 2000 Introduction: This document is intended to evolve into a design specification for netperf version 4, hereinafter refered to as "netperf4." The goal of netperf4 is to provide the ability to more easily do aggregate connection tests, as well as testing of FTP and/or DNS or other types of Internet services. The core component of netperf4 is meant to be a sufficiently flexible, simple, multi-system test harness that allows incremental load increase, and interactive loading control. Various tests can be plugged-into this harness to create different benchmarks. It is desired that netperf4 be "portable" - which is to say in the large it is tied neither to a specific platform, nor to a specific compiler. The extent to which netperf4 can be compiled and run on a given platform may depend on the capabilities of that platform. At a minimum, netperf4 is expected to compile and run under at least the following: *) HP-UX 11 or later *) Linux (of some suitable vintage/distribution) *) Windows (some suitable flavor(s)) Assuming that the work to support them isn't too burdensome and people step-up to assist, it is beleived that at some point netperf4 will also compile and run under (in no particular order): *) Solaris 8 or later *) *BSD *) Tru64 *) AIX *) OpenVMS To ease the development burden, netperf4 will, wherever possible, leveage the work of others. That may include, but would not be limited to: *) glib/gtk+ - for eventloop handling and display support *) libxml2 - for XML support in config files, messages and reports Thus, for a platform to support netperf4, it must also be able to support the works in the list above. If gtk+ is determined to be a core dependency then glib sort of follows. If a GUI is not determined to be a core dependency, glib will still be desired/required for eventloops and thread abstractions. In addition, it is expected that one or more of the test suites of netperf will rely on the following: *) libcurl - for FTP and HTTP/HTTPS testing *) TBD - for DNS testing It is desired that, just as netperf2, netperf4 become a "de facto" industry standard for benchmarking. As such it is expected to be developed and released under the terms of some suitable "open source" software license. Copyright holder for netperf4 is expected to remain the same as netperf2 - that is the Hewlett-Packard Company. What netperf4 is Not Netperf4 is _NOT_ expected to be a complete substitute for netperf2. It is expected that there will be situations where the overheads (either runtime, or system capability, or user interface) for netperf4 will be in excess of what some platforms can support or users will desire. Netperf2 will remain, and may migrate to make use of some concepts initially used in netperf4 (in particular the use of XML for messaging and reporting is a likely candidate). The Big Picture: There will be two process types in the netperf four architecture. At present, the names are not fixed, but would be: *) netperf *) netserver The netperf process is the process with which the user has direct interaction. Netperf will control the initation, reporting, and termination of load. User interaction with netperf will either be through interactive commands, or through scripting. Perhaps via a TUI and/or GUI. A "web-based" interface, while presently a "popular" thing is simply a "want." The netperf process will establish "control connections" to one or more netserver processes, spread across one or more load generating client systems. Each netserver process will have a control thread which is used to respond to commands from the netperf process. Those commands would include the creation and termination of load generating threads, and commands to transition those threads from an idle state to a loading and/or counting (actually tracking the load) state. All messages sent or received on the control connection will pass-though the control thread. On those platforms with support for SIGIO, a test "thread" need not be a thread separate from the "control thread." In such situations, there can be only one test per netserver process. For tests such as FTP/HTTP dowload or DNS serving, the load generating threads in the netserver processes would talk directly to the FTP/DNS/Web server. For the more classic netperf test types such as TCP_STREAM, TCP_RR and the like, netperf would talk to two netserver load generating threads via their respective netserver control threads (presumeably but not necessarily in different systems. Netperf would configure both netserver load generating threads and would tell them to link-up their data (test) connection. As an aid to portability all messages bewteen netperf and netserver shall be encoded in 7-bit US ASCII, encoding XML messages. While this does place a greater burden on the coding, it will bypass any issues with endianness and byte ordering for multi-byte data types. It is also hoped that use of XML for the messages will make use of XML for the config and report files easier. The netserver load generating thread FSM: A load generating thread (aka "test") would have a 6 state FSM. Those states would be: *) INIT - the state while the thread is setting-up to generate load *) IDLE - the thread is ready to generate load *) LOAD - the thread is generating load, but not tracking results *) MEAS - the thread is generating laod, and tracking results *) ERROR - the thread has encountered an unexpected error *) DEAD - the thread is terminating A test is created and exists in the INIT state when netserver receives a request to create a test from the netperf. Once initialization is complete, the test replies to the netperf and transitions to the IDLE state. While in the IDLE state, the test may be asked to transition to the LOAD state, or the DEAD state. Requests to transition the test to any other state will put the test into the ERROR state. While in the LOAD state, the test may be asked to transition to the IDLE state, or the MEAS state. Requests to transition the test to any other state will put the test into the ERROR state. While in the MEAS state, the thest may be asked to transition to the LOAD state. Requests to transition the test to any other state will put the test into the ERROR state. The ERROR state can be entered upon unexpected error while in the INIT, IDLE, LOAD, or MEAS states. While in the ERROR state, the test may be askt to transition to the DEAD state. Any other request will result in the test replying with an error message corresponding to the reason the test entered the ERROR state, and the test will remain in the ERROR state. A test in the ERROR state will not generate load. The following picture is intended to aid in understanding the FSM. It does not contain the associated messages received or sent: +-------+ | | +----error---| INIT | | | | V +-------+ | | When | V Ready | +-------+ | | | +----error---| IDLE |--------+ | | | | V +-------+ | | recv | ^ recv | | LOAD V | IDLE | +-------+ | +-------+ | | | | | | | | ERROR |<-+-error------| LOAD | | | | | | | | +-------+ ^ +-------+ | | | recv | ^ recv | recv | | MEAS V | NOCNT | DIE | | +-------+ | | | | | | recv | +-error------| MEAS | | DIE | | | | | +-------+ | | | | | | +-------+ | | | | | +------------------>| DEAD |<-------+ | | +-------+ | ----- --- - Netperf/netserver control connection commands Commands on the control connection are expected to be encapsulated in XML documents passed over the control connection. Test-specific items will be encapsulated as nodes within the document and will be opaque to the netserver control thread. The root node of a message between a netperf/netserver/test instance will resemble the following XML snippet: ...content... where "tonid" is the destination netserver/netperf ID, "totid" is the destination test ID, "fromnid" is the source netserver/netperf ID, and "fromtid" is the source test ID. The special case of "netserver" as either tonid or fromnid will identify the netserver process. The special case of "tnull" as either totid or fromtid will indicate that the message is for either the netperf or netserver accordingly. The astute reader will notice that this "enables" a test instance in in the context of one netserver to address a message to a test instance in the context of another netserver. This is deliberate to allow future test instances to communicate with one another when they need to coordinate their actions. However, it is not expected to be implemented and debugged at the first release of netperf4 :) It is also expected that by default netperf will disallow such messages - the next paragraph will explain why :) As for why one might not simply have the test instances create their own out-of-band connections? Well, they could, but the desire to better enable netperf to function through firewalls suggests that having the messages flow through connections with known addressing is a good thing. Of course, this also represents an opportunity for a "covert channel" between test instances and malicious test library code might exploit that for nefarious porpoises. Hence, once the functionlaity is enabled, the default in netperf will be to "block" such messages by aborting the entire test (after emitting an apropriate error message to the user of course :). The following commands/messages will be exchanged on the netperf/netserver control connection and will (as apropriate) appear as XML entities embedded in the construct. Only one of these entities shall be embedded in any one construct. new control connection - a new netserver process is created via platform-specific means (eg fork etc) version - send the major, minor and micro version numbers for netperf. if the netserver believes these version numbers to be compatible with his version numbers, the netserver responds with a version message with his major, minor and micro version numbers. otherwise netserver replies with an error message. when netperf receives a version message, it will compare the major, minor and micro version numbers against his own. if the netperf believes these to be compatible with his own, he will do nothing. otherwise he will close the control connection and report a version incompatability to the user interface. test - create a new test and init based on "workinfo." this will include dynamically loading the specified test library and loading apropriate function pointers, and then setup of test-specific paramters. if initialization is successful, return an init message containing the test-specific post-initialization data, otherwise, transistion the test to the DEAD state and return an error message load - request that the test transition from IDLE to LOAD and start generating load, or from MEAS to LOAD but not track its results. if the test can successfully transition to the LOAD state, send a LOAD message back, otherwise transition the test to the ERROR state and return an error message. meas - request that test transition from LOAD to MEASure state and start tracking the results of is generation of load. if the test can successfully transition to the MEASure state, reply with a meas message, otherwise transition to the error state and return an error message. idle - request that test transition from LOAD to IDLE and stop generating load. if the transition to IDLE is successful, return an IDLE message, otherwise transition to the ERROR state and return an error message. die - request that the test transition from either IDLE or ERROR to DEAD and simply fade away, freeing any test-specific resources not already freed. [Need there be a reply to this command?] clear - request that test clear its statistics. unless some error is encountered in clearing the statistics, no message is returned. otherwise, an error message is returned and the test transitions to the ERROR state. snap - take a snapshot of test 's statistics. if the statistics can be assembled, a snap message with the statistics is returned, otherwise return an error message and transition the test to the ERROR state. totals - request "total" statistics (statistics since the beginning of the measurement interval) from test . if the statistics can be assembled, a totals message with the statistics will be returned, otheriwse, return an error message and transistion the test to the ERROR state. warning - when the netserver control thread or a test instance detects a non-fatal condtion that allows testing to continue, it will send a warning message to netperf. a netperf will never send a warning message to a netserver or test instance. it is expected that this is exceedingly rare. error - when the netserver control thread or a test detects a fatal error, an error message will be send to the netperf. a netperf will never send an error message to a netserver or test instance. ASCII text of an error message will be embeded in the message. close of control connection - upon detecting a close of the control connection a netserver will unceremoniously terminate, taking all tests with it. [depending on the nature of the test code, it may be necessary to give the tests the option of some clean-up] a netperf detecting close of the control connection will presume a catastrophic error in the netserver and act accordingly. Netperf commands: The following are described as "operations" because they may or may not correspond to "commands" in the sense of someone typing that command name at a prompt or what not. They are provided as a guide to the functionality expected to be implemented in the netperf process. As such, do not pay too much attention to syntax, consider only semantics. Syntax will follow after decisions on the UI(s) are made. The netperf process will support the following "operations." open - open a connection to a new netserver process on and return a client number. close - terminate (including all threads) with extreme predjudice. The control thread of the corresponding netperf simply exit()'s, taking any and all tests with it. test - create a new test instance on and initialize with . return a global thread number. list - list all tests and their current state on load - request that global test id begin to generate load. when is specified as "INIT", the command causes the request to go to all tests in the INIT state. a test id of "MEAS" will cause the request to go to all tests in the MEASure state. a test id of "ALL" will cause the request to be sent to all tests in either the INIT or MEASure states. measure - request that global test id transition from the LOAD to the MEASure state. If is specified as "LOAD" or "ALL" the request will be sent to all tests in the LOAD state. idle - request that global test id transition from the LOAD to the IDLE state. If is specified as "LOAD" or "ALL" the request will be sent to all tests in the LOAD state. clear - request that global test id clear its accumulated statistics. snap - request that global test id return statistics for the interval since the last snap command or entry into the MEASure state whichever is most recent. If is specified as "MEAS" or "ALL" the request will be sent to all tests in the MEASure state. How load generator test state transitions work: When a message arrives on a control connection, the netserver control thread will compare the command in the message against the test state recorded in the per-test data structure. If the message is valid for the current state, the netserver control thread will then queue the message to the test and set a flag in the per-test data structure to "signal" (not in the Unix/gtk sense) the test that a message is present. The test will notice this "signal" and will consume the message and act accordingly. [Question - perhaps it would be better to simply have the netserver control thread queue all messages to the test and let the test decide how to handle them? ] The test is generally expected to generate some sort of reply message after consuming the message(s) sent to it by the netperf via the netserver control thread. This shall be accomplished with library code the test code can call to access the control socket in a manner otherwise opaque to the test code. When the test is a thread separate from the netserver control thread (the usual case?) this will involve queueing the message to the netserver control thread and "signalling" the thread in some manner - the idea is to hand the message off "quickly" and let the test get back to what it was doing before. When the test is not a thread separate from the netserver control thread, this will simply write directly to the control thread. This may block the test for some undesireably length of time if the nature of control traffic is to have more than one outstanding message at a time. Otherwise, it is expected that the socket buffers will be sufficiently large to allow the message to be queued to the control socket without blocking. Since the netserver is expected to be dealing with many, Many, MANY test instances simultaneously, messages sent across the control connections, while generally expected to trigger replies of some sort, shall be asynchronous. That is, the netperf process will be written in an event-driven manner, and the sending of a control message is expected to update sufficient state information in the netserver process to enable processing the resulting reply. What a load generating thread should do in the LOAD verus MEAS states: There is a decision to be made wrt how a load generating thread should behave while in the LOAD or MEAS states. In particular, how transitions from one to the other should affect the results being counted. For very simple (ie short) "transactions" in the load, we could simply state that the load generating thread does not start counting load until the first transaction it does after entering the MEAS state. That would likely be sufficient for something like the netperf TCP_STREAM test, or the TCP_RR, where it could simply start counting with the next transaction. At the other end, it is likely that a transaction started while in MEAS would complete very closely to the time of the request to return simply to the LOAD state. However, for something like an FTP download of a 16 MB file over a simulated 56,000 bit per second link, the next "transaction" (ie download) could be 40 minutes away, and it could be 40 minutes before it completes. It seems therefor, that any "long" test transaction has to be coded such that it can start and stop counting "in the middle." When netperf will exit: Netperf will exit whenever it encounters a fatal error. In general an error will be considered fatal if it precludes the possiblity of futher useful work being done by netperf. When netperf is being run interactively this shall include: When netperf is being run non-interactively, fatal errors will include those of the flavors listed for interactive operation plus: *) receipt of any "error" messages from any netserver or test instance *) failure to establish a control connection *) failure of a control connection More on the control connection: The netperf config file (or other mechanism, we'll just use "config file" here for brevity) must include the ability to completely specify both endpoints of a control connection. By that we mean the six-tuple of local and remote IPaddress/hostname, local and remote port numbers, and local and remote addressing families (corresponding to IPv4, IPv6 and "don't care"). The defaults for the items in the six-tuple will be as follows: *) Local IPaddress/hostname - INADDR_ANY/assigned by the system *) Local port number - dynamically assigned by the system *) Local address family - AF_INET *) Remote IPaddress/hostname - "localhost" *) Remote port number - the netperf4 well-known port number (TBD) *) Remote address family - AF_INET The routine "getaddrinfo()" will be used to confert the local and remote three-tuples into sockaddr structures that can be passed to bind()/connect() accordingly. For each remote address info structure returned by getaddrinfo(), the control connection establishment code will try each of the local address info structures returned by getaddrinfo(). The first combination of remote/local address info that results in a succesful call to connect() will be used for the control connection. If no combination of local/remote address info results in a successful call to connect() then control connection establishment will fail and an error will be displayed to the user. If netperf is being run interactively, it will continue to execute, otherwise, netperf will abort. Thus, the greatest control is exerted by the user when s/he specifies local and remote addressing information in the form of IP addresses (IPv4 or IPv6), explicit port numbers (numeric rather than names) and specific address families (AF_INET or AF_INET6 depending on the IP addresses provided. When hostnames are specified, the control connection can involve any of the IP addresses associated with the hostnames. If it is desired that hostnames be used and that only a single IP address associated with the hostname be used, then the hostname MUST resolve to a single IP address. More about test endpoint addressing: It is required that config files and test code be able to handle specification of full endpoint addressing information as defined by the type of test being executed. For "classic" netperf style tests that means that one must be able to specify addressing information for both ends of the "data" connection. "Classic" netperf style tests have two test specifications - one for the "recv" side and one for the "send" side. Complete addressing info MAY span config specifications for both sides. Whether or not a test utilizes an algorithm similar to that of the control connection is left as a decision for the designer of that test suite. Appendix M - Message formats: The following are the formats of the messages exchanged between netperf and the netserver control thread/test. Some messages may contain test-suite-specific elements which are not described here. In XML, if an element (entity) is normally described as: However, if there is no text content to the element, this can be shortened to simply: netperf messages, being XML constructs will naturally follow the same pattern. In this document, the shorthand will be used Version message: This is the message type that informs the other side of our version information. The "vers", "updt", and "fix" numbers (aka major, minor, micro) are encoded as attributes of the version element rather than as contents or sub-elements (this may not be the correct XML terminology...). The "req" attiribute is a placeholder should it be necessary to know if this version message is an initial request, or a reply to a request. It is not presently expected to be necessary. Whether attribute names should be full English or abreviated is open to discussion. The tradeoff is between human readability and bytes on the network. Human readability is likely to win. Snap message - The interval attribute of a snap will specify the frequency with which "interval" messages should be sent back to the netserver after the initial, immediate interval message. A value of "0" (zero) seconds states that only one interval message should be sent. A non-zero value for the interval attribute means that interval messages should continute to be sent, aproximately N seconds apart until otherwise disabled. Interval message - interval statistics in response to a snap or The interval element will have as its attributes a start time in the style of a "Unix" timeval structure as returned in a gettimeofday() call - seconds and microseconds since the beginning of the Epoch. The other option is to have the format be in a full ASCII format for YYYY-MM-DD-HH:MM:SS.mm. The decision again centers on how the timevalues will be used and whether things in log files should be more easily read by humans, or if conversions for math should be easier for the programmer. Error message This will cause the receiving test instance to transition to the LOAD state. The test instance is determined from the attributes. Request message and reply message are identical. Measure message This will cause the receiving test instance to transition to the MEASure state. The test instance was determined from the attributes. Request message and reply message are identical. Idle message This will cause the receiving test instance to transition to the IDLE state. The test instnace was determined from the enclosing attributes. Request message and reply message are identical. Test message ...test-specific contents... Cause the netserver control thread to instantiate and initialize a test instance with a test ID based on the "tid" attributed. The "totid" attribute in the enclosing would be tnull as a test command is addressed to a netserver control thread and not a test instance. The test-specific contents are expected to be XML formatted so they may come, unchanged, from a test element in the config file (should one exist) At this time, it is not certain if tids will be globally unique or only unique within the context of a netserver. Globally unique would perhaps be easier on the netserver, and may actually be required for the feature for linking two test instances together. Initresult message ...test-specific contents... This is the message sent from a test instance in response to the test message sent by netperf. It does not contain a "tid" attribute because that will be present in the "fromtid" attribute of the enclosing . The test-specific contents are expected to be XML formated so they can go directly into an XML-formatted results or log file. Die message It is presumed that since a "die" message is addressed to a netserver control thread that it should have a tid attribute to idendtify the test instance to be terminated - the "totid" attribute of the enclosing likely being tnull. Clear message Request that the test instance (specified in the totid attribute of the enclosing ) clear all statistics - both interval and total. Appendix T - Sample config file Unless you have a decent understanding of XML, they may look rather confusing. Basically all the configuration data for netperf is contained in child-elements/entities of a element. It is highly desireable that the format for the "test" subelement be usable as the "test" command on the control connection. The decisions on what should be attributes 'foo="bar"' and what should be child-elements is still somewhat open and subject to fluidity. One possible decision criteria (not necessarily rigorously applied here :) is whether or not something handling/passing-on an element needs/wants information contained within. If it may need information, then having that as an attribute may be preferable as it would not have to "walk" the child-elements - this probably makes more sense if you are familiar with libxml2 or XML in general. (Which the author does not necessarily claim to have himself... :) sweb156.cup.hp.com SEND_TCP_STREAM libnettest_bsd.sl foo.bar.baz 32768 32768 4 0 bin.fred.ethel 8 0 1460 123.45 123.45 ::1 RECV_TCP_STREAM libnettest_bsd.sl bing.fred.ethel 32 32 4 8 0 123.45 CPU_UTIL libnettest_cpu.sl 0 4 95.3