Friday, September 18, 2009

Geospatial Performance Benchmarks "Apples to Apples"

I get a lot of requests for Performance Benchmarks of APOLLO vs. lots of other systems. We provide extremely analytical and detailed performance results of our server every release. A couple issues always crop up in any performance benchmarks:

1. Features - first of all, ANY competitive product cannot DO what APOLLO can do...so I find myself either A: Dummying down APOLLO and the test to even be able to do an "apples to apples" test or B: not making it "apples to apples".

2. Return on Investment - ROI on software is not just performance, but HOW LONG IT TOOK TO SETUP, ADMINISTER and GET the feature operational in a production scenario!! I find myself spending such a HUGE part of my time getting the competitive software to even "work" to do the test.

I've been following the FOSS4G's Web Mapping Shootout announced for their 2009 conference. I get a really HUGE chuckle because their "shootout" couldn't be perfomed on a more CARTOON set of data and NON-REALISTIC use case. I don't know ONE client that requires one image and a handfull of vector data sets (3 to be precise).

Our smallest benchmark has 459 7 band images...choke on that Open Source.

They should call it a "water gun fight" instead of a "Shootout".

Also what will NOT be collected in the "shootout" is how long it took them to setup the system and service enable the data...how many "WEEKS" are you willing to struggle with that?

PERFORMANCE is about ROI on the investement and of course the ability of the system to handle a user load. Weigh both when your looking at the numbers!!

10 comments:

C├ędric@camptocamp said...

Could you provide URL of website delivering data from ERDAS Appolo Server ?
What is the price of one license of ERDAS Appolo Server ?
What are the yearly maintenance costs of ERDAS Appolo Server ?

Ian Turton said...

SO are you saying it takes too long to set APPOLO up for you to compete even with this simple sample of data? God help you if FOSS4G does this with a bigger sample next year.

Ian
I think it's ArcServer that needs the small data set not the open source contestants.

darkblueB said...

I dont know the full details of the Webmap Server Shootout, but arent vector renderings being tested? I dont see you mention that.. Is it necessarily true that performance is only meaningfully measured while the number of raster layers is high? Arent there at least several other dimensions of performance that could be measured well while number of raster layers is held at a low figure?

You may know I maintain a wiki with listings for all the major Earth Browser projects.. If you have any additions, you are invited to contribute...

Frank Warmerdam said...

Shawn,

I'm sorry we were unable to work with a dataset of the size you wanted. One of our objectives in the benchmark was to ensure folks could download the datasets and try things for themselves. That, and limited bandwidth into the benchmark servers makes tests with large datasets challenging this year. Perhaps you will participate next year if we can accommodate a scenario such as you propose. And one of our existing tests includes 512 files (albeit modestly sized).

You write "Our smallest benchmark has 459 7 band images...choke on that Open Source." Is there some reason you believe that open source packages would not be able to handle that? While I don't feel 7 band files are a typical case for those serving maps on the web (even imagery based maps) there are certainly ways to accomplish this with MapServer.

Also, you point out that a well rounded understanding of ROI for a software package needs to also take into account the amount of effort required to setup and deploy. That is true, but for the purposes of the benchmark we haven't tried to accomplish a broad ROI study. We are just examining maps/second performance. An ROI examination is interesting, but beyond the scope of what we intended for this presentation. I would add that examining ROI is very challenging to do fairly.

I hope you will be interested in involvement next year. I, for one, am eager to see what is choked on, and what is gobbled up.

Best regards,

Shawn Owston said...

Thanks for the comment Frank.

ERDAS was very interested in participating in the "shootout", but the response to "expanding" the test data and even our willingness to "donate" the data for the test wasn't of interest for public or presented numbers.

Hopefully next year we can participate on a larger set of data.

Our offer to donate data still stands for next year as well.

I've been monitoring the progress of the shootout and look forward to reviewing the results.

Shawn Owston said...

"Maps"/second measures "throughput" only and doesn't necessarily represent "maximum load".

Load is properly measured by determining the maximum number of simultaneous client connections making requests and the point that the peak throughput is reached (i.e. "maps"/second). After this point, as more connections are made to the server, the slope of the "maps/second" vs. time will reach 0 and as more connections are made to the, usually decrease.

This is usually only obtained and measured by more complex testing scenario and data collection during the test scenario run.

"Stability" at the point of maximum laod is also very important. That and the behavior of the system after maximum load is reached and when the system is "overloaded".

There are some hypothesis that can be made regarding load by simply collecting throughput, but those would then require more tests.

Roger Andre said...

In order to form an opinion, I would like to know more about the data you wished to test. I see that it is 7-band data, but how big (in pixels) is each image, and what sort of style rules are to be applied? Are you interested in seeing band combinations, or just single bands with color ramps?

Thanks,

Roger

jlivni said...

Shawn,

While everyone has a different use case, I think it's fair to say there's a large percentage of webmaps out there in the wild that have make use of just a few limited data layers.

That said, obviously some folks have more robust requirements. Perhaps rather than just dissing one set of attempts to set up a test scenario, you could set up your own test scenario (preferably one that covers something other software packages also market themselves as being able to do), and publish the source, and results, of your tests.

For example, if you are willing to donate some imagery for others to test with as you mention above, perhaps you could take your example dataset of 500 7-band images of whatever size you feel makes a good test, do whatever you want with them (serve up as wms?) and publish your speed results under various scenarios. Mix in some vector data, and different load testing, and so forth. Document how you did the load testing, and show examples of the results.

If you run it all on a standard server that others can duplicate (for example and EC2 virtual server), and share the dataset (for example on an EC2 shared EBS block) then it will be easy for others to try work under similar hardware situations for comparison. At that point, might be cool to also make a note of steps taken to set up the software, and approximate time it took to do so.

Assuming your apollo server is all that, I am sure you will be eager to make the results public :) And while I can't predict the future, I feel confident others in the open source world would likely take advantage of your generosity in providing the dataset and results to compare against by running their own tests and sharing their results.

Cheers,

-Josh

Paul Ramsey said...

Shawn,

The "throughput" we are measuring is pretty much exactly the "load" you are describing. It isn't a single-threaded throughput, it's a set of increasingly parallel tests, starting at only one thread, and increasing to a maximum of 40 concurrent threads. Like you said, the goal is to measure the point at which the maximum number of maps/second is reached, the place where the hardware is fully saturated with work. And also as you said, if the software is good, the maps/sec curve will stay flat thereafter as more and more load threads are added (and for the servers that did complete the benchmark, that is the pattern we are seeing).

Paul

Shawn Owston said...

Hi Paul,

Thanks for the comment.

Nice work on the PostGIS 1.4, we look forward to supporting in for our future releases.

back to the thread...

There can be "overhead" to each thread on the application server side as the thread pool grows. 40 is a relatively good number to reach saturation, but may not be stressing the thread pool of the connector on the application server and measuring the results from higher thread numbers (i.e. 250-1000).

In essence, the 40 thread limit is only measuring the saturation at 40 maximum possible users.

Although Thread -> Client Session is not a 1:1 correlation, as higher connection loads are demanded of the server, there is a App Server Overhead that must be considered. In essence, the 40 thread tests may not represent the actual number of threads required to handle a large number of concurrent users making requests at the same time.

You could "tweak" the test to increase the "wait time" of requests in each thread to say an average of 7 seconds which would provide an average "human" map experience and enable the test scenario to add more concurrent threads simultaneously.

This methodology will identify "max users at max throughput", which is a very important measure.

cheers