The Quest for Accurate Data: Why it is Important


From the Bridge Trader - Jan/Feb 1999

"Data geeks" is a term often applied to our personnel here at Ermanometry Research. We've even been called data freaks. Sometimes the language gets still more colorful, and we love it! From our perspective, the strongest language is the greatest compliment, provided that it refers only to our obsession with accurate data. We consider it confirmation that we are doing our job.

Our research requires us to be compulsive about accurate data. One of the foundations of our work is that market movements are not random. This applies to all freely traded markets, cash and futures, from grains and metals to financials and equities. Our thesis that all markets conform to specific dynamic patterns, both in price and time, was not a preconception for which we sought evidence. This thesis was developed from overwhelming evidence uncovered through painstaking data analysis. The book, Ermanometry-The Perfectly Patterned Stock Market, contains hundreds of pages with this evidence and the methods used to decode market moves. Ermanometry measures moves of more than 60 years using increments no larger than a single trading day. We do not count in weeks, months or years. The permissible error factor on these massive moves is less than one/thousandth of one percent. Accurate data is imperative in this analytical environment. For example, Ermanometry Research has projections for more than 16 time periods of major support or resistance for the DJIA and S&P 500 during 1999. Among the most significant are those centered on April 12 and September 1. If the indices exceed the highs of January 8, 1999, we expect them the be making historic highs about April 12. These projections result from the application of proprietary algorithms to the number of trading days between previous major turning points. We consider turning points to be those days on which the market reaches new intraday high or low extremes and then reverses. Closing prices are not considered. Some of our algorithms require multipliers as large as four. Assume that a projection was based upon applying a multiplier of four to a move counted as 100 days. Assume that the true turning point actually occurred on day 99, but faulty data caused us to believe the turn occurred on day 100. Multiplying the incorrect total of 100 days by four, and then adding the resulting 400 days (accurate data would have given a total of 396) to day 100 of the previous move, would actually create an error of five days. A four-day error resulted from the multiplication, and adding the result to day 100 instead of day 99 of the previous move increases the error factor to five days. Obviously, this is unacceptable. Thus a data error of only one day could cause our high probability projection of a major trend change in the indices to be shifted from the time period centered on April 12 to one a whole week later, centered on April 19.

The extent to which Ermanometry Research requires accurate data may not apply to the average trader/analyst. We believe in the KISS principle (keep it simple, stupid) and a trader should never get so involved in "details" that the big picture is obscured. A favorite expression of ours is... some people are so fervent over details they get caught in their own underwear. Nevertheless, Ermanometry has found many errors in the official records of major exchanges, regarding both the actual count of trading days and daily high/low prices, and all market participants should be aware of the potential for errors and the results of using bad data. A few bad ticks may not have much effect on moving averages and oscillators, but errors have a cumulative impact. Trendlines can be terribly skewed if the bad ticks include an important high or low.

Figure 1 illustrates an erroneous daily high that still resides in data banks 10 years after it occurred. It contains a "spike" that occurred on October 31, 1988 and shown on the five-minute chart of the S&P 500 Index. If the analyst was using real time data and small increment time charts, the spike would have been obvious, and a correction made. However, on an hourly chart the spike would not necessarily be evident. The error would be almost impossible to detect on a daily bar chart.

CLICK TO ENLARGE

Figure 2

CME S&P 500 Index-Nov. 88
Time and Sales-10/31/88

2:21
2:22
2:23
2:24
2:25
2:26
27879
27878
27877
27939
27871
27871
27888
27878
27876
27873
27871
27871
27888
27877
27875
27872
27872
27873
27878

27875
27872

27873


27939


27873
Please note above that the last tick at 2:23 was 279.39, 64/100 away from the previous tick...the first tick at 2:24 repeated this aberration, and then prices returned to "normal". The 279.39 ticks were almost certainly an error.
NYFE NYSE Index Spot
Time and Sales-10/31/88
2:21
2:22
2:23
2:24
2:25
2:26
15682
15682
15681
15679
15679
15679
15683
15682
15681
15679
15679
15679
15683
15682
15681

15679
15679
15682
15682


15679
15679
This is the time and sales data for the NYSE Composite, the best surrogate for the S&P 500. This index did not reflect a sudden rise and fall at 2:23. This is conclusive proof that the data for the S&P was faulty.

Figure 2 is a "time and sales" listing for the S&P 500 Index and the NYSEC for five minutes on 10/31/88.

Figure 3
S&P 500 Index, Monday October 31, 1988
Total call volume 4,942
Total put volume 4,581
The Index: High 279.39
Total call open interest 224,261
Total put open interest 229,841
Low 277.14; Close 278.97, +0.44

Figure 3 shows the statistics printed in all of the financial papers on the next day. Please note that the erroneous tick is shown as the high for the day. This error is "forever" embedded in every historical data bank that Ermanometry has investigated. Vendors of historical data are in a difficult position. They may know of errors but if their data conflicts with the "official" data, the client will most likely assume that the official records are correct and the vendor's data is wrong. Therefore, the vendors will usually retain the faulty data rather than conflict with the official records.

A bad tick in an index is usually caused by a bad tick in one of the individual stocks in the index. The NYSE will normally correct the error in the individual stock data. However, the indices are calculated by outside vendors. Therefore, unless the outside vendor picks up the correction message sent by the NYSE for the individual stock and then recalculates the index and sends out a correction message to be inserted at the proper time, the index will remain uncorrected. It is a mistake to assume that these corrections will be made.

There is one recent development that may alleviate the problem of bad ticks in the DJIA. Dow Jones & Company, Inc. has recently canceled old licenses which allowed a multitude of outside data vendors to compute and distribute the various Dow Jones & Company, Inc. averages. Dow Jones & Company, Inc. will compute the averages and the Chicago Board of Trade will be the exclusive gateway for redistribution of the calculations to other vendors. At this time we do not know if the averages will be recomputed when bad ticks in individual stocks are corrected and the corrections in the averages then distributed. Even if these corrections in the averages are made, there is still the problem of inserting the corrections into individual data bases.

The type of error represented in figure 1 is particularly insidious because if the analyst corrected the spike on an intraday chart he may have assumed that the bad tick had been permanently eliminated. Unfortunately, since the bad tick represented the high for the day, those data feeds that recap the daily high/low, often received from other vendors, would show the bad tick as a high. Thus the error would show on the daily chart even though the analyst had eliminated it on the intraday charts.

Some errors are innocuous but Murphy's law appears to have undue influence upon when the most errors occur. A disproportionate share of errors occurs at the end of explosive or panicky moves. These are the most chaotic moments and the environment in which errors thrive. The "end" of such moves often contains the extremes for the period and price action analyzed. Therefore, the analyst must consider all extremes suspect until verified. Remember, errors at extremes affect not only timing, but trendlines, oscillators, and almost every tool in the analyst's arsenal.

It is impossible to truly appreciate the large number of price corrections, insertions, deletions, etc. without having had the experience of watching the data stream printed out on the yellow paper tape from an old Western Union type ticker. Corrections will appear almost every few inches. Sometimes the entries are as simple as changing a bid or ask quote to an actual trade, or vice versa, and other times entire strings of trades are deleted. Very often these deleted trades actually took place, but they are "busted" (deleted) because the trades shouldn't have been executed.

Busted trades are most frequent in the futures pits. When trading is frenzied it is possible that a pit broker might not hear or see every bid/ask in the pit and the market will trade "through" a price that a broker is legitimately, diligently bidding or offering.

Assume that the market is trending down from 105 in very active trading. Conditions may or may not warrant a "fast market" designation which would invoke a different set of parameters governing pit rules. Fast market conditions will not be covered in this article because it would be an unnecessary complication:
bulletBroker A is diligently bidding for 10 contracts at 101.
bulletAcross the pit, Broker B bids 100 for two contracts.
bulletBroker C, standing next to Broker B, receives an order to sell four contracts "at the market."
bulletBroker C sells 2 contracts to Broker B at 100. Broker B then drops his bid to 99, and Broker C sells him two more at 99.
bulletNews hits the market, there are no more offers, and bids rise to 103. The next trade is 104.
bulletBroker A never got a chance to buy any contracts at 101.
bulletConditions were such that Broker C neither saw nor heard Broker A's bid at 101. It was an honest error.
bulletBroker A's client, seeing the prints at 100 and 99 rightfully assumed that his order to buy at 101 had been filled.
bulletThe pit committee would most likely bust the trades at 100 and 99, and a deletion message would be sent.
bulletBroker C's sales at 100 and 99, would be given to Broker A who had been bidding 101. Thus the selling client would get a better price, and Broker A's buyer would have been filled on four of the 10 contracts he wanted to buy.

Murphy's law not only makes sure that the most errors occur at the end of runs, but also that deletions or insertions will hover around the "even" prices, such as 100 or 150. This means that the point and figure chartist may have filled a box that should not be filled, or not filled one that should have been filled. Erroneous data can cause charts to read like comic strips, and cause oscillators and moving averages to generate false buy or sell signals, particularly in short-term trading.

The analyst can take measures to protect the integrity of his data. The obvious answer is to be constantly vigilant. The best measures are to understand the differences between various data feeds and charting software. If short-term in and out trading is done, it is helpful for the analyst/trader to have a data source that automatically transmits all error messages and deletes, inserts, and corrects the data used.

There are many real-time data feeds available. Unfortunately they are not all equal in their performance concerning corrections, and speed. Another important factor is whether or not the analyst/trader can access the vendor's data base with his own computer, i.e. two way communication, or is merely a passive recipient of the data. As a passive recipient it is necessary for the analyst/trader to manually make any corrections that he may find. The kind that he would know of would probably be limited to daily high/low prices or obvious spikes. Manual corrections are time consuming and aggravating, and the trader/analyst would still be unaware of the vast majority of corrections.

There are some data vendors that make corrections in the client's data even though the data is stored on the client's computer. However, the type of corrections made are usually not as comprehensive as those available to the client who has constant direct communication with the data base of a vendor that corrects every erroneous print.

The intent of this article is to make you, the reader, aware of data aberrations but not to make you paranoid about them. Bad ticks are not going to "make or break" your trading.

Trading discipline and money management are more important, and they should not be neglected while you get distracted or "caught in your own underwear" in the attempt to clean up every single tick.