The Waving Flag: ADLG Stats Are Broken (For Now)

Thursday 18 April 2024

ADLG Stats Are Broken (For Now)

I have a penchant for data analysis.  I blame my scientific background, but as a wargamer it comes in handy sometimes.

One such occasion was when I investigated the myth of "Super Armies" in Art de la Guerre (ADLG) in 2018.  I've continued my interest in this area, but I recently uncovered a significant problem with the data I've been using.

Background
In 2018 and 2020 I used the online ADLG results database to probe the perennial question of super armies.  In early 2021 I used the same data to look at the performance of the Yuan Chinese following their success at the World Championships from 2016 to 2019.

I found that, the more an army is used, the more its performance (efficacy in ADLG-speak) tends towards 50%.  So in short, the answer is no: there aren't any super armies in ADLG.  Remember, by super armies I mean armies that will have a high degree of success regardless of who is using it.

What about V4?
Obviously, the April 2021 launch of V4 had the potential to disrupt this pattern and in 2022 I wrote a new spreadsheet to analyse just the V4 results and to look for new trends.

I did this work well before the database has big enough to reliably identify trends.  I knew I'd have to wait a few years but I wanted to monitor the growth of the database in the meantime.

Consistency?
In the 20 months since I first analysed the V4 data, the data set has grown considerably and it now contains some nineteen thousand records, but in the last nine months a problem has emerged with the data.

The records are presented as a table listing the number of victories, draws and defeats for each of the 300 army lists in V4.  I expected that the average efficacy for all the armies to be very close to 50% and relatively stable.  This was the case until September last year as this table shows:

The numbers in red show a trend of armies becoming better as more results are added to the database.  This was unexpected and very strange.  If correct, this would be a major change; so I began digging in the data.

What's going on?
The data set is not stable and has a problem.  The trend of increasing average efficacy suggests there were too many victories and not enough defeats!  This is very odd because every victory should have a corresponding defeat somewhere in the database.

This pattern isn't obvious looking at individual armies as their victories are nearly always linked to defeats against other armies.

However, when all the results are used, the number of victories must be the same as the defeats.  If, for some reason, this wasn't the case then the average efficacy would either increase (too few defeats) or decrease (too few victories).

I quickly confirmed that the former was the case by analysing two data sets.  The first is for V3 (09 Jul 2020) alone and the second for V3 and V4 (24 Mar 2024).

As of 24 March 2024 there 1,534 victories without a corresponding defeat.

Putting things right
At the end of March I got in touch with Hervé Caille (the author of ADLG).  He confirmed my findings and by last week had identified two underlying problems:

  • From the start (V3 as well as V4) there has been a problem with "phantom" games in the database created when a player does not play in all games in the event (one day of a two day event for example).
  • There is a problem dealing with french accented words such that all games labelled "Défaite" are not counted.  Until September 2023 this was not a problem, but it is now.  Hervé believes it's something to do with charsets.

The first problem simply requires the deletion of the "phantom" games.  The second is more complex as Hervé would really like to use the original data containing french accents, but so far he hasn't found a way to do this.

If he can't solve the charset problem he has the option to edit the database replacing "Défaite" with "Defeat".  This is easy enough to do.

However, I think this raises an ongoing issue when importing new data: he would have to check that the new event data doesn't contain any accented words before it's imported.  Simple enough but a nuisance.

What's next?
I expect Hervé to repair the database fairly quickly.  When repaired the "missing" data will "reappear" and the efficacy data will be more reliable and truly representative of an army's performance.

When all is well I'll post an update in the comments.

1 comment :

Vexillia said...

One day after publishing the post, I've been told that the database has been fixed and the online data has been updated. I said it wouldn't take long.

For those interested, Hervé has resolved both issues: the main one relating to accented characters and the smaller one where players didn't play all games in an event.

Now the total number of victories matches the total number of defeats. A few absolutely tiny errors remain but they are nothing to worry about.

The number of V4 records has jumped to 21,309 now that all results are counted. This is a touch over 52% of the figure for V3. Not bad in such a short time.

Please note: because of the way the army statistics work each game generates two records; one for each player.

Salute The Flag

If you'd like to support this blog why not leave a comment, or buy me a beer!

Salute The Flag