Profiling Marvel and DC Characters

Ordinary Twilight
Analytics Vidhya
Published in
10 min readJul 9, 2021

--

My attempt to describe the average comic book character.

My results at a glance.

Let’s set the scene: It’s after midnight. I just opened a dataset I downloaded online with 16,000 points. Tableau has never lagged this badly before. Yet I keep pushing on, because I am determined to answer this question before I finally go to sleep: If you were to run into a random comic book character, what would they look like?

From the raw data, my money was on this: A male character with a secret identity who is considered as a good guy, is alive (not resurrected/ghost/zombie) and has black hair with blue eyes. It’s pretty likely that he’s from Marvel comics, since Marvel has more characters than DC according to the dataset I used which pulled characters from the publishers’ Wikia pages. Admittedly, the chances of him being Spider-man are also very high because the webslinger has had 4043 appearances in comic books so far. So… which argument is right? Before you continue, it might be helpful to have the Tableau workbook open so that you can see the full charts!

Making sense of the data

  1. Groups: Eye and hair colours had a lot of categories, such as “multiple eyes”, “photochromatic eyes”, “dyed hair”, “no hair” etc. I grouped everything into an “Others” category since those individual categories were really small and only added chaos to the chart colours.
  2. Chart colours: Initially I just made the category colours accurate to the descriptors, but I realised that a tonal palette might be less jarring… and yet I went back to my original plan in the end. Oh well.
  3. Alive/Identity status: I’m going to assume that these are representative of the time the dataset was made, because comic books rewrite history a lot of times… for example Spider-Man revealed his identity to the whole world before only to regret it so much that he made a deal with a literal devil to make his identity secret again. As for deaths, there used to be a saying that the only people in comic books who stayed dead were Bucky, Jason Todd and Uncle Ben. As of today, Bucky is the Winter Soldier and Jason Todd is Red Hood, and Uncle Ben is really the only person who’s stayed consistently dead all this time…
  4. I ended limiting the results to the top 10,000 characters in terms of appearance, because the bottom few thousand were once-off characters who weren’t too significant outside of their ability to lag my computer.

I showed the initial draft of the dashboard to friends for their feedback, and this is what they asked:

Is it some kind of ecological fallacy to aggregate black hair and blue eyes? In isolation those two qualities are most prominent but together is this also true? I always associated blue eyes with blonde hair.

Off the top of my head Superman, Batman (occasionally), Wonder Woman and at least 3 different Robins have this particular hair/eye colour combination. For the older heroes, I could argue that printing limitations restricted the number of colours available. I’ve seen this affecting comic books before but I’m not sure if it applied here, and I’ve also heard fans speculate that the artists project themselves onto the characters, so if they’re mostly Caucasian it might make sense that they’ll stick to familiar combinations. Of course, blond hair/blue eyes is really common too. Just look at Thor, Captain America, Captain Marvel, Emma Frost etc. Could I really trust my assumptions? Of course not… cue a bit of extra digging using dummy variables which just mashed the hair and eye colours together (e.g. Blond hairBlue eyes and Black hairBlue eyes).

Eventually, I found out that blond haired and blue eyed characters appear the most frequently, but the number of characters classified as such are surprisingly low. In fact, the category with the most characters (not appearances) is black hair with an unknown eye colour. My guess is that the vast majority of such characters don’t have a lot of appearances, hence that disparity in appearance numbers and character count. Another thing to note is how I might have been right in thinking that a lot of popular DC characters have black hair and blue eyes, as seen from the difference in the number of appearances between DC and Marvel for that category. The second picture is the Kirby-esque bubble chart equivalent to the bar chart, which explains the large number of appearances due to black-haired, blue-eyed characters. There aren’t as many of them compared to blond hair and blue eyes, but their average number of appearances was higher.

A high number of appearances doesn’t necessarily mean a high number of characters!

The next picture is the Kirby-esque bubble chart equivalent to the bar chart, which shows the large number of appearances due to black-haired, blue-eyed characters. There aren’t as many of them compared to blond hair and blue eyes, but their average number of appearances is higher. Perhaps these hair and eye combinations are reserved for main characters?

If you see a lot of people who look like Superman or Captain America… You aren’t wrong, they appear a lot.

Before we move on to the next part, I’ll talk a bit about the remaining charts in the first dashboard.

Alignment: It seems like alignment-wise, DC doesn’t have a lot of neutral characters and there aren’t a huge number of iconic villains from both publishers compared to heroes.

Identity: It seems like public identities make the bulk of the important characters instead of secret identities. This makes sense, seeing that the dataset is very much Marvel-biased in terms of the number of characters, and many of Marvel’s most significant characters don’t really have a secret identity… just compare the Justice League and the Avengers.

Totals: The chart colours reflect the decades where the characters first showed up in. It can be seen that both publishers have very significant characters from the 1960s. Obviously the longer the character has been around the more appearances they would have, but the 1960s for Marvel seems disproportionately large. Iconic characters from the Silver Age such as Spider-man, the Avengers and the X-Men are probably to blame.

The Second Dashboard

Heroes vs Villains

It’s not exactly a direct revision of the first dashboard, but more of an alternative way of representing the data, with a focus on the classic contrasts between heroes and villains.

Your first thought while looking at the picture: “What on earth is this and why are there so many categories?” The characters here are categorised by a the dummy variable I used before, which is called “Hair/eye colours”, a mashup of the values for hair and eye colours. Now your second question is probably: There is a filter right there on the dashboard, why don’t I use it to cut the tail…

  1. The first thing I wanted to note was the tail, especially for the villains chart. It’s pretty interesting to note how the tail of the villains chart for the number of appearances is much longer than that of the heroes chart. This indicates that we’re more likely to see weird hair/eye colour combinations (e.g. photocellular/unnaturally-coloured eyes) on villains instead of heroes. This fits with comic book archetypes: A humanoid hero (usually a Caucasian-coded male) fighting a monstrous-looking villain.
  2. Next, it should be noted that blond-haired and blue-eyed heroes appear the most times, while brown-haired and brown/black-eyed villains appear the most frequently. This divide is probably deliberate, playing up to the whole “Light = Good and Dark = Bad” trope. I should note that quite a lot of heroes have dark hair and eyes while a lot of villains are blond-haired and blue-eyed, but the numbers are noticeably lower than their respective counterparts. A lot of villains fell in the black hair/unknown eye colour category, which sort of supports my “comic book artists think that black hair implies villainy” idea… and then I remembered that Tony Stark and Bruce Wayne have black hair. Oops.
  3. Neutral characters seem to appear the most frequently with black hair and blue eyes, and perhaps this colour contrast between “good” and “bad” is somewhat deliberate. Of course, this combination is very common among all the characters in this dataset, and since there aren’t a lot of neutral characters relative to heroes/villains, it’s a bit difficult to confirm the trend. To further complicate things, I noticed a few heroes included in the neutral category, probably because these heroes have had villainous stints before and are considered as anti-heroes.
  4. I realised something interesting: Even though heroes appear far more frequently than villains, there are actually more villainous characters than heroic characters! One explanation I have for this phenomenon is that comic book titles tend to center themselves around a particular hero (e.g. Spider-Man) or a team of heroes (e.g Justice League). Many villains end up being once-off characters (villain of the week) or supporting characters to the hero’s main story (e.g. the Rogues which frequently battle the Flash). It is less common for a villain to become popular/memorable enough to return many times or get their own comic series (e.g. Loki, Joker etc.). In this dataset, Victor von Doom (aka Doctor Doom, frequently fights the Fantastic Four) is the villain with the most appearances (721, peanuts compared to Spider-Man’s 4043 appearances).
  5. Another thing I got from the Alignment (By Gender) chart is how you’re more likely to find a villain in the “Not Applicable” gender category. After checking through the characters in this category, I concluded that this category mostly has non-human characters such as robots and alien forces, and assigning a gender (such as agender/genderfluid/transgender, which were some of the other categories) to them wouldn’t really make sense. Again, this fits the “monstrous villain” trope, with most humanoids recognisably human-looking enough to assign a gender to.

Next, I decided to compare the more significant characters with more than 300 appearances. I used 500 as the previous threshold but decided to lower it so I could see more villains.

Most persistent heroes
Most persistent villains
  1. First thing I noticed: Wow, there really aren’t a lot of prominent female villains… there are only two with more than 300 appearances: Felicia Hardy (aka Black Cat, 332 appearances) and Raven Darkholme (aka Mystique, 371 appearances). Both should be reasonably recognisable to Spider-Man and X-Men fans respectively. It is interesting to note that none of DC’s female villains survived the filter. In fact, only 14 villains managed to appear more than 300 times since the 1940s, and the top 8 spots are all male villains.
  2. I’m a little bit surprised by how the number of significantly persistent villains is so low. However, after thinking about it, I guess this makes a bit of sense: A lot of the major villains for a particular story arc only appear for that story and get killed off when the heroes eventually win. Hence their appearances are limited in number even though their impact might be huge (e.g. that time the Anti-Monitor destroyed the multiverse during Crisis of Infinite Earths). Something to note is how Marvel seems to reuse their villains more than DC, at least among the top villains (table with their data in the fourth picture). A similar trend is seen among the heroes (third picture), but it should be noted that this data set has a lot more Marvel characters than DC characters. Actually if I think about it, Marvel has always had a much larger character roster than DC if the thickness of my character encyclopedias counted as an indication. This is probably because Marvel went a lot crazier than DC when it came to the “Multiverse = infinite versions of everyone” idea. Of course, multiple multiverse resets over the years have probably changed the numbers again, and I wonder what a new dataset would look like.
  3. After taking a while to stare at the hero/villain tables, I realised why certain villains show up so frequently: Both publishers really like the idea of a rogues gallery. The top villains list consists of quite a few Spider-Man and Wolverine enemies, as well a couple of Batman’s. It makes a lot of sense that Lex Luthor is so high up the list, after all he’s Superman’s archenemy and all that.
Bar chart version of the first dashboard.

Lastly, here’s a bar chart version of the first dashboard! It shows the numbers more clearly, but we lose a lot of the eye/hair colour categories. I also tweaked the Totals chart to show how trigger-happy the publishers were. In general, they don’t kill off characters which have a lot of appearances. If they do kill them off, it’s rarely permanent. This trend is also there if I include all 16000+ characters… that nearly caused Tableau to crash but it was worth it…

Additionally, linking this project to the real-world we can see how comic books and perhaps other types of media show disproportionately more males than females… something to link to gender representation, perhaps… another thing is how blue eyes are really popular in the comics, perhaps because lighter eyes can show more emotion than plain dark eyes… maybe that’s why people like to wear contact lenses… I jest. It’s possible that printing methods and cultural norms have influenced us too, but those are issues to explore with more datasets another day…

I’ll end this off with the tonally coloured version of the bubble chart, which looks a lot like a portal to another dimension. Thanks for reading, and I wonder where would the portal would lead us to…

Geeky note: The bubble charts remind me of the Kirby Krackle effect commonly found in older comic books…

--

--