• When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network.


  • Posts

  • Joined

  • Last visited

Everything posted by RhialtoTheMarvellous

  1. What are people thinking of sending in? I'm thinking my Infinity Gauntlet #1, Crisis #1 at the very least.
  2. With the spine roll its got it's an 8.0. Fix that and it could be a 9.0, as it has a bunch of non-color breaking stress lines and a few color breaking stress lines on the spine. There is also a small indent on the front cover.
  3. 4.0, that corner isn't getting fixed by a press.
  4. Yes, I've seen that, it's a good overall tool and way to break down the grading system and your point is well taken. The classifications done on that page, if regarded as accurate, would be what we want to automate. Then the problem just becomes finding enough examples of each to train the machine on.
  5. I'm only now kind of getting a sense of the differences between machine learning and deep learning and understanding why deep learning requires a lot of data to become functional. I'm also realizing why you folks with more experience are suggesting training a dataset against much less broad criteria before going down this road. The reason I'm realizing all of this is because I actually went in and made a program (rather than the model builder tool I was using prior) to utilize tensorflow and when I got into the nitty gritty of it I saw the basics of image recognition in the consumer area are all oriented around transfer learning. The existing deep learning model you choose has already been pre-trained against millions of images that it can classify into thousands of different categories and you build out a model that uses a subset of that data. This obviously isn't going to really work to score a book under the standard comic rating system when the criteria for classification under the various scoring levels is not even known by the existing model. Hmmm, interesting. This does give me some impetus to start breaking down things into those smaller classes.
  6. I'm not sure why we are assuming that CGC takes detailed scans of books. Is there somewhere that they indicate this is part of their process? Regarding the interior defects, my assumption was the same as yours. These are outliers. Any cursory examination will show missing or damaged interior content of a book. If something in that regard is damaged then that usually becomes the major factor in downgrading the book. These books are usually excluded from grading by default unless they are much older and rarer books and those aren't really the ones that get graded in volume. The front and back cover details are what define the grade for comics being sent through in volume. Is that the case for a general learner? A spine wrap issue generally moves the book into a different scoring category which you would want to account for in the model, but at the same time some books just come with the spine wrapped differently from different eras. Or are you referring to placement of the book in the image, ie a book that is placed incorrectly could be improperly classified?
  7. Not at all. I just wish I had something more interesting to share at this point than the number of scans that CGC has on their website.
  8. It's interesting that you say that. At one point I decided to test out my new scanner and took a 1200dpi scan of one of my books. The image ended up being huge of course. I opened it up and started zooming in on various areas of the book looking closely for defects. The image was so detailed that I found myself zooming in a lot and what I found is that zooming in on a 1200dpi image is like using a microscope on it. There are so many scratches and marks that are completely invisible to the naked eye that you can detect at that resolution. This is partly why I wanted to try just using 2D scans at first, because it occurred to me that even at a lower scanning resolution there are probably patterns a computer can detect that a human can't detect.
  9. 1. If there is one thing I've gotten out of this thread (and I've gotten more than that) it's the idea that I might run an experiment of this sort initially on binary characteristics like spine ticks present or not or creases in cover or not to see how well I could train a model in that regard. This would also be an easier experiment from a data collection perspective as I'm sure everyone could come up with books both with and without spine tics. 2. Well, that's part of the deep learning problem solving aspect and one reason why machine learning can be so valuable, because it can derive outcomes using combinations of factors that aren't always evident to humans. You give the machine a bunch of data on one end and a known set of results on the other and then let it interpret the factors that differentiate the source from the results on its own. The big thing with this is training a model to predict medical conditions. You give the machine publicly available health data of thousands of people and which ones get a certain condition and which ones don't and then it can predict with a fair degree of accuracy whether an individual it is given is at risk for that condition. 4. There is definitely some standard necessary. I'm not yet sure what it is, but the CGC images are pretty weak in that regard if the CGC registry is any indication. A lot of the things I pulled off there are not even scans. They are photos of a slabbed book with bad lighting or bad angles or photos of just the score or the top of the holder.
  10. My sample choice is obviously quite bad as well, but then again it would be difficult overall to find multiples of anything at different grades.
  11. I'm starting to realize that. It seems unlikely that without some coordination I could even get the data for this sort of sample on even one book. If I make the attributes more generalized so they can be applied to any book then that might work.
  12. I don't see a real downside to CGC sharing this sort of data overall. It's not like it cuts into their business having a machine that can grade comics. You're not so much paying for the number as you are for the certification aspect and encapsulation assuring that number is correct. If anything a machine evaluator would just be the equivalent of the "Hey can you spare a grade" forum, people will still have to examine the thing no matter what.
  13. Yeah, it's a deep learning algorithm. Right now I'm just fiddling with the model builder tool they overlay on top of the architecture. I need to get into the guts and make an actual program so I can tweak some of the options to see if I can get better results, but overall I think I probably need more data.
  14. I'm using ML.NET with TensorFlow as it's kind of the easy button to pull in and train a model as I'm familiar with the .NET environment. I could go look up the algorithm it defaults to if you are curious. I pulled all of the images off of the CGC registry to make an initial attempt at this because all of them have metadata showing the score. I made a scraper that went through each page and downloaded the front/back images. I only downloaded images where I found a front and back, but that still ended up giving me a lot of random placeholders. I cleaned most of that out. One problem is that the distribution of data is highly skewed as you can see from this screenshot. There are a ton of 9.8 items, like 5x as much as the next nearest category and then 7000 times as much as the smallest category. The other problem is that the images overall are pretty low quality and don't really show defects and they are all in holders. This is interesting though because it makes me think that there are a lot of folks throwing moderns at CGC to get a 9.8 rating on them to resell for more money.
  15. Are you confused about this because there are multiple categories in the comic book example (10.0,9.9,9.8,9.6... etc) versus say the dog/cat example? That isn't really an issue. You can establish any number of different categories for the computer to evaluate as long as you have the sample data for those categories. You could have the computer evaluate dog/cat/fox/beaver if you wanted to. This is in fact what Google and other big data companies do for their image search.
  16. The difference between a dog and a cat to a computer are not black / white it only seems that way from the human perspective. In reality, you're unconsciously examining a hundred different things about the image to contextualize it as a dog or a cat. If you give a computer a picture of a dog or a cat without any identifier, say image007.jpg it has no idea what the heck it is at all. It can't classify what it shows in any meaningful way at all like a human can. It knows only things that it is programmed to understand by the operating system. It could tell you what the name of the file is, whether it's a PNG or JPEG or how big it is in KB, but it won't be able to tell you that it's a picture of a dog or a cat. The idea behind machine learning is giving the machine the ability to do that, by training a model. You train a model using an existing dataset where a large set of data is broken into the categories you establish (dog or cat or 9.8, 9.6, 9.4 etc). For instance you take 2000 images of dogs and put them all into a folder/category named dogs. Then you take 2000 images of cats and put them all into a folder/category named cats. You let the computer examine the image sets and heuristic algorithms establish the attributes that the computer will use to identify the images inside of the model. This is something of a black box in that we don't necessarily always know what the computer is building up in the model to discern the differences. The training algorithm will take a subset of the given data, say 3/4 of it, and build up the model and then it will validate the model against the last 1/4 of the data (ie, use it against the AI it has made without giving it any category information to see if it gets the answers right) giving you an idea of whether it works or not. It's an amazing thing. The only rub is obtaining the data. It's pretty easy to find pictures of dogs and cats, but not so much pics of comics with certain grades.
  17. In what way do you find this comparison flawed? What attributes does the machine build up in the model from a few thousand images of dogs and cats which then allows it to take a completely new image and assign a value of dog or cat to it?
  18. As I noted in the OP the interior stuff isn't going to be accounted for in this particular scenario, though in many cases staple rust is visible on cover areas. Given the parameters of this experiment you could fool the thing any number of ways even if the model is well trained. If you have a mint looking set of covers and half the interior pages are cut out then it won't likely be able to tell unless there is some definitive difference in the images given to it. I'm really just trying to keep it simple at first to see if I could compose some sort of model, but it was kind of a long shot asking for submissions on here anyway. I've already downloaded all of the images from the CGC Registry and built a model around that, but the images are a bit too spread out and varied in quality for it to get any sort of macro accuracy (ability to select across categories) and they are mostly all in holders which limits the visibility. That said, it's much more likely that we could build a model around one of these attributes you've pointed out. For instance page quality. Scan a sample interior page section from any book and have it discern the differences between white and off-white for page color. Or staple quality, I'm not sure if there is a metric around that or if they are just good or bad. If the metric was rusty or not rusty I'm sure we could easily have it discern the difference, but at the same time that is probably an easy one for a human to eyeball.
  19. I probably emphasized the visual too much in my post. The machine can see things in the image that aren't necessarily visible to the human eye as it breaks down the image using algorithms that I can only pretend to understand. But, I would go with the more detailed the image the better. Let's consider the problem. How does a machine evaluate whether a cat is present in a picture versus a dog. This is a solved problem btw. You collect a lot of images of cats and put them into the cat category and then a lot of images of dogs and put them into the dog category and let a convolutional neural network break down the images in category for patterns. https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks The stage I'm at is gathering the dataset to see if I can feed the ML algorithm something it can use. For the gem mint problem, a machine evaluates the image based on the criteria it is given. The two criteria in this case are the score and the image. The image is evaluated by the machine at multiple different levels meaning there are a bunch of criteria we don't necessarily understand that it is using to evaluate the image. This is the black box of the model. If you put enough data into the 10.0 category alongside the data from 9.9 and 9.8 then it will be able to discern the differences because it breaks down the image along lines that aren't purely visual. That said, since there are few examples of 10.0 or 9.9 then the limitation will be that you cannot rely on the machine to give you an accurate result in that regard, just like for any modern comic you could not rely on it to tell you something is a 2.0 since there aren't many examples of that. These are admitted limitations and something I already acknowledged in the original post. I still would like to try to gather some data in this regard, but you are making me think about this problem overall and what sort of other things we could do. Rather than go more broad with the scoring criteria which confines us to a particular comic and the differences between that comic in each category we could try to identify various visual defects. For instance, spine ticks. Comics with and without visual spine ticks could be identified if we get enough examples.
  20. It's not exactly that. I should have said if we want it closer to the CGC scoring then the scoring should come from CGC. The criteria for scoring is based on visual data. Comic defects are visual. If you aggregate enough information into the model in each classification then the model will be able to discern the differences.
  21. Ideally it would be the raw scans of graded books. Which is why it is a tall ask. Grading is somewhat subjective no matter how it is done, but overall if we are looking to replicate the CGC grading process then the actual grade should come from CGC. If that's mission impossible then we could just take books scored by users and train the model using that, but obviously the results would be more reflective of a result you might get in "Hey buddy can you spare a grade." That said, even with user data this could still have value. The result you get from the model is not just one score. It evaluates the image and gives you multiple categories that it could fall into with a percentage value for the certainty of each with the highest score being the primary result. Ideally if the model is working well you get a certainty of 90+% on that primary result.