I have two separate folders of JPEG images—no labels yet—and I need to understand, quantitatively, how the content of one set stacks up against the other. Ultimately I want a full confusion matrix plus the four key scores: precision, recall, accuracy and F1. Because the images are unlabeled, the first task is to assign meaningful class labels that both datasets share. I’m flexible on how you achieve that; you might opt for manual annotation, leverage a pre-trained CNN, or use an active-learning loop that mixes both. Just let me know which route you prefer and why it will be reliable. Once labeling is complete, please: • Generate the confusion matrix that compares the two datasets class-by-class. • Calculate precision, recall, accuracy and F1 for each class and overall. • Present the findings in a clear report (tables plus a brief interpretation) and include the scripts or notebooks you used. Python with scikit-learn, PyTorch/TensorFlow and pandas is perfect, but if another stack suits you better, say so. I’ll provide the image folders as soon as we start, and I’m happy to answer any clarifying questions about the project or domain. Looking forward to seeing how you tackle this.