Posts

Predicting Dropouts: How Regression Models Reveal Hidden Patterns in New York’s High Schools

Image
A step-by-step data science journey from messy datasets to meaningful insights Quick Overview — What You’ll Learn Before we dive in, here’s what this article covers: Understanding the Problem: Predicting high school dropout rates — why it matters and how data can help. From Raw to Ready: Cleaning and preparing real-world education data for machine learning. Exploring the Story in the Data (EDA): How visualization uncovers trends and data integrity issues. Choosing the Right Model: Why “one-size-fits-all” doesn’t work for predicting counts like dropouts. Modeling in Action: Comparing Linear Regression, Poisson, and Negative Binomial models. Evaluating and Validating: How cross-validation ensures that your model isn’t just lucky. Lessons and Takeaways: What this project teaches us about modeling, data storytelling, and real-world decision-making.  1. Understanding the Problem Each year, educators and policymakers grapple with the same question: why...

When AI Learns to Lie: Inside the Neural Machinery of Machine Deception

Image
  What You’ll Learn in This Article: The critical distinction between AI making mistakes (hallucination) and AI deliberately deceiving (lying) How researchers discovered the “rehearsal process” where AI practices lies before saying them The three-step assembly line AI systems use to construct deceptions Detection and control techniques that can identify and steer AI honesty in real-time The disturbing trade-off between honesty and performance that creates economic incentives for deceptive AI Why this matters now and what it means for the future of AI safety Ask an AI a simple question: “What’s the capital of Australia?” It answers: “Canberra.” Now ask it to lie about the capital of Australia. It says: “Sydney.” This might seem like a parlor trick, but groundbreaking research from Carnegie Mellon University reveals something far more concerning: the AI knows the correct answer is Canberra, consciously decides to deceive you, and systematically plans how to construct that ...

Cracking the Code of Online Popularity: Lessons from Feature Selection and PCA

Image
  Predicting whether an article will go viral is a puzzle that blends data science with human behavior. In this project, we worked with a large dataset of online news articles, aiming to forecast popularity (measured as the number of shares) using dozens of explanatory variables. The assignment was straightforward in its goal but complex in its execution: reduce dimensionality, train models, and report performance. Along the way, we uncovered lessons about interpretability, complexity, and the limits of linear regression in messy, real-world data. The Dimensionality Challenge Our dataset contained nearly 40,000 articles with 60+ explanatory variables  — ranging from keyword frequency to sentiment polarity. This posed the classic curse of dimensionality : too many features relative to the predictive signal often leads to overfitting, inefficiency, and inscrutable models. To tackle this, we explored three modeling paths: Full features  — a baseline model with all predictors. Feature...

KASA: Your Voice, Your Community, Your Success

  Speech by Emmanuel Kasigazi, KASA President Introduction Good morning, everyone. My name is Emmanuel Kasigazi, and I’m honored to serve as your President of the Katz School African Student Association — KASA. I’m here studying Data Analytics and Visualization, originally from Uganda with an undergraduate degree in Information Systems. But more importantly, I’m here as someone who understands your journey. I’m a seasoned engineer and entrepreneur, and just like you, I’m an immigrant. I arrived here last year. Before that, I spent time in Toronto, and years ago, I lived in South Sudan. I know what it’s like to not live in your home country. I understand what you might be going through, what you might be experiencing. Different countries, yes, but leaving home is leaving home — and that experience connects us all. My Background Back home, I’ve been a leader throughout my life. I ran companies with teams of employees, worked across various sectors from tech to branding, printing prod...

Venture Capital Is Just Business: Lessons from Chapati Stands, Microfinance, and AI

Image
  Why This Matters When people hear “venture capital,” they picture boardrooms full of billionaires, complex financial models, and Silicon Valley jargon. But the deeper I’ve gone into this world—through programs like Venture Institute, NSF I-Corps, and my own entrepreneurial journey—I’ve learned a simple truth: venture capital is still business. Like running a chapati stand in high school or lending to small SMEs in East Africa, it comes down to the same loop: Get resources. Add value. Deliver results. Grow trust and relationships. The magnitude changes, but the fundamentals don’t. The People Game One of my biggest insights from the Venture Institute is that venture is a people’s game. Limited Partners (LPs) are the customers of a fund, not startups. GPs live and die by relationships—just as I once did when one client made up 50% of my branding business, and we nearly collapsed when they pulled out. In VC, no LP should ever hold more than 20% of a fund. Diversification isn’t just...

๐Ÿš€ IT'S TIME! NSF I-Corps Summer 2025 NY Regional Lean Bootcamp kicks off Thursday! ๐—Ÿ๐—˜๐—ง'๐—ฆ ๐—š๐—ข๐—ข๐—ข!! ๐ŸŽ‰

Image
๐Ÿš€ IT'S TIME! NSF I-Corps Summer 2025 NY Regional Lean Bootcamp kicks off Thursday! ๐—Ÿ๐—˜๐—ง'๐—ฆ ๐—š๐—ข๐—ข๐—ข!! ๐ŸŽ‰ A few months ago, I got accepted into the NSF I-Corps Summer 2025 Lean Bootcamp, a competitive startup training program backed by the ๐—ก๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—™๐—ผ๐˜‚๐—ป๐—ฑ๐—ฎ๐˜๐—ถ๐—ผ๐—ป (๐—ก๐—ฆ๐—™)! Since we kick off in just a few days (July 31st, a day before my birthday, no less), I figured it’s time I shared the story behind it all. So... what IS this whole thing? Think of it as a three-level pyramid: ๐Ÿ”น ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น ๐Ÿญ: ๐—ง๐—ต๐—ฒ ๐—ก๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—™๐—ผ๐˜‚๐—ป๐—ฑ๐—ฎ๐˜๐—ถ๐—ผ๐—ป (๐—ก๐—ฆ๐—™) – "๐˜›๐˜ฉ๐˜ฆ ๐˜Ž๐˜ฐ๐˜ฅ๐˜ง๐˜ข๐˜ต๐˜ฉ๐˜ฆ๐˜ณ"  The NSF is the U.S. government agency funding America’s scientific future. They invest billions in university research and back Nobel Prize winners.  When they support a venture, it carries serious weight. ๐Ÿ”น๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น ๐Ÿฎ: ๐—œ-๐—–๐—ผ๐—ฟ๐—ฝ๐˜€๐—ง๐—  (๐—œ๐—ป๐—ป๐—ผ๐˜ƒ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—–๐—ผ๐—ฟ๐—ฝ๐˜€) – "๐˜›๐˜ฉ๐˜ฆ ๐˜Œ๐˜ฏ๐˜ต๐˜ณ๐˜ฆ๐˜ฑ๐˜ณ๐˜ฆ๐˜ฏ๐˜ฆ๐˜ถ๐˜ณ'๐˜ด ๐˜‰๐˜ฐ๐˜ฐ๐˜ต๐˜ค๐˜ข๐˜ฎ๐˜ฑ" For d...