TITLE: STOCKPULSE: UNSUPERVISED ANOMALY FLAGS FOR DAILY STOCKS


INTRODUCTION:

This is me writing down exactly what me and my built, how it works, and how you can repeat it with zero drama. It looks at daily stock data and points at days that feel off. I tried to keep everything small and clean so it’s easy to maintain. Training is fully unsupervised. Labels come in later only to pick a cutoff on validation and to grade results on test. we used AAPL daily bars so it’s concrete.


WHAT THIS PROJECT DOES:

- Load one ticker’s daily bars over multiple years.

- Build a tiny feature set from Close and returns.

- Train Isolation Forest as the main detector and compute a single score i call wierdness.

- Remove trend and seasonality with STL, then run a spike test using generalized ESD.

- Detect change points with PELT so we catch larger regime shifts.

- Fuse the signals with a tiny rule so alerts are ranked into tiers.

- Export a simple CSV with six columns for Tableau so you can see everything fast.


SCOPE:

Daily data. One asset for the post, but you can swap tickers. The method stays unsupervised. Labels are not used for fitting, only for validation cutoff selection and test grading. No trading engine here. Just a clean flagger.


DATA:

We work off a standard OHLCV CSV. Columns like date, open, high, low, close, volume. we only need Close to compute the features i actually use. That minimalist choice helps stability and makes plots easy to read later.


FEATURES:

- close_daily_return

- 20 day rolling mean of returns

- 20 day rolling std of returns

- 20 day z score of Close


These four cover short term moves, dispersion shifts, and relative level vs a local baseline. Alot of projects overdo feature counts. This set is enough to get meaningful wierdness without making the model twitchy.


PRIMARY DETECTOR:

We used Isolation Forest on the train split only. Hyperparams are plain and boring on purpose. The important thing is how we define the score. scikit learn exposes score_samples and decision_function. decision_function equals score_samples minus an offset. To avoid confusion i lock one rule for the whole project. wierdness equals negative score_samples. Then i standardize using the train mean and std so wierdness is on a consistent scale across splits. Bigger wierdness means more anomalous. One definition. Everywhere. That makes thresholds, charts, and metrics sane.


HOW I CALCULATE WIERDNESS:

1. Call score_samples(X) on the trained Isolation Forest.

2. Multiply by negative one so higher means more wierd.

3. Compute the train set mean and std of that wierdness.

4. Transform all splits with (wierdness − train_mean) divided by train_std.


Now wierdness is centered and scaled from the train distribution. That simple step removes headaches when you compare validation and test.


SPLITS:

We split the timeline in order. 60 percent train, 20 percent validation, 20 percent test. No random shuffle. Time matters. This prevents leakage and mirrors how you’d actually run it going forward.


CHOOSING THE CUTOFF:

On validation, we sweep a list of percentiles. For each percentile we compute a numeric cutoff on wierdness, then evaluate precision and recall against a simple validation label file. The selection rule is short. pick the smallest cutoff that achieves the target recall. If multiple cutoffs hit the target, choose the one with higher precision. we log the chosen numeric cutoff, the approximate percentile, and the achieved recall and precision on validation. That cutoff is then frozen and used on test.


SPIKE DETECTION:

Spikes should be detected after removing smooth trend and seasonal patterns. we  run STL with robust set to true so the smoothing step down weights outliers. we keep the residual series from STL. Then we run generalized ESD on that residual. ESD is built to detect up to k outliers. On long histories we run ESD in windows of roughly one trading year so old clusters don’t mask new spikes. The output is a boolean spike flag per day.


CHANGE POINTS:

For larger shifts we  use PELT from the ruptures library. It’s a penalized change point method with pruning so it stays efficient. I tune the penalty by sweeping a small grid and we  pick the last calm point before the number of detected change points explodes as you lower the penalty. Two extra knobs matter in practice. min_size which forces a minimum gap between change points, and jump which subsamples candidate indices to speed things up. This gives a clean set of regime boundaries without chopping the series into confetti.


FUSION RULE:

We don’t want a single detector to decide everything. We fuse them with a small rule and a proximity window of one trading day around any change point.

- Tier A if there is an ESD spike near a change point, or if the IOF wierdness is extremely high around the top one percent on validation.

- Tier B if IOF flags and there is either a spike or a near CP context.

- Tier C if it’s just an isolated spike far from any CP, or a weak IOF day alone.


This keeps the top bucket strict and pushes mid confidence days to a review tier so you can triage properly.


OUTPUT FILE:

Everything is summarized into one CSV for Tableau with exactly six columns:

date, close, wierdness, iof_flag, shesd_flag, cp_flag

TABLEAU VISUAL:

We open tableau/signals.csv in Tableau Public. Put date on Columns and close on Rows. Add markers or colors for iof_flag, shesd_flag, and cp_flag. Then add a Reference Line from the Analytics pane set to the numeric wierdness cutoff i chose on validation. That line makes it clear which points cross the boundary. If you don’t see Reference Line, check that your axis is continuous and that you’re in the Analytics pane, not the Data pane.


METRICS I REPORT:

Training stays unsupervised, but we still want to grade. On test, at the frozen cutoff, we compute precision and recall. Then we compute the Precision Recall curve plus Average Precision, and the ROC curve plus ROC AUC. we save a small metrics_summary.csv with the headline numbers and two files with the curve points so we can plot them later if needed. scikit learn has clear functions for these so it’s straightforward.


WHAT THE METRICS MEAN:

Precision is out of everything we flagged, what fraction were labeled events. Recall is out of the labeled event days, what fraction we caught. PR AUC measures the quality of the ranking toward high recall without flooding with false alarms. ROC AUC looks at ranking quality over all thresholds. Both help, but for rare events PR is usually more honest.


REPRO STEPS:

1. Build features.

2. Split chronologically into train, validation, test.

3. Fit Isolation Forest on train only.

4. Score wierdness and standardize using train mean and std.

5. Run STL residuals with robust true and period tuned reasonably.

6. Run generalized ESD for spikes, windowed on long series.

7. Run PELT for change points with a short penalty sweep, set min_size and jump.

8. Fuse IOF, spikes, near CP into tiers with K equal to one trading day.

9. Build validation labels, pick the cutoff using the recall first rule, prefer higher precision when tied.

10. Evaluate on test, save metrics files.

11. Make tableau/signals.csv and visualize with the Reference Line.


CHECKS:

- Dates parsed to datetimes, sorted ascending, one row per day.

- Train wierdness roughly centered after z scaling, no obvious skew from bad casting.

- Validation cutoff actually hits the target recall number i set.

- Spike flags line up with residual bursts not just raw level moves.

- Penalty sweep shows a visible bend and the chosen penalty sits just before over segmentation.

- A handful of Tier A days look genuinely unusual on the price chart.


TUNING NOTES:

- If you get too many spikes, check STL period and keep robust true. Also trim max_outliers for ESD.

- If you get too many change points, raise the penalty or increase min_size. Set jump to a larger number if runtime matters.

- If recall is too low on validation, pick a higher recall target and accept the precision hit. Log that choice.

- If wierdness magnitudes look off between splits, re check that you standardized using train stats only, not mixed.

HOW TO ADAPT:

Swap the ticker and date range. Re run the exact steps. If you know your asset has a weekly or monthly cycle, adjust STL period accordingly. If your series is noisier, allow a slightly higher outlier ratio for ESD. If segmentation is messy, lift the penalty and raise min_size a bit. Always write down what you changed and why, with the date.


CONCLUSION:

This setup gives a solid first pass at anomaly flags on a stock series without labels. Isolation Forest catches distribution odd days. STL plus ESD catches spikes after removing smooth structure. PELT finds regime shifts. The fusion tiers keep the most serious days on top. The outputs are tiny and human friendly. It is not the final word and it doesn’t need to be. It is repeatable, readable, and easy to improve. That was the goal.


Comments