Implement the softmax function, which converts a vector of raw scores (logits) into a probability distribution.
softmax(xi)=∑j=1nexjexi
The output values all lie in (0,1) and sum to exactly 1.
Numerical stability: A naive implementation can overflow for large xi. The standard trick is to subtract the maximum value before exponentiating:
softmax(xi)=∑jexj−max(x)exi−max(x)
This is mathematically equivalent but avoids inf in floating point.
Input: [1.0, 2.0, 3.0] Output: [0.0900, 0.2447, 0.6652] # sums to 1.0