How to Compute the Derivative of a Sigmoid Function (fully worked example)
This is a sigmoid function:
The sigmoid function looks like this (made with a bit of MATLAB code):
x=-10:0.1:10;
s = 1./(1+exp(-x));
figure; plot(x,s); title('sigmoid');
x=-10:0.1:10;
s = 1./(1+exp(-x));
figure; plot(x,s); title('sigmoid');
Alright, now let’s put on our calculus hats…
Here’s how you compute the derivative of a sigmoid function
First, let’s rewrite the original equation to make it easier to work with.
Now we take the derivative:
Nice! We computed the derivative of a sigmoid! Okay, let’s simplify a bit.
Okay! That looks pretty good to me. Let’s quickly plot it and see if it looks reasonable. Again here’s some MATLAB code to check:
x=-10:0.1:10; % Test values.
s = 1./(1+exp(-x)); % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.figure; plot(x,s,'b*'); hold on; plot(x,ds,'r+'); legend('sigmoid', 'derivative-sigmoid','location','best')
x=-10:0.1:10; % Test values.
s = 1./(1+exp(-x)); % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
figure; plot(x,s,'b*'); hold on; plot(x,ds,'r+'); legend('sigmoid', 'derivative-sigmoid','location','best')
Looks like a derivative. Good! But wait… there’s more!
If you’ve been reading some of the neural net literature, you’ve probably come across text that says the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).
[note that and s'(x) are the same thing, just different notation.]
[also note that Andrew Ng writes, f'(z) = f(z)(1 – f(z)), where f(z) is the sigmoid function, which is the exact same thing that we are doing here.]
So your next question should be, is our derivative we calculated earlier equivalent to s'(x) = s(x)(1-s(x))?
So, using Andrew Ng’s notation…
How does the derivative of a sigmoid f(z) equal f(z)(1-(f(z))?
Swapping with our notation, we can ask the equivalent question:
How does the derivative of a sigmoid s(x) equal s(x)(1-(s(x))?
Okay we left off with…
This part is not intuitive… but let’s add and subtract a 1 to the numerator (this does not change the equation).
// factor out a
Hmmm…. look at that! There’s actually two sigmoid functions there… Recall that the sigmoid function is, . Let’s replace them with s(x).
Just like Prof Ng said… 🙂
And for a sanity check, do they both show the same function?
x=-10:0.1:10; % Test values.
s = 1./(1+exp(-x)); % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
ds1 = s.*(1-s); % Another simpler way to compute the derivative of a sigmoid.figure; plot(x,ds,'r+'); hold on; plot(x,ds1, 'go'); legend('(e^{-x})/((1+e^{-x})^2)','(s(x))(1-s(x))','location','best'); title('derivative of sigmoid')
x=-10:0.1:10; % Test values.
s = 1./(1+exp(-x)); % Sigmoid.
ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid.
ds1 = s.*(1-s); % Another simpler way to compute the derivative of a sigmoid.
figure; plot(x,ds,'r+'); hold on; plot(x,ds1, 'go'); legend('(e^{-x})/((1+e^{-x})^2)','(s(x))(1-s(x))','location','best'); title('derivative of sigmoid')
Yes! They perfectly match!
So there you go. Hopefully this satisfies your mathematical curiosity of why the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).
I think if you 1) rewrite my equation so the e^-x in the numerator goes to e^x in the denominator, 2) multiply my equation by e^x/e^x, and 3) expand the denominator in both the wolfram and my equation, they should be equal.
EASILY, the best blog post on finding the derivative of a sigmoid function. You didn’t leave any details out. Took me forever to wrap my head around this. The +1 – 1 thing is definitely not intuitive. Thanks for writing this.
Nicely explained ! Thanks !
Really nice explanation. Thank you!
https://www.wolframalpha.com/input/?i=(1%2Be%5E-x)%5E-1
This shows the derivative having positive x in the exponents.
I think if you 1) rewrite my equation so the e^-x in the numerator goes to e^x in the denominator, 2) multiply my equation by e^x/e^x, and 3) expand the denominator in both the wolfram and my equation, they should be equal.
How come websites like wolfram alpha simply the -x in the exponents to positive x? How do we get there? I don’t see it.. :-/
Hi Sefrin, could you include an example/link to explain what you mean?
for video tutorial on sigmoid and other activation functions: https://quickkt.com/tutorials/artificial-intelligence/deep-learning/activation-function/
Thanks Vinay! You created a nice visual summary of different activation functions.
Actually you do use the product rule but it is part of the chain rule. Hope this is clear.
Hi Scott, thanks for your comment! I agree this is confusing/misleading. I re-wrote to remove the reference to the product rule.
Very nice explanation but you are using the chain rule to differentiate not the product rule.
https://www.math.ucdavis.edu/~kouba/CalcOneDIRECTORY/chainruledirectory/ChainRule.html
Excellent walkthrough. For a guy just getting into activation fn’s, this really helps! Thanks so much!
You’re welcome Sid!
EASILY, the best blog post on finding the derivative of a sigmoid function. You didn’t leave any details out. Took me forever to wrap my head around this. The +1 – 1 thing is definitely not intuitive. Thanks for writing this.
happy to hear it helped!
Thanks! really helped with Prof. Hinton’s NNML Coursera lecture I was struggling to understand.
Glad it helped! It wasn’t obvious to me either 🙂
Superb!
Exactly what I was looking for!
🙂
Very detailed. Thank you !!
You’re welcome!
Thanks much. I was breaking my head on this today.
Glad it helped clear things up!
excellent. Thanks!