How to Compute the Derivative of a Sigmoid Function (fully worked example) This is a sigmoid function: $\boldsymbol{s(x) = \frac{1}{1 + e^{-x}}}$

The sigmoid function looks like this (made with a bit of MATLAB code):

 x=-10:0.1:10; s = 1./(1+exp(-x)); figure; plot(x,s); title('sigmoid'); Alright, now let’s put on our calculus hats…

Here’s how you compute the derivative of a sigmoid function

First, let’s rewrite the original equation to make it easier to work with. $\boldsymbol{s(x) = \frac{1}{1+e^{-x}} = (1)(1+e^{-x})^{-1} = (1+e^{-x})^{-1}}$

Now we take the derivative: $\boldsymbol{\frac{d}{dx}s(x) = \frac{d}{dx}((1+e^{-x})^{-1})}$ $\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-1-1)}) \frac{d}{dx}(1+ e^{-x})}$ $\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (\frac{d}{dx}(1) + \frac{d}{dx}(e^{-x}))}$ $\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (0 + e^{-x}(\frac{d}{dx}(-x)))}$ $\boldsymbol{\frac{d}{dx}s(x) = -1((1+e^{-x})^{(-2)}) (e^{-x})(-1)}$

Nice! We computed the derivative of a sigmoid! Okay, let’s simplify a bit. $\frac{d}{dx}s(x) = ((1+e^{-x})^{(-2)}) (e^{-x})$ $\frac{d}{dx}s(x) = \frac{1}{(1+e^{-x})^{2}} (e^{-x})$ $\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}$

Okay! That looks pretty good to me. Let’s quickly plot it and see if it looks reasonable. Again here’s some MATLAB code to check:

 x=-10:0.1:10; % Test values. s = 1./(1+exp(-x)); % Sigmoid. ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid. figure; plot(x,s,'b*'); hold on; plot(x,ds,'r+'); legend('sigmoid', 'derivative-sigmoid','location','best') Looks like a derivative. Good! But wait… there’s more!

If you’ve been reading some of the neural net literature, you’ve probably come across text that says the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

[note that $\frac{d}{dx}s(x)$ and s'(x) are the same thing, just different notation.]

[also note that Andrew Ng writes, f'(z) = f(z)(1 – f(z)), where f(z) is the sigmoid function, which is the exact same thing that we are doing here.]

So your next question should be, is our derivative we calculated earlier equivalent to s'(x) = s(x)(1-s(x))?

So, using Andrew Ng’s notation…

How does the derivative of a sigmoid f(z) equal f(z)(1-(f(z))?

Swapping with our notation, we can ask the equivalent question:

How does the derivative of a sigmoid s(x) equal s(x)(1-(s(x))?

Okay we left off with… $\frac{d}{dx}s(x) = \frac{(e^{-x})}{(1+e^{-x})^{2}}$

This part is not intuitive… but let’s add and subtract a 1 to the numerator (this does not change the equation). $\frac{d}{dx}s(x) = \frac{(e^{-x} + 1 -1)}{(1+e^{-x})^{2}}$ $\frac{d}{dx}s(x) = \frac{(1 + e^{-x} -1)}{(1+e^{-x})^{2}}$ $\frac{d}{dx}s(x) = \frac{(1 + e^{-x})}{(1+e^{-x})^{2}} - \frac{1}{(1+e^{-x})^{2}}$ $= \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^{2}}$ $= \frac{1}{(1+e^{-x})} - (\frac{1}{(1+e^{-x})}) (\frac{1}{(1+e^{-x})})$ // factor out a $\frac{1}{(1+e^{-x})}$ $= \frac{1}{(1+e^{-x})} (1 - \frac{1}{(1+e^{-x})})$

Hmmm…. look at that! There’s actually two sigmoid functions there… Recall that the sigmoid function is, $s(x) = \frac{1}{1 + e^{-x}}$. Let’s replace them with s(x). $s'(x) = \frac{d}{dx}s(x) = s(x) (1 - s(x))$

Just like Prof Ng said… 🙂

And for a sanity check, do they both show the same function?

 x=-10:0.1:10; % Test values. s = 1./(1+exp(-x)); % Sigmoid. ds = (exp(-x))./((1+exp(-x)).^2); % Derivative of sigmoid. ds1 = s.*(1-s); % Another simpler way to compute the derivative of a sigmoid. figure; plot(x,ds,'r+'); hold on; plot(x,ds1, 'go'); legend('(e^{-x})/((1+e^{-x})^2)','(s(x))(1-s(x))','location','best'); title('derivative of sigmoid') Yes! They perfectly match!

So there you go. Hopefully this satisfies your mathematical curiosity of why the derivative of a sigmoid s(x) is equal to s'(x) = s(x)(1-s(x)).

Posted on Categories math

25 thoughts on “How to Compute the Derivative of a Sigmoid Function (fully worked example)”

1. Rakesh says:

Nicely explained ! Thanks !

2. Beejal Vibhakar says:

Really nice explanation. Thank you!

3. Sefrin says:

https://www.wolframalpha.com/input/?i=(1%2Be%5E-x)%5E-1

This shows the derivative having positive x in the exponents.

1. Jeremy says:

I think if you 1) rewrite my equation so the e^-x in the numerator goes to e^x in the denominator, 2) multiply my equation by e^x/e^x, and 3) expand the denominator in both the wolfram and my equation, they should be equal.

4. Sefrin says:

How come websites like wolfram alpha simply the -x in the exponents to positive x? How do we get there? I don’t see it.. :-/

1. Jeremy says:

Hi Sefrin, could you include an example/link to explain what you mean?

1. Jeremy says:

Thanks Vinay! You created a nice visual summary of different activation functions.

5. Scott Favorite says:

Actually you do use the product rule but it is part of the chain rule. Hope this is clear.

1. Jeremy says:

Hi Scott, thanks for your comment! I agree this is confusing/misleading. I re-wrote to remove the reference to the product rule.

6. Sid says:

Excellent walkthrough. For a guy just getting into activation fn’s, this really helps! Thanks so much!

1. Jeremy says:

You’re welcome Sid!

7. jnscollier says:

EASILY, the best blog post on finding the derivative of a sigmoid function. You didn’t leave any details out. Took me forever to wrap my head around this. The +1 – 1 thing is definitely not intuitive. Thanks for writing this.

1. Jeremy says:

happy to hear it helped!

8. marie zelenina (@mariezelenina) says:

Thanks! really helped with Prof. Hinton’s NNML Coursera lecture I was struggling to understand.

1. Jeremy says:

Glad it helped! It wasn’t obvious to me either 🙂

9. Balamurugan Balakrishnan says:

Superb!

10. Bendegúz Csirmaz says:

Exactly what I was looking for!

1. Jeremy says:

🙂

11. Shubham Juneja says:

Very detailed. Thank you !!

1. Jeremy says:

You’re welcome!

12. asimplewoman says:

Thanks much. I was breaking my head on this today.

1. Jeremy says:

Glad it helped clear things up!

13. Kevin Wang says:

excellent. Thanks!