Maths for Pages on Mac

Equations for the Australian Curriculum Year 10 and Mathematical Methods

Ingrid Aulike

This webpage shows how to make maths equations in Pages, but more importantly, I have ironed out the useless extra notation that all the texts put into the statistics units of Maths Methods. In essence this boils down to not using any form of the letter p for anything except the parameter in the Bernoulli, and not using \(\mu\) and \(\sigma^2\) until discussing the normal distribution. As it stands, one thing is represented by more than one notation, and one notation represents several things. My philosophy is one bit of notation to represent one thing.

How to put LaTeX equations in Pages

Why use LaTeX?

This video gives you an idea of how simple it is to typeset equations that look really great, once you know a few basics of LaTeX. The LaTeX below will be your cheat sheet! If you have access to a Mac, iPad or iPhone, try it out.

(The video uses an example from the Queensland Curriculum Assessment Authority website. The maths is from Unit 4 of Mathematical Methods topic General Continuous Distributions, see below.)

We start with some simple LaTeX.

A few equations to get started

In LaTeX, simple maths is just written as you would expect it to be. If you type \[\text{3 + 5 = 8}\]in the Pages Equation Inserter, you will get \[3 + 5 = 8.\]Typing \[\text{3x - 12 = 3(x - 4)}\]in the Equation Inserter gives the equation \[3x - 12 = 3(x - 4)\]where the font for the text x is different to the font for the maths \(x\).

In Grade 10 Maths and in Mathematical Methods there is a lot of solving of equations. If you enter this text in the Pages Equation Inserter: \[\text{3x+b = \frac{2(4 - c)}{5} \ne 7},\]you will get the equation typeset like this: \[ 3x+b = \frac{2(4-c)}{5} \ne 7.\]Another way of getting the same equation is by entering \[\text{3x+b = {2(4 - c) \over 5} \ne 7}\]in the Pages Equation Inserter. I prefer using \(\text{\frac}\) terminology rather than \(\text{\over}\).

You will want to typeset things like cube roots. \[ \text{a_i \ge \sqrt[3]{g^2}}\]is the LaTeX for \[ a_i \ge \sqrt[3]{g^2}.\]The square root symbol goes over all of \(g^2\) because the LaTeX, \(\text{g^2}\), is contained in braces like so: \(\text{{g^2}}\), so that the \(\text{\sqrt}\) applies to everything in the braces. If you don’t put braces around the \(\text{g^2}\), this is what happens: \[ a_i \ge \sqrt[3] g^2.\]For a plain square root rather than a cube root we just leave out the \(\text{[3]}\) from the LaTeX.

Different types of brackets

Let’s put something in brackets. How about the midpoint formula: \[ \left( \frac{x_1 + x_2}{2}, \frac{y_1 + y_2}{2} \right) \]which comes from \[ \text{ \left( \frac{x_1 + x_2}{2}, \frac{y_1 + y_2}{2} \right)}.\]

Often the brackets you ask for, with \(\text{\left(}\) and \(\text{\right)}\), come out an appropriate size, like the ones around the midpoint formula have done, but sometimes you will want to tweak the size of brackets. There are several fixed sizes that can also be used with square and curly brackets: \[ \left( \right) \big[ \big] \Big\{ \Big\} \bigg[ \bigg] \text{ and } \Bigg( \Bigg) \]are produced by \[\text{\left( \right) \big[ \big] \Big \\{ \Big\\} \bigg[ \bigg] and \Bigg( \Bigg)}.\]Because there is nothing in between them, the ( and ) created with \(\text{\left(}\) and \(\text{\right)}\) are smaller than the ones LaTeX has produced in the midpoint formula. Notice too that (, ), [ and ] just work, but { and } need a backslash before them.

Aligning equations

You can align as many equations as you like, usually in the form of having equals signs under each other, as long as you put a double backslash at the end of each line except the last. You may wish to solve simultaneous equations for \(x\) and \(y\). The pair of equations \[\begin{align} x+y &= 4 \\ x-y &= -2 \end{align}\]is typeset with \[\text{\begin{align} x+y &= 4 \\\\ x-y &= -2 \end{align}}.\]The ampersand, \(\&\), marks the place that is aligned.


You will sometimes want to add or take away horizontal space, if something just doesn’t look right. Here are some numbers with different spaces between them: \[1 \! 2 3 \, 4 \: 5 \; 6 \quad 7 \qquad 8.\]These were produced with \[\text{1 \! 2 3 \, 4 \: 5 \; 6 \quad 7 \qquad 8.}\]The numbers 1 and 2 are not separated, or rather 1 and 2 are closer together than the normal spacing, because in the LaTeX between them is a “negative thin space” \(\text{\!}\). Numbers 3 and 4 are separated by a thin space \(\text{\,}\). Next we have a medium space \(\text{\:}\) and a thick space \(\text{\;}\). The second last space is called a \(\text{\quad}\) and finally the largest space, between 7 and 8, is a \(\text{\qquad}\).

Now that we have covered some of the basics of typing equations, here are all the equations necessary for the Australian Mathematical Methods Curriculum. Not every formula in the curriculum has been included, but once you know what equation you want, the expressions below should have something similar in them that you can adapt.


\[ \angle AOB = 45^\circ \] \[ \text{\angle AOB = 45^\circ} \]The angle at AOB is 45 degrees.

\[ \lambda \alpha \beta \delta \Delta \epsilon \] \[\text{ \lambda \alpha \beta \delta \Delta \epsilon }\]These are some of the LaTeX Greek symbols that are available. In some of the equations below, a variation of the epsilon is used. The variation is typeset \(\varepsilon\) and entered as \(\text{\varepsilon}\).

\[\tan \theta = \frac{\text{opposite}}{\text {adjacent}} \] \[\text{\tan \theta = \frac{\text{opposite}}{\text {adjacent}}} \]

\[ x = \frac{-b \pm \sqrt{ b^2-4ac}}{2a}\] \[\text{x = \frac{-b \pm \sqrt{ b^2 - 4ac }}{ 2a }}\]This is the formula to find the roots of the quadratic \(ax^2 + bx + c = 0\) where \(a \ne 0.\)

\[\pi \approx 3.14\] \[\text{\pi \approx 3.14}\]

\[y \propto x^{n - 1} + x^n - 1\] \[\text{y \propto x^{n - 1} + x^n - 1}\]It is an easy mistake to make to type \(x^n - 1\) when you intend \(x^{n-1}\).

\[(x,y) \rightarrow (x + h, y + k)\] \[\text{(x,y) \rightarrow (x + h, y + k)}\]The point \((x, y)\) is mapped to the point \((x + h, y + k)\).

\[ x^\prime = x + h \text{ and } y^\prime = y + k\] \[\text{x^\prime = x + h \text{ and } y^\prime = y + k}\]The point \(x + h\) is referred to as \(x\) prime or \(x\) dash, and similarly the point \(y + k\) is \(y\) prime or \(y\) dash.

\[\deg(f+g) \le \max\{\deg(f), \deg(g)\}\] \[\text{\deg(f+g) \le \max \\{\deg(f), \deg(g) \\}}\]If \(f\) and \(g\) are polynomials, then the degree of the sum of the two polynomials is less than or equal to the maximum of the degrees of the individual polynomials. Notice \(\text{\max}\) and \(\text{\deg}\) are LaTeX functions. So many of them!

\[2 \pi^c = 360^\circ\] \[\text{2 \pi^c = 360^\circ}\]Two \(\pi\) radians is equal to 360 degrees. The symbol for radians, \({}^c\), is rarely used. Usually we just write and think of \(\frac{\pi}{3} = 60^\circ\). (But should know that the units are called radians.)

\[\sin\left(\frac{\pi}{2}\right) = \tan\left(\frac{\pi}{4}\right)\] \[\text{\sin \left( \frac{\pi}{2} \right) = \tan \left( \frac{\pi}{4} \right)}\]LaTeX has a large number of functions like sine and tangent. There should never be a need to try to typeset a function like logarithm, \(\text{\log}\), or exponential \(\text{\exp}\), as the specific LaTeX functions will have the best spacing.

\[y=a \cos n(t \pm \varepsilon) \pm b\] \[\text{y=a \cos n(t \pm \varepsilon) \pm b}\]

\[A = \{1,\, 3,\, 5,\, 25 \}\] \[\text{A = \\{1,\, 3,\, 5,\, 25 \\}}\]Sets are denoted with braces around them which have to be typed with a backslash before the brace. This is because braces have a special role in LaTeX. For example, the \(\text{\frac}\) command has its two arguments in braces following it.

\[\varepsilon \setminus A = A^\prime \] \[\text{\varepsilon \setminus A = A^\prime} \]The universal set, without the set \(A\), is called the complement of \(A\).

\[\Pr(A^\prime) = 1 - \Pr(A)\] \[\text{\Pr(A^\prime) = 1 - \Pr(A)}\]The probability of the complement of \(A\) equals 1 minus the probability of \(A\). Note that \(\text{\Pr}\) is a LaTeX function.

\[7 \in B\] \[\text{7 \in B}\] The number 7 is an element of the set \(B\).

\[B \not \subseteq A\] \[\text{B \not \subseteq A}\]\(B\) is not a subset of \(A\).

\[B \cap A = \varnothing\] \[\text{B \cap A = \varnothing}\]The intersection of \(B\) and \(A\) is the null set, which is also called the empty set.

\[B \cup A = \{ 1,\, 3,\, 5,\,7, \, 25 \}\] \[\text{B \cup A = \\{ 1,\, 3,\, 5,\,7, \, 25 \\}}\]The union of the sets \(A\) and \(B\) is the set including every element in \(A\) and every element in \(B\). If there is an element in \(A\) that is also in \(B\), we only include it once in the union.

\[\mathbb{N} \subseteq \mathbb{Z} \subseteq \mathbb{Q} \subseteq \mathbb{R}\] \[\text{\mathbb{N} \subseteq \mathbb{Z} \subseteq}\] \[\text{\mathbb{Q} \subseteq \mathbb{R}}\]The natural numbers are a subset of the integers, which are a subset of the rational numbers, which are a subset of the real numbers.

\[K = \{ \clubsuit\text{K}, \spadesuit\text{K}, {\color{red}{\diamondsuit}} \text{K},{\color{red}{\heartsuit}} \text{K} \}\] \[\text{K = \\\{ \clubsuit\text{K}, \spadesuit\text{K},} \] \[\text{{\color{red}{\diamondsuit}} \text{K},}\] \[\text{{\color{red}{\heartsuit}} \text{K} \\\}}\]The diamond and heart are filled with red in Pages but only outlined in other implementations of LaTeX.

\[n! = n \times (n-1) \times(n-2)\times \ldots \times 2\times 1 \] \[\text{n! = n \times (n-1) \times(n-2)\times \ldots \times 2\times 1 }\]

\[^nC_r = \frac{^nP_r}{r!} = \frac{n!}{r!(n-r)!}\] \[\text{ ^nC_r = \frac{^nP_r}{r!} = \frac{n!}{r!(n-r)!}}\]

\[{n \choose r} \text{ or } \binom{n}{r}\] \[\text{\{n \choose r\} or \binom{n}{r}}\]These are alternative notation for \({}^nC_r\).


There are no formulas or equations in this topic that use anything new from LaTeX.

There are no formulas or equations in this topic that use anything new from LaTeX.

\[f^\prime (x) = \lim_{h \rightarrow 0} \frac{f(x+h)-f(x)}{h}\] \[\text{f^\prime (x) = \lim_{h \rightarrow 0} \frac{f(x+h)-f(x)}{h}}\]

\[\frac{dy}{dx}=\lim_{\delta x \rightarrow 0}\frac{\delta y}{\delta x}\] \[\text{\frac{dy}{dx}=\lim_{\delta x \rightarrow 0}\frac{\delta y}{\delta x}}\]

\[\frac{dy}{dx} = \frac{d}{dx}(x^n) = nx^{n-1}\] \[\text{\frac{dy}{dx} = \frac{d}{dx}(x^n) = nx^{n-1}}\]


\[f^{\prime \prime }(x) = n(n-1)x^{n-2}\] \[\text{f^{\prime \prime }(x) = n(n-1)x^{n-2}}\]

\[\frac{d^2y}{dx^2} = x e^x + 2e^x\] \[\text{\frac{d^2y}{dx^2} = x e^x + 2e^x}\]

\[\delta y \cong \frac{dy}{dx} \times \delta x\] \[\text{\delta y \cong \frac{dy}{dx} \times \delta x}\]

\[\int \!\!f(x)\,dx = F(x) + c\] \[\text{\int \!\! f(x)\,dx = F(x) + c}\]

General discrete random variables

\[\text{E}(X) = \sum_x x\cdot \Pr(X=x)\] \[\text{\text{E}(X) = \sum_x x\cdot \Pr(X=x)}\]

\[\begin{align} \text{Var}(X) &= \text{E}\left([ X - \text{E}(X)] ^2\right) \\ &= \text{E}(X^2) - [\text{E}(X)]^2 \end{align}\]where \[\begin{align} \text{E}(X^2) &= \sum_x x^2 \cdot \Pr(X=x) \end{align}\]Only \(\text{Var}\) is new LaTeX here: \(\text{\\text{Var}}\) is how it is entered in LaTeX. Note that \(\Pr\) is drawn up nicely with \(\text{\Pr}\), but \(\text{E}\) and \(\text{Var}\) are not.

There should be no \(P\), or \(\text{P}\), or \(p\) or \(\text{p}\) or \(p_i\) (or \(\mu\) or \(\sigma^2\)) used in this General discrete random variables sub topic, despite that all the texts that I have seen do use some of this notation. Just use the probability distribution function \(\Pr(X=x)\), the mean, which is also called the expected value \(\text{E}(X)\), and the variance \(\text{Var}(X)\). Any other extra notation creates confusion.

Bernoulli distribution

The Bernoulli is an example of a discrete distribution. It is rarely used in real life as it is so simple, but it is the basis for the binomial distribution. If we have a random variable \(Y\) that has two possible outcomes, “success” and “failure”, then our binary variable \(Y\)can be thought of as having probability distribution given by:\[\Pr(Y=y) = \begin{cases} p & \text{if } y=1, \text{ "success"} \\1-p & \text{if } y=0, \text{ "failure"}\end{cases}\]The pay off for not using \(p\) (or \(\text{P}\) or so on) for general discrete distributions, as was suggested above, is immediate. The Bernoulli often has its parameter called \(p\) in university and Maths Methods textbooks (although sometimes \(\pi\) is used).

For the Bernoulli distribution,\[\text{E}(Y) = \sum_y y \cdot \Pr(Y=y) = p\] and\[\text{E}(Y^2) = \sum_y y^2 \cdot \Pr(Y=y) = p\]These expressions can be used to find \[\text{Var}(Y) = \text{E}(Y^2) -[\text{E}(Y)]^2 = p-p^2 = p(1-p)\]

Binomial distribution

The binomial is an example of a discrete distribution. If we had \(n\) Bernoulli trials like the one above, each with the same underlying parameter \(p\), for example if we tossed a fair coin \(n\) times with success being getting heads, so that \(p=\frac{1}{2}\), then the total number of successes in the \(n\) trials is a random variable that has a binomial distribution. The binomial random variable \(X\) with number of trials \(n\) and parameter \(p\), has the binomial probability distribution given by: \[\Pr(X=x) = {}^nC_r p^x(1-p)^{n-x}\]

For the binomial, some textbooks don’t use the formulas for \(\text{E}[g(X)]\), (where so far \(g(X)=X\) and \(g(X)=X^2\) have been employed), to find the expected value and variance for general \(n\) and \(p\). One book just gives the results for the binomial \[\text{E}(X) = np\] and \[\text{Var}(X) = np(1-p)\]while another book applies the formulas \[\text{E}(X) = \text{E}(Y_1) + \text{E}(Y_2) + \ldots + \text{E}(Y_n)\]and \[\text{Var}(X) = \text{Var}(Y_1) + \text{Var}(Y_2) + \ldots + \text{Var}(Y_n)\]to get the results, where \(Y_1, Y_2, \ldots, Y_n\) are the underlying independent Bernoulli random variables, each with parameter \(p\).

It is possible, however, to find \(\text{E}(X)\) and \(\text{Var}(X)\) for the binomial with general \(n\) and \(p\) using \(\text{E}[g(X)] = \sum_x g(x) \cdot \Pr(X=x)\) with a thoughtful choice of \(g(X)\) for working out the variance. Two of the four textbooks show how to do this.


\[a^{\log_a\! x} = x\] \[\text{a ^\{\\log_a\!x\} = x}\]\(\log_a \!x\) is the number to which \(a\) is raised to equal \(x\).

General continuous random variables

See my video above for a Queensland Curriculum Assessment Authority example of a general continuous random variable question, the stem of which is reproduced here: \[f(x) = \begin{cases} \frac{\textstyle 1}{\textstyle 1152}\,(144-x^2) & 0 \leq x \leq 12 \\ \phantom{\frac{\textstyle 1}{\textstyle 1152}\,(1} 0 & \text{otherwise} \end{cases}\]I have used cases, and as you can see in the video, the normal setting out for cases, and for align which I showed you in the Aligning equations subsection above, is to write the LaTeX over several lines. Although it is not absolutely necessary to set out over multiple lines, it aids readability if you need to edit the equation or sort out where you made a mistake. In the last frames of my video you can see the LaTeX for this example.

For general continuous distributions the expected value is given by \[\text{E}(X) = \int_{-\infty}^\infty x f(x)\,dx\] \[\text{\text{E}(X) = \int_{-\infty}^\infty x f(x)\,dx}\]For a function of \(X\), \(g(X)\), the expected value is \[\text{E}[g(X)] = \int_{-\infty}^\infty g(x)f(x)\,dx\]

The formulas for the variance which applied to discrete random variables, also hold for continuous random variables:\[\begin{align} \text{Var}(X) &= \text{E}\left([ X - \text{E}(X)] ^2\right) \\ &= \text{E}(X^2) - [\text{E}(X)]^2 \end{align}\]

I strongly recommend you continue not to use \(\mu\) or \(\sigma^2\) for formulas for general continuous random variables. Unambiguous terminology for what you wish to express is again \(\text{E}(X)\), the expected value or mean of \(X\), and \(\text{Var}(X)\), the variance of \(X\), and now for continuous random variables, we use the probability density function \(f(x)\) rather than the probability distribution function \(\Pr(X=x)\) that we used for discrete random variables .

The difference in the formulas for discrete and continuous random variables

For discrete random variables, we may be asked to find the probability that a random variable \(Y\) is equal to 7 or 8, say. We would consult the table of values and probabilities we’ve been given, or use our binomial formula (where we assume \(n\) is 8 or larger), or something similar. We would sum over part of the probability distribution to do this: ie find \(\Pr(Y = 7) + \Pr(Y = 8)\).

For a continuous random variable, we might be asked to find the probability that the random variable \(X\) is between 7 and 8. We would integrate over part of the probability density function between 7 and 8 to do this: ie find \(\Pr(7 \leq x \leq 8) =\int^{\scriptscriptstyle 8}_{\scriptscriptstyle 7} f(x)\,dx\).

Both discrete and continuous random variables have cumulative distribution functions, but Mathematical Methods is only concerned with the cumulative distribution functions of continuous variables. For continuous random variables, the cumulative distribution function \(F(x)\) is given by \[F(x)=\Pr(X \leq x) = \int^x_{-\infty} f(t)\,dt\]

The normal distribution

The normal distribution is a continuous distribution. Here is the LaTeX for the normal probability density function:\[f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\] \[\text{f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}}\]

The usual method to find the mean and variance, by integrating \(x f(x)\) and \(x^2f(x)\) with respect to \(x\), is not possible for the normal distribution. The probability density function is symmetric about \(x=\mu\), which is the trick that is used to find \[\text{E}(X)=\frac{1}{\sigma \sqrt{2\pi}}\int^\infty_{-\infty} x e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\, dx =\mu\]The only way to calculate the integral in \(\text{E}(X^2)\) is numerically . After calculating \(\text{E}(X^2)\) it is easy to show that \(\text{Var}(X) = \text{E}(X^2) -[\text{E}(X)]^2 = \sigma^2\).

At this stage, be very happy if you didn’t use \(\mu\) and \(\sigma\) notation prior to discussing the normal distribution. It is such a shame that calculators and textbooks, by using the notation \(\mu\) and \(\sigma\) outside of the normal distribution, are sowing confusion.

If \(X\) is a normal random variable with mean \(\mu\) and variance \(\sigma^2\), then the “standard” normal distribution for a random variable \(Z=\frac{\textstyle X-\mu}{\textstyle \sigma}\), has mean 0 and variance 1. In other words, the transformed random variable \(Z\) is normally distributed and has expectation \(\text{E}(Z)=0\) and variance \(\text{Var}(Z)=1\).

How does what we have learned lead to this, the last topic in the Mathematical Methods curriculum? I explain this in more detail here (coming soon) but I outline the explanation of the interval estimate for a proportion now.

Consider a binomial random variable \(X\) with mean \(\text{E}(X) =np\) and variance \(\text{Var}(X) = np(1-p)\). Then the transformed random variable \(\frac{\textstyle X}{\textstyle n}\) has mean \[\text{E}\!\left(\frac{X}{n}\right) = p\] and variance \[\text{Var}\!\left(\frac{X}{n}\right) = \frac{p(1-p)}{n}.\]

The further transformed random variable \[\frac{\frac{\textstyle X}{\textstyle n} - p}{\sqrt{\frac{\textstyle p(1-p)}{\textstyle n}}}\]has approximately a standard normal distribution.

A note on LaTeX: there are so many \(p\)’s and other letters floating around, to prevent confusion they should all be typeset the same size within any particular expression, which involves using \(\text{\textstyle}\) to increase the size of the \(p\)s, \(n\)s and \(X\) in the fractions when necessary.

Because \(\frac{\frac{\textstyle X}{\textstyle n}\, - \,\textstyle p}{\sqrt{\frac{\textstyle p(1-p)}{\textstyle n}}}\) has approximately a standard normal distribution, an approximate 95% confidence interval for \(\frac{\frac{\textstyle X}{\textstyle n}\, - \,\textstyle p}{\sqrt{\frac{\textstyle p(1-p)}{\textstyle n}}}\) is based on the probability statement

\[\Pr\left(-1.96 \leq \frac{\frac{\textstyle X}{\textstyle n} - p}{\sqrt{\frac{\textstyle p(1-p)}{\textstyle n}}} \leq 1.96\right) \approx 95\%.\]

\[\text{\Pr\left(-1.96 \leq}\] \[\text{\frac{\frac{\textstyle X}{\textstyle n} - p}{\sqrt{\frac{\textstyle p(1-p)}{\textstyle n}}} }\] \[\text{\leq 1.96\right) \approx 95\%}.\]

The approximation (\(\approx\)) is because the random variable \(\frac{\textstyle X}{\textstyle n}\) has only an approximate normal distribution.

Rearranging within the probability statement we get \[\Pr\left(\frac{\textstyle X}{\textstyle n}-1.96 \sqrt{\frac{\textstyle p(1-p)}{\textstyle n}} \leq \; p \;\leq \frac{\textstyle X}{\textstyle n}+1.96 \sqrt{\frac{\textstyle p(1-p)}{\textstyle n}}\right) \approx 95\%.\]This looks like it has the makings of an interval estimate for the unknown parameter \(p\) but there are two problems.

First, \(\frac{\textstyle X}{\textstyle n}\) is a random variable. To get an interval, we can use a realisation of the random variable; \(\frac{\textstyle x}{\textstyle n}\). Note that \(\frac{\textstyle x}{\textstyle n}\) is a point estimate for \(p\), nearly always referred to as \(\hat{p}\).

The other problem is that the variance of \(\frac{\textstyle X}{\textstyle n}\), \(\frac{\textstyle p(1-p)}{\textstyle n}\), includes the very parameter we are trying to find an interval estimate for! We replace \(p\) with \(\hat{p}\) in the expression for the variance as we expect the point estimate \(\hat{p}\) to be close to \(p\).

We can say \[\left(\hat{p}-1.96 \sqrt{\frac{\textstyle \hat{p}(1-\hat{p})}{\textstyle n}} , \,\hat{p} + 1.96 \sqrt{\frac{\textstyle \hat{p}(1-\hat{p})}{\textstyle n}}\right)\]is an interval estimate for the proportion \(p\). The interval is an approximate 95% confidence interval for \(p\), meaning that if we repeated the experiment a large number of times, and formed an interval like this each time, approximately 95% of those intervals would contain the true value of \(p\).


It is unnecessary and confusing to use notation like \(\hat{P}\) to represent the random variable \(\frac{\textstyle X}{\textstyle n}\).

Some textbooks use \(\hat{p}\) for the random variable as well as the estimate. In higher level statistics courses at university, \(\hat{\theta}\) is used for both the random variable and the estimate, where the statistics student is supposed to be able to discern from the context which is intended. It is a big ask to have beginner high school students discern the context, when it is unnecessary. I suggest just stick with \(\frac{\textstyle X}{\textstyle n}\) for the random variable and keep \(\hat{p}\) for the point estimate of \(p\).

In summary:

Using this notation we avoid such things as \(P(\hat{p}=\hat{p})\) and \(S(P)\). Both of these bits of nonsense appear in commonly used texts. No idea where that \(S\) comes from.