Special Relativity assumes time is a dimension, i.e. space-time is Minkowski space. There are thus four coordinates in this space, xi with the index i taking the values 0,1,2,3. Since time has different units than length, to be able to describe space and time as elements of one space-time we have to multiply time by a constant of dimension length/time, i.e. a velocity. This constant is usually denoted c. It is then x0 = c t. We will come back to the meaning of this constant later.
The other ingredient of Special Relativity is that the laws of physics are same for all observers with constant velocity. That means there are sensible and well-defined transformations between observers that preserve the form of the equations.
A Word or Two about Tensors
The way to achieve such sensible transformations is to make the equations "tensor equations", since a tensor does exactly what we want: it transforms in a well-defined way under a change from one to the other observer's coordinate system. The simplest sort of a tensor is a scalar φ, which doesn't transform at all - it's just the same in all coordinate systems. That doesn't mean it has the same value at each point though, so it is actually a scalar field.
The next simplest tensor is a vector Vi which has one index that runs from 0 to 3, corresponding to four entries - three for the spatial and one for the time-component. Again this can be a position dependent quantity, so it's actually a vector field. The next tensor has two indices Tij that run from 0 to 3, so 16 entries, and so on: Uijklmn.... The number of indices is also called the "rank" of a tensor. To transform a tensor from one coordinate system in the other, one acts on it with the transformation matrix, one for every index. We will come to this transformation later.
Note that it is meaningless to say an object defined in only one inertial frame is a tensor. If you have it in only one frame, you can always make it into a tensor by just defining it in every other frame to be the appropriately transformed version.
The Scalar Product
A specifically important scalar for Special Relativity is the scalar product between two vectors. The scalar product is a symmetric bilinear form, which basically means it's given by a rank two tensor gij that doesn't care in which order the indices come, and if you shovel in two vectors out comes a scalar. It goes like this:
gijViUj = scalar,
where sums are taken over indices that appear twice, once up and once down. This is also known as Einstein's summation convention.
I used to have a photo of Einstein with him standing in front of a blackboard cluttered with sum symbols. Unfortunately I can't find it online, a reference would be highly welcome. That photo made really clear why the convention was introduced. Today the sum convention is so common that it often isn't even mentioned. In fact, you will have to tell readers instead not to sum over equal indices if that's what you mean.
The scalar product is a property of the space one operates in. It tells you what the lengths of a vector is, and angles between different vectors. That means it describes how to do measurements in that space. The bilinear form you need for this is also called the "metric", you can use it to raise and lower indices on vectors in the following way: gijVj = Vi. Note how indices on both sides match: if you leave out the indices that appear both up and down, the remaining indices have to be equal on both sides.
Technically, the metric it is a map from the tangential to the co-tangential space, it thus transforms row-vectors V into column vectors VT and vice versa, where the T means taking the transverse. A lower index is also called "covariant", whereas upper indices are called "contravariant," just to give you some lingo. The index jiggling is also called "Ricci calculus" and one of the common ways to calculate in General Relativity. The other possibility is to go indexless via differential forms. If you use indices, here is a good advice: Make sure you don't accidentally use an index twice for different purposes in one equation. You can produce all kind of nonsense that way.
In Special Relativity, the metric is (in Euclidean coordinates) just a diagonal matrix with entries (1,-1,-1,-1), usually denoted with ηij. In the case of a curved space-time it is denoted with gij as I used above, but that General case is a different story and shall be told another time. So for now let us stick with the case of Special Relativity where the scalar product is defined through η.
Now what is a Lorentz transformation? Let us denote it with Λ. As mentioned above, you need one for every index of your tensor that you want to transform. Say we want to get a vector V from one coordinate system to the other, we apply a Lorentz transformations on it so in the new coordinate system we have V' = VΛ, where V' is the same vector, but how seen in the other coordinate system. With indices that reads V'iΛij = Vj. Similarly, the transverse vector transforms by V'T = ΛT VT.
Lorentz transformations are then just the group of transformations that preserve the length of all vectors, length as defined through the scalar product with η. You can derive it from this requirement. First note that a transformation that preserves the lengths of all vectors also preserves angles. Proof: Draw a triangle. If you fix the length of all sides you can't change the angles either. Lorentz transformations are thus orthogonal transformations in Minkowski space. In particular, since the scalar product between any two vectors has to remain invariant,
VT η U = V'T η U' = VT ΛT η Λ U,
they fulfil (with and without indices)
ΛijηkiΛlk = ηjl <=> ΛT η Λ = η (1)
If you forget for a moment that we have three spatial dimension, you can derive the transformations from (1) as we go along. Just insert that η is diagonal with (in two dimensions) entries (1,-1), name the four entries of Λ and solve for them. You might want to use that if you take the determinant on both sides of the above equation you also find that |det Λ| = 1, from which we will restrict ourselves to the case with det = 1 to preserve orientation. You will be left with a matrix that has one unknown parameter β in the following familiar form
with γ-2 = 1- β2.
Now what about the parameter β? We can determine it by applying the Lorentz transformation to the worldline (cΔt, Δx) of an observer in rest such that Δx = 0. We apply the Lorentz transformation and ask what his world line (Δt', Δx') looks like. One finds that Δx'/Δt = βc. Thus, β is the relative velocity of the observers in units of c.
One can generalize this derivation to three spatial dimensions by noticing that the two-dimensional case represents the situation in which the motion is aligned with one of the coordinate axis. One obtains the general case by doing the same for all three axis, and adding spatial rotations to the group. The full group then has six generators (three boosts, three rotations), and it is called the Lorentz group, named after the Dutch physicist Hendrik Lorentz. Strictly speaking, since we have only considered the case with det Λ = +1, it is the "proper Lorentz group" we have here. It is usually denoted SO(3,1).
Once you have the group structure, you can then go ahead and derive the addition-theorem for velocities (by multiplying two Lorentz-transformations with different velocities), length contraction, and time dilatation (by applying Lorentz transformations to rulers).
Now let us consider some particles in this space-time with such nice symmetry properties. First, we introduce another important scalar invariant of Special Relativity, which is an observer's proper time τ. τ is the proper length of the particle's world line, and an infinitesimally small step of proper time dτ is consequently
dτ2 = c2 dt2 - dx2
One obtains the proper time of a curve by integrating dτ over this curve. Pull out a factor dt2 and use dx/dt = v to obtain
dτ2 γ2 = dt2
A massive particle's relativistic four-momentum is pi = mui, where ui=dxi/dτ = γ dxi/dt is the four-velocity of the particle, and m is its invariant rest mass (sometimes denoted m0). The rest mass is also a scalar. We then have for the spatial components (a = 1,2,3)
pa = m γ va .
What is c?
Let us eventually come back to the parameter c that we introduced in the beginning. Taking the square of the previous expression (possibly summing over spatial components), inserting γ and solving for v one obtains the particle's spatial velocity as a function of the momentum to
In the limit of m to zero, one obtains for arbitrary p that v=c. Or the other way round, the only way to get v=c is if the particle is massless m=0.
So far there is no experimental evidence that photons - the particles that constitute light - have mass. Thus, light moves with speed c. However, note that in the derivation that got us here, there was no mentioning of light whatsoever. There is no doubt that historically Einstein's path to the Special Relativity came from Maxwell's equations, and many of his thought experiments are about light signals. But a priori, arguing from symmetry principles in Minkowski-space as I did here, the constant c has nothing to do with light. Nowadays, this insight can get you an article in NewScientist.
Btw, note that c is indeed a constant. If you want to fiddle around with that, you'll have to mess up at least one step in this derivation.
See also: The Equivalence Principle