The first 4 parameters are passed in (in order) RCX, RDX, R8 and R9. XMM0 to XMM3 are used to pass floating point parameters.
Any further parameters are passed on the stack.
Parameters larger than 64bit are passed by address.
Even if the function uses less than 4 parameters the caller always provides space for 4 QWORD sized parameters on the stack. The callee is free to use them for any purpose, it is common to copy the parameters there if they would be spilled by another call.
For scalar return types, the return value is placed in RAX. If the return type is larger than 64bits (e.g. for structures) RAX is a pointer to that.
All registers used in parameter passing (RCX, RDX, R8, R9 and XMM0 to XMM3), RAX, R10, R11, XMM4 and XMM5 can be spilled by the callee. All other registers need to be preserved by the caller (e.g. on the stack).
The stack must be kept 16-byte aligned. Since the "call" instruction pushes an 8-byte return address, this means that every non-leaf function is going to adjust the stack by a value of the form 16n+8 in order to restore 16-byte alignment.
It is the callers job to clean the stack after a call.
Source: The history of calling conventions, part 5: amd64 Raymond Chen