Diary entries in English. See links above for other languages.
There is also a combined list of all diary entries in any language.
So I'm writing an IMAP client for Emacs. The main reason is that I want an email client works as if by osmosis, like the one on my phone—whenever it has a connection, messages magically enter a local cache, where they stay available even if the connection is lost. There are many ways to achieve that with existing Emacs email clients, possibly augmented with external tools, but I want one that works with minimal setup, and requires nothing but Emacs itself.
The first problem encountered when writing an IMAP client is parsing
what the server is sending. There are two IMAP parsers in the Emacs
source tree already (imap.el
and nnimap.el
), both fairly closely
tied to their respective clients, and both synchronous—they block
Emacs while they are waiting for data from the server. I want to
address both of those points: my IMAP client is going to have a
reusable asynchronous parser module.
Obviously, the first step in figuring out how to parse the protocol is to read the IMAP RFC. Naturally, you'd skip ahead to the section "Formal syntax", which defines in ABNF what various messages should look like: a sequence number followed by one of a few keywords followed by a quoted string or an atom, etc etc. That's where I started, adding a special case for each command response into my parser code.
But then I thought that this wasn't the right way to do it. The parser module wouldn't be independent of the module that uses it, since the calling module might want to use an extension that the parser module doesn't know about yet. Also, I would constantly need to come up with sensible data structures to represent the responses with.
So I decided to create a parser that would be able to parse IMAP responses with as little knowledge as possible about what those responses should look like. It turns out that most of IMAP consists of atoms, strings and lists. I figured that I should be able to parse everything following these simple rules:
- If it starts with a double quote, parse it as a quoted string.
- If it starts with a
(
or[
, descend, parse recursively and return it as a list. - Otherwise, treat it as an atom, read until the next space character
or closing
)
or]
, and return it as a string.
For example, this LIST
response:
* LIST (\HasNoChildren) "." INBOX
gets parsed to this:
("*" "LIST" ("\\HasNoChildren") "." "INBOX")
This mostly worked. One exception is that if the second word of a
line is one of OK
, NO
, BAD
, BYE
or PREAUTH
, then the rest of
the line (described by resp-text
in the RFC) is free-form
human-readable text, optionally preceded by a resp-text-code
. For
example, this is part of a response to a SELECT
command from the
Dovecot server:
* OK [UNSEEN 6] First unseen.
* OK [UIDVALIDITY 1381959933] UIDs valid
* OK [UIDNEXT 10] Predicted next UID
1 OK [READ-WRITE] Select completed (0.003 secs).
There is no reason why this free-form text couldn't contain unbalanced parentheses or anything else that might confuse a parser, and besides it doesn't make sense to split the text into words anyway. So that's a special case for the parser, and gets parsed to this:
("*" :ok :code "UNSEEN" :data "6" :text "First unseen.")
("*" :ok :code "UIDVALIDITY" :data "1381959933" :text "UIDs valid")
("*" :ok :code "UIDNEXT" :data "10" :text "Predicted next UID")
("1" :ok :code "READ-WRITE" :data nil :text "Select completed (0.003 secs).")
Then there is BODY
, which when given as a fetch or message attribute
is followed immediately by an opening square bracket, e.g. BODY[]
.
I decided to treat it as if there were a space between BODY
and the
bracket, and return them as two elements, the string "BODY"
and the
list of items parsed inside the brackets.
And all of this applies in principle to every single line, except that
a physical line can end with a byte count in curly braces
(e.g. {42}
), which means that the following bytes are literal data
to be treated as part of the current logical line. Fortunately, this
is independent from the parsing itself, so I have a function that
splits the data by logical lines and passes everything to the parse
function.
So far, my little client is able to select a mailbox, search for unread messages and fetch the messages, and my parser is sufficient for all of that.
Emacs 24 introduced the possibility to open TLS network connections using the GnuTLS library directly, instead of using a command line tool as a wrapper. This is especially interesting for those who are stuck using Emacs on Windows, as the command line tools can be rather brittle on that platform.
However, there are some steps that need to be performed in order to get native GnuTLS to work. This page attempts to describe them.
Get a GnuTLS-enabled Emacs
The Windows binaries available for download from the GNU site are
compiled against GnuTLS, but if you compile your own Emacs, see the
file nt/INSTALL
in the Emacs source distribution for instructions.
Find the GnuTLS DLLs
The first google hit for "emacs gnutls windows" is this page. It says:
There's one way to find out if GnuTLS is available, by calling
gnutls-available-p
. This is a little bit trickier on the W32 (Windows) platform, but if you have the GnuTLS DLLs (available from http://sourceforge.net/projects/ezwinports/files/ thanks to Eli Zaretskii) in the same directory as Emacs, you should be OK.
On that page, I found:
gnutls-3.0.9-w32-bin.zip 2012-01-02 7.2 MB
Extract the GnuTLS DLLs
I first naïvely tried opening the zip file in Explorer and copying the
files from there, but that does nothing—it neither copies the files
nor displays any error message. You need to extract the zip file, and
then copy all DLL files to the bin
directory where your Emacs is
installed, probably somewhere like
C:\Program Files (x86)\emacs-24.3\bin
.
Restart Emacs and try it
At this point, if you restart Emacs and type:
M-: (gnutls-available-p) RET
you should see t
in the echo area, which means that Emacs can find
the GnuTLS libraries.
Configure trust files
However, if you try to open a TLS connection, it will fail complaining
that certificate validation failed. This happens because GnuTLS needs
to have a set of CA certificates to verify the certificates of the
servers it connects to. It looks for CA certificates in the locations
specified in the variable gnutls-trustfiles
, but none of the default
values work out of the box on Windows.
I'm not aware of any way to make GnuTLS use any certificates that come
with the Windows system, so you need to get a certificate bundle from
elsewhere. The cURL project
provides such a bundle that
you can download. Download the cacert.pem
file to a suitable
location, and point gnutls-trustfiles
to it with customize-option
.
Note that the file name is passed unexpanded to GnuTLS, so you cannot
use ~
as a shorthand for your home directory; use the full absolute
file name instead.
See if it works
Paste the following piece of code into the *scratch*
buffer:
(condition-case e
(delete-process
(gnutls-negotiate
:process (open-network-stream "test" nil "www.google.com" 443)
:hostname "www.google.com"
:verify-error t))
(error e))
Then put point at the end and hit C-j
. If nil
gets inserted into
the buffer, then the certificate could be verified, and your setup
appears to be working.
Otherwise, you'll see an error like:
(error "Certificate validation failed www.google.com, verification code 66")
If so, a good place to start debugging is setting the variable
gnutls-log-level
to a value greater than 0.
In Emacs 23 or later, you can use M-x proced
to get a list of
running processes. You can sort and filter them in various ways, the
most common end goal being to quickly find and kill certain processes
in my experience.
However, Proced currently doesn't work on OSX. I just spent a few minutes figuring out why, so I hope writing this could save someone else some effort.
Proced is not based on command line tools such as ps
, but uses
Emacs Lisp functions implemented in C to get the list of processes
(list-system-processes
) and to get process attributes
(process-attributes
). list-system-processes
works on OSX, but
process-attributes
does not.
sysdep.c
contains a few different implementations guarded by #ifdef
s, and
this is the one chosen for OSX:
Lisp_Object
system_process_attributes (Lisp_Object pid)
{
return Qnil;
}
In March 2010, this was discussed on the macosx-emacs mailing list, and a patch giving parts of the needed information yielded a screenshot, but it seems that it wasn't finished, nor merged into Emacs.
Update: I got bored and wrote a patch for it. It basically works, but there are some things that it currently doesn't retrieve, notably memory and CPU usage, and command line arguments.
Emacs' process editor (proced) works on many operating systems, but not OSX. Here is why.
In Erlang, if a process is running code from module a
when module
a
is reloaded, it doesn't get automatically upgraded to the new
version. The old version of the code is kept as long as any process
is using it, and the process is said to be running "old code".
However, only one old version is kept, so if the process doesn't switch to the new version before the module is reloaded for a second time, the process will be killed by the code server. This can be confusing, as there is often no trace of the process dying, and even if there is, you only get to know that the process was killed, but not why.
While chasing down a problem related to this, I came up with this patch to the code server:
diff --git a/lib/kernel/src/code_server.erl b/lib/kernel/src/code_server.erl
index 00ad923..c4d5fd6 100644
--- a/lib/kernel/src/code_server.erl
+++ b/lib/kernel/src/code_server.erl
@@ -1414,6 +1414,7 @@ do_purge(Mod0) ->
do_purge([P|Ps], Mod, Purged) ->
case erlang:check_process_code(P, Mod) of
true ->
+ catch info_msg("Killing ~p for old code from ~p", [P, Mod]),
Ref = erlang:monitor(process, P),
exit(P, kill),
receive
That led me straight to the module that was being reloaded, and let me fix the problem by ensuring that the process switched to new code.
This patch is probably not suitable for inclusion in the official Erlang/OTP sources, but I hope it can be useful when developing.
How can you know when the code server kills your process, and why?
So I was dusting off an old Windows XP machine, applying security
updates for the last few years, when suddenly I found that the machine
would crash with a BSOD a few minutes after boot. The BSOD said that
a DRIVER_IRQL_NOT_LESS_OR_EQUAL
error had occured in w22nt51.sys.
Googling that lead to this article, which revealed that
this is a problem in the driver for the Intel 2200BG wireless network
card.
Broken network drivers usually lead to chicken-and-egg problems, and that article suggests a way around that, but fortunately I had an Ethernet cable at hand and was not entirely dependent on wireless network. (The cable was in my "give away or throw away" bag, so I'm grateful for my laziness and will reconsider some of the items in it.)
So I solved this by the following steps:
- Boot into "Safe mode" (without networking).
- Deactivate the wireless network card in the "System" section of the control panel, to stop the crashes.
- Boot into "Safe mode with networking" (though a normal boot would probably have been sufficient at this point) and connect the Ethernet cable.
- Download the new driver from the Intel Download Center.
- Install the driver.
- Reboot, to make sure the old driver is not in use (not sure if necessary).
- Reactive the device in the control panel.
And it works!
So why am I subjecting myself to Windows XP again? Simply because the firmware upgrade tool for my HTC phone requires Windows. Also, now I have the opportunity to test whether my Emacs-based Jabber client works well with Windows. (One of these days…)
Some people like to read their email in Emacs. Of course, Emacs comes with not one but two email clients, the more advanced one being Gnus. Gnus lets you choose from a number of formats to store your email on your local disk. Perhaps you're using offlineimap to fetch your email from an IMAP server, and since offlineimap uses the Maildir format, you might think that using Gnus' nnmaildir backend is exactly what you're looking for.
However, as has been documented, there are some issues with nnmaildir. In particular, nnmaildir and offlineimap disagree about how to mark a message as read. Normally, offlineimap will synchronise that flag between the server and your local mail store, such that a message that you've read in one place eventually gets marked as read in the other, but in this case we're not so lucky; you'd have to mark old messages as read in both places, which pretty much defeats the purpose of offlineimap synchronisation.
At this point, most people give up on nnmaildir and install Dovecot, a small IMAP server, locally, make it serve the Maildir directory over IMAP, and point Gnus' nnimap backend to the local IMAP server. That's a very reasonable thing to do, and probably the most painless way.
Of course, that precludes the chance of becoming immortal by writing
code, so I dug into nnmaildir.el
. I found that it paid no attention
to what the Maildir specification says about message
flags, but instead stores flags in a Gnus-specific format inside the
Maildir directory. The latter needs to be kept, since not all Gnus
marks can be represented as Maildir flags, but the most important ones
("read", "replied to" and "flagged") can be stored as part of the file
names of the messages. I came up with a patch and posted
it to the bug report for this feature. As of
2012-09-05, it's merged into Gnus' master branch, and the change was
included in GNU Emacs 24.3.
So there it is, ready to be tested. There appear to be some glitches, in that it might sometimes not mark a message as read that should be read, but so far it seems it's not in the habit of eating people's email.
Not fast enough
There is more to do about nnmaildir, though; my changes so far have
been about making it correct, so the next step is to make it fast. It
works fairly well for mailboxes with thousands of messages, but when
you get into the tens of thousands, there is a long delay when
starting Gnus. (The logical solution to that is to never exit Gnus,
but…) I'm getting ahead of myself here, but I've sprinkled debug
messages over the code and so far they suggest that most of that time
is spent in nnmaildir--grp-add-art
. If I'm reading the
code correctly, it seems that it's entering every article, one by one,
into an ordered list called nlist
. Obviously, in the worst case,
that has quadratic time complexity. My thoughts at this moment are:
- Does this need to be an ordered list? (Other parts of the code performs operations on subranges of this list, but maybe we could use a tree or something…)
- Do we really need to read all of this into memory? This is data about every single message in the folder; most likely we won't ever use most of it.
That's where I am right now. I'll keep working on this as time permits.
So, just a quick overview of where we are, what territory we're in: I like Emacs, I like Erlang, and I like running unit tests on my code. This is my attempt to gather some pieces together to fuse all of this into a seamless experience, so that whenever something goes wrong, I can quickly get to exactly where in the code the error occurred.
This will involve patches both to Emacs and to Erlang/OTP. My hope is that these changes will make it into upstream versions fairly soon, but this is where we are at the moment.
To start with, I run my unit tests from within Emacs, using M-x
compile
. It takes any shell command—make test
or rebar eunit
come to mind—and runs it inside a buffer in Emacs. That means that
you can use the normal Emacs editing commands to move around, search
for things, copy pieces of text, etc. Emacs also attempts to
highlight error and warning messages in the output, and turn them into
links to the corresponding position of the source code. With a stock
Emacs, there are many possibilities left untouched, and that is where
our journey begins.
Get stacktraces from eunit failures (pre-R15B02)
First of all, if you're running Erlang/OTP R15B or R15B01, you have a version of eunit that doesn't print stacktraces when an error occurs. That can be a serious time waster, compared to knowing what code was running three or four stack frames down from the error site (not to mention even knowing in what function the error occurred). So make sure that you're running R15B02 or later.
If you're stuck with an earlier version, fortunately there is a patch for that. I've written up instructions on how to apply it to your existing Erlang/OTP installation with minimal effort.
So before this patch, eunit output looks like this:
foo:6: my_test_ (module 'foo')...*failed*
::error:foo
And after the patch, we get this:
foo:6: my_test_ (module 'foo')...*failed*
::error:foo
in function foo:'-my_test_/0-fun-0-'/0 [foo.erl:6]
Drop compilation error regexp from Emacs Erlang mode
I received great satisfaction the other day when I submitted a patch to Erlang/OTP consisting entirely of line removals. I'm not sure at what point in history the code I removed was needed, but the comments suggest at some point around Emacs 19. My Emacs 24 does compilation error highlighting just as well without it.
Even better, in fact—once the Erlang mode had installed this regexp
globally, Emacs lost the ability to distinguish between compilation
errors (pink) and compilation warnings (orange). This regexp was
installed first in a long list (compilation-error-regexp-alist
),
thus hiding more sophisticated regexps.
If you have Erlang checked out from Github, you can apply this patch
with the following commands (adjusting the git-fetch
command if
needed):
git fetch
git cherry-pick a87a9699735b0a25f99397fba9576f5756da54d3
Theoretically you could undo the changes that have been done to your Emacs session, but the simplest way is to just restart Emacs.
So far, so good: if you compile your Erlang code from within Emacs
with M-x compile
, warnings will be orange instead of pink, and M-g
M-n
(the next-error
command) will skip warnings and jump directly
to errors.
Jump from failed test cases to code
So let's return to the output we got from eunit:
foo:6: my_test_ (module 'foo')...*failed*
That tells us that the test on line 6 in module foo
failed. That's
pretty unambiguous, so there's no reason why I as a human should spend
any effort on finding that location when the computer can do it for
me.
This calls for a regular expression, of course. Here it is:
(setq compilation-error-regexp-alist-alist
(delq (assq 'erlang-eunit compilation-error-regexp-alist-alist)
compilation-error-regexp-alist-alist))
(add-to-list
'compilation-error-regexp-alist-alist
(cons
'erlang-eunit
(list
"^ *\\(\\([^.:( \t\n]+\\):\\([0-9]+\\)\\):.*\\.\\.\\.\\(?:\\([^*]\\)\\|[*]\\)"
;; file
(list 2 "%s.erl" "src/%s.erl" "test/%s.erl")
;; line
3
;; column
nil
;; type - need to match [^*] after the three dots to be info,
;; otherwise it's an error
(cons nil 4)
;; highlight
1
)))
(add-to-list 'compilation-error-regexp-alist 'erlang-eunit)
The first line makes sure that any earlier attempts are purged before adding the new regexp, to avoid accumulating cruft. I did refine this quite a few times before arriving at this version ☺
I won't bore you with the details of that piece of code (see C-h v
compilation-error-regexp-alist
if you're interested), but let me just
draw your attention to one of the lines:
(list 2 "%s.erl" "src/%s.erl" "test/%s.erl")
Since the text we're matching is the module name, not the file name, we need to tell Emacs how to make a file name out of it. I added two common subdirectory names to make it do the right thing in most cases.
However, there is a bug in Emacs in Emacs that
prevents that from working—if we try this with our example above,
Emacs will just ask where the foo
file is. So head to that bug
report and apply the patch, to continue your journey to instant link
bliss.
Done? Great! You're probably itching to try this out, and thus
frantically looking for a failing test case. There's a simpler way:
just put the example output above in a text file, and hit M-x
compilation-minor-mode
and try all the links. To edit the text file
again, type M-x fundamental-mode
.
Jump from stacktrace lines to code
Since R15B, Erlang stacktraces include file names and line numbers (and with the patch above, we got Eunit to display them to us). So wouldn't it be great to use that information to jump directly from the test output to the corresponding point in the code?
The output looks like this, and we might want to jump to any of the line numbers given:
foo: bza_test...*failed*
::error:{badmatch,c}
in function foo:b/0 [foo.erl:31]
in call from foo:a/0 [foo.erl:27]
in call from foo:bza_test/0 [foo.erl:23]
This calls for another regexp:
(setq compilation-error-regexp-alist-alist
(delq (assq 'erlang-eunit-stacktrace compilation-error-regexp-alist-alist)
compilation-error-regexp-alist-alist))
(add-to-list
'compilation-error-regexp-alist-alist
(cons
'erlang-eunit-stacktrace
(list
"^ *in \\(?:function\\|call from\\) .* \\[\\(\\([^:]+\\):\\([0-9]+\\)\\)\\]$"
;; file
2
;; line
3
;; column
nil
;; type
2
;; hyperlink
1
)))
(add-to-list 'compilation-error-regexp-alist 'erlang-eunit-stacktrace)
So from an error like the one we saw above, we can now move point to a line in the stacktrace and hit Enter, and Emacs will take us to the right line and file. (You could of course just click on the stacktrace line, if you're into that kind of thing.)
Jump to failing assertions
And while we're at it, why not create links for assertion failures as
well? Eunit's assert macros (assert
, assertEqual
, assertMatch
etc) create error messages that look like this:
foo: foo_test (module 'foo')...*failed*
::error:{assertEqual_failed,[{module,foo},
{line,6},
{expression,"2"},
{expected,1},
{value,2}]}
in function foo:'-foo_test/0-fun-0-'/1 [foo.erl:6]
Here, the module name and the line number are on different lines, but that doesn't stop this regexp from working:
(setq compilation-error-regexp-alist-alist
(delq (assq 'erlang-eunit-assert compilation-error-regexp-alist-alist)
compilation-error-regexp-alist-alist))
(add-to-list
'compilation-error-regexp-alist-alist
(cons
'erlang-eunit-assert
(list
(concat
"^\\(::error:{assert[A-Za-z]+_failed\\),"
"[ \n]*\\[{module,\\([^}]+\\)},"
"[ \n]*{line,\\([0-9]+\\)}")
;; file
(list 2 "%s.erl" "src/%s.erl" "test/%s.erl")
;; line
3
;; column
nil
;; type
2
;; hyperlink
1
)))
(add-to-list 'compilation-error-regexp-alist 'erlang-eunit-assert)
Again, we have module names, not file names, so the same caveat applies.
Jump from stacktraces embedded in other output
The above snippets work very well as long as you get straightforward errors—something has crashed, and the error gets propagated up to your test function, and further up to Eunit, which formats a nice error report. But you're not always so lucky. You might have an error in a linked process:
foo: bar_test...
=ERROR REPORT==== 1-Aug-2012::19:56:58 ===
Error in process <0.79.0> with exit value: {badarith,[{foo,baz,0,[{file,"foo.erl"},{line,13}]}]}
*skipped*
undefined
*unexpected termination of test process*
::{badarith,[{foo,baz,0,[{file,"foo.erl"},{line,13}]}]}
Or there might be a catch
somewhere deep in the code, and an error
gets propagated into a comparison or something:
foo: frobozz_test...*failed*
::error:{badmatch,
{ok,{'EXIT',
{badarg,
[{erlang,list_to_integer,[x],[]},
{foo,frobozz,1,[{file,"foo.erl"},{line,19}]},
{foo,frobozz_test,0,[{file,"foo.erl"},{line,16}]},
{eunit_test,'-function_wrapper/2-fun-0-',2,
[{file,[...]},{line,...}]},
{eunit_test,run_testfun,1,[{file,...},{...}]},
{eunit_proc,run_test,1,[{...}|...]},
{eunit_proc,with_timeout,3,[...]},
{eunit_proc,handle_test,2,...}]}}}}
in function foo:frobozz_test/0 [foo.erl:16]
So here you can see that the crash actually occurred on line 19, but the only stacktrace line that our regexps so far can recognise is for line 16. But since the information is there, let's spare ourselves the trouble of moving to that line manually:
(setq compilation-error-regexp-alist-alist
(delq (assq 'erlang-raw-stacktrace compilation-error-regexp-alist-alist)
compilation-error-regexp-alist-alist))
(add-to-list
'compilation-error-regexp-alist-alist
(cons
'erlang-raw-stacktrace
(list
"{file,\"\\([^\"]+\\)\"},[[:space:]]*{line,\\([0-9]+\\)}"
;; file
1
;; line
2
;; column
nil
;; type
2
;; hyperlink
1
)))
(add-to-list 'compilation-error-regexp-alist 'erlang-raw-stacktrace)
There it is. Now, every time the compilation output contains
{file,"something.erl"}
and {line,42}
, possibly separated by
whitespace, the file name will be turned into a link.
Conclusion
That is all I've been able to come up with for now. I've automated the boring things (finding where my test crashed) so I can spend more time and energy on the fun things (actually fixing the test)—which is what Emacs is all about, of course.
With this article I want to show that the Russian and Polish orthographies, although very different, express almost the same set of phonemes. I hope that this will help you, dear reader, to read either of the languages better.
Comments, corrections and criticism are always welcome.
Phonemes
By "phoneme" I mean the smallest meaningful part of the sequence of sounds that make up a word. In this text, I will transcribe the phonemes that a written word (Russian or Polish) represents using Latin letters in [brackets]. (Although this is a common way to show pronunciation, I'm completely uninterested in pronunciation in this text, since that would get in the way of comparing Polish and Russian.)
Both Russian and Polish use "softened" consonants. I will indicate those with a subscript J in my transcriptions, for example НЬ = Ń = [nⱼ]. The difference in spelling of softened consonants between the two languages will be a central topic in the following.
Softening in Russian
In Russian, softened consonants are generally expressed in writing by a following "soft" vowel. The "hard" vowels А, О, У, Ы and Э correspond to the "soft" vowels Я, Ё, Ю, И and Е. If a softened consonant is not followed by a vowel, the so-called soft sign is used instead: Ь
E.g.: медь [mⱼedⱼ] "copper", лёд [lⱼod] "ice".
(In these two examples, the final sound is actually pronounced as T instead of D, because of the devoicing of final consonants that occurs in both Russian and Polish, but in the transcription I nevertheless use D, to follow the written form of the original word.)
If a soft vowel appears in the beginning of a word, or after another vowel, it represents the sound [j] instead of softening: ясный [jasnyj] "clear".
Some consonants (Ж, Ш, Ц) are never soft, and some (Ч, Щ) are always soft.
Softening in Polish
In Polish, softened consonants are in principle expressed by letters with diacritical marks: Ć, Ń, Ś, Ź. However, if the vowel "i" appears after the softened consonant, the mark is removed (since "i" by itself indicates softening), and if another vowel follows, an extra "i" is inserted between the consonant and the vowel. This can cause spelling differences in different conjugations of the same work, for example: koń [konⱼ] "horse", konia [konⱼa] "of a horse", koni [konⱼi] "of horses".
The hard correspondent to Ć is not C, but T. Therefore I will write the soft C as [tⱼ] in my transcriptions, e.g.: ciasto [tⱼasto] "cake".
The soft correspondent to D is written DŹ (but without the diacritical sign when it appears before a vowel, according to the rules above). E.g.: dziki [dⱼiki] "wild".
The consonant L is exceptional. It is written Ł when hard and L when soft. E.g.: las [lⱼas] "forest", głodny [glodny] "hungry".
The consonant R is also exceptional. It is written R when it is hard and RZ when it is soft. E.g.: ręka [ręka] "hand", rzeka [rⱼeka] "river". (Ą and Ę are Polish nasal vowels that Russian no longer has. I leave them as is in the transcriptions, but will go into further detail below.)
The reader will probably protest against some of the above pairings, and will rightly remark that L/Ł, R/RZ, S/Ś sound completely differently in Polish, but again I'd like to point out that I'm not interested in pronunciation; to compare Russian and Polish, these need to be treated as related phonemes.
Comparisons
Armed with this system for transforming written words from the two languages into a single transcription, we can to begin with notice that many words have the same phonemes (though not always the same meaning):
- кот = kot = [kot] "cat"
- конь = koń = [konⱼ] "horse"
- дети = dzieci = [dⱼetⱼi] "children"
- сеть = sieć = [sⱼetⱼ] "net"
- река = rzeka = [rⱼeka] "river"
- неделя "week" = niedziela "Sunday" = [nⱼedⱼelⱼa]
In some words we find a vowel change. Fairly often this is caused by the Old Slavic vowel Ѣ "yat", which in Russian became [ⱼe] but in Polish became either [ⱼa] or [ⱼe] depending on conjugation:
- белый [bⱼelyj] ≈ biały [bⱼaly] "white"
- лес [lⱼes] ≈ las [lⱼas] "forest"
- лесной [lⱼesnoj] ≈ leśny [lⱼesⱼny] "pertaining to (a) forest" (adjective)
- вера [vⱼera] ≈ wiara [vⱼara] "belief"
- место [mⱼesto] "place" ≈ miasto [mⱼasto] "city"
Sometimes Russian has an unstressed E where Polish has O:
- сестра [sⱼestra] ≈ siostra [sⱼostra] "sister"
- седло [sⱼedlo] ≈ siodło [sⱼodlo] "saddle"
The Polish nasal vowels Ą kaj Ę were originally written using the Cyrillic letters Ѫ "big yus" and Ѧ "little yus" (but in Polish those two sounds first collapsed into one and later separated again, such that it's not immediately obvious which of them was the original sound). In Russian, those sounds sometimes became [u], sometimes [ⱼa]:
- пять [pⱼatⱼ] ≈ pięć [pⱼętⱼ] "five"
- мясо [mⱼaso] ≈ mięso [mⱼęso] "meat"
- счастье [sĉⱼastje] ≈ szczęście [ŝĉęsⱼtⱼe] "happiness" (here [ⱼa] is written with А because of an orthographic rule)
- мука [muka] ≈ mąka [mąka] "flour"
- рука [ruka] ≈ ręka [ręka] "hand"
- мудрый [mudryj] ≈ mądry [mądry] "wise"
- буду [budu] ≈ będę [będę] "I will be"
In some cases Russian has [olo], [oro] or [ⱼerⱼe], while Polish lacks the first vowel:
- голос [golos] ≈ głos [glos] "voice"
- берег [bⱼerⱼeg] ≈ brzeg [brⱼeg] "coast"
- горох [goroĥ] ≈ groch [groĥ] "pea"
- молоко [moloko] ≈ mleko [mlⱼeko] "milk"
This article is intended as a gentle introduction to the UK tax system for immigrant workers. I've learnt a bit since I was "fresh of the boat", so I thought I'd share it in the hope that it be useful to someone. Any comments or feedback is welcome, of course ☺
In the following, I will assume that you have one and only one job, and that you get a monthly salary. Mutatur mutandis, caveat emptor, etc.
The basics
In the UK, tax is collected by a government agency called HMRC, Her Majesty's Revenue and Customs. If you're just a normal employee, your tax will be deducted from your salary payments before you even get the money through a scheme called Pay As You Earn (PAYE). The taxman is usually happy to keep the relationship at that level, but as you will see below it is sometimes to your advantage to get involved with them.
To the HMRC, you are just a number, specifically a National Insurance (NI) number. Your employer will ask you for your NI number when you start working. If you don't have one, just say so, and then try to get one as soon as possible.
While your tax will be deducted from your monthly salary, the amount of tax you pay is actually decided by your total income during the current tax year. The tax year runs from the 6th of April to the 5th of April the following year. During the tax year, you only pay tax on the part of your income that exceeds the personal allowance (£7,475 for 2011-12). You pay 20% of the amount that exceeds the personal allowance up to a certain limit (£35,000 for 2011-12), and a higher rate for the amount exceeding that limit.
On top of that, you pay a few percent of your salary for National Insurance contributions, which I won't cover in this article.
The tax code
Based on what they know about you, the HMRC assigns you a tax code,
which is used by your employer to figure out how much tax to deduct
from your salary. The tax code looks something like 747L
. (If your
tax code doesn't contain three digits and the letter L, you'd be
better off reading the HMRC page than this article.)
This means that your tax-free allowance for the year is £7,479 (replace
the L with a 9), and that this should be deducted from your salary
evenly across the year. (I'm aware that I said £7,475 above, but both
numbers come from the HMRC web site, so the confusion is not my
fault. And what is £4 between friends?…)
The HMRC should send you a letter notifying you about what tax code they have chosen, and you can also find it on every payslip.
Emergency tax codes
When the HMRC doesn't have enough information to assign the correct tax code, they give you something they call an emergency tax code. The name is misleading: there's nothing "emergency" about the tax code itself; in fact, it's quite likely that you will get the same tax code once your relations with HMRC have stabilised.
And this is the part where they take your money
You could probably see this coming: if you didn't start working at the beginning of the tax year (i.e., 6th of April), your Personal Allowance will make up a greater proportion of your salary than the HMRC thinks, and thus you should pay less tax each month. Of course, this isn't something the HMRC is eager to tell you.
For example, if your annual salary is £20,000 and you work during the entire tax year, your tax for the year would be £2,505:
(20,000 - 7,475) × 20% = 2,505
which is £208.75 per month. But if you started working in October, the personal allowance cancels out a greater part of your income:
(10,000 - 7,475) × 20% = 505
Split over six months, you should pay £42.08 per month. However, if you don't get your tax code right, you'd pay £208.75 per month, and as a result you would have overpaid £1,000 by the end of the tax year.
What you can do about it
Basically, you should write a letter to the HMRC and ask them to stop taking too much money, or to pay back the amount you have overpaid.
To find out where to send your letter, you first need to know your employer's taxpayer's reference. It might be indicated on your payslip, on your P60 form (see below), or you could ask your employer. Armed with that piece of knowledge, go to the [tax office finder][tax-office-finder] and type in the code, and you'll get the address of the tax office dealing with your tax.
During the tax year in question
In theory, the P46 form that you filled in when you started working should have saved you from all this trouble, as it gives the HMRC all the information they need to work out the correct tax code, but in practice you're not always that lucky. What you can do is ask the HMRC to give you a new tax code for the rest of the year, which would result in smaller tax deductions as they "pay back" by reducing future payments.
Always quote your National Insurance number, your employer's Taxpayer Reference, your address and your phone number in your letters.
After the tax year in question
Some time in April or May, your employer will give you a P60 form, which sums up your income and your tax payments during the previous tax year. This is very useful, since it contains all the information you need to claim back the overpaid tax. Use the formulas above to calculate how much tax you should have paid, and then write them a letter containing:
- your National Insurance number
- your employer's Taxpayer Reference
- the amount of tax you should have paid
- the amount of tax you actually paid
- the amount they owe you
- bank account details for repayment (sort code, account number, branch address)
Also enclose a copy of your P60 form.
You should get your refund within a few months, unless of course they manage to lose your request somehow, in which case you need to remind them.
Last weekend I was at the Language Show in London, mostly telling people about Esperanto at the stand of the Esperanto Association of Britain. (I had a lot of fun, of course, getting to tell people almost everything about my hobby.)
There were lots of exhibitors proposing various ways to learn various languages, but what stuck in my mind was the stand about Saaspel, a proposal for an alternative English orthography. Reforming their spelling system is something the English should have done a long time ago, as the pronunciation of a word generally has no relation to its spelling—which makes using the language harder for learners and native speakers alike. (The traditional spelling may have been a perfect fit for the language of Shakespeare's time, but is quite irrelevant now.)
Saaspel (or Sāspel, as written in its alternative form, with macrons to indicate long vowels) stands for "same sound—same spelling", which is a pretty good summary of how it works. Words are written as they are pronounced, with little (but still some) consideration given to their classical spelling. A few other rules and principles are:
A "long vowel", pronounced as in the alphabet, is written "long", either "aa ee ii oo" or "ā ē ī ō", depending on your taste. "U" gets special treatment: "use" → "yuz", "boot" → "buut" or "būt".
The vowels A, E, I and O, when short, generally correspond to the sound written with those letters in most continental European languages. For the same reason, "automatic" becomes "outomatic" but "sound" becomes "saund". Again, "U" gets special treatment and is used for the sound in "cut".
"K" is not used except in proper names. The letter "C" always stands for the "K" sound, never for "S".
Voiced "th", as in "this", is written "th", but voiceless "th", as in "thin", becomes "tt".
Often a consonant is enough for an entire syllable: "silabl" (syllable), "endd" (ended), "problm" (problem).
See their web site for more.
Though I'm not convinced that Saaspel is the perfect way to fix English spelling, I feel that it is a decent implementation of a good idea, so I'll try using it and see if I can make someone happy with it. (Ideally my effort would make the entire world adopt it, but if I can give at least one fellow human that warm fuzzy feeling of something done right, it's totally worth it.)