Diary entries in English. See links above for other languages.

There is also a combined list of all diary entries in any language.

So I'm writing an IMAP client for Emacs. The main reason is that I want an email client works as if by osmosis, like the one on my phone—whenever it has a connection, messages magically enter a local cache, where they stay available even if the connection is lost. There are many ways to achieve that with existing Emacs email clients, possibly augmented with external tools, but I want one that works with minimal setup, and requires nothing but Emacs itself.

The first problem encountered when writing an IMAP client is parsing what the server is sending. There are two IMAP parsers in the Emacs source tree already (imap.el and nnimap.el), both fairly closely tied to their respective clients, and both synchronous—they block Emacs while they are waiting for data from the server. I want to address both of those points: my IMAP client is going to have a reusable asynchronous parser module.

Obviously, the first step in figuring out how to parse the protocol is to read the IMAP RFC. Naturally, you'd skip ahead to the section "Formal syntax", which defines in ABNF what various messages should look like: a sequence number followed by one of a few keywords followed by a quoted string or an atom, etc etc. That's where I started, adding a special case for each command response into my parser code.

But then I thought that this wasn't the right way to do it. The parser module wouldn't be independent of the module that uses it, since the calling module might want to use an extension that the parser module doesn't know about yet. Also, I would constantly need to come up with sensible data structures to represent the responses with.

So I decided to create a parser that would be able to parse IMAP responses with as little knowledge as possible about what those responses should look like. It turns out that most of IMAP consists of atoms, strings and lists. I figured that I should be able to parse everything following these simple rules:

  • If it starts with a double quote, parse it as a quoted string.
  • If it starts with a ( or [, descend, parse recursively and return it as a list.
  • Otherwise, treat it as an atom, read until the next space character or closing ) or ], and return it as a string.

For example, this LIST response:

* LIST (\HasNoChildren) "." INBOX

gets parsed to this:

("*" "LIST" ("\\HasNoChildren") "." "INBOX")

This mostly worked. One exception is that if the second word of a line is one of OK, NO, BAD, BYE or PREAUTH, then the rest of the line (described by resp-text in the RFC) is free-form human-readable text, optionally preceded by a resp-text-code. For example, this is part of a response to a SELECT command from the Dovecot server:

* OK [UNSEEN 6] First unseen.
* OK [UIDVALIDITY 1381959933] UIDs valid
* OK [UIDNEXT 10] Predicted next UID
1 OK [READ-WRITE] Select completed (0.003 secs).

There is no reason why this free-form text couldn't contain unbalanced parentheses or anything else that might confuse a parser, and besides it doesn't make sense to split the text into words anyway. So that's a special case for the parser, and gets parsed to this:

("*" :ok :code "UNSEEN"      :data "6"          :text "First unseen.")
("*" :ok :code "UIDVALIDITY" :data "1381959933" :text "UIDs valid")
("*" :ok :code "UIDNEXT"     :data "10"         :text "Predicted next UID")
("1" :ok :code "READ-WRITE"  :data nil          :text "Select completed (0.003 secs).")

Then there is BODY, which when given as a fetch or message attribute is followed immediately by an opening square bracket, e.g. BODY[]. I decided to treat it as if there were a space between BODY and the bracket, and return them as two elements, the string "BODY" and the list of items parsed inside the brackets.

And all of this applies in principle to every single line, except that a physical line can end with a byte count in curly braces (e.g. {42}), which means that the following bytes are literal data to be treated as part of the current logical line. Fortunately, this is independent from the parsing itself, so I have a function that splits the data by logical lines and passes everything to the parse function.

So far, my little client is able to select a mailbox, search for unread messages and fetch the messages, and my parser is sufficient for all of that.

Posted Sat 03 May 2014 03:47:00 AM BST

Emacs 24 introduced the possibility to open TLS network connections using the GnuTLS library directly, instead of using a command line tool as a wrapper. This is especially interesting for those who are stuck using Emacs on Windows, as the command line tools can be rather brittle on that platform.

However, there are some steps that need to be performed in order to get native GnuTLS to work. This page attempts to describe them.

Get a GnuTLS-enabled Emacs

The Windows binaries available for download from the GNU site are compiled against GnuTLS, but if you compile your own Emacs, see the file nt/INSTALL in the Emacs source distribution for instructions.

Find the GnuTLS DLLs

The first google hit for "emacs gnutls windows" is this page. It says:

There's one way to find out if GnuTLS is available, by calling gnutls-available-p. This is a little bit trickier on the W32 (Windows) platform, but if you have the GnuTLS DLLs (available from http://sourceforge.net/projects/ezwinports/files/ thanks to Eli Zaretskii) in the same directory as Emacs, you should be OK.

On that page, I found:

gnutls-3.0.9-w32-bin.zip    2012-01-02  7.2 MB

Extract the GnuTLS DLLs

I first naïvely tried opening the zip file in Explorer and copying the files from there, but that does nothing—it neither copies the files nor displays any error message. You need to extract the zip file, and then copy all DLL files to the bin directory where your Emacs is installed, probably somewhere like C:\Program Files (x86)\emacs-24.3\bin.

Restart Emacs and try it

At this point, if you restart Emacs and type:

M-: (gnutls-available-p) RET

you should see t in the echo area, which means that Emacs can find the GnuTLS libraries.

Configure trust files

However, if you try to open a TLS connection, it will fail complaining that certificate validation failed. This happens because GnuTLS needs to have a set of CA certificates to verify the certificates of the servers it connects to. It looks for CA certificates in the locations specified in the variable gnutls-trustfiles, but none of the default values work out of the box on Windows.

I'm not aware of any way to make GnuTLS use any certificates that come with the Windows system, so you need to get a certificate bundle from elsewhere. The cURL project provides such a bundle that you can download. Download the cacert.pem file to a suitable location, and point gnutls-trustfiles to it with customize-option. Note that the file name is passed unexpanded to GnuTLS, so you cannot use ~ as a shorthand for your home directory; use the full absolute file name instead.

See if it works

Paste the following piece of code into the *scratch* buffer:

(condition-case e
    (delete-process
     (gnutls-negotiate
      :process (open-network-stream "test" nil "www.google.com" 443)
      :hostname "www.google.com"
      :verify-error t))
  (error e))

Then put point at the end and hit C-j. If nil gets inserted into the buffer, then the certificate could be verified, and your setup appears to be working.

Otherwise, you'll see an error like:

(error "Certificate validation failed www.google.com, verification code 66")

If so, a good place to start debugging is setting the variable gnutls-log-level to a value greater than 0.

Emacs 24 introduced the possibility to open TLS network connections using the GnuTLS library directly, instead of using a command line tool as a wrapper. This is especially interesting for those who are stuck using Emacs on Windows, as the command line tools can be rather brittle on that platform. However, there are some steps that need to be performed in order to get native GnuTLS to work. This page attempts to describe them.

Posted Sat 26 Apr 2014 01:01:11 AM BST

In Emacs 23 or later, you can use M-x proced to get a list of running processes. You can sort and filter them in various ways, the most common end goal being to quickly find and kill certain processes in my experience.

However, Proced currently doesn't work on OSX. I just spent a few minutes figuring out why, so I hope writing this could save someone else some effort.

Proced is not based on command line tools such as ps, but uses Emacs Lisp functions implemented in C to get the list of processes (list-system-processes) and to get process attributes (process-attributes). list-system-processes works on OSX, but process-attributes does not. sysdep.c contains a few different implementations guarded by #ifdefs, and this is the one chosen for OSX:

Lisp_Object
system_process_attributes (Lisp_Object pid)
{
  return Qnil;
}

In March 2010, this was discussed on the macosx-emacs mailing list, and a patch giving parts of the needed information yielded a screenshot, but it seems that it wasn't finished, nor merged into Emacs.

Update: I got bored and wrote a patch for it. It basically works, but there are some things that it currently doesn't retrieve, notably memory and CPU usage, and command line arguments.

Emacs' process editor (proced) works on many operating systems, but not OSX. Here is why.

Posted Tue 03 Sep 2013 01:51:13 PM BST

In Erlang, if a process is running code from module a when module a is reloaded, it doesn't get automatically upgraded to the new version. The old version of the code is kept as long as any process is using it, and the process is said to be running "old code".

However, only one old version is kept, so if the process doesn't switch to the new version before the module is reloaded for a second time, the process will be killed by the code server. This can be confusing, as there is often no trace of the process dying, and even if there is, you only get to know that the process was killed, but not why.

While chasing down a problem related to this, I came up with this patch to the code server:

diff --git a/lib/kernel/src/code_server.erl b/lib/kernel/src/code_server.erl
index 00ad923..c4d5fd6 100644
--- a/lib/kernel/src/code_server.erl
+++ b/lib/kernel/src/code_server.erl
@@ -1414,6 +1414,7 @@ do_purge(Mod0) ->
 do_purge([P|Ps], Mod, Purged) ->
     case erlang:check_process_code(P, Mod) of
    true ->
+       catch info_msg("Killing ~p for old code from ~p", [P, Mod]),
        Ref = erlang:monitor(process, P),
        exit(P, kill),
        receive

That led me straight to the module that was being reloaded, and let me fix the problem by ensuring that the process switched to new code.

This patch is probably not suitable for inclusion in the official Erlang/OTP sources, but I hope it can be useful when developing.

How can you know when the code server kills your process, and why?

Posted Tue 20 Aug 2013 12:55:27 PM BST

So I was dusting off an old Windows XP machine, applying security updates for the last few years, when suddenly I found that the machine would crash with a BSOD a few minutes after boot. The BSOD said that a DRIVER_IRQL_NOT_LESS_OR_EQUAL error had occured in w22nt51.sys. Googling that lead to this article, which revealed that this is a problem in the driver for the Intel 2200BG wireless network card.

Broken network drivers usually lead to chicken-and-egg problems, and that article suggests a way around that, but fortunately I had an Ethernet cable at hand and was not entirely dependent on wireless network. (The cable was in my "give away or throw away" bag, so I'm grateful for my laziness and will reconsider some of the items in it.)

So I solved this by the following steps:

  • Boot into "Safe mode" (without networking).
  • Deactivate the wireless network card in the "System" section of the control panel, to stop the crashes.
  • Boot into "Safe mode with networking" (though a normal boot would probably have been sufficient at this point) and connect the Ethernet cable.
  • Download the new driver from the Intel Download Center.
  • Install the driver.
  • Reboot, to make sure the old driver is not in use (not sure if necessary).
  • Reactive the device in the control panel.

And it works!

So why am I subjecting myself to Windows XP again? Simply because the firmware upgrade tool for my HTC phone requires Windows. Also, now I have the opportunity to test whether my Emacs-based Jabber client works well with Windows. (One of these days…)

Posted Mon 08 Jul 2013 12:14:40 AM BST

Some people like to read their email in Emacs. Of course, Emacs comes with not one but two email clients, the more advanced one being Gnus. Gnus lets you choose from a number of formats to store your email on your local disk. Perhaps you're using offlineimap to fetch your email from an IMAP server, and since offlineimap uses the Maildir format, you might think that using Gnus' nnmaildir backend is exactly what you're looking for.

However, as has been documented, there are some issues with nnmaildir. In particular, nnmaildir and offlineimap disagree about how to mark a message as read. Normally, offlineimap will synchronise that flag between the server and your local mail store, such that a message that you've read in one place eventually gets marked as read in the other, but in this case we're not so lucky; you'd have to mark old messages as read in both places, which pretty much defeats the purpose of offlineimap synchronisation.

At this point, most people give up on nnmaildir and install Dovecot, a small IMAP server, locally, make it serve the Maildir directory over IMAP, and point Gnus' nnimap backend to the local IMAP server. That's a very reasonable thing to do, and probably the most painless way.

Of course, that precludes the chance of becoming immortal by writing code, so I dug into nnmaildir.el. I found that it paid no attention to what the Maildir specification says about message flags, but instead stores flags in a Gnus-specific format inside the Maildir directory. The latter needs to be kept, since not all Gnus marks can be represented as Maildir flags, but the most important ones ("read", "replied to" and "flagged") can be stored as part of the file names of the messages. I came up with a patch and posted it to the bug report for this feature. As of 2012-09-05, it's merged into Gnus' master branch, and the change was included in GNU Emacs 24.3.

So there it is, ready to be tested. There appear to be some glitches, in that it might sometimes not mark a message as read that should be read, but so far it seems it's not in the habit of eating people's email.

Not fast enough

There is more to do about nnmaildir, though; my changes so far have been about making it correct, so the next step is to make it fast. It works fairly well for mailboxes with thousands of messages, but when you get into the tens of thousands, there is a long delay when starting Gnus. (The logical solution to that is to never exit Gnus, but…) I'm getting ahead of myself here, but I've sprinkled debug messages over the code and so far they suggest that most of that time is spent in nnmaildir--grp-add-art. If I'm reading the code correctly, it seems that it's entering every article, one by one, into an ordered list called nlist. Obviously, in the worst case, that has quadratic time complexity. My thoughts at this moment are:

  • Does this need to be an ordered list? (Other parts of the code performs operations on subranges of this list, but maybe we could use a tree or something…)
  • Do we really need to read all of this into memory? This is data about every single message in the folder; most likely we won't ever use most of it.

That's where I am right now. I'll keep working on this as time permits.

nnmaildir, the Maildir implementation of Gnus, doesn't agree with offlineimap about what a read message looks like. I'm trying to fix that, and other issues with nnmaildir as well.

Posted Sat 25 Aug 2012 11:27:11 PM BST

So, just a quick overview of where we are, what territory we're in: I like Emacs, I like Erlang, and I like running unit tests on my code. This is my attempt to gather some pieces together to fuse all of this into a seamless experience, so that whenever something goes wrong, I can quickly get to exactly where in the code the error occurred.

This will involve patches both to Emacs and to Erlang/OTP. My hope is that these changes will make it into upstream versions fairly soon, but this is where we are at the moment.

To start with, I run my unit tests from within Emacs, using M-x compile. It takes any shell command—make test or rebar eunit come to mind—and runs it inside a buffer in Emacs. That means that you can use the normal Emacs editing commands to move around, search for things, copy pieces of text, etc. Emacs also attempts to highlight error and warning messages in the output, and turn them into links to the corresponding position of the source code. With a stock Emacs, there are many possibilities left untouched, and that is where our journey begins.

Get stacktraces from eunit failures (pre-R15B02)

First of all, if you're running Erlang/OTP R15B or R15B01, you have a version of eunit that doesn't print stacktraces when an error occurs. That can be a serious time waster, compared to knowing what code was running three or four stack frames down from the error site (not to mention even knowing in what function the error occurred). So make sure that you're running R15B02 or later.

If you're stuck with an earlier version, fortunately there is a patch for that. I've written up instructions on how to apply it to your existing Erlang/OTP installation with minimal effort.

So before this patch, eunit output looks like this:

foo:6: my_test_ (module 'foo')...*failed*
::error:foo

And after the patch, we get this:

foo:6: my_test_ (module 'foo')...*failed*
::error:foo
  in function foo:'-my_test_/0-fun-0-'/0 [foo.erl:6]

Drop compilation error regexp from Emacs Erlang mode

I received great satisfaction the other day when I submitted a patch to Erlang/OTP consisting entirely of line removals. I'm not sure at what point in history the code I removed was needed, but the comments suggest at some point around Emacs 19. My Emacs 24 does compilation error highlighting just as well without it.

Even better, in fact—once the Erlang mode had installed this regexp globally, Emacs lost the ability to distinguish between compilation errors (pink) and compilation warnings (orange). This regexp was installed first in a long list (compilation-error-regexp-alist), thus hiding more sophisticated regexps.

If you have Erlang checked out from Github, you can apply this patch with the following commands (adjusting the git-fetch command if needed):

git fetch
git cherry-pick a87a9699735b0a25f99397fba9576f5756da54d3

Theoretically you could undo the changes that have been done to your Emacs session, but the simplest way is to just restart Emacs.

So far, so good: if you compile your Erlang code from within Emacs with M-x compile, warnings will be orange instead of pink, and M-g M-n (the next-error command) will skip warnings and jump directly to errors.

Jump from failed test cases to code

So let's return to the output we got from eunit:

foo:6: my_test_ (module 'foo')...*failed*

That tells us that the test on line 6 in module foo failed. That's pretty unambiguous, so there's no reason why I as a human should spend any effort on finding that location when the computer can do it for me.

This calls for a regular expression, of course. Here it is:

(setq compilation-error-regexp-alist-alist
      (delq (assq 'erlang-eunit compilation-error-regexp-alist-alist)
            compilation-error-regexp-alist-alist))
(add-to-list
 'compilation-error-regexp-alist-alist
 (cons
  'erlang-eunit
  (list
   "^ *\\(\\([^.:( \t\n]+\\):\\([0-9]+\\)\\):.*\\.\\.\\.\\(?:\\([^*]\\)\\|[*]\\)"
   ;; file
   (list 2 "%s.erl" "src/%s.erl" "test/%s.erl")
   ;; line
   3
   ;; column
   nil
   ;; type - need to match [^*] after the three dots to be info,
   ;; otherwise it's an error
   (cons nil 4)
   ;; highlight
   1
   )))
(add-to-list 'compilation-error-regexp-alist 'erlang-eunit)

The first line makes sure that any earlier attempts are purged before adding the new regexp, to avoid accumulating cruft. I did refine this quite a few times before arriving at this version ☺

I won't bore you with the details of that piece of code (see C-h v compilation-error-regexp-alist if you're interested), but let me just draw your attention to one of the lines:

   (list 2 "%s.erl" "src/%s.erl" "test/%s.erl")

Since the text we're matching is the module name, not the file name, we need to tell Emacs how to make a file name out of it. I added two common subdirectory names to make it do the right thing in most cases.

However, there is a bug in Emacs in Emacs that prevents that from working—if we try this with our example above, Emacs will just ask where the foo file is. So head to that bug report and apply the patch, to continue your journey to instant link bliss.

Done? Great! You're probably itching to try this out, and thus frantically looking for a failing test case. There's a simpler way: just put the example output above in a text file, and hit M-x compilation-minor-mode and try all the links. To edit the text file again, type M-x fundamental-mode.

Jump from stacktrace lines to code

Since R15B, Erlang stacktraces include file names and line numbers (and with the patch above, we got Eunit to display them to us). So wouldn't it be great to use that information to jump directly from the test output to the corresponding point in the code?

The output looks like this, and we might want to jump to any of the line numbers given:

foo: bza_test...*failed*
::error:{badmatch,c}
  in function foo:b/0 [foo.erl:31]
  in call from foo:a/0 [foo.erl:27]
  in call from foo:bza_test/0 [foo.erl:23]

This calls for another regexp:

(setq compilation-error-regexp-alist-alist
      (delq (assq 'erlang-eunit-stacktrace compilation-error-regexp-alist-alist)
            compilation-error-regexp-alist-alist))
(add-to-list
 'compilation-error-regexp-alist-alist
 (cons
  'erlang-eunit-stacktrace
  (list
   "^ *in \\(?:function\\|call from\\) .* \\[\\(\\([^:]+\\):\\([0-9]+\\)\\)\\]$"
   ;; file
   2
   ;; line
   3
   ;; column
   nil
   ;; type
   2
   ;; hyperlink
   1
   )))
(add-to-list 'compilation-error-regexp-alist 'erlang-eunit-stacktrace)

So from an error like the one we saw above, we can now move point to a line in the stacktrace and hit Enter, and Emacs will take us to the right line and file. (You could of course just click on the stacktrace line, if you're into that kind of thing.)

Jump to failing assertions

And while we're at it, why not create links for assertion failures as well? Eunit's assert macros (assert, assertEqual, assertMatch etc) create error messages that look like this:

foo: foo_test (module 'foo')...*failed*
::error:{assertEqual_failed,[{module,foo},
                           {line,6},
                           {expression,"2"},
                           {expected,1},
                           {value,2}]}
  in function foo:'-foo_test/0-fun-0-'/1 [foo.erl:6]

Here, the module name and the line number are on different lines, but that doesn't stop this regexp from working:

(setq compilation-error-regexp-alist-alist
      (delq (assq 'erlang-eunit-assert compilation-error-regexp-alist-alist)
            compilation-error-regexp-alist-alist))
(add-to-list
 'compilation-error-regexp-alist-alist
 (cons
  'erlang-eunit-assert
  (list
   (concat
    "^\\(::error:{assert[A-Za-z]+_failed\\),"
    "[ \n]*\\[{module,\\([^}]+\\)},"
    "[ \n]*{line,\\([0-9]+\\)}")
   ;; file
   (list 2 "%s.erl" "src/%s.erl" "test/%s.erl")
   ;; line
   3
   ;; column
   nil
   ;; type
   2
   ;; hyperlink
   1
   )))
(add-to-list 'compilation-error-regexp-alist 'erlang-eunit-assert)

Again, we have module names, not file names, so the same caveat applies.

Jump from stacktraces embedded in other output

The above snippets work very well as long as you get straightforward errors—something has crashed, and the error gets propagated up to your test function, and further up to Eunit, which formats a nice error report. But you're not always so lucky. You might have an error in a linked process:

foo: bar_test...
=ERROR REPORT==== 1-Aug-2012::19:56:58 ===
Error in process <0.79.0> with exit value: {badarith,[{foo,baz,0,[{file,"foo.erl"},{line,13}]}]}

*skipped*
undefined
*unexpected termination of test process*
::{badarith,[{foo,baz,0,[{file,"foo.erl"},{line,13}]}]}

Or there might be a catch somewhere deep in the code, and an error gets propagated into a comparison or something:

foo: frobozz_test...*failed*
::error:{badmatch,
      {ok,{'EXIT',
          {badarg,
          [{erlang,list_to_integer,[x],[]},
           {foo,frobozz,1,[{file,"foo.erl"},{line,19}]},
           {foo,frobozz_test,0,[{file,"foo.erl"},{line,16}]},
           {eunit_test,'-function_wrapper/2-fun-0-',2,
               [{file,[...]},{line,...}]},
           {eunit_test,run_testfun,1,[{file,...},{...}]},
           {eunit_proc,run_test,1,[{...}|...]},
           {eunit_proc,with_timeout,3,[...]},
           {eunit_proc,handle_test,2,...}]}}}}
  in function foo:frobozz_test/0 [foo.erl:16]

So here you can see that the crash actually occurred on line 19, but the only stacktrace line that our regexps so far can recognise is for line 16. But since the information is there, let's spare ourselves the trouble of moving to that line manually:

(setq compilation-error-regexp-alist-alist
  (delq (assq 'erlang-raw-stacktrace compilation-error-regexp-alist-alist)
    compilation-error-regexp-alist-alist))
(add-to-list
 'compilation-error-regexp-alist-alist
 (cons
  'erlang-raw-stacktrace
  (list
   "{file,\"\\([^\"]+\\)\"},[[:space:]]*{line,\\([0-9]+\\)}"
   ;; file
   1
   ;; line
   2
   ;; column
   nil
   ;; type
   2
   ;; hyperlink
   1
   )))
(add-to-list 'compilation-error-regexp-alist 'erlang-raw-stacktrace)

There it is. Now, every time the compilation output contains {file,"something.erl"} and {line,42}, possibly separated by whitespace, the file name will be turned into a link.

Conclusion

That is all I've been able to come up with for now. I've automated the boring things (finding where my test crashed) so I can spend more time and energy on the fun things (actually fixing the test)—which is what Emacs is all about, of course.

Some elisp snippets for improving highlighting and linking of output from Eunit (the Erlang unit test tool) when run inside Emacs.

Posted Wed 01 Aug 2012 11:08:33 PM BST

With this article I want to show that the Russian and Polish orthographies, although very different, express almost the same set of phonemes. I hope that this will help you, dear reader, to read either of the languages better.

Comments, corrections and criticism are always welcome.

Phonemes

By "phoneme" I mean the smallest meaningful part of the sequence of sounds that make up a word. In this text, I will transcribe the phonemes that a written word (Russian or Polish) represents using Latin letters in [brackets]. (Although this is a common way to show pronunciation, I'm completely uninterested in pronunciation in this text, since that would get in the way of comparing Polish and Russian.)

Both Russian and Polish use "softened" consonants. I will indicate those with a subscript J in my transcriptions, for example НЬ = Ń = [nⱼ]. The difference in spelling of softened consonants between the two languages will be a central topic in the following.

Softening in Russian

In Russian, softened consonants are generally expressed in writing by a following "soft" vowel. The "hard" vowels А, О, У, Ы and Э correspond to the "soft" vowels Я, Ё, Ю, И and Е. If a softened consonant is not followed by a vowel, the so-called soft sign is used instead: Ь

E.g.: медь [mⱼedⱼ] "copper", лёд [lⱼod] "ice".

(In these two examples, the final sound is actually pronounced as T instead of D, because of the devoicing of final consonants that occurs in both Russian and Polish, but in the transcription I nevertheless use D, to follow the written form of the original word.)

If a soft vowel appears in the beginning of a word, or after another vowel, it represents the sound [j] instead of softening: ясный [jasnyj] "clear".

Some consonants (Ж, Ш, Ц) are never soft, and some (Ч, Щ) are always soft.

Softening in Polish

In Polish, softened consonants are in principle expressed by letters with diacritical marks: Ć, Ń, Ś, Ź. However, if the vowel "i" appears after the softened consonant, the mark is removed (since "i" by itself indicates softening), and if another vowel follows, an extra "i" is inserted between the consonant and the vowel. This can cause spelling differences in different conjugations of the same work, for example: koń [konⱼ] "horse", konia [konⱼa] "of a horse", koni [konⱼi] "of horses".

The hard correspondent to Ć is not C, but T. Therefore I will write the soft C as [tⱼ] in my transcriptions, e.g.: ciasto [tⱼasto] "cake".

The soft correspondent to D is written DŹ (but without the diacritical sign when it appears before a vowel, according to the rules above). E.g.: dziki [dⱼiki] "wild".

The consonant L is exceptional. It is written Ł when hard and L when soft. E.g.: las [lⱼas] "forest", głodny [glodny] "hungry".

The consonant R is also exceptional. It is written R when it is hard and RZ when it is soft. E.g.: ręka [ręka] "hand", rzeka [rⱼeka] "river". (Ą and Ę are Polish nasal vowels that Russian no longer has. I leave them as is in the transcriptions, but will go into further detail below.)

The reader will probably protest against some of the above pairings, and will rightly remark that L/Ł, R/RZ, S/Ś sound completely differently in Polish, but again I'd like to point out that I'm not interested in pronunciation; to compare Russian and Polish, these need to be treated as related phonemes.

Comparisons

Armed with this system for transforming written words from the two languages into a single transcription, we can to begin with notice that many words have the same phonemes (though not always the same meaning):

  • кот = kot = [kot] "cat"
  • конь = koń = [konⱼ] "horse"
  • дети = dzieci = [dⱼetⱼi] "children"
  • сеть = sieć = [sⱼetⱼ] "net"
  • река = rzeka = [rⱼeka] "river"
  • неделя "week" = niedziela "Sunday" = [nⱼedⱼelⱼa]

In some words we find a vowel change. Fairly often this is caused by the Old Slavic vowel Ѣ "yat", which in Russian became [ⱼe] but in Polish became either [ⱼa] or [ⱼe] depending on conjugation:

  • белый [bⱼelyj] ≈ biały [bⱼaly] "white"
  • лес [lⱼes] ≈ las [lⱼas] "forest"
  • лесной [lⱼesnoj] ≈ leśny [lⱼesⱼny] "pertaining to (a) forest" (adjective)
  • вера [vⱼera] ≈ wiara [vⱼara] "belief"
  • место [mⱼesto] "place" ≈ miasto [mⱼasto] "city"

Sometimes Russian has an unstressed E where Polish has O:

  • сестра [sⱼestra] ≈ siostra [sⱼostra] "sister"
  • седло [sⱼedlo] ≈ siodło [sⱼodlo] "saddle"

The Polish nasal vowels Ą kaj Ę were originally written using the Cyrillic letters Ѫ "big yus" and Ѧ "little yus" (but in Polish those two sounds first collapsed into one and later separated again, such that it's not immediately obvious which of them was the original sound). In Russian, those sounds sometimes became [u], sometimes [ⱼa]:

  • пять [pⱼatⱼ] ≈ pięć [pⱼętⱼ] "five"
  • мясо [mⱼaso] ≈ mięso [mⱼęso] "meat"
  • счастье [sĉⱼastje] ≈ szczęście [ŝĉęsⱼtⱼe] "happiness" (here [ⱼa] is written with А because of an orthographic rule)
  • мука [muka] ≈ mąka [mąka] "flour"
  • рука [ruka] ≈ ręka [ręka] "hand"
  • мудрый [mudryj] ≈ mądry [mądry] "wise"
  • буду [budu] ≈ będę [będę] "I will be"

In some cases Russian has [olo], [oro] or [ⱼerⱼe], while Polish lacks the first vowel:

  • голос [golos] ≈ głos [glos] "voice"
  • берег [bⱼerⱼeg] ≈ brzeg [brⱼeg] "coast"
  • горох [goroĥ] ≈ groch [groĥ] "pea"
  • молоко [moloko] ≈ mleko [mlⱼeko] "milk"

With this article I want to show that the Russian and Polish orthographies, although very different, express almost the same set of phonemes.

Posted Sun 31 Jul 2011 09:41:24 PM BST

This article is intended as a gentle introduction to the UK tax system for immigrant workers. I've learnt a bit since I was "fresh of the boat", so I thought I'd share it in the hope that it be useful to someone. Any comments or feedback is welcome, of course ☺

In the following, I will assume that you have one and only one job, and that you get a monthly salary. Mutatur mutandis, caveat emptor, etc.

The basics

In the UK, tax is collected by a government agency called HMRC, Her Majesty's Revenue and Customs. If you're just a normal employee, your tax will be deducted from your salary payments before you even get the money through a scheme called Pay As You Earn (PAYE). The taxman is usually happy to keep the relationship at that level, but as you will see below it is sometimes to your advantage to get involved with them.

To the HMRC, you are just a number, specifically a National Insurance (NI) number. Your employer will ask you for your NI number when you start working. If you don't have one, just say so, and then try to get one as soon as possible.

While your tax will be deducted from your monthly salary, the amount of tax you pay is actually decided by your total income during the current tax year. The tax year runs from the 6th of April to the 5th of April the following year. During the tax year, you only pay tax on the part of your income that exceeds the personal allowance (£7,475 for 2011-12). You pay 20% of the amount that exceeds the personal allowance up to a certain limit (£35,000 for 2011-12), and a higher rate for the amount exceeding that limit.

On top of that, you pay a few percent of your salary for National Insurance contributions, which I won't cover in this article.

The tax code

Based on what they know about you, the HMRC assigns you a tax code, which is used by your employer to figure out how much tax to deduct from your salary. The tax code looks something like 747L. (If your tax code doesn't contain three digits and the letter L, you'd be better off reading the HMRC page than this article.) This means that your tax-free allowance for the year is £7,479 (replace the L with a 9), and that this should be deducted from your salary evenly across the year. (I'm aware that I said £7,475 above, but both numbers come from the HMRC web site, so the confusion is not my fault. And what is £4 between friends?…)

The HMRC should send you a letter notifying you about what tax code they have chosen, and you can also find it on every payslip.

Emergency tax codes

When the HMRC doesn't have enough information to assign the correct tax code, they give you something they call an emergency tax code. The name is misleading: there's nothing "emergency" about the tax code itself; in fact, it's quite likely that you will get the same tax code once your relations with HMRC have stabilised.

And this is the part where they take your money

You could probably see this coming: if you didn't start working at the beginning of the tax year (i.e., 6th of April), your Personal Allowance will make up a greater proportion of your salary than the HMRC thinks, and thus you should pay less tax each month. Of course, this isn't something the HMRC is eager to tell you.

For example, if your annual salary is £20,000 and you work during the entire tax year, your tax for the year would be £2,505:

(20,000 - 7,475) × 20% = 2,505

which is £208.75 per month. But if you started working in October, the personal allowance cancels out a greater part of your income:

(10,000 - 7,475) × 20% = 505

Split over six months, you should pay £42.08 per month. However, if you don't get your tax code right, you'd pay £208.75 per month, and as a result you would have overpaid £1,000 by the end of the tax year.

What you can do about it

Basically, you should write a letter to the HMRC and ask them to stop taking too much money, or to pay back the amount you have overpaid.

To find out where to send your letter, you first need to know your employer's taxpayer's reference. It might be indicated on your payslip, on your P60 form (see below), or you could ask your employer. Armed with that piece of knowledge, go to the [tax office finder][tax-office-finder] and type in the code, and you'll get the address of the tax office dealing with your tax.

During the tax year in question

In theory, the P46 form that you filled in when you started working should have saved you from all this trouble, as it gives the HMRC all the information they need to work out the correct tax code, but in practice you're not always that lucky. What you can do is ask the HMRC to give you a new tax code for the rest of the year, which would result in smaller tax deductions as they "pay back" by reducing future payments.

Always quote your National Insurance number, your employer's Taxpayer Reference, your address and your phone number in your letters.

After the tax year in question

Some time in April or May, your employer will give you a P60 form, which sums up your income and your tax payments during the previous tax year. This is very useful, since it contains all the information you need to claim back the overpaid tax. Use the formulas above to calculate how much tax you should have paid, and then write them a letter containing:

  • your National Insurance number
  • your employer's Taxpayer Reference
  • the amount of tax you should have paid
  • the amount of tax you actually paid
  • the amount they owe you
  • bank account details for repayment (sort code, account number, branch address)

Also enclose a copy of your P60 form.

You should get your refund within a few months, unless of course they manage to lose your request somehow, in which case you need to remind them.

This article is intended as a gentle introduction to the UK tax system for immigrant workers. I've learnt a bit since I was 'fresh of the boat', so I thought I'd share it in the hope that it be useful to someone.

Posted Sun 29 May 2011 11:41:25 AM BST

Last weekend I was at the Language Show in London, mostly telling people about Esperanto at the stand of the Esperanto Association of Britain. (I had a lot of fun, of course, getting to tell people almost everything about my hobby.)

There were lots of exhibitors proposing various ways to learn various languages, but what stuck in my mind was the stand about Saaspel, a proposal for an alternative English orthography. Reforming their spelling system is something the English should have done a long time ago, as the pronunciation of a word generally has no relation to its spelling—which makes using the language harder for learners and native speakers alike. (The traditional spelling may have been a perfect fit for the language of Shakespeare's time, but is quite irrelevant now.)

Saaspel (or Sāspel, as written in its alternative form, with macrons to indicate long vowels) stands for "same sound—same spelling", which is a pretty good summary of how it works. Words are written as they are pronounced, with little (but still some) consideration given to their classical spelling. A few other rules and principles are:

  • A "long vowel", pronounced as in the alphabet, is written "long", either "aa ee ii oo" or "ā ē ī ō", depending on your taste. "U" gets special treatment: "use" → "yuz", "boot" → "buut" or "būt".

  • The vowels A, E, I and O, when short, generally correspond to the sound written with those letters in most continental European languages. For the same reason, "automatic" becomes "outomatic" but "sound" becomes "saund". Again, "U" gets special treatment and is used for the sound in "cut".

  • "K" is not used except in proper names. The letter "C" always stands for the "K" sound, never for "S".

  • Voiced "th", as in "this", is written "th", but voiceless "th", as in "thin", becomes "tt".

  • Often a consonant is enough for an entire syllable: "silabl" (syllable), "endd" (ended), "problm" (problem).

See their web site for more.

Though I'm not convinced that Saaspel is the perfect way to fix English spelling, I feel that it is a decent implementation of a good idea, so I'll try using it and see if I can make someone happy with it. (Ideally my effort would make the entire world adopt it, but if I can give at least one fellow human that warm fuzzy feeling of something done right, it's totally worth it.)

How I found out about Saaspel…

Posted Sun 31 Oct 2010 02:38:01 AM GMT