Feature of the Week: Character encoding of imported files
For this weeks feature of the week I'll first briefly discuss what a "character encoding" is and afterwards explain, why it is important during BibTeX import.
On a very low level, computers only understand zeros and ones. Hence, a mechanism is needed to encode symbols like letters and numbers as sequences of zeros and ones. A "table" which assigns to each symbol its corresponding zero-one sequence is called a character encoding (or character set). This table allows a computer to interprete the data in a file and show the correct symbol on the screen (or printer). Unfortunately, several such character encodings exist. Depending on the chosen character encoding, the same sequence of ones and zeros might stand for different symbols. To correctly display a piece of data, the computer must know its interpretation - its character encoding.
When uploading a BibTeX (or EndNote) file to BibSonomy, we face the same problem: we have to interprete the file with the correct character encoding. Typically, it's not possible to guess it (it's just an interpretation of the data - each interpretation could possibly be correct) so there is an option on the post_bibtex page which allows you to specify the character encoding of the file to upload. A click on the options link reveals a dropdown list which contains a choice of some typical character encodings. The default is "UTF-8" which is nowadays more and more common. However, older files might have a different encoding like "ISO-8859-1" (also known as "latin1"). If you're unsure about your data, UTF-8 is a good choice. If this gives you errors during import or strange looking characters afterwards, try another encoding. In Europe "ISO-8859-1" is very common, too.
On a very low level, computers only understand zeros and ones. Hence, a mechanism is needed to encode symbols like letters and numbers as sequences of zeros and ones. A "table" which assigns to each symbol its corresponding zero-one sequence is called a character encoding (or character set). This table allows a computer to interprete the data in a file and show the correct symbol on the screen (or printer). Unfortunately, several such character encodings exist. Depending on the chosen character encoding, the same sequence of ones and zeros might stand for different symbols. To correctly display a piece of data, the computer must know its interpretation - its character encoding.
When uploading a BibTeX (or EndNote) file to BibSonomy, we face the same problem: we have to interprete the file with the correct character encoding. Typically, it's not possible to guess it (it's just an interpretation of the data - each interpretation could possibly be correct) so there is an option on the post_bibtex page which allows you to specify the character encoding of the file to upload. A click on the options link reveals a dropdown list which contains a choice of some typical character encodings. The default is "UTF-8" which is nowadays more and more common. However, older files might have a different encoding like "ISO-8859-1" (also known as "latin1"). If you're unsure about your data, UTF-8 is a good choice. If this gives you errors during import or strange looking characters afterwards, try another encoding. In Europe "ISO-8859-1" is very common, too.


4 Comments:
hi bibsonomy team,
i just noticed the scores for the recommended tags for the first time, what are those about and how are they calculated?
also, is it possible to change the bookmarklets so as to open the bookmarking / bibtexing page in a popup or new tab? (better for fast copy n paste of summaries and such)
and while i'm at it, can you put the upload private copy field to the defaults, i use that all the time. oh and something that really bothers me, i need bibtex output for bookmarks, seriously, this should be an obvious feature, close to half of my citations are web pages.
by the way sorry for being way off topic.
Dear Paul,
many thanks for your feedback. Let met try to answer you step by step :)
1) Scores of recommended tags: Our tag recommender analyzes the URL in question and applies a collaborative filtering algorithm to come up with personalized recommendations. The scores are based on the latter.
2) Bookmarklets: Thanks a lot for this hint, we're working on it and we'll make available an accordingly modified bookmarklet as soon as possible (we'll try before christmas). Of course we'll let you know at this place.
3) Upload private copy field: Currently we're working on a major restructuring of all our websites onto a new internal infrastructure. We'll include your hint when restructuring the "post bibtex" page, so we plan to make available this feature with the next major release.
4) Bibtex output for Bookmarks: Once again an interesting hint. One question that we asked ourselves is how to construct the bibtex key in a consistent manner. Our idea was to use something like "url:HOSTNAME", e.g. "url:google". What do you think about this?
Best regards from Kassel,
Dominik
hi dominik,
thank you for your explanation. i have thought about your suggestion for the keys and it is probably the best solution, but one might run into problems when bookmarking several pages from the same domain, like wikipedia entries. but you also don't want the key to be a full url, that would be too long. i for one change the key for the publications to a format i'm used to anyway, so maybe you could point to the key field with a suggestion to change it to something that's easy to remember.
best regards, paul.
Dear Paul,
thanks for your comment. We're just about add this feature. One thing for clarification: The Bibtex key for each bookmark won't be stored in BibSonomy, but will be re-generated each time the bookmarks are exported in BibTex format. This way, unfortunately we cannot point to the key field and suggest to change it. But our current development resources don't allow for another option. We hope this is pragmatic solution.
In case you have some more feedback and / or suggestions, just let us know.
Best,
Dominik
Kommentar veröffentlichen
Links to this post:
Link erstellen
<< Home