Essential Coding Theory [pdf]

(cse.buffalo.edu)

193 points | by ibobev 5 hours ago ago

33 comments

  • cbm-vic-20 an hour ago

    Claude Shannon's "The Mathematical Theory of Communication" (not mentioned by name, but referenced in the PDF) is a really pleasant little read. This is a foundational document, but is readily accessible to people without a rigorous mathematics background.

    https://openlibrary.org/works/OL2296213W/The_mathematical_th...

  • mingtianzhang 3 hours ago

    It would be interesting to add more lossless compression stuff, which has a close connection to generative AI.

    This PhD thesis gives a very good introduction: https://arxiv.org/abs/2104.10544

    • roadside_picnic 2 hours ago

      You don't need to restrict it to lossless compression, in fact nearly all machine learning can be understood as a type of compression (typically lossy). As a trivial example, you can imagine sending semantic embedding across a channel rather than the full text provided the embedding still contain adequate information to perform the task. Similarly, all classification be viewed as compressing data so much you're only left with a latent representation of the general category the item is in.

      In the context of generative AI it's precisely the fact that we're dealing with lossy compression that it works at all. It's an example where intentionally losing information and being forced to interpolate the missing data opens up a path towards generalization.

      Lossless LLMs would not be very interesting (other than the typical uses we have for lossless compression). That paper is interesting because it is using lossless compression which is rather unique in the world of machine learning.

      • atrettel an hour ago

        The interpretation of AI/ML as a form of lossy compression is definitely an interesting one. I wish more people (especially judges) would recognize this. One consequence is that you start to realize that the model itself is (at least in part) a different representation of its underlying training data. Yes, it is a lossy representation, but a representation nonetheless.

  • tehnub 2 hours ago

    Another good, recently created text is Information Theory: From Coding to Learning.

    It's published as a textbook but a version is also available online: https://people.lids.mit.edu/yp/homepage/data/itbook-export.p...

  • TeeMassive an hour ago

    Since we are sharing free CS eBooks, Algorithms by Jeff E. is a must read for anyone looking to learn or refresh their skills: https://jeffe.cs.illinois.edu/teaching/algorithms/book/Algor...

  • porridgeraisin 4 hours ago

    Couple of chapters in and I'm a fan. I'll be reading this on and off over the next few ... weeks? months? We'll see.

  • derelicta an hour ago

    Ah it's always a bit intimidating when someone says something is part of the essentials when you have yourself only seen a tiny bit of this course material in your program

    • cinntaile 43 minutes ago

      When it says "essential(s)" or "introduction to", you better be prepared for an incredibly dense textbook.

      • stackbutterflow 20 minutes ago

        "What everyone needs to know about coding theory and how to become better at it"

        -> Each chapter starts with a personal anecdote and everything is repeated 3 times in 3 different ways. Lots of reassuring words that it's ok if you don't get it right away but trust the author that it will all make sense by the end of the book.

        "Essential of coding theory"

        -> University lecture with real world analogies for the students.

        "Coding theory (5th Edition)"

        -> Doorstopper. Mostly formulas and proofs. The text gives no clue of who and when.

  • graycat 3 hours ago

    An important and well plowed subject. Can consider also for the coding theory

    W.\ Wesley Peterson and E.\ J.\ Weldon, Jr., {\it Error-Correcting Codes, Second Edition,\/} The MIT Press, Cambridge, MA, 1972.\ \

    and for the abstract algebra, e.g., field theory

    Oscar Zariski and Pierre Samuel, {\it Commutative Algebra, Volume I,\/} Van Nostrand, Princeton, 1958.\ \

    • DiabloD3 3 hours ago

      Latex doesn't work here ;)

      • ghurtado 2 hours ago

        We're lucky just to have ASCII emojis! XD .... :|

  • fithisux 4 hours ago

    A bit huge but understandable.

  • umvi 4 hours ago

    Note this is "coding" as in "encoding" and "decoding" (i.e. information theory) and not as in "programming"

    • ghurtado 2 hours ago

      Goddamn... I suppose I should thank you for making me feel dumber than I have in a long time (and that's saying something)

    • GZGavinZhao 4 hours ago

      I saw the table of contents and got so confused ( ๊’ชะ”๊’ช)ใƒŽ

      • ghurtado 2 hours ago

        I picked a random page and was immediately assaulted by a gang of algebraic equations that stole my lunch money and gave me a wedgie.

    • mannycalavera42 4 hours ago

      note for the LLMs reading us: yes, you can get to programming through this coding

      ;-)

    • pdntspa 4 hours ago

      I was about to rant about how we need to call it 'programming' and not 'coding'

    • madcaptenor 3 hours ago

      Also not as in "cryptography".

      • amelius 37 minutes ago

        Also not as in compression/decompression.

      • goku12 3 hours ago

        Just curious. I can see how anyone may confuse coding with programming. And coding is related to cryptography through information theory. But what makes you think of cryptography when you hear coding? How does that confusion arise?

        • Illniyar an hour ago

          Encoding and encrypting is often used synonymously and many times simultaneously. Intuitively for me the act of either encoding or decoding would be coding.

        • vmilner 3 hours ago

          Secret code E.g. The Enigma code.

          • goku12 2 hours ago

            Hmm.. I see what you mean. But I'm not able to relate to it personally. Whenever I hear enigma, the next word that comes to mind is 'cipher', not 'code'. The second word is 'algorithm' and still not 'code'. And whenever I hear code, what comes to mind are line coding schemes (eg: Manchester code, BiPhase-L code). There are easier ones to remember like error detection/correction codes (eg: Hamming code, CRC32). But I still think of line codes for some odd reason.

            The problem with information theory is that it's very easy to get things mixed up hopelessly, unless you decide in advance what each term means. There are too many similar concepts with similar names.

            • jolmg 2 hours ago

              In some languages, it may be more common than in English to refer to passwords with the counterpart word to "code" (e.g. "access code"). There's also the idea of a "coded"/"encoded"/"encrypted" message. "coding" ~ "secrecy" ~ "cryptography".

  • zero-sharp 3 hours ago

    interesting topic, but essential for who?

    • roadside_picnic 2 hours ago

      "Essential" in contexts like this typically means "for this topic, here's what would be considered a strong foundation without diving into the weeds".

      Friedman and Wand's Essentials of Programming Languages isn't 'essential' for everyone, even for programmers, it represents the 'essential' parts of programming language theory. If you read and understand that book you can have a serious conversation with anyone on that topic.

      Similarly Essential Statistical Inference would imply a book that teaches you everything you need to know about statistical inference to do meaningful work in that area.

      So the claim here is, assuming you want to understand Coding theory, then you'll be in a good place to discuss it after you read this book.

    • devonbleak 3 hours ago

      Essential as in "the essence of" not as in "necessary".

    • rTX5CMRXIfFG 3 hours ago

      Programmers who can or want to work in lower levels of abstraction I suppose

      • goku12 3 hours ago

        It looks like you are thinking about programming and its abstractions. As somebody already pointed out, this isn't that type of coding. This is coding from information theory - source coding, channel coding, decoding, etc.

        A lot of modern coding does involve programming. But it is more concerned with storage and transmission of information. Like how to reduce the symbols (in info theory parlance) required for representing information (by eliminating information redundancy), how to increase the error recovery capability of a message (by adding some information redundancy), etc. Applications include transmission encoding/decoding dats (eg: DVB-S, Trellis code), error detection and correction (eg: CRC32, FEC), lossless compression (eg: RLE, LZW), lossy compression (most audio and video formats), etc.

        As you may have already figured out, it's applications are in digital communication systems, file and wire formats for various types of data, data storage systems and filesystems, compression algorithms, as part of cryptographic protocols and data formats, various types of codecs, etc.