This is the tenth installment in a series of articles introducing the Ceylon language. Note that some features of the language may change before the final release.
An overview of the language module
The module ceylon.language contains classes and interfaces that are referred to in the language specification, other declarations they refer to, and a number of related declarations. Let's meet the main characters.
Just like Java, Ceylon has a class named Object.
shared abstract class Object()
extends Void() {
doc "A developer-friendly string representing the instance."
shared formal String string;
doc "Determine if this object belongs to the given Category
or is produced by the iterator of the given Iterable
object."
shared Boolean element(Category|Iterable<Equality> category) {
switch (category)
case (is Category) {
return category.contains(this);
}
case (is Iterable<Equality>) {
if (is Equality self = this) {
for (Equality x in category) {
if (x==self) {
return true;
}
}
fail {
return false;
}
}
else {
return false;
}
}
}
}
In Ceylon, Object isn't the root of the type system. An expression of type Object has a definite, well-defined, non-null value. As we've seen, the Ceylon type system can also represent some more exotic types, for example Nothing, which is the type of null.
Therefore, Ceylon's Object has a superclass, named Void, which we already met in Part 1. All Ceylon types are assignable to Void. Expressions of type Void aren't useful for very much, since Void has no members or operations. You can't even narrow an expression of type Void to a different type. The one useful thing you can do with Void is use it to represent the signature of a method when you don't care about the return type, since a method declared void is considered to have return type Void, as we saw in Part 8.
As we also saw in Part 1, the type Nothing directly extends Void. All types that represent well-defined values extend Object, including:
- user-written classes,
- all interfaces, and
- the types that are considered primitive in Java, such as Integer, Float and Character.
Since an expression of type Object always evaluates to a definite, well-defined value, it's possible to obtain the runtime type of an Object, or narrow an expression of type Object to a more specific type.
Equality and identity
On the other hand, since Object is a supertype of types like Float which are passed by value at the level of the Java Virtual Machine, you can't use the === operator to test the identity of two values of type Object. Instead, there is a subclass of Object, named IdentifiableObject, which represents a type which is always passed by reference. The === operator accepts expressions of type IdentifiableObject. It's possible for a user-written class to directly extend Object, but most of the classes you write will be subclasses of IdentifiableObject. All classes with variable attributes must extend IdentifiableObject.
shared abstract class IdentifiableObject()
extends Object()
satisfies Equality {
shared default actual Boolean equals(Equality that) {
if (is IdentifiableObject that) {
return this===that;
}
else {
return false;
}
}
shared default actual Integer hash {
return identityHash(this);
}
shared default actual String string {
...
}
}
IdentifiableObject defines a default implementation of the interface Equality, which is very similar to the equals() and hashCode() methods defined by java.lang.Object.
shared interface Equality {
shared formal Boolean equals(Equality that);
shared formal Integer hash;
}
Just like in Java, you can refine this default implementation in your own classes. This is the normal way to get a customized behavior for the == operator, the only constraint being, that for subtypes of IdentifiableObject, x===y should imply x==y — equality should be consistent with identity.
Occasionally that's not what we want. For example, for numeric types, I don't care whether a value is of class Natural, Integer, or Whole when comparing it to 0. Fortunately, numeric types extend Object directly, and are not subject to the additional constraints defined by IdentifiableObject.
Thus, Ceylon is able to capture within the type system much of the behavior that Java introduces by fiat special-case rules in the language definition.
Operator polymorphism
Ceylon discourages the creation of intriguing executable ASCII art. Therefore, true operator overloading is not supported by the language. Instead, almost every operator (every one except the primitive ., (), is, and := operators) is considered a shortcut way of writing some more complex expression involving other operators and ordinary method calls. For example, the < operator is defined in terms of the interface Comparable<Other>, which we met in Part 5, and which has a method named smallerThan(), which is in turn defined in terms of another method named compare().
x<y
means, by definition,
x.smallerThan(y)
The equality operator == is defined in terms of the interface Equality, which has a method named equals().
x==y
means, by definition,
x.equals(y)
Therefore, it's easy to customize operators like < and == with specific behavior for our own classes, just by implementing or refining methods like compare() and equals(). Thus, we say that operators are polymorphic in Ceylon.
Apart from Comparable and Equality, which provide the underlying definition of comparison and equality operators, the following interfaces are also important in the definition of Ceylon's polymorphic operators:
- Summable supports the infix + operator,
- Invertable supports the prefix + and - operators,
- Numeric supports the other basic arithmetic operators,
- Slots supports bitwise operators,
- Comparable supports the comparison operators,
- Correspondence and Sequence support indexing and subrange operators, and
- Boolean is the basis of the logical operators.
Operator polymorphism is a little more flexible than you might imagine. Here's a quick example of this.
The Slots interface
The interface Slots is an abstraction of the idea of a set of slots which may each hold true or false. The bitwise operators &, |, and ~ are defined in terms of this interface. The most obvious subtype of Slots would be a Byte class, where the slots are the eight binary digits.
But the interface Set from the collections module also extends Slots. The slots of a Set are values which may or may not belong to the set. A slot holds true if the value it represents belongs to the Set. The practical value of this is to allow the use of the operator | for set union, the operator & for set intersection, and the infix ~ operator for set complement.
Set<Person> children = males|females ~ adults;
Yes, I realize that these aren't the traditional symbols representing these operations. But if you think carefully about the definition of these operations, I'm pretty sure you'll agree that these symbols are reasonable.
We could even define a Permission class that implements Slots, allowing us to write things like permissions&(read|execute).
Numeric types
As we've mentioned several times before, Ceylon doesn't have anything like Java's primitive types. The types that represent numeric values are just ordinary classes. Ceylon has fewer built-in numeric types than other C-like languages:
- Natural represents the unsigned integers and zero,
- Integer represents signed integers,
- Float represents floating point approximations to the real numbers,
- Whole represents arbitrary-precision signed integers, and
- Decimal represents arbitrary-precision and arbitrary-scale decimals.
Natural, Integer and Float have 64-bit precision by default. Eventually, you'll be able to specify that a value has 32-bit precision by annotating it small. But note that this annotation is really just a hint that the compiler is free to ignore (and it currently does).
Numeric literals
There are only two kinds of numeric literals: literals for Naturals, and literals for Floats:
Natural one = 1;
Float oneHundredth = 0.01;
Float oneMillion = 1.0E+6;
The digits of a numeric literal may be grouped using underscores. If the digits are grouped, then groups must contain exactly three digits.
Natural twoMillionAndOne = 2_000_001;
Float pi = 3.141_592_654;
A very large or small numeric literals may be qualified by one of the standard SI unit prefixes: m, u, n, p, f, k, M, G, T, P.
Float red = 390.0n; //n (nano) means E-9
Float galaxyDiameter = 900.0P; //P (peta) means E15
Float hydrogenRadius = 25.0p; //p (pico) means E-12
Float usGovDebt = 14.33T; //T (tera) means E12
Float brainCellSize = 4.0u; //u (micro) means E-6
Natural deathsUnderCommunism = 94M; //M (mega) means E6
Numeric widening
I mentioned earlier that Ceylon doesn't have implicit type conversions, not even built-in conversions for numeric types. Assignment does not automatically widen (or narrow) numeric values. Instead, we need to call one of the operations (well, attributes, actually) defined by the interface Number.
Whole zero = 0.whole; //explicitly widen from Natural
Decimal half = 0.5.decimal; //explicitly widen from Float
Usefully, the unary prefix operators + and - always widen Natural to Integer:
Integer negativeOne = -1;
Integer three = +3;
You can use all the operators you're used to from other C-style languages with the numeric types. You can also use the ** operator to raise a number to a power:
Float diagonal = (length**2.0+width**2.0)**0.5;
Of course, if you want to use the increment ++ operator, decrement -- operator, or one of the compound assignment operators such as +=, you'll have to declare the value variable.
Since it's quite noisy to explicitly perform numeric widening in numeric expressions, the numeric operators automatically widen their operands, so we could write the expression above like this:
Float diagonal = (length**2+width**2)**(1.0/2);
The built-in
widening conversions are the following:
- Natural to Integer, Float, Whole, or Decimal
- Integer to Float, Whole, or Decimal
- Float to Decimal
- Whole to Decimal
But these conversions aren't defined by special-case rules in the language specification.
Numeric operator semantics
Operators in Ceylon are, in principle, just abbreviations for some expression involving a method call. So the numeric types all implement the Numeric interface, refining the methods plus(), minus(), times(), divided() and power(), and the Invertable interface, refining inverse. The numeric operators are defined in terms of these methods of Numeric. The numeric types also implement the interface Castable, which enables the widening conversions we just mentioned.
shared interface Castable<in Types> {
shared formal CastValue as<CastValue>()
given CastValue satisfies Types;
}
The type parameter Types uses a special trick. The argument to Types should be the union of all types to which the implementing type is castable.
For example, simplifying slightly the definitions in the language module:
shared class Natural(...)
extends Object()
satisfies Castable<Natural|Integer|Float|Whole|Decimal> &
Numeric<Natural> &
Invertable<Integer> {
...
}
shared class Integer(...)
extends Object()
satisfies Castable<Integer|Float|Whole|Decimal> &
Numeric<Integer> &
Invertable<Integer> {
...
}
shared class Float(...)
extends Object()
satisfies Castable<Float|Decimal> &
Numeric<Float> &
Invertable<Float> {
...
}
These declarations tell us that Integer can be widened to Float, Whole, or Decimal, but that Float can only be widened to Decimal. So we can infer that the expression -1 * 0.4 is of type Float.
Therefore, the definition of a numeric operator like * can be represented, completely within the type system, in terms of Numeric and Castable:
Result product<Left,Right,Result>(Left x, Right y)
given Result of Left|Right satisfies Numeric<Result>
given Left satisfies Castable<Result> & Numeric<Left>
given Right satisfies Castable<Result> & Numeric<Right> {
return x.as<Result>().times(y.as<Result>());
}
Don't worry too much about the performance implications of all this — in practice, the compiler is permitted to optimize the types Natural, Integer, and Float down to the virtual machine's native numeric types.
The value of all this — apart from eliminating special cases in the language definition and type checker — is that a library can define its own specialized numeric types, without losing any of the nice language-level syntax support for numeric arithmetic and numeric widening conversions.
There's more...
If you're interested, you can check out a complete list of Ceylon's operators along with a discussion of their precedence.
In the Part 11 we're going to come back to the subject of object initialization, and deal with a subtle problem affecting languages like Java and C#.
Here's a quote from the language specification that explains the design constraints we've operated under with respect to numeric types:
Of all the problems we've come across in designing the language, this was for me personally one of the most interesting.
Thanks for your series of articles about Ceylon. It is much appreciated!
A couple of comments:
Shouldn't this be the other way around?
----
Nice!
----
Integer vs. Whole: I'd prefer Integer to be of arbitrary-precision and have Int64 and Int32 for restricted integers (making clear their ranges).
----
Furthermore I'd suggest to drop the prefix small, because it is just a hint which can be ignored. That is not really useful (and has haunted C for a long time).
----
That's a good idea which I first saw nicely done in the programming language Cecil by Craig Chambers (very nice type system with mutliple dispatch, Cecil Publications).
But shouldn't this read
instead of ?I'd prefer a C like default for signed (they are the most usable default, as in Java) and use 0U (suffix) for Natural / unsigned.
But this syntax kinda conflicts (u/U confusion) with ISO prefixes which are IMO pointless and I'd just remove them (especially since the common ones will result in the 1000/1024 confusion)
yes, of course, thanks. Fixed.
The thing is, I really think like precision shouldn't be modeled within the numeric types. It's just not an issue of typing.
I will check it out, I had never seen this approach taken elsewhere.
It's an enumerated type bound. The expression type N has to be exactly one of the operand types X, Y.
The default of Natural is definitely going to cause some controversy. My take is that use of unsigned values in Ceylon is a lot more common than in most other languages because the sequence index type is Natural.
This is something we will need a bit more practical experience with to know if it's truly the right decision or not.
(One possibility would be to try making Natural a subtype of Integer.)
And unsigned values simply occur very often in the real world. Including anytime you count stuff.
You are right. Instead there should probably be specialized types outside the numeric hierarchy like Byte, Word, Word32 or Word64 (or whatever) for working with the bits in a low-level fashion (where I need to specify the word width).
Ah, I see. Of course. Thanks for clearing that up.
Are hexadecimal literals allowed?
Not natively, but we're experimenting with the idea of a generic syntax for handling things like dates, times, URLs, email addresses, hexadecimal numbers, regular expressions, etc. i.e. a generic way of embedding micro-languages. I'm not totally certain what's the best path to make this work yet, but there are a couple of possibilities. I'll discuss it in a future blog.
I would love to see support for more literals. One niche language that does this is REBOL, it has literals for times, dates, urls, file paths, binary data, and more: REBOL datatypes
Interestingly, Ceylon reminds me of REBOL a bit already, they also have the aim of being able to write code in a declarative style for GUIs and structured data, while remaining in a general purpose language.
Yes, but as soon you subtract (compare) your count, you can easily get a signed number.
So, unless the subtraction on unsigned returns signed, it's better to have signed as default.
Sure, but if you're going to be dealing with exceptions, wouldn't it be much better to get those exceptions earlier (i.e. where subtraction results in a negative integer) rather than later (i.e. where you try to index a sequence by a negative index).
I have not looked at REBOL yet, but will do. What is definitely on the cards is a literal syntax, where you can define your own literal format that will be parsed at compile time. But we haven't really properly figured out the details of how this would work yet, so I have not talked about it here.
Interesting. Yes, in this context, support for more kinds of literals definitely comes up.
Yes, I think that is exactly right. There are always low-level applications where you need an efficient way to represent binary data. But it is a good idea to have separate data types for that, simply because the requirements are different (namely arithmetic on the one hand, and representing a specific binary pattern on the other hand).
How does operator polymorphism work for non-numeric types? Mainly I'm thinking of types from linear algebra (matrices, vectors) A vector has additive (interface Summable) operations (over vectors) and multiplicative (interface Numeric, I guess?) operations (over numbers) but can this be expressed in Ceylon so that things like scaling a vector works (s*V = V*s where s number, V vector)?
How would you iterate over a completement of a Set? :D
Adam
I assume that the type decimal will be implemented using something similar to java.math.BigDecimal. It would be better to delay the implementation of decimal and wait for the JVM to support 128bit values. It would not allow arbitrary precision but a least it would be efficient on modern CPU. If anyone needs arbitrary precision, then he or she can rely on BigDecimal,
I propose make numerical types hierarchy more math specific and use maths terms as algebraic structures, such as groups, rings, fields, modules, vector spaces, and algebras
It allow to make math packages more ceylon friendly and logical/
That's certainly something to look into for the future.
Is there precedent for the name Whole? It sounds a bit unclear to me. How about BigInteger?
As in whole number. The problem with BigInteger, apart from verbosity, is it's actually incorrect. A BigInteger may be or , depending.
Ah, but according to the Wikipedia article, can ambiguously refer to naturals, naturals without 0, or integers. I would venture to say that BigInteger, if not always accurate, will confuse people less.
Another possibility would be to make both Natural and Integer unbounded, and let the compiler and VM optimize them down to 32 or 64 bits. But the speed cost might be too high.
This lib can help
This is much better
Hello,
One important property of datatypes is their immutability which makes them suitable for reuse (low memory usage), sharing across threads (multithreading), use as keys in HashMaps (datatype consistency), etc. I have not seen anything about immutable types o your blog just yet. Do you have datatype immutability on the cards?
Also, another immutable type is important with interned Strings which can shrink memory usage quite a lot. Ruby and other languages have a nice syntax for interning. Do you plan to also have literal support for interned Strings? Or will you handle this with the pluggable literals API?
Thanks,
Djano
@Djano: I can tell you that immutable types are supported, whenever a type inherits directly from Object instead of IdentifiableObject it won't be allowed to have variable attributes. This is for example used in the numeric types and enumerations.